[PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized

Sergey Senozhatsky posted 1 patch 6 months, 3 weeks ago
drivers/net/wireless/ath/ath11k/hal.c | 4 ++++
1 file changed, 4 insertions(+)
[PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized
Posted by Sergey Senozhatsky 6 months, 3 weeks ago
ath11k_hal_srng_deinit() frees rdp and wrp which are used
by srng lists.  Mark srng lists as not-initialized.  This
makes sense, for instance, when device fails to resume
and the driver calls ath11k_hal_srng_deinit() from
ath11k_core_reconfigure_on_crash().

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---

v2: fixed subject line and updated commit message

 drivers/net/wireless/ath/ath11k/hal.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
index 8cb1505a5a0c..cab11a35f911 100644
--- a/drivers/net/wireless/ath/ath11k/hal.c
+++ b/drivers/net/wireless/ath/ath11k/hal.c
@@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
 void ath11k_hal_srng_deinit(struct ath11k_base *ab)
 {
 	struct ath11k_hal *hal = &ab->hal;
+	int i;
+
+	for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
+		ab->hal.srng_list[i].initialized = 0;
 
 	ath11k_hal_unregister_srng_key(ab);
 	ath11k_hal_free_cont_rdp(ab);
-- 
2.49.0.1204.g71687c7c1d-goog
Re: [PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized
Posted by Baochen Qiang 6 months, 1 week ago

On 5/29/2025 11:56 AM, Sergey Senozhatsky wrote:
> ath11k_hal_srng_deinit() frees rdp and wrp which are used
> by srng lists.  Mark srng lists as not-initialized.  This
> makes sense, for instance, when device fails to resume
> and the driver calls ath11k_hal_srng_deinit() from
> ath11k_core_reconfigure_on_crash().

Did you see any issue without your change?

> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
> 
> v2: fixed subject line and updated commit message
> 
>  drivers/net/wireless/ath/ath11k/hal.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
> index 8cb1505a5a0c..cab11a35f911 100644
> --- a/drivers/net/wireless/ath/ath11k/hal.c
> +++ b/drivers/net/wireless/ath/ath11k/hal.c
> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
>  void ath11k_hal_srng_deinit(struct ath11k_base *ab)
>  {
>  	struct ath11k_hal *hal = &ab->hal;
> +	int i;
> +
> +	for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
> +		ab->hal.srng_list[i].initialized = 0;

With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().

>  
>  	ath11k_hal_unregister_srng_key(ab);
>  	ath11k_hal_free_cont_rdp(ab);
Re: [PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized
Posted by Sergey Senozhatsky 6 months, 1 week ago
On (25/06/12 11:30), Baochen Qiang wrote:
> On 5/29/2025 11:56 AM, Sergey Senozhatsky wrote:
> > ath11k_hal_srng_deinit() frees rdp and wrp which are used
> > by srng lists.  Mark srng lists as not-initialized.  This
> > makes sense, for instance, when device fails to resume
> > and the driver calls ath11k_hal_srng_deinit() from
> > ath11k_core_reconfigure_on_crash().
> 
> Did you see any issue without your change?

We do see some issues, yes, on LTS kernels.

[..]
> > diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
> > index 8cb1505a5a0c..cab11a35f911 100644
> > --- a/drivers/net/wireless/ath/ath11k/hal.c
> > +++ b/drivers/net/wireless/ath/ath11k/hal.c
> > @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
> >  void ath11k_hal_srng_deinit(struct ath11k_base *ab)
> >  {
> >  	struct ath11k_hal *hal = &ab->hal;
> > +	int i;
> > +
> > +	for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
> > +		ab->hal.srng_list[i].initialized = 0;
> 
> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().

I think un-initialized lists should not be dumped.

ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are
accessed, as far as I understand it, in ath11k_hal_dump_srng_stats()
as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably,
causing things like:

<1>[173154.396775] BUG: unable to handle page fault for address: ffffb4e4c046f010
<1>[173154.396778] #PF: supervisor read access in kernel mode
<1>[173154.396781] #PF: error_code(0x0000) - not-present page
<4>[173154.396824] RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k]
<4>[173154.396839] Code: 88 c0 44 89 f2 89 c1 e8 3a 14 06 00 41 be e8 25 00 00 eb 6e 42 0f b6 84 33 78 ff ff ff 89 45 d0 46 8b 7c 33 d8 4a 8b 44 33 e0 <44> 8b 20 46 8b 6c 33 e8 42 8b 04 33 48 89 45 c8 48 8b 3d 45 a3 a0
<4>[173154.396842] RSP: 0018:ffffb4e4dceefc50 EFLAGS: 00010246
<4>[173154.396846] RAX: ffffb4e4c046f010 RBX: ffff90d1c3040000 RCX: a0009634a5d28c00
<4>[173154.396849] RDX: ffffffffb0279d80 RSI: ffffffffb0279d80 RDI: ffff90d2e5d17488
<4>[173154.396851] RBP: ffffb4e4dceefc90 R08: ffffffffb0249d80 R09: 0000000000003b82
<4>[173154.396854] R10: 0000000000000004 R11: 00000000ffffffea R12: ffff90d1c3041c90
<4>[173154.396856] R13: ffff90d1c3040000 R14: 0000000000002828 R15: 0000000000000000
<4>[173154.396859] FS: 0000000000000000(0000) GS:ffff90d2e5d00000(0000) knlGS:0000000000000000
<4>[173154.396862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[173154.396865] CR2: ffffb4e4c046f010 CR3: 000000005ca24000 CR4: 0000000000750ee0
<4>[173154.396868] PKRU: 55555554
<4>[173154.396870] Call Trace:
<4>[173154.396874] <TASK>
<4>[173154.396883] ? __die_body+0xae/0xb0
<4>[173154.396890] ? page_fault_oops+0x381/0x3e0
<4>[173154.396896] ? exc_page_fault+0x69/0xa0
<4>[173154.396901] ? asm_exc_page_fault+0x22/0x30
<4>[173154.396908] ? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:3de7 4)]
<4>[173154.396923] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:3de7 4)]
<4>[173154.396942] worker_thread+0x390/0x960
<4>[173154.396949] kthread+0x149/0x170
Re: [PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized
Posted by Baochen Qiang 6 months ago
[+ kernel mm list]

On 6/12/2025 1:04 PM, Sergey Senozhatsky wrote:
> On (25/06/12 11:30), Baochen Qiang wrote:
>> On 5/29/2025 11:56 AM, Sergey Senozhatsky wrote:
>>> ath11k_hal_srng_deinit() frees rdp and wrp which are used
>>> by srng lists.  Mark srng lists as not-initialized.  This
>>> makes sense, for instance, when device fails to resume
>>> and the driver calls ath11k_hal_srng_deinit() from
>>> ath11k_core_reconfigure_on_crash().
>>
>> Did you see any issue without your change?
> 
> We do see some issues, yes, on LTS kernels.
> 
> [..]
>>> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
>>> index 8cb1505a5a0c..cab11a35f911 100644
>>> --- a/drivers/net/wireless/ath/ath11k/hal.c
>>> +++ b/drivers/net/wireless/ath/ath11k/hal.c
>>> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
>>>  void ath11k_hal_srng_deinit(struct ath11k_base *ab)
>>>  {
>>>  	struct ath11k_hal *hal = &ab->hal;
>>> +	int i;
>>> +
>>> +	for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
>>> +		ab->hal.srng_list[i].initialized = 0;
>>
>> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().
> 
> I think un-initialized lists should not be dumped.
> 
> ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are
> accessed, as far as I understand it, in ath11k_hal_dump_srng_stats()
> as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably,
> causing things like:
> 
> <1>[173154.396775] BUG: unable to handle page fault for address: ffffb4e4c046f010
> <1>[173154.396778] #PF: supervisor read access in kernel mode
> <1>[173154.396781] #PF: error_code(0x0000) - not-present page

I am confused here: if the root cause is driver trying to read a freed memory, it should
not result in a PF issue. Because even if freed, the page is there and still mapped in
kernel page table.

Andrew, any insights?

> <4>[173154.396824] RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k]
> <4>[173154.396839] Code: 88 c0 44 89 f2 89 c1 e8 3a 14 06 00 41 be e8 25 00 00 eb 6e 42 0f b6 84 33 78 ff ff ff 89 45 d0 46 8b 7c 33 d8 4a 8b 44 33 e0 <44> 8b 20 46 8b 6c 33 e8 42 8b 04 33 48 89 45 c8 48 8b 3d 45 a3 a0
> <4>[173154.396842] RSP: 0018:ffffb4e4dceefc50 EFLAGS: 00010246
> <4>[173154.396846] RAX: ffffb4e4c046f010 RBX: ffff90d1c3040000 RCX: a0009634a5d28c00
> <4>[173154.396849] RDX: ffffffffb0279d80 RSI: ffffffffb0279d80 RDI: ffff90d2e5d17488
> <4>[173154.396851] RBP: ffffb4e4dceefc90 R08: ffffffffb0249d80 R09: 0000000000003b82
> <4>[173154.396854] R10: 0000000000000004 R11: 00000000ffffffea R12: ffff90d1c3041c90
> <4>[173154.396856] R13: ffff90d1c3040000 R14: 0000000000002828 R15: 0000000000000000
> <4>[173154.396859] FS: 0000000000000000(0000) GS:ffff90d2e5d00000(0000) knlGS:0000000000000000
> <4>[173154.396862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[173154.396865] CR2: ffffb4e4c046f010 CR3: 000000005ca24000 CR4: 0000000000750ee0
> <4>[173154.396868] PKRU: 55555554
> <4>[173154.396870] Call Trace:
> <4>[173154.396874] <TASK>
> <4>[173154.396883] ? __die_body+0xae/0xb0
> <4>[173154.396890] ? page_fault_oops+0x381/0x3e0
> <4>[173154.396896] ? exc_page_fault+0x69/0xa0
> <4>[173154.396901] ? asm_exc_page_fault+0x22/0x30
> <4>[173154.396908] ? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:3de7 4)]
> <4>[173154.396923] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:3de7 4)]
> <4>[173154.396942] worker_thread+0x390/0x960
> <4>[173154.396949] kthread+0x149/0x170
Re: [PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized
Posted by Pedro Falcato 6 months ago
On Thu, Jun 12, 2025 at 04:01:49PM +0800, Baochen Qiang wrote:
> [+ kernel mm list]
> 
> On 6/12/2025 1:04 PM, Sergey Senozhatsky wrote:
> > On (25/06/12 11:30), Baochen Qiang wrote:
> >> On 5/29/2025 11:56 AM, Sergey Senozhatsky wrote:
> >>> ath11k_hal_srng_deinit() frees rdp and wrp which are used
> >>> by srng lists.  Mark srng lists as not-initialized.  This
> >>> makes sense, for instance, when device fails to resume
> >>> and the driver calls ath11k_hal_srng_deinit() from
> >>> ath11k_core_reconfigure_on_crash().
> >>
> >> Did you see any issue without your change?
> > 
> > We do see some issues, yes, on LTS kernels.
> > 
> > [..]
> >>> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
> >>> index 8cb1505a5a0c..cab11a35f911 100644
> >>> --- a/drivers/net/wireless/ath/ath11k/hal.c
> >>> +++ b/drivers/net/wireless/ath/ath11k/hal.c
> >>> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
> >>>  void ath11k_hal_srng_deinit(struct ath11k_base *ab)
> >>>  {
> >>>  	struct ath11k_hal *hal = &ab->hal;
> >>> +	int i;
> >>> +
> >>> +	for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
> >>> +		ab->hal.srng_list[i].initialized = 0;
> >>
> >> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().
> > 
> > I think un-initialized lists should not be dumped.
> > 
> > ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are
> > accessed, as far as I understand it, in ath11k_hal_dump_srng_stats()
> > as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably,
> > causing things like:
> > 
> > <1>[173154.396775] BUG: unable to handle page fault for address: ffffb4e4c046f010
> > <1>[173154.396778] #PF: supervisor read access in kernel mode
> > <1>[173154.396781] #PF: error_code(0x0000) - not-present page
> 
> I am confused here: if the root cause is driver trying to read a freed memory, it should
> not result in a PF issue. Because even if freed, the page is there and still mapped in
> kernel page table.
> 

Any memory that is virtually-mapped (read: vmalloc, vmap, vm_map_ram, and others)
will be unmapped on its subsequent free. I'm not familiar with the DMA subsystem,
but the address ffffb4e4c046f010 is vmalloc-like.

> 
> > <4>[173154.396824] RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k]
> > <4>[173154.396839] Code: 88 c0 44 89 f2 89 c1 e8 3a 14 06 00 41 be e8 25 00 00 eb 6e 42 0f b6 84 33 78 ff ff ff 89 45 d0 46 8b 7c 33 d8 4a 8b 44 33 e0 <44> 8b 20 46 8b 6c 33 e8 42 8b 04 33 48 89 45 c8 48 8b 3d 45 a3 a0
> > <4>[173154.396842] RSP: 0018:ffffb4e4dceefc50 EFLAGS: 00010246
> > <4>[173154.396846] RAX: ffffb4e4c046f010 RBX: ffff90d1c3040000 RCX: a0009634a5d28c00
> > <4>[173154.396849] RDX: ffffffffb0279d80 RSI: ffffffffb0279d80 RDI: ffff90d2e5d17488
> > <4>[173154.396851] RBP: ffffb4e4dceefc90 R08: ffffffffb0249d80 R09: 0000000000003b82
> > <4>[173154.396854] R10: 0000000000000004 R11: 00000000ffffffea R12: ffff90d1c3041c90
> > <4>[173154.396856] R13: ffff90d1c3040000 R14: 0000000000002828 R15: 0000000000000000
> > <4>[173154.396859] FS: 0000000000000000(0000) GS:ffff90d2e5d00000(0000) knlGS:0000000000000000
> > <4>[173154.396862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4>[173154.396865] CR2: ffffb4e4c046f010 CR3: 000000005ca24000 CR4: 0000000000750ee0
> > <4>[173154.396868] PKRU: 55555554
> > <4>[173154.396870] Call Trace:
> > <4>[173154.396874] <TASK>
> > <4>[173154.396883] ? __die_body+0xae/0xb0
> > <4>[173154.396890] ? page_fault_oops+0x381/0x3e0
> > <4>[173154.396896] ? exc_page_fault+0x69/0xa0
> > <4>[173154.396901] ? asm_exc_page_fault+0x22/0x30
> > <4>[173154.396908] ? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:3de7 4)]
> > <4>[173154.396923] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:3de7 4)]
> > <4>[173154.396942] worker_thread+0x390/0x960
> > <4>[173154.396949] kthread+0x149/0x170
> 

-- 
Pedro
Re: [PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized
Posted by Baochen Qiang 6 months ago

On 6/12/2025 6:01 PM, Pedro Falcato wrote:
> On Thu, Jun 12, 2025 at 04:01:49PM +0800, Baochen Qiang wrote:
>> [+ kernel mm list]
>>
>> On 6/12/2025 1:04 PM, Sergey Senozhatsky wrote:
>>> On (25/06/12 11:30), Baochen Qiang wrote:
>>>> On 5/29/2025 11:56 AM, Sergey Senozhatsky wrote:
>>>>> ath11k_hal_srng_deinit() frees rdp and wrp which are used
>>>>> by srng lists.  Mark srng lists as not-initialized.  This
>>>>> makes sense, for instance, when device fails to resume
>>>>> and the driver calls ath11k_hal_srng_deinit() from
>>>>> ath11k_core_reconfigure_on_crash().
>>>>
>>>> Did you see any issue without your change?
>>>
>>> We do see some issues, yes, on LTS kernels.
>>>
>>> [..]
>>>>> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
>>>>> index 8cb1505a5a0c..cab11a35f911 100644
>>>>> --- a/drivers/net/wireless/ath/ath11k/hal.c
>>>>> +++ b/drivers/net/wireless/ath/ath11k/hal.c
>>>>> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
>>>>>  void ath11k_hal_srng_deinit(struct ath11k_base *ab)
>>>>>  {
>>>>>  	struct ath11k_hal *hal = &ab->hal;
>>>>> +	int i;
>>>>> +
>>>>> +	for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
>>>>> +		ab->hal.srng_list[i].initialized = 0;
>>>>
>>>> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().
>>>
>>> I think un-initialized lists should not be dumped.
>>>
>>> ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are
>>> accessed, as far as I understand it, in ath11k_hal_dump_srng_stats()
>>> as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably,
>>> causing things like:
>>>
>>> <1>[173154.396775] BUG: unable to handle page fault for address: ffffb4e4c046f010
>>> <1>[173154.396778] #PF: supervisor read access in kernel mode
>>> <1>[173154.396781] #PF: error_code(0x0000) - not-present page
>>
>> I am confused here: if the root cause is driver trying to read a freed memory, it should
>> not result in a PF issue. Because even if freed, the page is there and still mapped in
>> kernel page table.
>>
> 
> Any memory that is virtually-mapped (read: vmalloc, vmap, vm_map_ram, and others)
> will be unmapped on its subsequent free. I'm not familiar with the DMA subsystem,
> but the address ffffb4e4c046f010 is vmalloc-like.

OK, I forget the vmalloc case. And indeed in case IOMMU present, the DMA subsystem is
preferring vmalloc'ed memory.

Thank you, Pedro!

> 
>>
>>> <4>[173154.396824] RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k]
>>> <4>[173154.396839] Code: 88 c0 44 89 f2 89 c1 e8 3a 14 06 00 41 be e8 25 00 00 eb 6e 42 0f b6 84 33 78 ff ff ff 89 45 d0 46 8b 7c 33 d8 4a 8b 44 33 e0 <44> 8b 20 46 8b 6c 33 e8 42 8b 04 33 48 89 45 c8 48 8b 3d 45 a3 a0
>>> <4>[173154.396842] RSP: 0018:ffffb4e4dceefc50 EFLAGS: 00010246
>>> <4>[173154.396846] RAX: ffffb4e4c046f010 RBX: ffff90d1c3040000 RCX: a0009634a5d28c00
>>> <4>[173154.396849] RDX: ffffffffb0279d80 RSI: ffffffffb0279d80 RDI: ffff90d2e5d17488
>>> <4>[173154.396851] RBP: ffffb4e4dceefc90 R08: ffffffffb0249d80 R09: 0000000000003b82
>>> <4>[173154.396854] R10: 0000000000000004 R11: 00000000ffffffea R12: ffff90d1c3041c90
>>> <4>[173154.396856] R13: ffff90d1c3040000 R14: 0000000000002828 R15: 0000000000000000
>>> <4>[173154.396859] FS: 0000000000000000(0000) GS:ffff90d2e5d00000(0000) knlGS:0000000000000000
>>> <4>[173154.396862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> <4>[173154.396865] CR2: ffffb4e4c046f010 CR3: 000000005ca24000 CR4: 0000000000750ee0
>>> <4>[173154.396868] PKRU: 55555554
>>> <4>[173154.396870] Call Trace:
>>> <4>[173154.396874] <TASK>
>>> <4>[173154.396883] ? __die_body+0xae/0xb0
>>> <4>[173154.396890] ? page_fault_oops+0x381/0x3e0
>>> <4>[173154.396896] ? exc_page_fault+0x69/0xa0
>>> <4>[173154.396901] ? asm_exc_page_fault+0x22/0x30
>>> <4>[173154.396908] ? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:3de7 4)]
>>> <4>[173154.396923] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:3de7 4)]
>>> <4>[173154.396942] worker_thread+0x390/0x960
>>> <4>[173154.396949] kthread+0x149/0x170
>>
>
Re: [PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized
Posted by Baochen Qiang 6 months, 1 week ago

On 6/12/2025 1:04 PM, Sergey Senozhatsky wrote:
> On (25/06/12 11:30), Baochen Qiang wrote:
>> On 5/29/2025 11:56 AM, Sergey Senozhatsky wrote:
>>> ath11k_hal_srng_deinit() frees rdp and wrp which are used
>>> by srng lists.  Mark srng lists as not-initialized.  This
>>> makes sense, for instance, when device fails to resume
>>> and the driver calls ath11k_hal_srng_deinit() from
>>> ath11k_core_reconfigure_on_crash().
>>
>> Did you see any issue without your change?
> 
> We do see some issues, yes, on LTS kernels.
> 
> [..]
>>> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
>>> index 8cb1505a5a0c..cab11a35f911 100644
>>> --- a/drivers/net/wireless/ath/ath11k/hal.c
>>> +++ b/drivers/net/wireless/ath/ath11k/hal.c
>>> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
>>>  void ath11k_hal_srng_deinit(struct ath11k_base *ab)
>>>  {
>>>  	struct ath11k_hal *hal = &ab->hal;
>>> +	int i;
>>> +
>>> +	for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
>>> +		ab->hal.srng_list[i].initialized = 0;
>>
>> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().
> 
> I think un-initialized lists should not be dumped.
> 
> ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are
> accessed, as far as I understand it, in ath11k_hal_dump_srng_stats()
> as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably,
> causing things like:

But ath11k_hal_dump_srng_stats() is called before ath11k_hal_srng_deinit(), right?

The sequence is ath11k_hal_dump_srng_stats() is called in reset process, then restart_work
is queued and in ath11k_core_restart() we call ath11k_core_reconfigure_on_crash(), there
ath11k_hal_srng_deinit() is called, right?

> 
> <1>[173154.396775] BUG: unable to handle page fault for address: ffffb4e4c046f010
> <1>[173154.396778] #PF: supervisor read access in kernel mode
> <1>[173154.396781] #PF: error_code(0x0000) - not-present page
> <4>[173154.396824] RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k]
> <4>[173154.396839] Code: 88 c0 44 89 f2 89 c1 e8 3a 14 06 00 41 be e8 25 00 00 eb 6e 42 0f b6 84 33 78 ff ff ff 89 45 d0 46 8b 7c 33 d8 4a 8b 44 33 e0 <44> 8b 20 46 8b 6c 33 e8 42 8b 04 33 48 89 45 c8 48 8b 3d 45 a3 a0
> <4>[173154.396842] RSP: 0018:ffffb4e4dceefc50 EFLAGS: 00010246
> <4>[173154.396846] RAX: ffffb4e4c046f010 RBX: ffff90d1c3040000 RCX: a0009634a5d28c00
> <4>[173154.396849] RDX: ffffffffb0279d80 RSI: ffffffffb0279d80 RDI: ffff90d2e5d17488
> <4>[173154.396851] RBP: ffffb4e4dceefc90 R08: ffffffffb0249d80 R09: 0000000000003b82
> <4>[173154.396854] R10: 0000000000000004 R11: 00000000ffffffea R12: ffff90d1c3041c90
> <4>[173154.396856] R13: ffff90d1c3040000 R14: 0000000000002828 R15: 0000000000000000
> <4>[173154.396859] FS: 0000000000000000(0000) GS:ffff90d2e5d00000(0000) knlGS:0000000000000000
> <4>[173154.396862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[173154.396865] CR2: ffffb4e4c046f010 CR3: 000000005ca24000 CR4: 0000000000750ee0
> <4>[173154.396868] PKRU: 55555554
> <4>[173154.396870] Call Trace:
> <4>[173154.396874] <TASK>
> <4>[173154.396883] ? __die_body+0xae/0xb0
> <4>[173154.396890] ? page_fault_oops+0x381/0x3e0
> <4>[173154.396896] ? exc_page_fault+0x69/0xa0
> <4>[173154.396901] ? asm_exc_page_fault+0x22/0x30
> <4>[173154.396908] ? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:3de7 4)]
> <4>[173154.396923] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:3de7 4)]
> <4>[173154.396942] worker_thread+0x390/0x960
> <4>[173154.396949] kthread+0x149/0x170
Re: [PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized
Posted by Sergey Senozhatsky 6 months, 1 week ago
On (25/06/12 13:47), Baochen Qiang wrote:
> > [..]
> >>> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
> >>> index 8cb1505a5a0c..cab11a35f911 100644
> >>> --- a/drivers/net/wireless/ath/ath11k/hal.c
> >>> +++ b/drivers/net/wireless/ath/ath11k/hal.c
> >>> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
> >>>  void ath11k_hal_srng_deinit(struct ath11k_base *ab)
> >>>  {
> >>>  	struct ath11k_hal *hal = &ab->hal;
> >>> +	int i;
> >>> +
> >>> +	for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
> >>> +		ab->hal.srng_list[i].initialized = 0;
> >>
> >> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().
> > 
> > I think un-initialized lists should not be dumped.
> > 
> > ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are
> > accessed, as far as I understand it, in ath11k_hal_dump_srng_stats()
> > as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably,
> > causing things like:
> 
> But ath11k_hal_dump_srng_stats() is called before ath11k_hal_srng_deinit(), right?
> 
> The sequence is ath11k_hal_dump_srng_stats() is called in reset process, then restart_work
> is queued and in ath11k_core_restart() we call ath11k_core_reconfigure_on_crash(), there
> ath11k_hal_srng_deinit() is called, right?

My understanding is that the driver first fails to reconfigure

<4>[163874.555825] ath11k_pci 0000:01:00.0: already resetting count 2
<4>[163884.606490] ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110
<4>[163884.606508] ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110
<3>[163884.606550] ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery

so ath11k_core_reconfigure_on_crash() calls ath11k_hal_srng_deinit(),
which destroys the srng lists, but leaves the stale initialized flag.
So next time ath11k_hal_dump_srng_stats() is called everything looks ok,
but in fact everything is not quite ok.

Regardless of that, I do think that resetting the initialized flag
when srng list is de-initialized/destroyed is the right thing to do.
Re: [PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized
Posted by Baochen Qiang 6 months ago

On 6/12/2025 3:02 PM, Sergey Senozhatsky wrote:
> On (25/06/12 13:47), Baochen Qiang wrote:
>>> [..]
>>>>> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
>>>>> index 8cb1505a5a0c..cab11a35f911 100644
>>>>> --- a/drivers/net/wireless/ath/ath11k/hal.c
>>>>> +++ b/drivers/net/wireless/ath/ath11k/hal.c
>>>>> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
>>>>>  void ath11k_hal_srng_deinit(struct ath11k_base *ab)
>>>>>  {
>>>>>  	struct ath11k_hal *hal = &ab->hal;
>>>>> +	int i;
>>>>> +
>>>>> +	for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
>>>>> +		ab->hal.srng_list[i].initialized = 0;
>>>>
>>>> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().
>>>
>>> I think un-initialized lists should not be dumped.
>>>
>>> ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are
>>> accessed, as far as I understand it, in ath11k_hal_dump_srng_stats()
>>> as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably,
>>> causing things like:
>>
>> But ath11k_hal_dump_srng_stats() is called before ath11k_hal_srng_deinit(), right?
>>
>> The sequence is ath11k_hal_dump_srng_stats() is called in reset process, then restart_work
>> is queued and in ath11k_core_restart() we call ath11k_core_reconfigure_on_crash(), there
>> ath11k_hal_srng_deinit() is called, right?
> 
> My understanding is that the driver first fails to reconfigure
> 
> <4>[163874.555825] ath11k_pci 0000:01:00.0: already resetting count 2
> <4>[163884.606490] ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110
> <4>[163884.606508] ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110
> <3>[163884.606550] ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery
> 
> so ath11k_core_reconfigure_on_crash() calls ath11k_hal_srng_deinit(),
> which destroys the srng lists, but leaves the stale initialized flag.
> So next time ath11k_hal_dump_srng_stats() is called everything looks ok,
> but in fact everything is not quite ok.

OK, we have a second crash while the first crash is still in recovering. And guess the
first recovery fails such that srng is not reinitialized. Then after a
wait-for-first-recovery time out, the second recovery starts, this results in
ath11k_hal_dump_srng_stats() getting called and hence the kernel crash.

Could you please share complete verbose kernel log? you may enable it with

	modprobe ath11k debug_mask=0xffffffff
	modprobe ath11k_pci

> 
> Regardless of that, I do think that resetting the initialized flag
> when srng list is de-initialized/destroyed is the right thing to do.

Yeah, correct.
Re: [PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized
Posted by Sergey Senozhatsky 6 months ago
On (25/06/12 15:49), Baochen Qiang wrote:
> > My understanding is that the driver first fails to reconfigure
> > 
> > <4>[163874.555825] ath11k_pci 0000:01:00.0: already resetting count 2
> > <4>[163884.606490] ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110
> > <4>[163884.606508] ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110
> > <3>[163884.606550] ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery
> > 
> > so ath11k_core_reconfigure_on_crash() calls ath11k_hal_srng_deinit(),
> > which destroys the srng lists, but leaves the stale initialized flag.
> > So next time ath11k_hal_dump_srng_stats() is called everything looks ok,
> > but in fact everything is not quite ok.
> 
> OK, we have a second crash while the first crash is still in recovering. And guess the
> first recovery fails such that srng is not reinitialized. Then after a
> wait-for-first-recovery time out, the second recovery starts, this results in
> ath11k_hal_dump_srng_stats() getting called and hence the kernel crash.
> 
> Could you please share complete verbose kernel log? you may enable it with
> 
> 	modprobe ath11k debug_mask=0xffffffff
> 	modprobe ath11k_pci

Unfortunately I don't have a reproducer.  We just see that some of the
consumer devices crash on resume and all we get is ramoops.  We can't
do any debugging on consumer devices.

This is all I have:

<3>[23518.302240] Last interrupt received for each group:
<3>[23518.302243] ath11k_pci 0000:01:00.0: group_id 0 22511ms before
<3>[23518.302246] ath11k_pci 0000:01:00.0: group_id 1 14440788ms before
<3>[23518.302248] ath11k_pci 0000:01:00.0: group_id 2 14440788ms before
<3>[23518.302250] ath11k_pci 0000:01:00.0: group_id 3 14440788ms before
<3>[23518.302252] ath11k_pci 0000:01:00.0: group_id 4 14736571ms before
<3>[23518.302253] ath11k_pci 0000:01:00.0: group_id 5 14736571ms before
<3>[23518.302261] ath11k_pci 0000:01:00.0: group_id 6 14440789ms before
<3>[23518.302263] ath11k_pci 0000:01:00.0: group_id 7 22541ms before
<3>[23518.302265] ath11k_pci 0000:01:00.0: group_id 8 24724ms before
<3>[23518.302266] ath11k_pci 0000:01:00.0: group_id 9 23315ms before
<3>[23518.302268] ath11k_pci 0000:01:00.0: group_id 10 25238ms before
<3>[23518.302270] ath11k_pci 0000:01:00.0: dst srng id 0 tp 5312, cur hp 5312, cached hp 5312 last hp 5312 napi processed before 22541ms
<3>[23518.302272] ath11k_pci 0000:01:00.0: dst srng id 1 tp 27664, cur hp 27664, cached hp 27664 last hp 27664 napi processed before 24724ms
<3>[23518.302274] ath11k_pci 0000:01:00.0: dst srng id 2 tp 12432, cur hp 12432, cached hp 12432 last hp 12432 napi processed before 23315ms
<3>[23518.302276] ath11k_pci 0000:01:00.0: dst srng id 3 tp 1424, cur hp 1424, cached hp 1424 last hp 1424 napi processed before 25238ms
<3>[23518.302278] ath11k_pci 0000:01:00.0: dst srng id 4 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 22512ms
<3>[23518.302280] ath11k_pci 0000:01:00.0: src srng id 5 hp 0, reap_hp 248, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms
<3>[23518.302282] ath11k_pci 0000:01:00.0: src srng id 8 hp 950, reap_hp 950, cur tp 950, cached tp 280 last tp 280 napi processed before 22512ms
<3>[23518.302284] ath11k_pci 0000:01:00.0: dst srng id 9 tp 19526, cur hp 19526, cached hp 19526 last hp 19526 napi processed before 22512ms
<3>[23518.302286] ath11k_pci 0000:01:00.0: src srng id 16 hp 3832, reap_hp 3832, cur tp 3832, cached tp 3824 last tp 3824 napi processed before 22758ms
<3>[23518.302288] ath11k_pci 0000:01:00.0: src srng id 24 hp 0, reap_hp 248, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms
<3>[23518.302290] ath11k_pci 0000:01:00.0: dst srng id 25 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 14440789ms
<3>[23518.302292] ath11k_pci 0000:01:00.0: src srng id 32 hp 12, reap_hp 8, cur tp 12, cached tp 12 last tp 8 napi processed before 14736834ms
<3>[23518.302294] ath11k_pci 0000:01:00.0: src srng id 35 hp 96, reap_hp 88, cur tp 92, cached tp 92 last tp 92 napi processed before 21573ms
<3>[23518.302296] ath11k_pci 0000:01:00.0: src srng id 36 hp 176, reap_hp 164, cur tp 176, cached tp 168 last tp 168 napi processed before 22447ms
<3>[23518.302298] ath11k_pci 0000:01:00.0: src srng id 39 hp 0, reap_hp 124, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms
<3>[23518.302300] ath11k_pci 0000:01:00.0: src srng id 57 hp 54, reap_hp 54, cur tp 58, cached tp 58 last tp 58 napi processed before 22485ms
<3>[23518.302302] ath11k_pci 0000:01:00.0: src srng id 58 hp 584, reap_hp 584, cur tp 588, cached tp 588 last tp 588 napi processed before 22429ms
<3>[23518.302304] ath11k_pci 0000:01:00.0: src srng id 61 hp 1020, reap_hp 1020, cur tp 0, cached tp 0 last tp 0 napi processed before 14736834ms
<3>[23518.302306] ath11k_pci 0000:01:00.0: dst srng id 81 tp 116, cur hp 116, cached hp 116 last hp 116 napi processed before 22485ms
<3>[23518.302308] ath11k_pci 0000:01:00.0: dst srng id 82 tp 1176, cur hp 1176, cached hp 1176 last hp 1176 napi processed before 22429ms
<3>[23518.302309] ath11k_pci 0000:01:00.0: dst srng id 85 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 14440789ms
<3>[23518.302311] ath11k_pci 0000:01:00.0: src srng id 104 hp 65532, reap_hp 65532, cur tp 0, cached tp 0 last tp 0 napi processed before 14736836ms
<3>[23518.302313] ath11k_pci 0000:01:00.0: src srng id 105 hp 0, reap_hp 504, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms
<3>[23518.302315] ath11k_pci 0000:01:00.0: dst srng id 106 tp 245496, cur hp 245496, cached hp 245496 last hp 245496 napi processed before 22512ms
<3>[23518.302317] ath11k_pci 0000:01:00.0: dst srng id 109 tp 5704, cur hp 5704, cached hp 5704 last hp 5704 napi processed before 22512ms
<3>[23518.302319] ath11k_pci 0000:01:00.0: src srng id 128 hp 3182, reap_hp 3182, cur tp 7428, cached tp 7428 last tp 7428 napi processed before 22541ms
<3>[23518.302321] ath11k_pci 0000:01:00.0: src srng id 129 hp 0, reap_hp 2046, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms
<3>[23518.302323] ath11k_pci 0000:01:00.0: src srng id 132 hp 1690, reap_hp 1690, cur tp 1692, cached tp 1692 last tp 1692 napi processed before 22429ms
<3>[23518.302324] ath11k_pci 0000:01:00.0: dst srng id 133 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 22512ms
<3>[23518.302326] ath11k_pci 0000:01:00.0: src srng id 144 hp 0, reap_hp 2046, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms
<3>[23518.302328] ath11k_pci 0000:01:00.0: src srng id 147 hp 1948, reap_hp 1948, cur tp 1950, cached tp 1950 last tp 1950 napi processed before 22429ms
<3>[23518.302330] ath11k_pci 0000:01:00.0: dst srng id 148 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 14440789ms
<4>[23519.369310] ath11k_pci 0000:01:00.0: failed to receive control response completion, polling..
<3>[23520.393292] ath11k_pci 0000:01:00.0: Service connect timeout
<3>[23520.393302] ath11k_pci 0000:01:00.0: failed to connect to HTT: -110
<3>[23520.394087] ath11k_pci 0000:01:00.0: failed to start core: -110
<4>[23520.710478] ath11k_pci 0000:01:00.0: firmware crashed: MHI_CB_EE_RDDM
<4>[23520.710550] ath11k_pci 0000:01:00.0: already resetting count 2
<4>[23530.761544] ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110
<4>[23530.761562] ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110
<3>[23530.761595] ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery
<6>[23561.813605] mhi mhi0: Requested to power ON
<6>[23561.813627] mhi mhi0: Power on setup success
<6>[23561.899318] mhi mhi0: Wait for device to enter SBL or Mission mode
<6>[23562.530990] ath11k_pci 0000:01:00.0: chip_id 0x12 chip_family 0xb board_id 0xff soc_id 0x400c1211
<6>[23562.531010] ath11k_pci 0000:01:00.0: fw_version 0x11088c35 fw_build_timestamp 2024-04-17 08:34 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
<3>[23562.575723] ath11k_pci 0000:01:00.0: Last interrupt received for each CE:
<3>[23562.575742] ath11k_pci 0000:01:00.0: CE_id 0 pipe_num 0 14781107ms before
<3>[23562.575751] ath11k_pci 0000:01:00.0: CE_id 1 pipe_num 1 66758ms before
<3>[23562.575756] ath11k_pci 0000:01:00.0: CE_id 2 pipe_num 2 66702ms before
<3>[23562.575759] ath11k_pci 0000:01:00.0: CE_id 3 pipe_num 3 66720ms before
<3>[23562.575763] ath11k_pci 0000:01:00.0: CE_id 5 pipe_num 5 14485062ms before
<3>[23562.575766] ath11k_pci 0000:01:00.0: CE_id 7 pipe_num 7 14485062ms before
<3>[23562.575770] ath11k_pci 0000:01:00.0: CE_id 8 pipe_num 8 14485062ms before
<3>[23562.575773] ath11k_pci 0000:01:00.0:
<3>[23562.575773] Last interrupt received for each group:
<3>[23562.575778] ath11k_pci 0000:01:00.0: group_id 0 66785ms before
<3>[23562.575781] ath11k_pci 0000:01:00.0: group_id 1 14485062ms before
<3>[23562.575785] ath11k_pci 0000:01:00.0: group_id 2 14485062ms before
<3>[23562.575788] ath11k_pci 0000:01:00.0: group_id 3 14485062ms before
<3>[23562.575791] ath11k_pci 0000:01:00.0: group_id 4 14780845ms before
<3>[23562.575795] ath11k_pci 0000:01:00.0: group_id 5 14780845ms before
<3>[23562.575798] ath11k_pci 0000:01:00.0: group_id 6 14485062ms before
<3>[23562.575801] ath11k_pci 0000:01:00.0: group_id 7 66814ms before
<3>[23562.575805] ath11k_pci 0000:01:00.0: group_id 8 68997ms before
<3>[23562.575808] ath11k_pci 0000:01:00.0: group_id 9 67588ms before
<3>[23562.575812] ath11k_pci 0000:01:00.0: group_id 10 69511ms before
<1>[23562.575828] BUG: unable to handle page fault for address: ffffa007404eb010
<1>[23562.575833] #PF: supervisor read access in kernel mode
<1>[23562.575837] #PF: error_code(0x0000) - not-present page
<6>[23562.575842] PGD 100000067 P4D 100000067 PUD 10022d067 PMD 100b01067 PTE 0
<4>[23562.575852] Oops: 0000 [#1] PREEMPT SMP NOPTI
<4>[23562.575873] Workqueue: ath11k_qmi_driver_event ath11k_qmi_driver_event_work [ath11k]
<4>[23562.575896] RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k]
<4>[23562.575916] Code: 6b c0 44 89 f2 89 c1 e8 4a 14 06 00 41 be e8 25 00 00 eb 6e 42 0f b6 84 33 78 ff ff ff 89 45 d0 46 8b 7c 33 d8 4a 8b 44 33 e0 <44> 8b 20 46 8b 6c 33 e8 42 8b 04 33 48 89 45 c8 48 8b 3d 45 93 fd
<4>[23562.575922] RSP: 0018:ffffa00759ed3c50 EFLAGS: 00010246
<4>[23562.575926] RAX: ffffa007404eb010 RBX: ffff9eab4ea60000 RCX: 382d128991c49600
<4>[23562.575930] RDX: 00000000ffffffea RSI: ffffa00759ed3998 RDI: ffff9eac66017488
<4>[23562.575934] RBP: ffffa00759ed3c90 R08: ffffffffbd649d80 R09: 0000000000005ffd
<4>[23562.575937] R10: 0000000000000004 R11: 00000000ffffdfff R12: ffff9eab4ea61c90
<4>[23562.575941] R13: ffff9eab4ea60000 R14: 0000000000002828 R15: 0000000000000000
<4>[23562.575945] FS: 0000000000000000(0000) GS:ffff9eac66000000(0000) knlGS:0000000000000000
<4>[23562.575949] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[23562.575953] CR2: ffffa007404eb010 CR3: 0000000136e24000 CR4: 0000000000750ee0
<4>[23562.575956] PKRU: 55555554
<4>[23562.575959] Call Trace:
<4>[23562.575965] <TASK>
<4>[23562.575978] ? __die_body+0xae/0xb0
<4>[23562.575987] ? page_fault_oops+0x381/0x3e0
<4>[23562.575995] ? exc_page_fault+0x69/0xa0
<4>[23562.576003] ? asm_exc_page_fault+0x22/0x30
<4>[23562.576015] ? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:6cea 4)]
<4>[23562.576034] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:6cea 4)]
<4>[23562.576058] worker_thread+0x389/0x930
<4>[23562.576065] kthread+0x149/0x170
<4>[23562.576074] ? start_flush_work+0x130/0x130
<4>[23562.576078] ? kthread_associate_blkcg+0xb0/0xb0
<4>[23562.576084] ret_from_fork+0x3b/0x50
<4>[23562.576090] ? kthread_associate_blkcg+0xb0/0xb0
<4>[23562.576096] ret_from_fork_asm+0x11/0x20


There are clearly two ath11k_hal_dump_srng_stats() calls, the first
one happens before crash recovery, the second happens right after
and presumably causes UAF, because ->initialized flag is not cleared.
Re: [PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized
Posted by Baochen Qiang 6 months ago

On 6/12/2025 4:00 PM, Sergey Senozhatsky wrote:
> On (25/06/12 15:49), Baochen Qiang wrote:
>>> My understanding is that the driver first fails to reconfigure
>>>
>>> <4>[163874.555825] ath11k_pci 0000:01:00.0: already resetting count 2
>>> <4>[163884.606490] ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110
>>> <4>[163884.606508] ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110
>>> <3>[163884.606550] ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery
>>>
>>> so ath11k_core_reconfigure_on_crash() calls ath11k_hal_srng_deinit(),
>>> which destroys the srng lists, but leaves the stale initialized flag.
>>> So next time ath11k_hal_dump_srng_stats() is called everything looks ok,
>>> but in fact everything is not quite ok.
>>
>> OK, we have a second crash while the first crash is still in recovering. And guess the
>> first recovery fails such that srng is not reinitialized. Then after a
>> wait-for-first-recovery time out, the second recovery starts, this results in
>> ath11k_hal_dump_srng_stats() getting called and hence the kernel crash.
>>
>> Could you please share complete verbose kernel log? you may enable it with
>>
>> 	modprobe ath11k debug_mask=0xffffffff
>> 	modprobe ath11k_pci
> 
> Unfortunately I don't have a reproducer.  We just see that some of the
> consumer devices crash on resume and all we get is ramoops.  We can't
> do any debugging on consumer devices.

no worry, below log is enough.

> 
> This is all I have:
> 
> <3>[23518.302240] Last interrupt received for each group:
> <3>[23518.302243] ath11k_pci 0000:01:00.0: group_id 0 22511ms before
> <3>[23518.302246] ath11k_pci 0000:01:00.0: group_id 1 14440788ms before
> <3>[23518.302248] ath11k_pci 0000:01:00.0: group_id 2 14440788ms before
> <3>[23518.302250] ath11k_pci 0000:01:00.0: group_id 3 14440788ms before
> <3>[23518.302252] ath11k_pci 0000:01:00.0: group_id 4 14736571ms before
> <3>[23518.302253] ath11k_pci 0000:01:00.0: group_id 5 14736571ms before
> <3>[23518.302261] ath11k_pci 0000:01:00.0: group_id 6 14440789ms before
> <3>[23518.302263] ath11k_pci 0000:01:00.0: group_id 7 22541ms before
> <3>[23518.302265] ath11k_pci 0000:01:00.0: group_id 8 24724ms before
> <3>[23518.302266] ath11k_pci 0000:01:00.0: group_id 9 23315ms before
> <3>[23518.302268] ath11k_pci 0000:01:00.0: group_id 10 25238ms before
> <3>[23518.302270] ath11k_pci 0000:01:00.0: dst srng id 0 tp 5312, cur hp 5312, cached hp 5312 last hp 5312 napi processed before 22541ms
> <3>[23518.302272] ath11k_pci 0000:01:00.0: dst srng id 1 tp 27664, cur hp 27664, cached hp 27664 last hp 27664 napi processed before 24724ms
> <3>[23518.302274] ath11k_pci 0000:01:00.0: dst srng id 2 tp 12432, cur hp 12432, cached hp 12432 last hp 12432 napi processed before 23315ms
> <3>[23518.302276] ath11k_pci 0000:01:00.0: dst srng id 3 tp 1424, cur hp 1424, cached hp 1424 last hp 1424 napi processed before 25238ms
> <3>[23518.302278] ath11k_pci 0000:01:00.0: dst srng id 4 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 22512ms
> <3>[23518.302280] ath11k_pci 0000:01:00.0: src srng id 5 hp 0, reap_hp 248, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms
> <3>[23518.302282] ath11k_pci 0000:01:00.0: src srng id 8 hp 950, reap_hp 950, cur tp 950, cached tp 280 last tp 280 napi processed before 22512ms
> <3>[23518.302284] ath11k_pci 0000:01:00.0: dst srng id 9 tp 19526, cur hp 19526, cached hp 19526 last hp 19526 napi processed before 22512ms
> <3>[23518.302286] ath11k_pci 0000:01:00.0: src srng id 16 hp 3832, reap_hp 3832, cur tp 3832, cached tp 3824 last tp 3824 napi processed before 22758ms
> <3>[23518.302288] ath11k_pci 0000:01:00.0: src srng id 24 hp 0, reap_hp 248, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms
> <3>[23518.302290] ath11k_pci 0000:01:00.0: dst srng id 25 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 14440789ms
> <3>[23518.302292] ath11k_pci 0000:01:00.0: src srng id 32 hp 12, reap_hp 8, cur tp 12, cached tp 12 last tp 8 napi processed before 14736834ms
> <3>[23518.302294] ath11k_pci 0000:01:00.0: src srng id 35 hp 96, reap_hp 88, cur tp 92, cached tp 92 last tp 92 napi processed before 21573ms
> <3>[23518.302296] ath11k_pci 0000:01:00.0: src srng id 36 hp 176, reap_hp 164, cur tp 176, cached tp 168 last tp 168 napi processed before 22447ms
> <3>[23518.302298] ath11k_pci 0000:01:00.0: src srng id 39 hp 0, reap_hp 124, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms
> <3>[23518.302300] ath11k_pci 0000:01:00.0: src srng id 57 hp 54, reap_hp 54, cur tp 58, cached tp 58 last tp 58 napi processed before 22485ms
> <3>[23518.302302] ath11k_pci 0000:01:00.0: src srng id 58 hp 584, reap_hp 584, cur tp 588, cached tp 588 last tp 588 napi processed before 22429ms
> <3>[23518.302304] ath11k_pci 0000:01:00.0: src srng id 61 hp 1020, reap_hp 1020, cur tp 0, cached tp 0 last tp 0 napi processed before 14736834ms
> <3>[23518.302306] ath11k_pci 0000:01:00.0: dst srng id 81 tp 116, cur hp 116, cached hp 116 last hp 116 napi processed before 22485ms
> <3>[23518.302308] ath11k_pci 0000:01:00.0: dst srng id 82 tp 1176, cur hp 1176, cached hp 1176 last hp 1176 napi processed before 22429ms
> <3>[23518.302309] ath11k_pci 0000:01:00.0: dst srng id 85 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 14440789ms
> <3>[23518.302311] ath11k_pci 0000:01:00.0: src srng id 104 hp 65532, reap_hp 65532, cur tp 0, cached tp 0 last tp 0 napi processed before 14736836ms
> <3>[23518.302313] ath11k_pci 0000:01:00.0: src srng id 105 hp 0, reap_hp 504, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms
> <3>[23518.302315] ath11k_pci 0000:01:00.0: dst srng id 106 tp 245496, cur hp 245496, cached hp 245496 last hp 245496 napi processed before 22512ms
> <3>[23518.302317] ath11k_pci 0000:01:00.0: dst srng id 109 tp 5704, cur hp 5704, cached hp 5704 last hp 5704 napi processed before 22512ms
> <3>[23518.302319] ath11k_pci 0000:01:00.0: src srng id 128 hp 3182, reap_hp 3182, cur tp 7428, cached tp 7428 last tp 7428 napi processed before 22541ms
> <3>[23518.302321] ath11k_pci 0000:01:00.0: src srng id 129 hp 0, reap_hp 2046, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms
> <3>[23518.302323] ath11k_pci 0000:01:00.0: src srng id 132 hp 1690, reap_hp 1690, cur tp 1692, cached tp 1692 last tp 1692 napi processed before 22429ms
> <3>[23518.302324] ath11k_pci 0000:01:00.0: dst srng id 133 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 22512ms
> <3>[23518.302326] ath11k_pci 0000:01:00.0: src srng id 144 hp 0, reap_hp 2046, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms
> <3>[23518.302328] ath11k_pci 0000:01:00.0: src srng id 147 hp 1948, reap_hp 1948, cur tp 1950, cached tp 1950 last tp 1950 napi processed before 22429ms
> <3>[23518.302330] ath11k_pci 0000:01:00.0: dst srng id 148 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 14440789ms
> <4>[23519.369310] ath11k_pci 0000:01:00.0: failed to receive control response completion, polling..
> <3>[23520.393292] ath11k_pci 0000:01:00.0: Service connect timeout
> <3>[23520.393302] ath11k_pci 0000:01:00.0: failed to connect to HTT: -110
> <3>[23520.394087] ath11k_pci 0000:01:00.0: failed to start core: -110
> <4>[23520.710478] ath11k_pci 0000:01:00.0: firmware crashed: MHI_CB_EE_RDDM
> <4>[23520.710550] ath11k_pci 0000:01:00.0: already resetting count 2
> <4>[23530.761544] ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110
> <4>[23530.761562] ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110
> <3>[23530.761595] ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery
> <6>[23561.813605] mhi mhi0: Requested to power ON
> <6>[23561.813627] mhi mhi0: Power on setup success
> <6>[23561.899318] mhi mhi0: Wait for device to enter SBL or Mission mode
> <6>[23562.530990] ath11k_pci 0000:01:00.0: chip_id 0x12 chip_family 0xb board_id 0xff soc_id 0x400c1211
> <6>[23562.531010] ath11k_pci 0000:01:00.0: fw_version 0x11088c35 fw_build_timestamp 2024-04-17 08:34 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
> <3>[23562.575723] ath11k_pci 0000:01:00.0: Last interrupt received for each CE:
> <3>[23562.575742] ath11k_pci 0000:01:00.0: CE_id 0 pipe_num 0 14781107ms before
> <3>[23562.575751] ath11k_pci 0000:01:00.0: CE_id 1 pipe_num 1 66758ms before
> <3>[23562.575756] ath11k_pci 0000:01:00.0: CE_id 2 pipe_num 2 66702ms before
> <3>[23562.575759] ath11k_pci 0000:01:00.0: CE_id 3 pipe_num 3 66720ms before
> <3>[23562.575763] ath11k_pci 0000:01:00.0: CE_id 5 pipe_num 5 14485062ms before
> <3>[23562.575766] ath11k_pci 0000:01:00.0: CE_id 7 pipe_num 7 14485062ms before
> <3>[23562.575770] ath11k_pci 0000:01:00.0: CE_id 8 pipe_num 8 14485062ms before
> <3>[23562.575773] ath11k_pci 0000:01:00.0:
> <3>[23562.575773] Last interrupt received for each group:
> <3>[23562.575778] ath11k_pci 0000:01:00.0: group_id 0 66785ms before
> <3>[23562.575781] ath11k_pci 0000:01:00.0: group_id 1 14485062ms before
> <3>[23562.575785] ath11k_pci 0000:01:00.0: group_id 2 14485062ms before
> <3>[23562.575788] ath11k_pci 0000:01:00.0: group_id 3 14485062ms before
> <3>[23562.575791] ath11k_pci 0000:01:00.0: group_id 4 14780845ms before
> <3>[23562.575795] ath11k_pci 0000:01:00.0: group_id 5 14780845ms before
> <3>[23562.575798] ath11k_pci 0000:01:00.0: group_id 6 14485062ms before
> <3>[23562.575801] ath11k_pci 0000:01:00.0: group_id 7 66814ms before
> <3>[23562.575805] ath11k_pci 0000:01:00.0: group_id 8 68997ms before
> <3>[23562.575808] ath11k_pci 0000:01:00.0: group_id 9 67588ms before
> <3>[23562.575812] ath11k_pci 0000:01:00.0: group_id 10 69511ms before
> <1>[23562.575828] BUG: unable to handle page fault for address: ffffa007404eb010
> <1>[23562.575833] #PF: supervisor read access in kernel mode
> <1>[23562.575837] #PF: error_code(0x0000) - not-present page
> <6>[23562.575842] PGD 100000067 P4D 100000067 PUD 10022d067 PMD 100b01067 PTE 0
> <4>[23562.575852] Oops: 0000 [#1] PREEMPT SMP NOPTI
> <4>[23562.575873] Workqueue: ath11k_qmi_driver_event ath11k_qmi_driver_event_work [ath11k]
> <4>[23562.575896] RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k]
> <4>[23562.575916] Code: 6b c0 44 89 f2 89 c1 e8 4a 14 06 00 41 be e8 25 00 00 eb 6e 42 0f b6 84 33 78 ff ff ff 89 45 d0 46 8b 7c 33 d8 4a 8b 44 33 e0 <44> 8b 20 46 8b 6c 33 e8 42 8b 04 33 48 89 45 c8 48 8b 3d 45 93 fd
> <4>[23562.575922] RSP: 0018:ffffa00759ed3c50 EFLAGS: 00010246
> <4>[23562.575926] RAX: ffffa007404eb010 RBX: ffff9eab4ea60000 RCX: 382d128991c49600
> <4>[23562.575930] RDX: 00000000ffffffea RSI: ffffa00759ed3998 RDI: ffff9eac66017488
> <4>[23562.575934] RBP: ffffa00759ed3c90 R08: ffffffffbd649d80 R09: 0000000000005ffd
> <4>[23562.575937] R10: 0000000000000004 R11: 00000000ffffdfff R12: ffff9eab4ea61c90
> <4>[23562.575941] R13: ffff9eab4ea60000 R14: 0000000000002828 R15: 0000000000000000
> <4>[23562.575945] FS: 0000000000000000(0000) GS:ffff9eac66000000(0000) knlGS:0000000000000000
> <4>[23562.575949] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[23562.575953] CR2: ffffa007404eb010 CR3: 0000000136e24000 CR4: 0000000000750ee0
> <4>[23562.575956] PKRU: 55555554
> <4>[23562.575959] Call Trace:
> <4>[23562.575965] <TASK>
> <4>[23562.575978] ? __die_body+0xae/0xb0
> <4>[23562.575987] ? page_fault_oops+0x381/0x3e0
> <4>[23562.575995] ? exc_page_fault+0x69/0xa0
> <4>[23562.576003] ? asm_exc_page_fault+0x22/0x30
> <4>[23562.576015] ? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:6cea 4)]
> <4>[23562.576034] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:6cea 4)]
> <4>[23562.576058] worker_thread+0x389/0x930
> <4>[23562.576065] kthread+0x149/0x170
> <4>[23562.576074] ? start_flush_work+0x130/0x130
> <4>[23562.576078] ? kthread_associate_blkcg+0xb0/0xb0
> <4>[23562.576084] ret_from_fork+0x3b/0x50
> <4>[23562.576090] ? kthread_associate_blkcg+0xb0/0xb0
> <4>[23562.576096] ret_from_fork_asm+0x11/0x20
> 
> 
> There are clearly two ath11k_hal_dump_srng_stats() calls, the first
> one happens before crash recovery, the second happens right after
> and presumably causes UAF, because ->initialized flag is not cleared.

So with above we can confirm our guess.

Could you refine your commit message with these details such that readers have a clear
understanding of this issue?
Re: [PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized
Posted by Sergey Senozhatsky 6 months ago
On (25/06/12 16:14), Baochen Qiang wrote:
> > <4>[23562.576034] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:6cea 4)]
> > <4>[23562.576058] worker_thread+0x389/0x930
> > <4>[23562.576065] kthread+0x149/0x170
> > <4>[23562.576074] ? start_flush_work+0x130/0x130
> > <4>[23562.576078] ? kthread_associate_blkcg+0xb0/0xb0
> > <4>[23562.576084] ret_from_fork+0x3b/0x50
> > <4>[23562.576090] ? kthread_associate_blkcg+0xb0/0xb0
> > <4>[23562.576096] ret_from_fork_asm+0x11/0x20
> > 
> > 
> > There are clearly two ath11k_hal_dump_srng_stats() calls, the first
> > one happens before crash recovery, the second happens right after
> > and presumably causes UAF, because ->initialized flag is not cleared.
> 
> So with above we can confirm our guess.
> 
> Could you refine your commit message with these details such that readers have a clear
> understanding of this issue?

Sure, I can do that.   I didn't want to throw my guesses into the commit
message, stale ->initialized flag looked like a good enough justification
for the patch.  But I can send out v3 with a more detailed commit message.
Re: [PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized
Posted by Baochen Qiang 6 months ago

On 6/12/2025 4:31 PM, Sergey Senozhatsky wrote:
> On (25/06/12 16:14), Baochen Qiang wrote:
>>> <4>[23562.576034] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:6cea 4)]
>>> <4>[23562.576058] worker_thread+0x389/0x930
>>> <4>[23562.576065] kthread+0x149/0x170
>>> <4>[23562.576074] ? start_flush_work+0x130/0x130
>>> <4>[23562.576078] ? kthread_associate_blkcg+0xb0/0xb0
>>> <4>[23562.576084] ret_from_fork+0x3b/0x50
>>> <4>[23562.576090] ? kthread_associate_blkcg+0xb0/0xb0
>>> <4>[23562.576096] ret_from_fork_asm+0x11/0x20
>>>
>>>
>>> There are clearly two ath11k_hal_dump_srng_stats() calls, the first
>>> one happens before crash recovery, the second happens right after
>>> and presumably causes UAF, because ->initialized flag is not cleared.
>>
>> So with above we can confirm our guess.
>>
>> Could you refine your commit message with these details such that readers have a clear
>> understanding of this issue?
> 
> Sure, I can do that.   I didn't want to throw my guesses into the commit
> message, stale ->initialized flag looked like a good enough justification

Yeah, it is indeed enough. But would be better to disclose any known issue caused by it.

> for the patch.  But I can send out v3 with a more detailed commit message.

Thanks.
Re: [PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized
Posted by Sergey Senozhatsky 6 months, 1 week ago
On (25/05/29 12:56), Sergey Senozhatsky wrote:
> ath11k_hal_srng_deinit() frees rdp and wrp which are used
> by srng lists.  Mark srng lists as not-initialized.  This
> makes sense, for instance, when device fails to resume
> and the driver calls ath11k_hal_srng_deinit() from
> ath11k_core_reconfigure_on_crash().
[..]

Gentle ping.