drivers/net/wireless/ath/ath11k/hal.c | 4 ++++ 1 file changed, 4 insertions(+)
ath11k_hal_srng_deinit() frees rdp and wrp which are used
by srng lists. Mark srng lists as not-initialized. This
makes sense, for instance, when device fails to resume
and the driver calls ath11k_hal_srng_deinit() from
ath11k_core_reconfigure_on_crash().
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
v2: fixed subject line and updated commit message
drivers/net/wireless/ath/ath11k/hal.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
index 8cb1505a5a0c..cab11a35f911 100644
--- a/drivers/net/wireless/ath/ath11k/hal.c
+++ b/drivers/net/wireless/ath/ath11k/hal.c
@@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
void ath11k_hal_srng_deinit(struct ath11k_base *ab)
{
struct ath11k_hal *hal = &ab->hal;
+ int i;
+
+ for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
+ ab->hal.srng_list[i].initialized = 0;
ath11k_hal_unregister_srng_key(ab);
ath11k_hal_free_cont_rdp(ab);
--
2.49.0.1204.g71687c7c1d-goog
On 5/29/2025 11:56 AM, Sergey Senozhatsky wrote:
> ath11k_hal_srng_deinit() frees rdp and wrp which are used
> by srng lists. Mark srng lists as not-initialized. This
> makes sense, for instance, when device fails to resume
> and the driver calls ath11k_hal_srng_deinit() from
> ath11k_core_reconfigure_on_crash().
Did you see any issue without your change?
>
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>
> v2: fixed subject line and updated commit message
>
> drivers/net/wireless/ath/ath11k/hal.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
> index 8cb1505a5a0c..cab11a35f911 100644
> --- a/drivers/net/wireless/ath/ath11k/hal.c
> +++ b/drivers/net/wireless/ath/ath11k/hal.c
> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
> void ath11k_hal_srng_deinit(struct ath11k_base *ab)
> {
> struct ath11k_hal *hal = &ab->hal;
> + int i;
> +
> + for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
> + ab->hal.srng_list[i].initialized = 0;
With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().
>
> ath11k_hal_unregister_srng_key(ab);
> ath11k_hal_free_cont_rdp(ab);
On (25/06/12 11:30), Baochen Qiang wrote:
> On 5/29/2025 11:56 AM, Sergey Senozhatsky wrote:
> > ath11k_hal_srng_deinit() frees rdp and wrp which are used
> > by srng lists. Mark srng lists as not-initialized. This
> > makes sense, for instance, when device fails to resume
> > and the driver calls ath11k_hal_srng_deinit() from
> > ath11k_core_reconfigure_on_crash().
>
> Did you see any issue without your change?
We do see some issues, yes, on LTS kernels.
[..]
> > diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
> > index 8cb1505a5a0c..cab11a35f911 100644
> > --- a/drivers/net/wireless/ath/ath11k/hal.c
> > +++ b/drivers/net/wireless/ath/ath11k/hal.c
> > @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
> > void ath11k_hal_srng_deinit(struct ath11k_base *ab)
> > {
> > struct ath11k_hal *hal = &ab->hal;
> > + int i;
> > +
> > + for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
> > + ab->hal.srng_list[i].initialized = 0;
>
> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().
I think un-initialized lists should not be dumped.
ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are
accessed, as far as I understand it, in ath11k_hal_dump_srng_stats()
as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably,
causing things like:
<1>[173154.396775] BUG: unable to handle page fault for address: ffffb4e4c046f010
<1>[173154.396778] #PF: supervisor read access in kernel mode
<1>[173154.396781] #PF: error_code(0x0000) - not-present page
<4>[173154.396824] RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k]
<4>[173154.396839] Code: 88 c0 44 89 f2 89 c1 e8 3a 14 06 00 41 be e8 25 00 00 eb 6e 42 0f b6 84 33 78 ff ff ff 89 45 d0 46 8b 7c 33 d8 4a 8b 44 33 e0 <44> 8b 20 46 8b 6c 33 e8 42 8b 04 33 48 89 45 c8 48 8b 3d 45 a3 a0
<4>[173154.396842] RSP: 0018:ffffb4e4dceefc50 EFLAGS: 00010246
<4>[173154.396846] RAX: ffffb4e4c046f010 RBX: ffff90d1c3040000 RCX: a0009634a5d28c00
<4>[173154.396849] RDX: ffffffffb0279d80 RSI: ffffffffb0279d80 RDI: ffff90d2e5d17488
<4>[173154.396851] RBP: ffffb4e4dceefc90 R08: ffffffffb0249d80 R09: 0000000000003b82
<4>[173154.396854] R10: 0000000000000004 R11: 00000000ffffffea R12: ffff90d1c3041c90
<4>[173154.396856] R13: ffff90d1c3040000 R14: 0000000000002828 R15: 0000000000000000
<4>[173154.396859] FS: 0000000000000000(0000) GS:ffff90d2e5d00000(0000) knlGS:0000000000000000
<4>[173154.396862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[173154.396865] CR2: ffffb4e4c046f010 CR3: 000000005ca24000 CR4: 0000000000750ee0
<4>[173154.396868] PKRU: 55555554
<4>[173154.396870] Call Trace:
<4>[173154.396874] <TASK>
<4>[173154.396883] ? __die_body+0xae/0xb0
<4>[173154.396890] ? page_fault_oops+0x381/0x3e0
<4>[173154.396896] ? exc_page_fault+0x69/0xa0
<4>[173154.396901] ? asm_exc_page_fault+0x22/0x30
<4>[173154.396908] ? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:3de7 4)]
<4>[173154.396923] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:3de7 4)]
<4>[173154.396942] worker_thread+0x390/0x960
<4>[173154.396949] kthread+0x149/0x170
[+ kernel mm list]
On 6/12/2025 1:04 PM, Sergey Senozhatsky wrote:
> On (25/06/12 11:30), Baochen Qiang wrote:
>> On 5/29/2025 11:56 AM, Sergey Senozhatsky wrote:
>>> ath11k_hal_srng_deinit() frees rdp and wrp which are used
>>> by srng lists. Mark srng lists as not-initialized. This
>>> makes sense, for instance, when device fails to resume
>>> and the driver calls ath11k_hal_srng_deinit() from
>>> ath11k_core_reconfigure_on_crash().
>>
>> Did you see any issue without your change?
>
> We do see some issues, yes, on LTS kernels.
>
> [..]
>>> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
>>> index 8cb1505a5a0c..cab11a35f911 100644
>>> --- a/drivers/net/wireless/ath/ath11k/hal.c
>>> +++ b/drivers/net/wireless/ath/ath11k/hal.c
>>> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
>>> void ath11k_hal_srng_deinit(struct ath11k_base *ab)
>>> {
>>> struct ath11k_hal *hal = &ab->hal;
>>> + int i;
>>> +
>>> + for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
>>> + ab->hal.srng_list[i].initialized = 0;
>>
>> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().
>
> I think un-initialized lists should not be dumped.
>
> ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are
> accessed, as far as I understand it, in ath11k_hal_dump_srng_stats()
> as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably,
> causing things like:
>
> <1>[173154.396775] BUG: unable to handle page fault for address: ffffb4e4c046f010
> <1>[173154.396778] #PF: supervisor read access in kernel mode
> <1>[173154.396781] #PF: error_code(0x0000) - not-present page
I am confused here: if the root cause is driver trying to read a freed memory, it should
not result in a PF issue. Because even if freed, the page is there and still mapped in
kernel page table.
Andrew, any insights?
> <4>[173154.396824] RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k]
> <4>[173154.396839] Code: 88 c0 44 89 f2 89 c1 e8 3a 14 06 00 41 be e8 25 00 00 eb 6e 42 0f b6 84 33 78 ff ff ff 89 45 d0 46 8b 7c 33 d8 4a 8b 44 33 e0 <44> 8b 20 46 8b 6c 33 e8 42 8b 04 33 48 89 45 c8 48 8b 3d 45 a3 a0
> <4>[173154.396842] RSP: 0018:ffffb4e4dceefc50 EFLAGS: 00010246
> <4>[173154.396846] RAX: ffffb4e4c046f010 RBX: ffff90d1c3040000 RCX: a0009634a5d28c00
> <4>[173154.396849] RDX: ffffffffb0279d80 RSI: ffffffffb0279d80 RDI: ffff90d2e5d17488
> <4>[173154.396851] RBP: ffffb4e4dceefc90 R08: ffffffffb0249d80 R09: 0000000000003b82
> <4>[173154.396854] R10: 0000000000000004 R11: 00000000ffffffea R12: ffff90d1c3041c90
> <4>[173154.396856] R13: ffff90d1c3040000 R14: 0000000000002828 R15: 0000000000000000
> <4>[173154.396859] FS: 0000000000000000(0000) GS:ffff90d2e5d00000(0000) knlGS:0000000000000000
> <4>[173154.396862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[173154.396865] CR2: ffffb4e4c046f010 CR3: 000000005ca24000 CR4: 0000000000750ee0
> <4>[173154.396868] PKRU: 55555554
> <4>[173154.396870] Call Trace:
> <4>[173154.396874] <TASK>
> <4>[173154.396883] ? __die_body+0xae/0xb0
> <4>[173154.396890] ? page_fault_oops+0x381/0x3e0
> <4>[173154.396896] ? exc_page_fault+0x69/0xa0
> <4>[173154.396901] ? asm_exc_page_fault+0x22/0x30
> <4>[173154.396908] ? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:3de7 4)]
> <4>[173154.396923] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:3de7 4)]
> <4>[173154.396942] worker_thread+0x390/0x960
> <4>[173154.396949] kthread+0x149/0x170
On Thu, Jun 12, 2025 at 04:01:49PM +0800, Baochen Qiang wrote:
> [+ kernel mm list]
>
> On 6/12/2025 1:04 PM, Sergey Senozhatsky wrote:
> > On (25/06/12 11:30), Baochen Qiang wrote:
> >> On 5/29/2025 11:56 AM, Sergey Senozhatsky wrote:
> >>> ath11k_hal_srng_deinit() frees rdp and wrp which are used
> >>> by srng lists. Mark srng lists as not-initialized. This
> >>> makes sense, for instance, when device fails to resume
> >>> and the driver calls ath11k_hal_srng_deinit() from
> >>> ath11k_core_reconfigure_on_crash().
> >>
> >> Did you see any issue without your change?
> >
> > We do see some issues, yes, on LTS kernels.
> >
> > [..]
> >>> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
> >>> index 8cb1505a5a0c..cab11a35f911 100644
> >>> --- a/drivers/net/wireless/ath/ath11k/hal.c
> >>> +++ b/drivers/net/wireless/ath/ath11k/hal.c
> >>> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
> >>> void ath11k_hal_srng_deinit(struct ath11k_base *ab)
> >>> {
> >>> struct ath11k_hal *hal = &ab->hal;
> >>> + int i;
> >>> +
> >>> + for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
> >>> + ab->hal.srng_list[i].initialized = 0;
> >>
> >> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().
> >
> > I think un-initialized lists should not be dumped.
> >
> > ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are
> > accessed, as far as I understand it, in ath11k_hal_dump_srng_stats()
> > as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably,
> > causing things like:
> >
> > <1>[173154.396775] BUG: unable to handle page fault for address: ffffb4e4c046f010
> > <1>[173154.396778] #PF: supervisor read access in kernel mode
> > <1>[173154.396781] #PF: error_code(0x0000) - not-present page
>
> I am confused here: if the root cause is driver trying to read a freed memory, it should
> not result in a PF issue. Because even if freed, the page is there and still mapped in
> kernel page table.
>
Any memory that is virtually-mapped (read: vmalloc, vmap, vm_map_ram, and others)
will be unmapped on its subsequent free. I'm not familiar with the DMA subsystem,
but the address ffffb4e4c046f010 is vmalloc-like.
>
> > <4>[173154.396824] RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k]
> > <4>[173154.396839] Code: 88 c0 44 89 f2 89 c1 e8 3a 14 06 00 41 be e8 25 00 00 eb 6e 42 0f b6 84 33 78 ff ff ff 89 45 d0 46 8b 7c 33 d8 4a 8b 44 33 e0 <44> 8b 20 46 8b 6c 33 e8 42 8b 04 33 48 89 45 c8 48 8b 3d 45 a3 a0
> > <4>[173154.396842] RSP: 0018:ffffb4e4dceefc50 EFLAGS: 00010246
> > <4>[173154.396846] RAX: ffffb4e4c046f010 RBX: ffff90d1c3040000 RCX: a0009634a5d28c00
> > <4>[173154.396849] RDX: ffffffffb0279d80 RSI: ffffffffb0279d80 RDI: ffff90d2e5d17488
> > <4>[173154.396851] RBP: ffffb4e4dceefc90 R08: ffffffffb0249d80 R09: 0000000000003b82
> > <4>[173154.396854] R10: 0000000000000004 R11: 00000000ffffffea R12: ffff90d1c3041c90
> > <4>[173154.396856] R13: ffff90d1c3040000 R14: 0000000000002828 R15: 0000000000000000
> > <4>[173154.396859] FS: 0000000000000000(0000) GS:ffff90d2e5d00000(0000) knlGS:0000000000000000
> > <4>[173154.396862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4>[173154.396865] CR2: ffffb4e4c046f010 CR3: 000000005ca24000 CR4: 0000000000750ee0
> > <4>[173154.396868] PKRU: 55555554
> > <4>[173154.396870] Call Trace:
> > <4>[173154.396874] <TASK>
> > <4>[173154.396883] ? __die_body+0xae/0xb0
> > <4>[173154.396890] ? page_fault_oops+0x381/0x3e0
> > <4>[173154.396896] ? exc_page_fault+0x69/0xa0
> > <4>[173154.396901] ? asm_exc_page_fault+0x22/0x30
> > <4>[173154.396908] ? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:3de7 4)]
> > <4>[173154.396923] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:3de7 4)]
> > <4>[173154.396942] worker_thread+0x390/0x960
> > <4>[173154.396949] kthread+0x149/0x170
>
--
Pedro
On 6/12/2025 6:01 PM, Pedro Falcato wrote:
> On Thu, Jun 12, 2025 at 04:01:49PM +0800, Baochen Qiang wrote:
>> [+ kernel mm list]
>>
>> On 6/12/2025 1:04 PM, Sergey Senozhatsky wrote:
>>> On (25/06/12 11:30), Baochen Qiang wrote:
>>>> On 5/29/2025 11:56 AM, Sergey Senozhatsky wrote:
>>>>> ath11k_hal_srng_deinit() frees rdp and wrp which are used
>>>>> by srng lists. Mark srng lists as not-initialized. This
>>>>> makes sense, for instance, when device fails to resume
>>>>> and the driver calls ath11k_hal_srng_deinit() from
>>>>> ath11k_core_reconfigure_on_crash().
>>>>
>>>> Did you see any issue without your change?
>>>
>>> We do see some issues, yes, on LTS kernels.
>>>
>>> [..]
>>>>> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
>>>>> index 8cb1505a5a0c..cab11a35f911 100644
>>>>> --- a/drivers/net/wireless/ath/ath11k/hal.c
>>>>> +++ b/drivers/net/wireless/ath/ath11k/hal.c
>>>>> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
>>>>> void ath11k_hal_srng_deinit(struct ath11k_base *ab)
>>>>> {
>>>>> struct ath11k_hal *hal = &ab->hal;
>>>>> + int i;
>>>>> +
>>>>> + for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
>>>>> + ab->hal.srng_list[i].initialized = 0;
>>>>
>>>> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().
>>>
>>> I think un-initialized lists should not be dumped.
>>>
>>> ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are
>>> accessed, as far as I understand it, in ath11k_hal_dump_srng_stats()
>>> as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably,
>>> causing things like:
>>>
>>> <1>[173154.396775] BUG: unable to handle page fault for address: ffffb4e4c046f010
>>> <1>[173154.396778] #PF: supervisor read access in kernel mode
>>> <1>[173154.396781] #PF: error_code(0x0000) - not-present page
>>
>> I am confused here: if the root cause is driver trying to read a freed memory, it should
>> not result in a PF issue. Because even if freed, the page is there and still mapped in
>> kernel page table.
>>
>
> Any memory that is virtually-mapped (read: vmalloc, vmap, vm_map_ram, and others)
> will be unmapped on its subsequent free. I'm not familiar with the DMA subsystem,
> but the address ffffb4e4c046f010 is vmalloc-like.
OK, I forget the vmalloc case. And indeed in case IOMMU present, the DMA subsystem is
preferring vmalloc'ed memory.
Thank you, Pedro!
>
>>
>>> <4>[173154.396824] RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k]
>>> <4>[173154.396839] Code: 88 c0 44 89 f2 89 c1 e8 3a 14 06 00 41 be e8 25 00 00 eb 6e 42 0f b6 84 33 78 ff ff ff 89 45 d0 46 8b 7c 33 d8 4a 8b 44 33 e0 <44> 8b 20 46 8b 6c 33 e8 42 8b 04 33 48 89 45 c8 48 8b 3d 45 a3 a0
>>> <4>[173154.396842] RSP: 0018:ffffb4e4dceefc50 EFLAGS: 00010246
>>> <4>[173154.396846] RAX: ffffb4e4c046f010 RBX: ffff90d1c3040000 RCX: a0009634a5d28c00
>>> <4>[173154.396849] RDX: ffffffffb0279d80 RSI: ffffffffb0279d80 RDI: ffff90d2e5d17488
>>> <4>[173154.396851] RBP: ffffb4e4dceefc90 R08: ffffffffb0249d80 R09: 0000000000003b82
>>> <4>[173154.396854] R10: 0000000000000004 R11: 00000000ffffffea R12: ffff90d1c3041c90
>>> <4>[173154.396856] R13: ffff90d1c3040000 R14: 0000000000002828 R15: 0000000000000000
>>> <4>[173154.396859] FS: 0000000000000000(0000) GS:ffff90d2e5d00000(0000) knlGS:0000000000000000
>>> <4>[173154.396862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> <4>[173154.396865] CR2: ffffb4e4c046f010 CR3: 000000005ca24000 CR4: 0000000000750ee0
>>> <4>[173154.396868] PKRU: 55555554
>>> <4>[173154.396870] Call Trace:
>>> <4>[173154.396874] <TASK>
>>> <4>[173154.396883] ? __die_body+0xae/0xb0
>>> <4>[173154.396890] ? page_fault_oops+0x381/0x3e0
>>> <4>[173154.396896] ? exc_page_fault+0x69/0xa0
>>> <4>[173154.396901] ? asm_exc_page_fault+0x22/0x30
>>> <4>[173154.396908] ? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:3de7 4)]
>>> <4>[173154.396923] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:3de7 4)]
>>> <4>[173154.396942] worker_thread+0x390/0x960
>>> <4>[173154.396949] kthread+0x149/0x170
>>
>
On 6/12/2025 1:04 PM, Sergey Senozhatsky wrote:
> On (25/06/12 11:30), Baochen Qiang wrote:
>> On 5/29/2025 11:56 AM, Sergey Senozhatsky wrote:
>>> ath11k_hal_srng_deinit() frees rdp and wrp which are used
>>> by srng lists. Mark srng lists as not-initialized. This
>>> makes sense, for instance, when device fails to resume
>>> and the driver calls ath11k_hal_srng_deinit() from
>>> ath11k_core_reconfigure_on_crash().
>>
>> Did you see any issue without your change?
>
> We do see some issues, yes, on LTS kernels.
>
> [..]
>>> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
>>> index 8cb1505a5a0c..cab11a35f911 100644
>>> --- a/drivers/net/wireless/ath/ath11k/hal.c
>>> +++ b/drivers/net/wireless/ath/ath11k/hal.c
>>> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
>>> void ath11k_hal_srng_deinit(struct ath11k_base *ab)
>>> {
>>> struct ath11k_hal *hal = &ab->hal;
>>> + int i;
>>> +
>>> + for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
>>> + ab->hal.srng_list[i].initialized = 0;
>>
>> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().
>
> I think un-initialized lists should not be dumped.
>
> ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are
> accessed, as far as I understand it, in ath11k_hal_dump_srng_stats()
> as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably,
> causing things like:
But ath11k_hal_dump_srng_stats() is called before ath11k_hal_srng_deinit(), right?
The sequence is ath11k_hal_dump_srng_stats() is called in reset process, then restart_work
is queued and in ath11k_core_restart() we call ath11k_core_reconfigure_on_crash(), there
ath11k_hal_srng_deinit() is called, right?
>
> <1>[173154.396775] BUG: unable to handle page fault for address: ffffb4e4c046f010
> <1>[173154.396778] #PF: supervisor read access in kernel mode
> <1>[173154.396781] #PF: error_code(0x0000) - not-present page
> <4>[173154.396824] RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k]
> <4>[173154.396839] Code: 88 c0 44 89 f2 89 c1 e8 3a 14 06 00 41 be e8 25 00 00 eb 6e 42 0f b6 84 33 78 ff ff ff 89 45 d0 46 8b 7c 33 d8 4a 8b 44 33 e0 <44> 8b 20 46 8b 6c 33 e8 42 8b 04 33 48 89 45 c8 48 8b 3d 45 a3 a0
> <4>[173154.396842] RSP: 0018:ffffb4e4dceefc50 EFLAGS: 00010246
> <4>[173154.396846] RAX: ffffb4e4c046f010 RBX: ffff90d1c3040000 RCX: a0009634a5d28c00
> <4>[173154.396849] RDX: ffffffffb0279d80 RSI: ffffffffb0279d80 RDI: ffff90d2e5d17488
> <4>[173154.396851] RBP: ffffb4e4dceefc90 R08: ffffffffb0249d80 R09: 0000000000003b82
> <4>[173154.396854] R10: 0000000000000004 R11: 00000000ffffffea R12: ffff90d1c3041c90
> <4>[173154.396856] R13: ffff90d1c3040000 R14: 0000000000002828 R15: 0000000000000000
> <4>[173154.396859] FS: 0000000000000000(0000) GS:ffff90d2e5d00000(0000) knlGS:0000000000000000
> <4>[173154.396862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[173154.396865] CR2: ffffb4e4c046f010 CR3: 000000005ca24000 CR4: 0000000000750ee0
> <4>[173154.396868] PKRU: 55555554
> <4>[173154.396870] Call Trace:
> <4>[173154.396874] <TASK>
> <4>[173154.396883] ? __die_body+0xae/0xb0
> <4>[173154.396890] ? page_fault_oops+0x381/0x3e0
> <4>[173154.396896] ? exc_page_fault+0x69/0xa0
> <4>[173154.396901] ? asm_exc_page_fault+0x22/0x30
> <4>[173154.396908] ? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:3de7 4)]
> <4>[173154.396923] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:3de7 4)]
> <4>[173154.396942] worker_thread+0x390/0x960
> <4>[173154.396949] kthread+0x149/0x170
On (25/06/12 13:47), Baochen Qiang wrote:
> > [..]
> >>> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
> >>> index 8cb1505a5a0c..cab11a35f911 100644
> >>> --- a/drivers/net/wireless/ath/ath11k/hal.c
> >>> +++ b/drivers/net/wireless/ath/ath11k/hal.c
> >>> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
> >>> void ath11k_hal_srng_deinit(struct ath11k_base *ab)
> >>> {
> >>> struct ath11k_hal *hal = &ab->hal;
> >>> + int i;
> >>> +
> >>> + for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
> >>> + ab->hal.srng_list[i].initialized = 0;
> >>
> >> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().
> >
> > I think un-initialized lists should not be dumped.
> >
> > ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are
> > accessed, as far as I understand it, in ath11k_hal_dump_srng_stats()
> > as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably,
> > causing things like:
>
> But ath11k_hal_dump_srng_stats() is called before ath11k_hal_srng_deinit(), right?
>
> The sequence is ath11k_hal_dump_srng_stats() is called in reset process, then restart_work
> is queued and in ath11k_core_restart() we call ath11k_core_reconfigure_on_crash(), there
> ath11k_hal_srng_deinit() is called, right?
My understanding is that the driver first fails to reconfigure
<4>[163874.555825] ath11k_pci 0000:01:00.0: already resetting count 2
<4>[163884.606490] ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110
<4>[163884.606508] ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110
<3>[163884.606550] ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery
so ath11k_core_reconfigure_on_crash() calls ath11k_hal_srng_deinit(),
which destroys the srng lists, but leaves the stale initialized flag.
So next time ath11k_hal_dump_srng_stats() is called everything looks ok,
but in fact everything is not quite ok.
Regardless of that, I do think that resetting the initialized flag
when srng list is de-initialized/destroyed is the right thing to do.
On 6/12/2025 3:02 PM, Sergey Senozhatsky wrote:
> On (25/06/12 13:47), Baochen Qiang wrote:
>>> [..]
>>>>> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
>>>>> index 8cb1505a5a0c..cab11a35f911 100644
>>>>> --- a/drivers/net/wireless/ath/ath11k/hal.c
>>>>> +++ b/drivers/net/wireless/ath/ath11k/hal.c
>>>>> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
>>>>> void ath11k_hal_srng_deinit(struct ath11k_base *ab)
>>>>> {
>>>>> struct ath11k_hal *hal = &ab->hal;
>>>>> + int i;
>>>>> +
>>>>> + for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
>>>>> + ab->hal.srng_list[i].initialized = 0;
>>>>
>>>> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().
>>>
>>> I think un-initialized lists should not be dumped.
>>>
>>> ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are
>>> accessed, as far as I understand it, in ath11k_hal_dump_srng_stats()
>>> as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably,
>>> causing things like:
>>
>> But ath11k_hal_dump_srng_stats() is called before ath11k_hal_srng_deinit(), right?
>>
>> The sequence is ath11k_hal_dump_srng_stats() is called in reset process, then restart_work
>> is queued and in ath11k_core_restart() we call ath11k_core_reconfigure_on_crash(), there
>> ath11k_hal_srng_deinit() is called, right?
>
> My understanding is that the driver first fails to reconfigure
>
> <4>[163874.555825] ath11k_pci 0000:01:00.0: already resetting count 2
> <4>[163884.606490] ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110
> <4>[163884.606508] ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110
> <3>[163884.606550] ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery
>
> so ath11k_core_reconfigure_on_crash() calls ath11k_hal_srng_deinit(),
> which destroys the srng lists, but leaves the stale initialized flag.
> So next time ath11k_hal_dump_srng_stats() is called everything looks ok,
> but in fact everything is not quite ok.
OK, we have a second crash while the first crash is still in recovering. And guess the
first recovery fails such that srng is not reinitialized. Then after a
wait-for-first-recovery time out, the second recovery starts, this results in
ath11k_hal_dump_srng_stats() getting called and hence the kernel crash.
Could you please share complete verbose kernel log? you may enable it with
modprobe ath11k debug_mask=0xffffffff
modprobe ath11k_pci
>
> Regardless of that, I do think that resetting the initialized flag
> when srng list is de-initialized/destroyed is the right thing to do.
Yeah, correct.
On (25/06/12 15:49), Baochen Qiang wrote: > > My understanding is that the driver first fails to reconfigure > > > > <4>[163874.555825] ath11k_pci 0000:01:00.0: already resetting count 2 > > <4>[163884.606490] ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110 > > <4>[163884.606508] ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110 > > <3>[163884.606550] ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery > > > > so ath11k_core_reconfigure_on_crash() calls ath11k_hal_srng_deinit(), > > which destroys the srng lists, but leaves the stale initialized flag. > > So next time ath11k_hal_dump_srng_stats() is called everything looks ok, > > but in fact everything is not quite ok. > > OK, we have a second crash while the first crash is still in recovering. And guess the > first recovery fails such that srng is not reinitialized. Then after a > wait-for-first-recovery time out, the second recovery starts, this results in > ath11k_hal_dump_srng_stats() getting called and hence the kernel crash. > > Could you please share complete verbose kernel log? you may enable it with > > modprobe ath11k debug_mask=0xffffffff > modprobe ath11k_pci Unfortunately I don't have a reproducer. We just see that some of the consumer devices crash on resume and all we get is ramoops. We can't do any debugging on consumer devices. This is all I have: <3>[23518.302240] Last interrupt received for each group: <3>[23518.302243] ath11k_pci 0000:01:00.0: group_id 0 22511ms before <3>[23518.302246] ath11k_pci 0000:01:00.0: group_id 1 14440788ms before <3>[23518.302248] ath11k_pci 0000:01:00.0: group_id 2 14440788ms before <3>[23518.302250] ath11k_pci 0000:01:00.0: group_id 3 14440788ms before <3>[23518.302252] ath11k_pci 0000:01:00.0: group_id 4 14736571ms before <3>[23518.302253] ath11k_pci 0000:01:00.0: group_id 5 14736571ms before <3>[23518.302261] ath11k_pci 0000:01:00.0: group_id 6 14440789ms before <3>[23518.302263] ath11k_pci 0000:01:00.0: group_id 7 22541ms before <3>[23518.302265] ath11k_pci 0000:01:00.0: group_id 8 24724ms before <3>[23518.302266] ath11k_pci 0000:01:00.0: group_id 9 23315ms before <3>[23518.302268] ath11k_pci 0000:01:00.0: group_id 10 25238ms before <3>[23518.302270] ath11k_pci 0000:01:00.0: dst srng id 0 tp 5312, cur hp 5312, cached hp 5312 last hp 5312 napi processed before 22541ms <3>[23518.302272] ath11k_pci 0000:01:00.0: dst srng id 1 tp 27664, cur hp 27664, cached hp 27664 last hp 27664 napi processed before 24724ms <3>[23518.302274] ath11k_pci 0000:01:00.0: dst srng id 2 tp 12432, cur hp 12432, cached hp 12432 last hp 12432 napi processed before 23315ms <3>[23518.302276] ath11k_pci 0000:01:00.0: dst srng id 3 tp 1424, cur hp 1424, cached hp 1424 last hp 1424 napi processed before 25238ms <3>[23518.302278] ath11k_pci 0000:01:00.0: dst srng id 4 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 22512ms <3>[23518.302280] ath11k_pci 0000:01:00.0: src srng id 5 hp 0, reap_hp 248, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms <3>[23518.302282] ath11k_pci 0000:01:00.0: src srng id 8 hp 950, reap_hp 950, cur tp 950, cached tp 280 last tp 280 napi processed before 22512ms <3>[23518.302284] ath11k_pci 0000:01:00.0: dst srng id 9 tp 19526, cur hp 19526, cached hp 19526 last hp 19526 napi processed before 22512ms <3>[23518.302286] ath11k_pci 0000:01:00.0: src srng id 16 hp 3832, reap_hp 3832, cur tp 3832, cached tp 3824 last tp 3824 napi processed before 22758ms <3>[23518.302288] ath11k_pci 0000:01:00.0: src srng id 24 hp 0, reap_hp 248, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms <3>[23518.302290] ath11k_pci 0000:01:00.0: dst srng id 25 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 14440789ms <3>[23518.302292] ath11k_pci 0000:01:00.0: src srng id 32 hp 12, reap_hp 8, cur tp 12, cached tp 12 last tp 8 napi processed before 14736834ms <3>[23518.302294] ath11k_pci 0000:01:00.0: src srng id 35 hp 96, reap_hp 88, cur tp 92, cached tp 92 last tp 92 napi processed before 21573ms <3>[23518.302296] ath11k_pci 0000:01:00.0: src srng id 36 hp 176, reap_hp 164, cur tp 176, cached tp 168 last tp 168 napi processed before 22447ms <3>[23518.302298] ath11k_pci 0000:01:00.0: src srng id 39 hp 0, reap_hp 124, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms <3>[23518.302300] ath11k_pci 0000:01:00.0: src srng id 57 hp 54, reap_hp 54, cur tp 58, cached tp 58 last tp 58 napi processed before 22485ms <3>[23518.302302] ath11k_pci 0000:01:00.0: src srng id 58 hp 584, reap_hp 584, cur tp 588, cached tp 588 last tp 588 napi processed before 22429ms <3>[23518.302304] ath11k_pci 0000:01:00.0: src srng id 61 hp 1020, reap_hp 1020, cur tp 0, cached tp 0 last tp 0 napi processed before 14736834ms <3>[23518.302306] ath11k_pci 0000:01:00.0: dst srng id 81 tp 116, cur hp 116, cached hp 116 last hp 116 napi processed before 22485ms <3>[23518.302308] ath11k_pci 0000:01:00.0: dst srng id 82 tp 1176, cur hp 1176, cached hp 1176 last hp 1176 napi processed before 22429ms <3>[23518.302309] ath11k_pci 0000:01:00.0: dst srng id 85 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 14440789ms <3>[23518.302311] ath11k_pci 0000:01:00.0: src srng id 104 hp 65532, reap_hp 65532, cur tp 0, cached tp 0 last tp 0 napi processed before 14736836ms <3>[23518.302313] ath11k_pci 0000:01:00.0: src srng id 105 hp 0, reap_hp 504, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms <3>[23518.302315] ath11k_pci 0000:01:00.0: dst srng id 106 tp 245496, cur hp 245496, cached hp 245496 last hp 245496 napi processed before 22512ms <3>[23518.302317] ath11k_pci 0000:01:00.0: dst srng id 109 tp 5704, cur hp 5704, cached hp 5704 last hp 5704 napi processed before 22512ms <3>[23518.302319] ath11k_pci 0000:01:00.0: src srng id 128 hp 3182, reap_hp 3182, cur tp 7428, cached tp 7428 last tp 7428 napi processed before 22541ms <3>[23518.302321] ath11k_pci 0000:01:00.0: src srng id 129 hp 0, reap_hp 2046, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms <3>[23518.302323] ath11k_pci 0000:01:00.0: src srng id 132 hp 1690, reap_hp 1690, cur tp 1692, cached tp 1692 last tp 1692 napi processed before 22429ms <3>[23518.302324] ath11k_pci 0000:01:00.0: dst srng id 133 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 22512ms <3>[23518.302326] ath11k_pci 0000:01:00.0: src srng id 144 hp 0, reap_hp 2046, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms <3>[23518.302328] ath11k_pci 0000:01:00.0: src srng id 147 hp 1948, reap_hp 1948, cur tp 1950, cached tp 1950 last tp 1950 napi processed before 22429ms <3>[23518.302330] ath11k_pci 0000:01:00.0: dst srng id 148 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 14440789ms <4>[23519.369310] ath11k_pci 0000:01:00.0: failed to receive control response completion, polling.. <3>[23520.393292] ath11k_pci 0000:01:00.0: Service connect timeout <3>[23520.393302] ath11k_pci 0000:01:00.0: failed to connect to HTT: -110 <3>[23520.394087] ath11k_pci 0000:01:00.0: failed to start core: -110 <4>[23520.710478] ath11k_pci 0000:01:00.0: firmware crashed: MHI_CB_EE_RDDM <4>[23520.710550] ath11k_pci 0000:01:00.0: already resetting count 2 <4>[23530.761544] ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110 <4>[23530.761562] ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110 <3>[23530.761595] ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery <6>[23561.813605] mhi mhi0: Requested to power ON <6>[23561.813627] mhi mhi0: Power on setup success <6>[23561.899318] mhi mhi0: Wait for device to enter SBL or Mission mode <6>[23562.530990] ath11k_pci 0000:01:00.0: chip_id 0x12 chip_family 0xb board_id 0xff soc_id 0x400c1211 <6>[23562.531010] ath11k_pci 0000:01:00.0: fw_version 0x11088c35 fw_build_timestamp 2024-04-17 08:34 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41 <3>[23562.575723] ath11k_pci 0000:01:00.0: Last interrupt received for each CE: <3>[23562.575742] ath11k_pci 0000:01:00.0: CE_id 0 pipe_num 0 14781107ms before <3>[23562.575751] ath11k_pci 0000:01:00.0: CE_id 1 pipe_num 1 66758ms before <3>[23562.575756] ath11k_pci 0000:01:00.0: CE_id 2 pipe_num 2 66702ms before <3>[23562.575759] ath11k_pci 0000:01:00.0: CE_id 3 pipe_num 3 66720ms before <3>[23562.575763] ath11k_pci 0000:01:00.0: CE_id 5 pipe_num 5 14485062ms before <3>[23562.575766] ath11k_pci 0000:01:00.0: CE_id 7 pipe_num 7 14485062ms before <3>[23562.575770] ath11k_pci 0000:01:00.0: CE_id 8 pipe_num 8 14485062ms before <3>[23562.575773] ath11k_pci 0000:01:00.0: <3>[23562.575773] Last interrupt received for each group: <3>[23562.575778] ath11k_pci 0000:01:00.0: group_id 0 66785ms before <3>[23562.575781] ath11k_pci 0000:01:00.0: group_id 1 14485062ms before <3>[23562.575785] ath11k_pci 0000:01:00.0: group_id 2 14485062ms before <3>[23562.575788] ath11k_pci 0000:01:00.0: group_id 3 14485062ms before <3>[23562.575791] ath11k_pci 0000:01:00.0: group_id 4 14780845ms before <3>[23562.575795] ath11k_pci 0000:01:00.0: group_id 5 14780845ms before <3>[23562.575798] ath11k_pci 0000:01:00.0: group_id 6 14485062ms before <3>[23562.575801] ath11k_pci 0000:01:00.0: group_id 7 66814ms before <3>[23562.575805] ath11k_pci 0000:01:00.0: group_id 8 68997ms before <3>[23562.575808] ath11k_pci 0000:01:00.0: group_id 9 67588ms before <3>[23562.575812] ath11k_pci 0000:01:00.0: group_id 10 69511ms before <1>[23562.575828] BUG: unable to handle page fault for address: ffffa007404eb010 <1>[23562.575833] #PF: supervisor read access in kernel mode <1>[23562.575837] #PF: error_code(0x0000) - not-present page <6>[23562.575842] PGD 100000067 P4D 100000067 PUD 10022d067 PMD 100b01067 PTE 0 <4>[23562.575852] Oops: 0000 [#1] PREEMPT SMP NOPTI <4>[23562.575873] Workqueue: ath11k_qmi_driver_event ath11k_qmi_driver_event_work [ath11k] <4>[23562.575896] RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k] <4>[23562.575916] Code: 6b c0 44 89 f2 89 c1 e8 4a 14 06 00 41 be e8 25 00 00 eb 6e 42 0f b6 84 33 78 ff ff ff 89 45 d0 46 8b 7c 33 d8 4a 8b 44 33 e0 <44> 8b 20 46 8b 6c 33 e8 42 8b 04 33 48 89 45 c8 48 8b 3d 45 93 fd <4>[23562.575922] RSP: 0018:ffffa00759ed3c50 EFLAGS: 00010246 <4>[23562.575926] RAX: ffffa007404eb010 RBX: ffff9eab4ea60000 RCX: 382d128991c49600 <4>[23562.575930] RDX: 00000000ffffffea RSI: ffffa00759ed3998 RDI: ffff9eac66017488 <4>[23562.575934] RBP: ffffa00759ed3c90 R08: ffffffffbd649d80 R09: 0000000000005ffd <4>[23562.575937] R10: 0000000000000004 R11: 00000000ffffdfff R12: ffff9eab4ea61c90 <4>[23562.575941] R13: ffff9eab4ea60000 R14: 0000000000002828 R15: 0000000000000000 <4>[23562.575945] FS: 0000000000000000(0000) GS:ffff9eac66000000(0000) knlGS:0000000000000000 <4>[23562.575949] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[23562.575953] CR2: ffffa007404eb010 CR3: 0000000136e24000 CR4: 0000000000750ee0 <4>[23562.575956] PKRU: 55555554 <4>[23562.575959] Call Trace: <4>[23562.575965] <TASK> <4>[23562.575978] ? __die_body+0xae/0xb0 <4>[23562.575987] ? page_fault_oops+0x381/0x3e0 <4>[23562.575995] ? exc_page_fault+0x69/0xa0 <4>[23562.576003] ? asm_exc_page_fault+0x22/0x30 <4>[23562.576015] ? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:6cea 4)] <4>[23562.576034] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:6cea 4)] <4>[23562.576058] worker_thread+0x389/0x930 <4>[23562.576065] kthread+0x149/0x170 <4>[23562.576074] ? start_flush_work+0x130/0x130 <4>[23562.576078] ? kthread_associate_blkcg+0xb0/0xb0 <4>[23562.576084] ret_from_fork+0x3b/0x50 <4>[23562.576090] ? kthread_associate_blkcg+0xb0/0xb0 <4>[23562.576096] ret_from_fork_asm+0x11/0x20 There are clearly two ath11k_hal_dump_srng_stats() calls, the first one happens before crash recovery, the second happens right after and presumably causes UAF, because ->initialized flag is not cleared.
On 6/12/2025 4:00 PM, Sergey Senozhatsky wrote: > On (25/06/12 15:49), Baochen Qiang wrote: >>> My understanding is that the driver first fails to reconfigure >>> >>> <4>[163874.555825] ath11k_pci 0000:01:00.0: already resetting count 2 >>> <4>[163884.606490] ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110 >>> <4>[163884.606508] ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110 >>> <3>[163884.606550] ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery >>> >>> so ath11k_core_reconfigure_on_crash() calls ath11k_hal_srng_deinit(), >>> which destroys the srng lists, but leaves the stale initialized flag. >>> So next time ath11k_hal_dump_srng_stats() is called everything looks ok, >>> but in fact everything is not quite ok. >> >> OK, we have a second crash while the first crash is still in recovering. And guess the >> first recovery fails such that srng is not reinitialized. Then after a >> wait-for-first-recovery time out, the second recovery starts, this results in >> ath11k_hal_dump_srng_stats() getting called and hence the kernel crash. >> >> Could you please share complete verbose kernel log? you may enable it with >> >> modprobe ath11k debug_mask=0xffffffff >> modprobe ath11k_pci > > Unfortunately I don't have a reproducer. We just see that some of the > consumer devices crash on resume and all we get is ramoops. We can't > do any debugging on consumer devices. no worry, below log is enough. > > This is all I have: > > <3>[23518.302240] Last interrupt received for each group: > <3>[23518.302243] ath11k_pci 0000:01:00.0: group_id 0 22511ms before > <3>[23518.302246] ath11k_pci 0000:01:00.0: group_id 1 14440788ms before > <3>[23518.302248] ath11k_pci 0000:01:00.0: group_id 2 14440788ms before > <3>[23518.302250] ath11k_pci 0000:01:00.0: group_id 3 14440788ms before > <3>[23518.302252] ath11k_pci 0000:01:00.0: group_id 4 14736571ms before > <3>[23518.302253] ath11k_pci 0000:01:00.0: group_id 5 14736571ms before > <3>[23518.302261] ath11k_pci 0000:01:00.0: group_id 6 14440789ms before > <3>[23518.302263] ath11k_pci 0000:01:00.0: group_id 7 22541ms before > <3>[23518.302265] ath11k_pci 0000:01:00.0: group_id 8 24724ms before > <3>[23518.302266] ath11k_pci 0000:01:00.0: group_id 9 23315ms before > <3>[23518.302268] ath11k_pci 0000:01:00.0: group_id 10 25238ms before > <3>[23518.302270] ath11k_pci 0000:01:00.0: dst srng id 0 tp 5312, cur hp 5312, cached hp 5312 last hp 5312 napi processed before 22541ms > <3>[23518.302272] ath11k_pci 0000:01:00.0: dst srng id 1 tp 27664, cur hp 27664, cached hp 27664 last hp 27664 napi processed before 24724ms > <3>[23518.302274] ath11k_pci 0000:01:00.0: dst srng id 2 tp 12432, cur hp 12432, cached hp 12432 last hp 12432 napi processed before 23315ms > <3>[23518.302276] ath11k_pci 0000:01:00.0: dst srng id 3 tp 1424, cur hp 1424, cached hp 1424 last hp 1424 napi processed before 25238ms > <3>[23518.302278] ath11k_pci 0000:01:00.0: dst srng id 4 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 22512ms > <3>[23518.302280] ath11k_pci 0000:01:00.0: src srng id 5 hp 0, reap_hp 248, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms > <3>[23518.302282] ath11k_pci 0000:01:00.0: src srng id 8 hp 950, reap_hp 950, cur tp 950, cached tp 280 last tp 280 napi processed before 22512ms > <3>[23518.302284] ath11k_pci 0000:01:00.0: dst srng id 9 tp 19526, cur hp 19526, cached hp 19526 last hp 19526 napi processed before 22512ms > <3>[23518.302286] ath11k_pci 0000:01:00.0: src srng id 16 hp 3832, reap_hp 3832, cur tp 3832, cached tp 3824 last tp 3824 napi processed before 22758ms > <3>[23518.302288] ath11k_pci 0000:01:00.0: src srng id 24 hp 0, reap_hp 248, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms > <3>[23518.302290] ath11k_pci 0000:01:00.0: dst srng id 25 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 14440789ms > <3>[23518.302292] ath11k_pci 0000:01:00.0: src srng id 32 hp 12, reap_hp 8, cur tp 12, cached tp 12 last tp 8 napi processed before 14736834ms > <3>[23518.302294] ath11k_pci 0000:01:00.0: src srng id 35 hp 96, reap_hp 88, cur tp 92, cached tp 92 last tp 92 napi processed before 21573ms > <3>[23518.302296] ath11k_pci 0000:01:00.0: src srng id 36 hp 176, reap_hp 164, cur tp 176, cached tp 168 last tp 168 napi processed before 22447ms > <3>[23518.302298] ath11k_pci 0000:01:00.0: src srng id 39 hp 0, reap_hp 124, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms > <3>[23518.302300] ath11k_pci 0000:01:00.0: src srng id 57 hp 54, reap_hp 54, cur tp 58, cached tp 58 last tp 58 napi processed before 22485ms > <3>[23518.302302] ath11k_pci 0000:01:00.0: src srng id 58 hp 584, reap_hp 584, cur tp 588, cached tp 588 last tp 588 napi processed before 22429ms > <3>[23518.302304] ath11k_pci 0000:01:00.0: src srng id 61 hp 1020, reap_hp 1020, cur tp 0, cached tp 0 last tp 0 napi processed before 14736834ms > <3>[23518.302306] ath11k_pci 0000:01:00.0: dst srng id 81 tp 116, cur hp 116, cached hp 116 last hp 116 napi processed before 22485ms > <3>[23518.302308] ath11k_pci 0000:01:00.0: dst srng id 82 tp 1176, cur hp 1176, cached hp 1176 last hp 1176 napi processed before 22429ms > <3>[23518.302309] ath11k_pci 0000:01:00.0: dst srng id 85 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 14440789ms > <3>[23518.302311] ath11k_pci 0000:01:00.0: src srng id 104 hp 65532, reap_hp 65532, cur tp 0, cached tp 0 last tp 0 napi processed before 14736836ms > <3>[23518.302313] ath11k_pci 0000:01:00.0: src srng id 105 hp 0, reap_hp 504, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms > <3>[23518.302315] ath11k_pci 0000:01:00.0: dst srng id 106 tp 245496, cur hp 245496, cached hp 245496 last hp 245496 napi processed before 22512ms > <3>[23518.302317] ath11k_pci 0000:01:00.0: dst srng id 109 tp 5704, cur hp 5704, cached hp 5704 last hp 5704 napi processed before 22512ms > <3>[23518.302319] ath11k_pci 0000:01:00.0: src srng id 128 hp 3182, reap_hp 3182, cur tp 7428, cached tp 7428 last tp 7428 napi processed before 22541ms > <3>[23518.302321] ath11k_pci 0000:01:00.0: src srng id 129 hp 0, reap_hp 2046, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms > <3>[23518.302323] ath11k_pci 0000:01:00.0: src srng id 132 hp 1690, reap_hp 1690, cur tp 1692, cached tp 1692 last tp 1692 napi processed before 22429ms > <3>[23518.302324] ath11k_pci 0000:01:00.0: dst srng id 133 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 22512ms > <3>[23518.302326] ath11k_pci 0000:01:00.0: src srng id 144 hp 0, reap_hp 2046, cur tp 0, cached tp 0 last tp 0 napi processed before 14440789ms > <3>[23518.302328] ath11k_pci 0000:01:00.0: src srng id 147 hp 1948, reap_hp 1948, cur tp 1950, cached tp 1950 last tp 1950 napi processed before 22429ms > <3>[23518.302330] ath11k_pci 0000:01:00.0: dst srng id 148 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 14440789ms > <4>[23519.369310] ath11k_pci 0000:01:00.0: failed to receive control response completion, polling.. > <3>[23520.393292] ath11k_pci 0000:01:00.0: Service connect timeout > <3>[23520.393302] ath11k_pci 0000:01:00.0: failed to connect to HTT: -110 > <3>[23520.394087] ath11k_pci 0000:01:00.0: failed to start core: -110 > <4>[23520.710478] ath11k_pci 0000:01:00.0: firmware crashed: MHI_CB_EE_RDDM > <4>[23520.710550] ath11k_pci 0000:01:00.0: already resetting count 2 > <4>[23530.761544] ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110 > <4>[23530.761562] ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110 > <3>[23530.761595] ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery > <6>[23561.813605] mhi mhi0: Requested to power ON > <6>[23561.813627] mhi mhi0: Power on setup success > <6>[23561.899318] mhi mhi0: Wait for device to enter SBL or Mission mode > <6>[23562.530990] ath11k_pci 0000:01:00.0: chip_id 0x12 chip_family 0xb board_id 0xff soc_id 0x400c1211 > <6>[23562.531010] ath11k_pci 0000:01:00.0: fw_version 0x11088c35 fw_build_timestamp 2024-04-17 08:34 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41 > <3>[23562.575723] ath11k_pci 0000:01:00.0: Last interrupt received for each CE: > <3>[23562.575742] ath11k_pci 0000:01:00.0: CE_id 0 pipe_num 0 14781107ms before > <3>[23562.575751] ath11k_pci 0000:01:00.0: CE_id 1 pipe_num 1 66758ms before > <3>[23562.575756] ath11k_pci 0000:01:00.0: CE_id 2 pipe_num 2 66702ms before > <3>[23562.575759] ath11k_pci 0000:01:00.0: CE_id 3 pipe_num 3 66720ms before > <3>[23562.575763] ath11k_pci 0000:01:00.0: CE_id 5 pipe_num 5 14485062ms before > <3>[23562.575766] ath11k_pci 0000:01:00.0: CE_id 7 pipe_num 7 14485062ms before > <3>[23562.575770] ath11k_pci 0000:01:00.0: CE_id 8 pipe_num 8 14485062ms before > <3>[23562.575773] ath11k_pci 0000:01:00.0: > <3>[23562.575773] Last interrupt received for each group: > <3>[23562.575778] ath11k_pci 0000:01:00.0: group_id 0 66785ms before > <3>[23562.575781] ath11k_pci 0000:01:00.0: group_id 1 14485062ms before > <3>[23562.575785] ath11k_pci 0000:01:00.0: group_id 2 14485062ms before > <3>[23562.575788] ath11k_pci 0000:01:00.0: group_id 3 14485062ms before > <3>[23562.575791] ath11k_pci 0000:01:00.0: group_id 4 14780845ms before > <3>[23562.575795] ath11k_pci 0000:01:00.0: group_id 5 14780845ms before > <3>[23562.575798] ath11k_pci 0000:01:00.0: group_id 6 14485062ms before > <3>[23562.575801] ath11k_pci 0000:01:00.0: group_id 7 66814ms before > <3>[23562.575805] ath11k_pci 0000:01:00.0: group_id 8 68997ms before > <3>[23562.575808] ath11k_pci 0000:01:00.0: group_id 9 67588ms before > <3>[23562.575812] ath11k_pci 0000:01:00.0: group_id 10 69511ms before > <1>[23562.575828] BUG: unable to handle page fault for address: ffffa007404eb010 > <1>[23562.575833] #PF: supervisor read access in kernel mode > <1>[23562.575837] #PF: error_code(0x0000) - not-present page > <6>[23562.575842] PGD 100000067 P4D 100000067 PUD 10022d067 PMD 100b01067 PTE 0 > <4>[23562.575852] Oops: 0000 [#1] PREEMPT SMP NOPTI > <4>[23562.575873] Workqueue: ath11k_qmi_driver_event ath11k_qmi_driver_event_work [ath11k] > <4>[23562.575896] RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k] > <4>[23562.575916] Code: 6b c0 44 89 f2 89 c1 e8 4a 14 06 00 41 be e8 25 00 00 eb 6e 42 0f b6 84 33 78 ff ff ff 89 45 d0 46 8b 7c 33 d8 4a 8b 44 33 e0 <44> 8b 20 46 8b 6c 33 e8 42 8b 04 33 48 89 45 c8 48 8b 3d 45 93 fd > <4>[23562.575922] RSP: 0018:ffffa00759ed3c50 EFLAGS: 00010246 > <4>[23562.575926] RAX: ffffa007404eb010 RBX: ffff9eab4ea60000 RCX: 382d128991c49600 > <4>[23562.575930] RDX: 00000000ffffffea RSI: ffffa00759ed3998 RDI: ffff9eac66017488 > <4>[23562.575934] RBP: ffffa00759ed3c90 R08: ffffffffbd649d80 R09: 0000000000005ffd > <4>[23562.575937] R10: 0000000000000004 R11: 00000000ffffdfff R12: ffff9eab4ea61c90 > <4>[23562.575941] R13: ffff9eab4ea60000 R14: 0000000000002828 R15: 0000000000000000 > <4>[23562.575945] FS: 0000000000000000(0000) GS:ffff9eac66000000(0000) knlGS:0000000000000000 > <4>[23562.575949] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > <4>[23562.575953] CR2: ffffa007404eb010 CR3: 0000000136e24000 CR4: 0000000000750ee0 > <4>[23562.575956] PKRU: 55555554 > <4>[23562.575959] Call Trace: > <4>[23562.575965] <TASK> > <4>[23562.575978] ? __die_body+0xae/0xb0 > <4>[23562.575987] ? page_fault_oops+0x381/0x3e0 > <4>[23562.575995] ? exc_page_fault+0x69/0xa0 > <4>[23562.576003] ? asm_exc_page_fault+0x22/0x30 > <4>[23562.576015] ? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:6cea 4)] > <4>[23562.576034] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:6cea 4)] > <4>[23562.576058] worker_thread+0x389/0x930 > <4>[23562.576065] kthread+0x149/0x170 > <4>[23562.576074] ? start_flush_work+0x130/0x130 > <4>[23562.576078] ? kthread_associate_blkcg+0xb0/0xb0 > <4>[23562.576084] ret_from_fork+0x3b/0x50 > <4>[23562.576090] ? kthread_associate_blkcg+0xb0/0xb0 > <4>[23562.576096] ret_from_fork_asm+0x11/0x20 > > > There are clearly two ath11k_hal_dump_srng_stats() calls, the first > one happens before crash recovery, the second happens right after > and presumably causes UAF, because ->initialized flag is not cleared. So with above we can confirm our guess. Could you refine your commit message with these details such that readers have a clear understanding of this issue?
On (25/06/12 16:14), Baochen Qiang wrote: > > <4>[23562.576034] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:6cea 4)] > > <4>[23562.576058] worker_thread+0x389/0x930 > > <4>[23562.576065] kthread+0x149/0x170 > > <4>[23562.576074] ? start_flush_work+0x130/0x130 > > <4>[23562.576078] ? kthread_associate_blkcg+0xb0/0xb0 > > <4>[23562.576084] ret_from_fork+0x3b/0x50 > > <4>[23562.576090] ? kthread_associate_blkcg+0xb0/0xb0 > > <4>[23562.576096] ret_from_fork_asm+0x11/0x20 > > > > > > There are clearly two ath11k_hal_dump_srng_stats() calls, the first > > one happens before crash recovery, the second happens right after > > and presumably causes UAF, because ->initialized flag is not cleared. > > So with above we can confirm our guess. > > Could you refine your commit message with these details such that readers have a clear > understanding of this issue? Sure, I can do that. I didn't want to throw my guesses into the commit message, stale ->initialized flag looked like a good enough justification for the patch. But I can send out v3 with a more detailed commit message.
On 6/12/2025 4:31 PM, Sergey Senozhatsky wrote: > On (25/06/12 16:14), Baochen Qiang wrote: >>> <4>[23562.576034] ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:6cea 4)] >>> <4>[23562.576058] worker_thread+0x389/0x930 >>> <4>[23562.576065] kthread+0x149/0x170 >>> <4>[23562.576074] ? start_flush_work+0x130/0x130 >>> <4>[23562.576078] ? kthread_associate_blkcg+0xb0/0xb0 >>> <4>[23562.576084] ret_from_fork+0x3b/0x50 >>> <4>[23562.576090] ? kthread_associate_blkcg+0xb0/0xb0 >>> <4>[23562.576096] ret_from_fork_asm+0x11/0x20 >>> >>> >>> There are clearly two ath11k_hal_dump_srng_stats() calls, the first >>> one happens before crash recovery, the second happens right after >>> and presumably causes UAF, because ->initialized flag is not cleared. >> >> So with above we can confirm our guess. >> >> Could you refine your commit message with these details such that readers have a clear >> understanding of this issue? > > Sure, I can do that. I didn't want to throw my guesses into the commit > message, stale ->initialized flag looked like a good enough justification Yeah, it is indeed enough. But would be better to disclose any known issue caused by it. > for the patch. But I can send out v3 with a more detailed commit message. Thanks.
On (25/05/29 12:56), Sergey Senozhatsky wrote: > ath11k_hal_srng_deinit() frees rdp and wrp which are used > by srng lists. Mark srng lists as not-initialized. This > makes sense, for instance, when device fails to resume > and the driver calls ath11k_hal_srng_deinit() from > ath11k_core_reconfigure_on_crash(). [..] Gentle ping.
© 2016 - 2025 Red Hat, Inc.