Don't deinitialize and reinitialize the HAL helpers. The dma memory is
deallocated and there is high possibility that we'll not be able to get
the same memory allocated from dma when there is high memory pressure.
Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
---
drivers/net/wireless/ath/ath11k/core.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
index 4488e4cdc5e9e..bc4930fe6a367 100644
--- a/drivers/net/wireless/ath/ath11k/core.c
+++ b/drivers/net/wireless/ath/ath11k/core.c
@@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
mutex_unlock(&ab->core_lock);
ath11k_dp_free(ab);
- ath11k_hal_srng_deinit(ab);
ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
- ret = ath11k_hal_srng_init(ab);
- if (ret)
- return ret;
-
clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags);
ret = ath11k_core_qmi_firmware_ready(ab);
--
2.39.5
On 6/30/2025 3:43 PM, Muhammad Usama Anjum wrote: > Don't deinitialize and reinitialize the HAL helpers. The dma memory is > deallocated and there is high possibility that we'll not be able to get > the same memory allocated from dma when there is high memory pressure. > > Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 > > Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") > Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> > --- > drivers/net/wireless/ath/ath11k/core.c | 5 ----- > 1 file changed, 5 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c > index 4488e4cdc5e9e..bc4930fe6a367 100644 > --- a/drivers/net/wireless/ath/ath11k/core.c > +++ b/drivers/net/wireless/ath/ath11k/core.c > @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab) > mutex_unlock(&ab->core_lock); > > ath11k_dp_free(ab); > - ath11k_hal_srng_deinit(ab); > > ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1; > > - ret = ath11k_hal_srng_init(ab); > - if (ret) > - return ret; > - while I agree there is no need of a dealloc/realloc, we can not simply remove calling the _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g. avail_blk_resource, current_blk_index and num_shadow_reg_configured etc.) inside the _init() needs to be kept as the later operation needs a clean state of them. > clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags); > > ret = ath11k_core_qmi_firmware_ready(ab); the _deinit() is still getting called in case ath11k_core_qmi_firmware_ready() fails, making it a little odd since there is no _init() anymore with this change, though this is the way of current logic (I mean the hal is currently deinit in the error path). Thinking it more, if we hit the error path, seems the only way is to remove ath11k module. In that case the _deinit() would be called again in ath11k_pci_remove(), leading to issues (at least I see a double free of hal->srng_config). But this is another topic which can be fixed in a separate patch.
Thank you for reviewing. On 7/2/25 8:50 AM, Baochen Qiang wrote: > > > On 6/30/2025 3:43 PM, Muhammad Usama Anjum wrote: >> Don't deinitialize and reinitialize the HAL helpers. The dma memory is >> deallocated and there is high possibility that we'll not be able to get >> the same memory allocated from dma when there is high memory pressure. >> >> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 >> >> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") >> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >> --- >> drivers/net/wireless/ath/ath11k/core.c | 5 ----- >> 1 file changed, 5 deletions(-) >> >> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c >> index 4488e4cdc5e9e..bc4930fe6a367 100644 >> --- a/drivers/net/wireless/ath/ath11k/core.c >> +++ b/drivers/net/wireless/ath/ath11k/core.c >> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab) >> mutex_unlock(&ab->core_lock); >> >> ath11k_dp_free(ab); >> - ath11k_hal_srng_deinit(ab); >> >> ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1; >> >> - ret = ath11k_hal_srng_init(ab); >> - if (ret) >> - return ret; >> - > > while I agree there is no need of a dealloc/realloc, we can not simply remove calling the > _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g. Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up in resume handler? So when device wakes up, its state is already correct. I'm not sure why it worked every time when I tested it on my device. > avail_blk_resource, current_blk_index and num_shadow_reg_configured etc.) inside the > _init() needs to be kept as the later operation needs a clean state of them. So should we just memset these 3? > >> clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags); >> >> ret = ath11k_core_qmi_firmware_ready(ab); > > the _deinit() is still getting called in case ath11k_core_qmi_firmware_ready() fails, > making it a little odd since there is no _init() anymore with this change, though this is > the way of current logic (I mean the hal is currently deinit in the error path). > > Thinking it more, if we hit the error path, seems the only way is to remove ath11k module. > In that case the _deinit() would be called again in ath11k_pci_remove(), leading to issues > (at least I see a double free of hal->srng_config). But this is another topic which can be > fixed in a separate patch. I don't think this is the problem as HAL is already initialized when before the system has suspended. So by removing deinit() and init() pairs, the HAL still remains initialized. Or maybe I've missed something?
On 7/3/2025 12:12 AM, Muhammad Usama Anjum wrote: > Thank you for reviewing. > > On 7/2/25 8:50 AM, Baochen Qiang wrote: >> >> >> On 6/30/2025 3:43 PM, Muhammad Usama Anjum wrote: >>> Don't deinitialize and reinitialize the HAL helpers. The dma memory is >>> deallocated and there is high possibility that we'll not be able to get >>> the same memory allocated from dma when there is high memory pressure. >>> >>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 >>> >>> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") >>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >>> --- >>> drivers/net/wireless/ath/ath11k/core.c | 5 ----- >>> 1 file changed, 5 deletions(-) >>> >>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c >>> index 4488e4cdc5e9e..bc4930fe6a367 100644 >>> --- a/drivers/net/wireless/ath/ath11k/core.c >>> +++ b/drivers/net/wireless/ath/ath11k/core.c >>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab) >>> mutex_unlock(&ab->core_lock); >>> >>> ath11k_dp_free(ab); >>> - ath11k_hal_srng_deinit(ab); >>> >>> ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1; >>> >>> - ret = ath11k_hal_srng_init(ab); >>> - if (ret) >>> - return ret; >>> - >> >> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the >> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g. > Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up > in resume handler? So when device wakes up, its state is already correct. > Hmm... not quite understand your question. Can you elaborate? > I'm not sure why it worked every time when I tested it on my device. > >> avail_blk_resource, current_blk_index and num_shadow_reg_configured etc.) inside the >> _init() needs to be kept as the later operation needs a clean state of them. > So should we just memset these 3? more than them I think. We need to take care of all entries in hal since current code is memset them all. > > >> >>> clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags); >>> >>> ret = ath11k_core_qmi_firmware_ready(ab); >> >> the _deinit() is still getting called in case ath11k_core_qmi_firmware_ready() fails, >> making it a little odd since there is no _init() anymore with this change, though this is >> the way of current logic (I mean the hal is currently deinit in the error path). >> >> Thinking it more, if we hit the error path, seems the only way is to remove ath11k module. >> In that case the _deinit() would be called again in ath11k_pci_remove(), leading to issues >> (at least I see a double free of hal->srng_config). But this is another topic which can be >> fixed in a separate patch. > > I don't think this is the problem as HAL is already initialized when before the system has > suspended. So by removing deinit() and init() pairs, the HAL still remains initialized. Or > maybe I've missed something? Yeah, it is OK in normal path. However in error path we face issues. > > > >
On 7/3/25 6:59 AM, Baochen Qiang wrote: > > > On 7/3/2025 12:12 AM, Muhammad Usama Anjum wrote: >> Thank you for reviewing. >> >> On 7/2/25 8:50 AM, Baochen Qiang wrote: >>> >>> >>> On 6/30/2025 3:43 PM, Muhammad Usama Anjum wrote: >>>> Don't deinitialize and reinitialize the HAL helpers. The dma memory is >>>> deallocated and there is high possibility that we'll not be able to get >>>> the same memory allocated from dma when there is high memory pressure. >>>> >>>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 >>>> >>>> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") >>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >>>> --- >>>> drivers/net/wireless/ath/ath11k/core.c | 5 ----- >>>> 1 file changed, 5 deletions(-) >>>> >>>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c >>>> index 4488e4cdc5e9e..bc4930fe6a367 100644 >>>> --- a/drivers/net/wireless/ath/ath11k/core.c >>>> +++ b/drivers/net/wireless/ath/ath11k/core.c >>>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab) >>>> mutex_unlock(&ab->core_lock); >>>> >>>> ath11k_dp_free(ab); >>>> - ath11k_hal_srng_deinit(ab); >>>> >>>> ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1; >>>> >>>> - ret = ath11k_hal_srng_init(ab); >>>> - if (ret) >>>> - return ret; >>>> - >>> >>> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the >>> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g. >> Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up >> in resume handler? So when device wakes up, its state is already correct. >> > > Hmm... not quite understand your question. Can you elaborate? I'm trying to understand the possibility of cleanup of hal in suspend handler. For example: * The driver has been loaded and has been working fine. * The user called suspend. So all devices would be suspended. * In suspend handler of the ath11k, we should do the necessary cleanups of the states like hal. * When the device would resume after long time, the hal would have the correct state already. So we'll not need to deinit and init again. > >> I'm not sure why it worked every time when I tested it on my device. >> >>> avail_blk_resource, current_blk_index and num_shadow_reg_configured etc.) inside the >>> _init() needs to be kept as the later operation needs a clean state of them. >> So should we just memset these 3? > > more than them I think. We need to take care of all entries in hal since current code is > memset them all. I see. So memset the whole ath11k hal structure other than the config. > >> >> >>> >>>> clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags); >>>> >>>> ret = ath11k_core_qmi_firmware_ready(ab); >>> >>> the _deinit() is still getting called in case ath11k_core_qmi_firmware_ready() fails, >>> making it a little odd since there is no _init() anymore with this change, though this is >>> the way of current logic (I mean the hal is currently deinit in the error path). >>> >>> Thinking it more, if we hit the error path, seems the only way is to remove ath11k module. >>> In that case the _deinit() would be called again in ath11k_pci_remove(), leading to issues >>> (at least I see a double free of hal->srng_config). But this is another topic which can be >>> fixed in a separate patch. >> >> I don't think this is the problem as HAL is already initialized when before the system has >> suspended. So by removing deinit() and init() pairs, the HAL still remains initialized. Or >> maybe I've missed something? > > Yeah, it is OK in normal path. However in error path we face issues. For example: * When driver was initialized the first time, the hal was init. * Then system was suspended and hal doesn't get deinit. * At system resume, the hal is already init. We can memset some status variables. But its initialized already from the first time. (considering this patch that deinit/init have been removed) * So at this stage if some error occurs and we can call the deinit in the error paths.
On 7/7/2025 4:19 PM, Muhammad Usama Anjum wrote: > On 7/3/25 6:59 AM, Baochen Qiang wrote: >> >> >> On 7/3/2025 12:12 AM, Muhammad Usama Anjum wrote: >>> Thank you for reviewing. >>> >>> On 7/2/25 8:50 AM, Baochen Qiang wrote: >>>> >>>> >>>> On 6/30/2025 3:43 PM, Muhammad Usama Anjum wrote: >>>>> Don't deinitialize and reinitialize the HAL helpers. The dma memory is >>>>> deallocated and there is high possibility that we'll not be able to get >>>>> the same memory allocated from dma when there is high memory pressure. >>>>> >>>>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 >>>>> >>>>> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") >>>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >>>>> --- >>>>> drivers/net/wireless/ath/ath11k/core.c | 5 ----- >>>>> 1 file changed, 5 deletions(-) >>>>> >>>>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c >>>>> index 4488e4cdc5e9e..bc4930fe6a367 100644 >>>>> --- a/drivers/net/wireless/ath/ath11k/core.c >>>>> +++ b/drivers/net/wireless/ath/ath11k/core.c >>>>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab) >>>>> mutex_unlock(&ab->core_lock); >>>>> >>>>> ath11k_dp_free(ab); >>>>> - ath11k_hal_srng_deinit(ab); >>>>> >>>>> ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1; >>>>> >>>>> - ret = ath11k_hal_srng_init(ab); >>>>> - if (ret) >>>>> - return ret; >>>>> - >>>> >>>> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the >>>> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g. >>> Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up >>> in resume handler? So when device wakes up, its state is already correct. >>> >> >> Hmm... not quite understand your question. Can you elaborate? > > I'm trying to understand the possibility of cleanup of hal in suspend handler. For example: > * The driver has been loaded and has been working fine. > * The user called suspend. So all devices would be suspended. > * In suspend handler of the ath11k, we should do the necessary cleanups of the states > like hal. > * When the device would resume after long time, the hal would have the correct state > already. So we'll not need to deinit and init again. The hal cleanup is not only needed by suspend/resume, but also a step of reset/recover process. So If we are moving the cleanup to suspend handler, similar stuff needs to be done for reset/recover as well. > >> >>> I'm not sure why it worked every time when I tested it on my device. >>> >>>> avail_blk_resource, current_blk_index and num_shadow_reg_configured etc.) inside the >>>> _init() needs to be kept as the later operation needs a clean state of them. >>> So should we just memset these 3? >> >> more than them I think. We need to take care of all entries in hal since current code is >> memset them all. > I see. So memset the whole ath11k hal structure other than the config. > >> >>> >>> >>>> >>>>> clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags); >>>>> >>>>> ret = ath11k_core_qmi_firmware_ready(ab); >>>> >>>> the _deinit() is still getting called in case ath11k_core_qmi_firmware_ready() fails, >>>> making it a little odd since there is no _init() anymore with this change, though this is >>>> the way of current logic (I mean the hal is currently deinit in the error path). >>>> >>>> Thinking it more, if we hit the error path, seems the only way is to remove ath11k module. >>>> In that case the _deinit() would be called again in ath11k_pci_remove(), leading to issues >>>> (at least I see a double free of hal->srng_config). But this is another topic which can be >>>> fixed in a separate patch. >>> >>> I don't think this is the problem as HAL is already initialized when before the system has >>> suspended. So by removing deinit() and init() pairs, the HAL still remains initialized. Or >>> maybe I've missed something? >> >> Yeah, it is OK in normal path. However in error path we face issues. > For example: > * When driver was initialized the first time, the hal was init. > * Then system was suspended and hal doesn't get deinit. > * At system resume, the hal is already init. We can memset some status variables. But its > initialized already from the first time. (considering this patch that deinit/init have > been removed) > * So at this stage if some error occurs and we can call the deinit in the error paths. > >
On 7/7/25 2:00 PM, Baochen Qiang wrote: > > > On 7/7/2025 4:19 PM, Muhammad Usama Anjum wrote: >> On 7/3/25 6:59 AM, Baochen Qiang wrote: >>> >>> >>> On 7/3/2025 12:12 AM, Muhammad Usama Anjum wrote: >>>> Thank you for reviewing. >>>> >>>> On 7/2/25 8:50 AM, Baochen Qiang wrote: >>>>> >>>>> >>>>> On 6/30/2025 3:43 PM, Muhammad Usama Anjum wrote: >>>>>> Don't deinitialize and reinitialize the HAL helpers. The dma memory is >>>>>> deallocated and there is high possibility that we'll not be able to get >>>>>> the same memory allocated from dma when there is high memory pressure. >>>>>> >>>>>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 >>>>>> >>>>>> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") >>>>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >>>>>> --- >>>>>> drivers/net/wireless/ath/ath11k/core.c | 5 ----- >>>>>> 1 file changed, 5 deletions(-) >>>>>> >>>>>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c >>>>>> index 4488e4cdc5e9e..bc4930fe6a367 100644 >>>>>> --- a/drivers/net/wireless/ath/ath11k/core.c >>>>>> +++ b/drivers/net/wireless/ath/ath11k/core.c >>>>>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab) >>>>>> mutex_unlock(&ab->core_lock); >>>>>> >>>>>> ath11k_dp_free(ab); >>>>>> - ath11k_hal_srng_deinit(ab); >>>>>> >>>>>> ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1; >>>>>> >>>>>> - ret = ath11k_hal_srng_init(ab); >>>>>> - if (ret) >>>>>> - return ret; >>>>>> - >>>>> >>>>> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the >>>>> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g. >>>> Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up >>>> in resume handler? So when device wakes up, its state is already correct. >>>> >>> >>> Hmm... not quite understand your question. Can you elaborate? >> >> I'm trying to understand the possibility of cleanup of hal in suspend handler. For example: >> * The driver has been loaded and has been working fine. >> * The user called suspend. So all devices would be suspended. >> * In suspend handler of the ath11k, we should do the necessary cleanups of the states >> like hal. >> * When the device would resume after long time, the hal would have the correct state >> already. So we'll not need to deinit and init again. > > The hal cleanup is not only needed by suspend/resume, but also a step of reset/recover > process. So If we are moving the cleanup to suspend handler, similar stuff needs to be > done for reset/recover as well. It makes sense. So clearing the hal structure completely other than ab->hal.srn_config doesn't seem right. I've also tested it and it crashes the whole system. On contrary, with only the current patch applied, there is no abnormality. num_shadow_reg_configured and avail_blk_resource are non-zero. If I make them 0, driver still keeps on working. ab->hal.num_shadow_reg_configured = 0; ab->hal.avail_blk_resource = 0; ab->hal.current_blk_index = 0; As you have suggested setting these 3 to zero, is there any other variable in hal structure which should be set to zero? > >> >>> >>>> I'm not sure why it worked every time when I tested it on my device. >>>> >>>>> avail_blk_resource, current_blk_index and num_shadow_reg_configured etc.) inside the >>>>> _init() needs to be kept as the later operation needs a clean state of them. >>>> So should we just memset these 3? >>> >>> more than them I think. We need to take care of all entries in hal since current code is >>> memset them all. >> I see. So memset the whole ath11k hal structure other than the config. >> >>> >>>> >>>> >>>>> >>>>>> clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags); >>>>>> >>>>>> ret = ath11k_core_qmi_firmware_ready(ab); >>>>> >>>>> the _deinit() is still getting called in case ath11k_core_qmi_firmware_ready() fails, >>>>> making it a little odd since there is no _init() anymore with this change, though this is >>>>> the way of current logic (I mean the hal is currently deinit in the error path). >>>>> >>>>> Thinking it more, if we hit the error path, seems the only way is to remove ath11k module. >>>>> In that case the _deinit() would be called again in ath11k_pci_remove(), leading to issues >>>>> (at least I see a double free of hal->srng_config). But this is another topic which can be >>>>> fixed in a separate patch. >>>> >>>> I don't think this is the problem as HAL is already initialized when before the system has >>>> suspended. So by removing deinit() and init() pairs, the HAL still remains initialized. Or >>>> maybe I've missed something? >>> >>> Yeah, it is OK in normal path. However in error path we face issues. >> For example: >> * When driver was initialized the first time, the hal was init. >> * Then system was suspended and hal doesn't get deinit. >> * At system resume, the hal is already init. We can memset some status variables. But its >> initialized already from the first time. (considering this patch that deinit/init have >> been removed) >> * So at this stage if some error occurs and we can call the deinit in the error paths. >> >> >
On 7/7/2025 9:11 PM, Muhammad Usama Anjum wrote: >>>>>>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c >>>>>>> index 4488e4cdc5e9e..bc4930fe6a367 100644 >>>>>>> --- a/drivers/net/wireless/ath/ath11k/core.c >>>>>>> +++ b/drivers/net/wireless/ath/ath11k/core.c >>>>>>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab) >>>>>>> mutex_unlock(&ab->core_lock); >>>>>>> >>>>>>> ath11k_dp_free(ab); >>>>>>> - ath11k_hal_srng_deinit(ab); >>>>>>> >>>>>>> ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1; >>>>>>> >>>>>>> - ret = ath11k_hal_srng_init(ab); >>>>>>> - if (ret) >>>>>>> - return ret; >>>>>>> - >>>>>> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the >>>>>> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g. >>>>> Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up >>>>> in resume handler? So when device wakes up, its state is already correct. >>>>> >>>> Hmm... not quite understand your question. Can you elaborate? >>> I'm trying to understand the possibility of cleanup of hal in suspend handler. For example: >>> * The driver has been loaded and has been working fine. >>> * The user called suspend. So all devices would be suspended. >>> * In suspend handler of the ath11k, we should do the necessary cleanups of the states >>> like hal. >>> * When the device would resume after long time, the hal would have the correct state >>> already. So we'll not need to deinit and init again. >> The hal cleanup is not only needed by suspend/resume, but also a step of reset/recover >> process. So If we are moving the cleanup to suspend handler, similar stuff needs to be >> done for reset/recover as well. > It makes sense. > > So clearing the hal structure completely other than ab->hal.srn_config doesn't seem > right. I've also tested it and it crashes the whole system. > > On contrary, with only the current patch applied, there is no abnormality. > > num_shadow_reg_configured and avail_blk_resource are non-zero. If I make them 0, > driver still keeps on working. > > ab->hal.num_shadow_reg_configured = 0; > ab->hal.avail_blk_resource = 0; > ab->hal.current_blk_index = 0; > > As you have suggested setting these 3 to zero, is there any other variable in hal > structure which should be set to zero? IMO srng_config, rdp, wrp and srng_key may keep unchanged through suspend/reset, all other fields should be cleared/reinitialized.
On 7/8/25 6:43 AM, Baochen Qiang wrote: > > > On 7/7/2025 9:11 PM, Muhammad Usama Anjum wrote: >>>>>>>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c >>>>>>>> index 4488e4cdc5e9e..bc4930fe6a367 100644 >>>>>>>> --- a/drivers/net/wireless/ath/ath11k/core.c >>>>>>>> +++ b/drivers/net/wireless/ath/ath11k/core.c >>>>>>>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab) >>>>>>>> mutex_unlock(&ab->core_lock); >>>>>>>> >>>>>>>> ath11k_dp_free(ab); >>>>>>>> - ath11k_hal_srng_deinit(ab); >>>>>>>> >>>>>>>> ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1; >>>>>>>> >>>>>>>> - ret = ath11k_hal_srng_init(ab); >>>>>>>> - if (ret) >>>>>>>> - return ret; >>>>>>>> - >>>>>>> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the >>>>>>> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g. >>>>>> Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up >>>>>> in resume handler? So when device wakes up, its state is already correct. >>>>>> >>>>> Hmm... not quite understand your question. Can you elaborate? >>>> I'm trying to understand the possibility of cleanup of hal in suspend handler. For example: >>>> * The driver has been loaded and has been working fine. >>>> * The user called suspend. So all devices would be suspended. >>>> * In suspend handler of the ath11k, we should do the necessary cleanups of the states >>>> like hal. >>>> * When the device would resume after long time, the hal would have the correct state >>>> already. So we'll not need to deinit and init again. >>> The hal cleanup is not only needed by suspend/resume, but also a step of reset/recover >>> process. So If we are moving the cleanup to suspend handler, similar stuff needs to be >>> done for reset/recover as well. >> It makes sense. >> >> So clearing the hal structure completely other than ab->hal.srn_config doesn't seem >> right. I've also tested it and it crashes the whole system. >> >> On contrary, with only the current patch applied, there is no abnormality. >> >> num_shadow_reg_configured and avail_blk_resource are non-zero. If I make them 0, >> driver still keeps on working. >> >> ab->hal.num_shadow_reg_configured = 0; >> ab->hal.avail_blk_resource = 0; >> ab->hal.current_blk_index = 0; >> >> As you have suggested setting these 3 to zero, is there any other variable in hal >> structure which should be set to zero? > > IMO srng_config, rdp, wrp and srng_key may keep unchanged through suspend/reset, all other > fields should be cleared/reinitialized. memseting srng_list and shadow_reg_addr causes crashes. Please can you confirm why do you think those should be memset. Here is WIP patch: --- a/drivers/net/wireless/ath/ath11k/core.c +++ b/drivers/net/wireless/ath/ath11k/core.c @@ -2213,14 +2213,10 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab) mutex_unlock(&ab->core_lock); ath11k_dp_free(ab); - ath11k_hal_srng_deinit(ab); + ath11k_hal_srng_clear(ab); ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1; - ret = ath11k_hal_srng_init(ab); - if (ret) - return ret; - clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags); ret = ath11k_core_qmi_firmware_ready(ab); diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c index b32de563d453a..d4be040acf2c8 100644 --- a/drivers/net/wireless/ath/ath11k/hal.c +++ b/drivers/net/wireless/ath/ath11k/hal.c @@ -1359,6 +1359,19 @@ void ath11k_hal_srng_deinit(struct ath11k_base *ab) } EXPORT_SYMBOL(ath11k_hal_srng_deinit); +void ath11k_hal_srng_clear(struct ath11k_base *ab) +{ +// --> both of these memset causes crashes +// memset(ab->hal.srng_list, 0, +// sizeof(ab->hal.srng_list) * HAL_SRNG_RING_ID_MAX); +// memset(ab->hal.shadow_reg_addr, 0, +// sizeof(ab->hal.shadow_reg_addr) * HAL_SHADOW_NUM_REGS); + ab->hal.avail_blk_resource = 0; + ab->hal.current_blk_index = 0; + ab->hal.num_shadow_reg_configured = 0; +} +EXPORT_SYMBOL(ath11k_hal_srng_clear); + void ath11k_hal_dump_srng_stats(struct ath11k_base *ab) { struct hal_srng *srng; diff --git a/drivers/net/wireless/ath/ath11k/hal.h b/drivers/net/wireless/ath/ath11k/hal.h index 601542410c752..839095af9267e 100644 --- a/drivers/net/wireless/ath/ath11k/hal.h +++ b/drivers/net/wireless/ath/ath11k/hal.h @@ -965,6 +965,7 @@ int ath11k_hal_srng_setup(struct ath11k_base *ab, enum hal_ring_type type, struct hal_srng_params *params); int ath11k_hal_srng_init(struct ath11k_base *ath11k); void ath11k_hal_srng_deinit(struct ath11k_base *ath11k); +void ath11k_hal_srng_clear(struct ath11k_base *ab); void ath11k_hal_dump_srng_stats(struct ath11k_base *ab); void ath11k_hal_srng_get_shadow_config(struct ath11k_base *ab, u32 **cfg, u32 *len);
On 7/8/2025 5:12 PM, Muhammad Usama Anjum wrote: > On 7/8/25 6:43 AM, Baochen Qiang wrote: >> >> >> On 7/7/2025 9:11 PM, Muhammad Usama Anjum wrote: >>>>>>>>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c >>>>>>>>> index 4488e4cdc5e9e..bc4930fe6a367 100644 >>>>>>>>> --- a/drivers/net/wireless/ath/ath11k/core.c >>>>>>>>> +++ b/drivers/net/wireless/ath/ath11k/core.c >>>>>>>>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab) >>>>>>>>> mutex_unlock(&ab->core_lock); >>>>>>>>> >>>>>>>>> ath11k_dp_free(ab); >>>>>>>>> - ath11k_hal_srng_deinit(ab); >>>>>>>>> >>>>>>>>> ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1; >>>>>>>>> >>>>>>>>> - ret = ath11k_hal_srng_init(ab); >>>>>>>>> - if (ret) >>>>>>>>> - return ret; >>>>>>>>> - >>>>>>>> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the >>>>>>>> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g. >>>>>>> Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up >>>>>>> in resume handler? So when device wakes up, its state is already correct. >>>>>>> >>>>>> Hmm... not quite understand your question. Can you elaborate? >>>>> I'm trying to understand the possibility of cleanup of hal in suspend handler. For example: >>>>> * The driver has been loaded and has been working fine. >>>>> * The user called suspend. So all devices would be suspended. >>>>> * In suspend handler of the ath11k, we should do the necessary cleanups of the states >>>>> like hal. >>>>> * When the device would resume after long time, the hal would have the correct state >>>>> already. So we'll not need to deinit and init again. >>>> The hal cleanup is not only needed by suspend/resume, but also a step of reset/recover >>>> process. So If we are moving the cleanup to suspend handler, similar stuff needs to be >>>> done for reset/recover as well. >>> It makes sense. >>> >>> So clearing the hal structure completely other than ab->hal.srn_config doesn't seem >>> right. I've also tested it and it crashes the whole system. >>> >>> On contrary, with only the current patch applied, there is no abnormality. >>> >>> num_shadow_reg_configured and avail_blk_resource are non-zero. If I make them 0, >>> driver still keeps on working. >>> >>> ab->hal.num_shadow_reg_configured = 0; >>> ab->hal.avail_blk_resource = 0; >>> ab->hal.current_blk_index = 0; >>> >>> As you have suggested setting these 3 to zero, is there any other variable in hal >>> structure which should be set to zero? >> >> IMO srng_config, rdp, wrp and srng_key may keep unchanged through suspend/reset, all other >> fields should be cleared/reinitialized. > > memseting srng_list and shadow_reg_addr causes crashes. Please can you confirm why do you > think those should be memset. Here is WIP patch: We need to make sure they have a clean state while resume/recover. > > > --- a/drivers/net/wireless/ath/ath11k/core.c > +++ b/drivers/net/wireless/ath/ath11k/core.c > @@ -2213,14 +2213,10 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab) > mutex_unlock(&ab->core_lock); > > ath11k_dp_free(ab); > - ath11k_hal_srng_deinit(ab); > + ath11k_hal_srng_clear(ab); > > ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1; > > - ret = ath11k_hal_srng_init(ab); > - if (ret) > - return ret; > - > clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags); > > ret = ath11k_core_qmi_firmware_ready(ab); > diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c > index b32de563d453a..d4be040acf2c8 100644 > --- a/drivers/net/wireless/ath/ath11k/hal.c > +++ b/drivers/net/wireless/ath/ath11k/hal.c > @@ -1359,6 +1359,19 @@ void ath11k_hal_srng_deinit(struct ath11k_base *ab) > } > EXPORT_SYMBOL(ath11k_hal_srng_deinit); > > +void ath11k_hal_srng_clear(struct ath11k_base *ab) > +{ > +// --> both of these memset causes crashes > +// memset(ab->hal.srng_list, 0, > +// sizeof(ab->hal.srng_list) * HAL_SRNG_RING_ID_MAX); You are memset too much, just sizeof(ab->hal.srng_list) is OK. > +// memset(ab->hal.shadow_reg_addr, 0, > +// sizeof(ab->hal.shadow_reg_addr) * HAL_SHADOW_NUM_REGS); same here > + ab->hal.avail_blk_resource = 0; > + ab->hal.current_blk_index = 0; > + ab->hal.num_shadow_reg_configured = 0; > +} > +EXPORT_SYMBOL(ath11k_hal_srng_clear); > + > void ath11k_hal_dump_srng_stats(struct ath11k_base *ab) > { > struct hal_srng *srng; > diff --git a/drivers/net/wireless/ath/ath11k/hal.h b/drivers/net/wireless/ath/ath11k/hal.h > index 601542410c752..839095af9267e 100644 > --- a/drivers/net/wireless/ath/ath11k/hal.h > +++ b/drivers/net/wireless/ath/ath11k/hal.h > @@ -965,6 +965,7 @@ int ath11k_hal_srng_setup(struct ath11k_base *ab, enum hal_ring_type type, > struct hal_srng_params *params); > int ath11k_hal_srng_init(struct ath11k_base *ath11k); > void ath11k_hal_srng_deinit(struct ath11k_base *ath11k); > +void ath11k_hal_srng_clear(struct ath11k_base *ab); > void ath11k_hal_dump_srng_stats(struct ath11k_base *ab); > void ath11k_hal_srng_get_shadow_config(struct ath11k_base *ab, > u32 **cfg, u32 *len); > >
On 6/30/2025 12:43 AM, Muhammad Usama Anjum wrote: Subject has incorrect prefix, should be "wifi: ath11k: " And ideally it should mention HAL SRNG since it is specific to that allocation > Don't deinitialize and reinitialize the HAL helpers. The dma memory is > deallocated and there is high possibility that we'll not be able to get > the same memory allocated from dma when there is high memory pressure. > > Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 not quite the right format since it is missing hw version and bus > > Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") > Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> > --- > drivers/net/wireless/ath/ath11k/core.c | 5 ----- > 1 file changed, 5 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c > index 4488e4cdc5e9e..bc4930fe6a367 100644 > --- a/drivers/net/wireless/ath/ath11k/core.c > +++ b/drivers/net/wireless/ath/ath11k/core.c > @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab) > mutex_unlock(&ab->core_lock); > > ath11k_dp_free(ab); > - ath11k_hal_srng_deinit(ab); > > ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1; > > - ret = ath11k_hal_srng_init(ab); > - if (ret) > - return ret; > - > clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags); > > ret = ath11k_core_qmi_firmware_ready(ab); does this patch have any dependency upon the 1/3 patch? if not, then it should be sent separately since it should go through the ath tree instead of through the mhi tree. /jeff
Thank you for reviewing! On 7/1/25 7:49 PM, Jeff Johnson wrote: > On 6/30/2025 12:43 AM, Muhammad Usama Anjum wrote: > > Subject has incorrect prefix, should be "wifi: ath11k: " > > And ideally it should mention HAL SRNG since it is specific to that allocation I'll fix the subject. > >> Don't deinitialize and reinitialize the HAL helpers. The dma memory is >> deallocated and there is high possibility that we'll not be able to get >> the same memory allocated from dma when there is high memory pressure. >> >> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 > > not quite the right format since it is missing hw version and bus I've been using the same tag from last accepted patches. How to construct the correct patch? > >> >> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") >> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >> --- >> drivers/net/wireless/ath/ath11k/core.c | 5 ----- >> 1 file changed, 5 deletions(-) >> >> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c >> index 4488e4cdc5e9e..bc4930fe6a367 100644 >> --- a/drivers/net/wireless/ath/ath11k/core.c >> +++ b/drivers/net/wireless/ath/ath11k/core.c >> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab) >> mutex_unlock(&ab->core_lock); >> >> ath11k_dp_free(ab); >> - ath11k_hal_srng_deinit(ab); >> >> ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1; >> >> - ret = ath11k_hal_srng_init(ab); >> - if (ret) >> - return ret; >> - >> clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags); >> >> ret = ath11k_core_qmi_firmware_ready(ab); > > does this patch have any dependency upon the 1/3 patch? No > if not, then it should be sent separately since it should go through the ath > tree instead of through the mhi tree. Ohh... I see. I'll send it separately. > > /jeff
On 7/2/2025 8:28 AM, Muhammad Usama Anjum wrote: > On 7/1/25 7:49 PM, Jeff Johnson wrote: >> On 6/30/2025 12:43 AM, Muhammad Usama Anjum wrote: >>> the same memory allocated from dma when there is high memory pressure. >>> >>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 >> >> not quite the right format since it is missing hw version and bus > I've been using the same tag from last accepted patches. How to construct the > correct patch? https://wireless.docs.kernel.org/en/latest/en/users/drivers/ath12k/submittingpatches.html#tested-on-tag
On Mon, Jun 30, 2025 at 12:43:29PM +0500, Muhammad Usama Anjum wrote: > Don't deinitialize and reinitialize the HAL helpers. The dma memory is > deallocated and there is high possibility that we'll not be able to get > the same memory allocated from dma when there is high memory pressure. > > Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 > > Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") No cc: stable?
On 7/1/25 3:25 PM, Greg Kroah-Hartman wrote: > On Mon, Jun 30, 2025 at 12:43:29PM +0500, Muhammad Usama Anjum wrote: >> Don't deinitialize and reinitialize the HAL helpers. The dma memory is >> deallocated and there is high possibility that we'll not be able to get >> the same memory allocated from dma when there is high memory pressure. >> >> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 >> >> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") > > No cc: stable? I'll add it to entire series.
© 2016 - 2025 Red Hat, Inc.