[PATCH v5] Bluetooth: qca: Fix delayed hw_error handling due to missing wakeup during SSR

Shuai Zhang posted 1 patch 2 months ago
There is a newer version of this series
drivers/bluetooth/hci_qca.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
[PATCH v5] Bluetooth: qca: Fix delayed hw_error handling due to missing wakeup during SSR
Posted by Shuai Zhang 2 months ago
When Bluetooth controller encounters a coredump, it triggers
the Subsystem Restart (SSR) mechanism. The controller first
reports the coredump data, and once the data upload is complete,
it sends a hw_error event. The host relies on this event to
proceed with subsequent recovery actions.

If the host has not finished processing the coredump data
when the hw_error event is received,
it sets a timer to wait until either the data processing is complete
or the timeout expires before handling the event.

The current implementation lacks a wakeup trigger. As a result,
even if the coredump data has already been processed, the host
continues to wait until the timer expires, causing unnecessary
delays in handling the hw_error event.

To fix this issue, adds a `wake_up_bit()` call after the host finishes
processing the coredump data. This ensures that the waiting thread is
promptly notified and can proceed to handle the hw_error event without
waiting for the timeout.

Test case:
- Trigger controller coredump using the command: `hcitool cmd 0x3f 0c 26`.
- Use `btmon` to capture HCI logs.
- Observe the time interval between receiving the hw_error event
and the execution of the power-off sequence in the HCI log.

Signed-off-by: Shuai Zhang <shuai.zhang@oss.qualcomm.com>
---
Changes v5:
- Replace clear_and_wake_up_bit with wake_up_bit
- Link to v4
  https://lore.kernel.org/all/20260327083258.1398450-1-shuai.zhang@oss.qualcomm.com/

Changes v4:
- add Acked-by signoff
- Link to v3
  https://lore.kernel.org/all/20251107033924.3707495-1-quic_shuaz@quicinc.com/

Changes v3:
- add Fixes tag
- Link to v2
  https://lore.kernel.org/all/20251106140103.1406081-1-quic_shuaz@quicinc.com/

Changes v2:
- Split timeout conversion into a separate patch.
- Clarified commit messages and added test case description.
- Link to v1
  https://lore.kernel.org/all/20251104112601.2670019-1-quic_shuaz@quicinc.com/
---
 drivers/bluetooth/hci_qca.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index c17a462ae..9fffe665b 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -1108,7 +1108,7 @@ static void qca_controller_memdump(struct work_struct *work)
 				qca->qca_memdump = NULL;
 				qca->memdump_state = QCA_MEMDUMP_COLLECTED;
 				cancel_delayed_work(&qca->ctrl_memdump_timeout);
-				clear_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
+				wake_up_bit(&qca->flags, QCA_MEMDUMP_COLLECTION);
 				clear_bit(QCA_IBS_DISABLED, &qca->flags);
 				mutex_unlock(&qca->hci_memdump_lock);
 				return;
@@ -1186,7 +1186,7 @@ static void qca_controller_memdump(struct work_struct *work)
 			kfree(qca->qca_memdump);
 			qca->qca_memdump = NULL;
 			qca->memdump_state = QCA_MEMDUMP_COLLECTED;
-			clear_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
+			wake_up_bit(&qca->flags, QCA_MEMDUMP_COLLECTION);
 		}
 
 		mutex_unlock(&qca->hci_memdump_lock);
-- 
2.34.1
Re: [PATCH v5] Bluetooth: qca: Fix delayed hw_error handling due to missing wakeup during SSR
Posted by Bartosz Golaszewski 2 months ago
On Thu, 9 Apr 2026 13:22:33 +0200, Shuai Zhang
<shuai.zhang@oss.qualcomm.com> said:
> When Bluetooth controller encounters a coredump, it triggers
> the Subsystem Restart (SSR) mechanism. The controller first
> reports the coredump data, and once the data upload is complete,
> it sends a hw_error event. The host relies on this event to
> proceed with subsequent recovery actions.
>
> If the host has not finished processing the coredump data
> when the hw_error event is received,
> it sets a timer to wait until either the data processing is complete
> or the timeout expires before handling the event.
>
> The current implementation lacks a wakeup trigger. As a result,
> even if the coredump data has already been processed, the host
> continues to wait until the timer expires, causing unnecessary
> delays in handling the hw_error event.
>
> To fix this issue, adds a `wake_up_bit()` call after the host finishes
> processing the coredump data. This ensures that the waiting thread is
> promptly notified and can proceed to handle the hw_error event without
> waiting for the timeout.
>
> Test case:
> - Trigger controller coredump using the command: `hcitool cmd 0x3f 0c 26`.
> - Use `btmon` to capture HCI logs.
> - Observe the time interval between receiving the hw_error event
> and the execution of the power-off sequence in the HCI log.
>
> Signed-off-by: Shuai Zhang <shuai.zhang@oss.qualcomm.com>
> ---

Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Re: [PATCH v5] Bluetooth: qca: Fix delayed hw_error handling due to missing wakeup during SSR
Posted by Luiz Augusto von Dentz 2 months ago
Hi Shuai,

On Thu, Apr 9, 2026 at 7:22 AM Shuai Zhang <shuai.zhang@oss.qualcomm.com> wrote:
>
> When Bluetooth controller encounters a coredump, it triggers
> the Subsystem Restart (SSR) mechanism. The controller first
> reports the coredump data, and once the data upload is complete,
> it sends a hw_error event. The host relies on this event to
> proceed with subsequent recovery actions.
>
> If the host has not finished processing the coredump data
> when the hw_error event is received,
> it sets a timer to wait until either the data processing is complete
> or the timeout expires before handling the event.
>
> The current implementation lacks a wakeup trigger. As a result,
> even if the coredump data has already been processed, the host
> continues to wait until the timer expires, causing unnecessary
> delays in handling the hw_error event.
>
> To fix this issue, adds a `wake_up_bit()` call after the host finishes
> processing the coredump data. This ensures that the waiting thread is
> promptly notified and can proceed to handle the hw_error event without
> waiting for the timeout.
>
> Test case:
> - Trigger controller coredump using the command: `hcitool cmd 0x3f 0c 26`.
> - Use `btmon` to capture HCI logs.
> - Observe the time interval between receiving the hw_error event
> and the execution of the power-off sequence in the HCI log.
>
> Signed-off-by: Shuai Zhang <shuai.zhang@oss.qualcomm.com>
> ---
> Changes v5:
> - Replace clear_and_wake_up_bit with wake_up_bit
> - Link to v4
>   https://lore.kernel.org/all/20260327083258.1398450-1-shuai.zhang@oss.qualcomm.com/
>
> Changes v4:
> - add Acked-by signoff
> - Link to v3
>   https://lore.kernel.org/all/20251107033924.3707495-1-quic_shuaz@quicinc.com/
>
> Changes v3:
> - add Fixes tag
> - Link to v2
>   https://lore.kernel.org/all/20251106140103.1406081-1-quic_shuaz@quicinc.com/
>
> Changes v2:
> - Split timeout conversion into a separate patch.
> - Clarified commit messages and added test case description.
> - Link to v1
>   https://lore.kernel.org/all/20251104112601.2670019-1-quic_shuaz@quicinc.com/
> ---
>  drivers/bluetooth/hci_qca.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
>
> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> index c17a462ae..9fffe665b 100644
> --- a/drivers/bluetooth/hci_qca.c
> +++ b/drivers/bluetooth/hci_qca.c
> @@ -1108,7 +1108,7 @@ static void qca_controller_memdump(struct work_struct *work)
>                                 qca->qca_memdump = NULL;
>                                 qca->memdump_state = QCA_MEMDUMP_COLLECTED;
>                                 cancel_delayed_work(&qca->ctrl_memdump_timeout);
> -                               clear_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
> +                               wake_up_bit(&qca->flags, QCA_MEMDUMP_COLLECTION);
>                                 clear_bit(QCA_IBS_DISABLED, &qca->flags);
>                                 mutex_unlock(&qca->hci_memdump_lock);
>                                 return;
> @@ -1186,7 +1186,7 @@ static void qca_controller_memdump(struct work_struct *work)
>                         kfree(qca->qca_memdump);
>                         qca->qca_memdump = NULL;
>                         qca->memdump_state = QCA_MEMDUMP_COLLECTED;
> -                       clear_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
> +                       wake_up_bit(&qca->flags, QCA_MEMDUMP_COLLECTION);
>                 }
>
>                 mutex_unlock(&qca->hci_memdump_lock);
> --
> 2.34.1

https://sashiko.dev/#/patchset/20260409112233.3326467-1-shuai.zhang%40oss.qualcomm.com

Lots of comments which Im not sure are really valid, but one is
considered critical so we do have to pay attention if that is valid
then it should be addressed.

-- 
Luiz Augusto von Dentz
Re: [PATCH v5] Bluetooth: qca: Fix delayed hw_error handling due to missing wakeup during SSR
Posted by Shuai Zhang 2 months ago
Hi  Luiz

On 4/9/2026 11:21 PM, Luiz Augusto von Dentz wrote:
> Hi Shuai,
>
> On Thu, Apr 9, 2026 at 7:22 AM Shuai Zhang <shuai.zhang@oss.qualcomm.com> wrote:
>> When Bluetooth controller encounters a coredump, it triggers
>> the Subsystem Restart (SSR) mechanism. The controller first
>> reports the coredump data, and once the data upload is complete,
>> it sends a hw_error event. The host relies on this event to
>> proceed with subsequent recovery actions.
>>
>> If the host has not finished processing the coredump data
>> when the hw_error event is received,
>> it sets a timer to wait until either the data processing is complete
>> or the timeout expires before handling the event.
>>
>> The current implementation lacks a wakeup trigger. As a result,
>> even if the coredump data has already been processed, the host
>> continues to wait until the timer expires, causing unnecessary
>> delays in handling the hw_error event.
>>
>> To fix this issue, adds a `wake_up_bit()` call after the host finishes
>> processing the coredump data. This ensures that the waiting thread is
>> promptly notified and can proceed to handle the hw_error event without
>> waiting for the timeout.
>>
>> Test case:
>> - Trigger controller coredump using the command: `hcitool cmd 0x3f 0c 26`.
>> - Use `btmon` to capture HCI logs.
>> - Observe the time interval between receiving the hw_error event
>> and the execution of the power-off sequence in the HCI log.
>>
>> Signed-off-by: Shuai Zhang <shuai.zhang@oss.qualcomm.com>
>> ---
>> Changes v5:
>> - Replace clear_and_wake_up_bit with wake_up_bit
>> - Link to v4
>>    https://lore.kernel.org/all/20260327083258.1398450-1-shuai.zhang@oss.qualcomm.com/
>>
>> Changes v4:
>> - add Acked-by signoff
>> - Link to v3
>>    https://lore.kernel.org/all/20251107033924.3707495-1-quic_shuaz@quicinc.com/
>>
>> Changes v3:
>> - add Fixes tag
>> - Link to v2
>>    https://lore.kernel.org/all/20251106140103.1406081-1-quic_shuaz@quicinc.com/
>>
>> Changes v2:
>> - Split timeout conversion into a separate patch.
>> - Clarified commit messages and added test case description.
>> - Link to v1
>>    https://lore.kernel.org/all/20251104112601.2670019-1-quic_shuaz@quicinc.com/
>> ---
>>   drivers/bluetooth/hci_qca.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>>
>> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
>> index c17a462ae..9fffe665b 100644
>> --- a/drivers/bluetooth/hci_qca.c
>> +++ b/drivers/bluetooth/hci_qca.c
>> @@ -1108,7 +1108,7 @@ static void qca_controller_memdump(struct work_struct *work)
>>                                  qca->qca_memdump = NULL;
>>                                  qca->memdump_state = QCA_MEMDUMP_COLLECTED;
>>                                  cancel_delayed_work(&qca->ctrl_memdump_timeout);
>> -                               clear_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
>> +                               wake_up_bit(&qca->flags, QCA_MEMDUMP_COLLECTION);
>>                                  clear_bit(QCA_IBS_DISABLED, &qca->flags);
>>                                  mutex_unlock(&qca->hci_memdump_lock);
>>                                  return;
>> @@ -1186,7 +1186,7 @@ static void qca_controller_memdump(struct work_struct *work)
>>                          kfree(qca->qca_memdump);
>>                          qca->qca_memdump = NULL;
>>                          qca->memdump_state = QCA_MEMDUMP_COLLECTED;
>> -                       clear_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
>> +                       wake_up_bit(&qca->flags, QCA_MEMDUMP_COLLECTION);
>>                  }
>>
>>                  mutex_unlock(&qca->hci_memdump_lock);
>> --
>> 2.34.1
> https://sashiko.dev/#/patchset/20260409112233.3326467-1-shuai.zhang%40oss.qualcomm.com
>
> Lots of comments which Im not sure are really valid, but one is
> considered critical so we do have to pay attention if that is valid
> then it should be addressed.


This is valid. clear_and_wake_up_bit() will be used.


Thanks,
Shuai

>