[PATCH v8] Bluetooth: hci_qca: Fix SSR (SubSystem Restart) fail when BT_EN is pulled up by hw

Shuai Zhang posted 1 patch 1 month, 1 week ago
There is a newer version of this series
drivers/bluetooth/hci_qca.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
[PATCH v8] Bluetooth: hci_qca: Fix SSR (SubSystem Restart) fail when BT_EN is pulled up by hw
Posted by Shuai Zhang 1 month, 1 week ago
When the host actively triggers SSR and collects coredump data,
the Bluetooth stack sends a reset command to the controller. However, due
to the inability to clear the QCA_SSR_TRIGGERED and QCA_IBS_DISABLED bits,
the reset command times out.

To address this, this patch clears the QCA_SSR_TRIGGERED and
QCA_IBS_DISABLED flags and adds a 50ms delay after SSR, but only when
HCI_QUIRK_NON_PERSISTENT_SETUP is not set. This ensures the controller
completes the SSR process when BT_EN is always high due to hardware.

For the purpose of HCI_QUIRK_NON_PERSISTENT_SETUP, please refer to
the comment in `include/net/bluetooth/hci.h`.

The HCI_QUIRK_NON_PERSISTENT_SETUP quirk is associated with BT_EN,
and its presence can be used to determine whether BT_EN is defined in DTS.

After SSR, host will not download the firmware, causing
controller to remain in the IBS_WAKE state. Host needs
to synchronize with the controller to maintain proper operation.

Multiple triggers of SSR only first generate coredump file,
due to memcoredump_flag no clear.

add clear coredump flag when ssr completed.

When the SSR duration exceeds 2 seconds, it triggers
host tx_idle_timeout, which sets host TX state to sleep. due to the
hardware pulling up bt_en, the firmware is not downloaded after the SSR.
As a result, the controller does not enter sleep mode. Consequently,
when the host sends a command afterward, it sends 0xFD to the controller,
but the controller does not respond, leading to a command timeout.

So reset tx_idle_timer after SSR to prevent host enter TX IBS_Sleep mode.

Changes since v6-7:
- Merge the changes into a single patch.
- Update commit.

Changes since v1-5:
- Add an explanation for HCI_QUIRK_NON_PERSISTENT_SETUP.
- Add commments for msleep(50).
- Update format and commit.

Signed-off-by: Shuai Zhang <quic_shuaz@quicinc.com>
---
 drivers/bluetooth/hci_qca.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index 4e56782b0..9dc59b002 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -1653,6 +1653,39 @@ static void qca_hw_error(struct hci_dev *hdev, u8 code)
 		skb_queue_purge(&qca->rx_memdump_q);
 	}
 
+	/*
+	 * If the BT chip's bt_en pin is connected to a 3.3V power supply via
+	 * hardware and always stays high, driver cannot control the bt_en pin.
+	 * As a result, during SSR (SubSystem Restart), QCA_SSR_TRIGGERED and
+	 * QCA_IBS_DISABLED flags cannot be cleared, which leads to a reset
+	 * command timeout.
+	 * Add an msleep delay to ensure controller completes the SSR process.
+	 *
+	 * Host will not download the firmware after SSR, controller to remain
+	 * in the IBS_WAKE state, and the host needs to synchronize with it
+	 *
+	 * Since the bluetooth chip has been reset, clear the memdump state.
+	 */
+	if (!test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) {
+		/*
+		 * When the SSR (SubSystem Restart) duration exceeds 2 seconds,
+		 * it triggers host tx_idle_delay, which sets host TX state
+		 * to sleep. Reset tx_idle_timer after SSR to prevent
+		 * host enter TX IBS_Sleep mode.
+		 */
+		mod_timer(&qca->tx_idle_timer, jiffies +
+				  msecs_to_jiffies(qca->tx_idle_delay));
+
+		/* Controller reset completion time is 50ms */
+		msleep(50);
+
+		clear_bit(QCA_SSR_TRIGGERED, &qca->flags);
+		clear_bit(QCA_IBS_DISABLED, &qca->flags);
+
+		qca->tx_ibs_state = HCI_IBS_TX_AWAKE;
+		qca->memdump_state = QCA_MEMDUMP_IDLE;
+	}
+
 	clear_bit(QCA_HW_ERROR_EVENT, &qca->flags);
 }
 
-- 
2.34.1
Re: [PATCH v8] Bluetooth: hci_qca: Fix SSR (SubSystem Restart) fail when BT_EN is pulled up by hw
Posted by kernel test robot 1 month ago
Hi Shuai,

kernel test robot noticed the following build errors:

[auto build test ERROR on bluetooth/master]
[also build test ERROR on bluetooth-next/master linus/master v6.17-rc4 next-20250901]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shuai-Zhang/Bluetooth-hci_qca-Fix-SSR-SubSystem-Restart-fail-when-BT_EN-is-pulled-up-by-hw/20250822-203836
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git master
patch link:    https://lore.kernel.org/r/20250822123605.757306-1-quic_shuaz%40quicinc.com
patch subject: [PATCH v8] Bluetooth: hci_qca: Fix SSR (SubSystem Restart) fail when BT_EN is pulled up by hw
config: arm64-defconfig (https://download.01.org/0day-ci/archive/20250902/202509020557.cSBn6IwZ-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250902/202509020557.cSBn6IwZ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509020557.cSBn6IwZ-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/kernel.h:23,
                    from drivers/bluetooth/hci_qca.c:18:
   drivers/bluetooth/hci_qca.c: In function 'qca_hw_error':
>> drivers/bluetooth/hci_qca.c:1669:60: error: 'struct hci_dev' has no member named 'quirks'
    1669 |         if (!test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) {
         |                                                            ^~
   include/linux/bitops.h:44:44: note: in definition of macro 'bitop'
      44 |           __builtin_constant_p((uintptr_t)(addr) != (uintptr_t)NULL) && \
         |                                            ^~~~
   drivers/bluetooth/hci_qca.c:1669:14: note: in expansion of macro 'test_bit'
    1669 |         if (!test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) {
         |              ^~~~~~~~
>> drivers/bluetooth/hci_qca.c:1669:60: error: 'struct hci_dev' has no member named 'quirks'
    1669 |         if (!test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) {
         |                                                            ^~
   include/linux/bitops.h:45:23: note: in definition of macro 'bitop'
      45 |           (uintptr_t)(addr) != (uintptr_t)NULL &&                       \
         |                       ^~~~
   drivers/bluetooth/hci_qca.c:1669:14: note: in expansion of macro 'test_bit'
    1669 |         if (!test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) {
         |              ^~~~~~~~
>> drivers/bluetooth/hci_qca.c:1669:60: error: 'struct hci_dev' has no member named 'quirks'
    1669 |         if (!test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) {
         |                                                            ^~
   include/linux/bitops.h:46:57: note: in definition of macro 'bitop'
      46 |           __builtin_constant_p(*(const unsigned long *)(addr))) ?       \
         |                                                         ^~~~
   drivers/bluetooth/hci_qca.c:1669:14: note: in expansion of macro 'test_bit'
    1669 |         if (!test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) {
         |              ^~~~~~~~
>> drivers/bluetooth/hci_qca.c:1669:60: error: 'struct hci_dev' has no member named 'quirks'
    1669 |         if (!test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) {
         |                                                            ^~
   include/linux/bitops.h:47:24: note: in definition of macro 'bitop'
      47 |          const##op(nr, addr) : op(nr, addr))
         |                        ^~~~
   drivers/bluetooth/hci_qca.c:1669:14: note: in expansion of macro 'test_bit'
    1669 |         if (!test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) {
         |              ^~~~~~~~
>> drivers/bluetooth/hci_qca.c:1669:60: error: 'struct hci_dev' has no member named 'quirks'
    1669 |         if (!test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) {
         |                                                            ^~
   include/linux/bitops.h:47:39: note: in definition of macro 'bitop'
      47 |          const##op(nr, addr) : op(nr, addr))
         |                                       ^~~~
   drivers/bluetooth/hci_qca.c:1669:14: note: in expansion of macro 'test_bit'
    1669 |         if (!test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) {
         |              ^~~~~~~~


vim +1669 drivers/bluetooth/hci_qca.c

  1609	
  1610	static void qca_hw_error(struct hci_dev *hdev, u8 code)
  1611	{
  1612		struct hci_uart *hu = hci_get_drvdata(hdev);
  1613		struct qca_data *qca = hu->priv;
  1614	
  1615		set_bit(QCA_SSR_TRIGGERED, &qca->flags);
  1616		set_bit(QCA_HW_ERROR_EVENT, &qca->flags);
  1617		bt_dev_info(hdev, "mem_dump_status: %d", qca->memdump_state);
  1618	
  1619		if (qca->memdump_state == QCA_MEMDUMP_IDLE) {
  1620			/* If hardware error event received for other than QCA
  1621			 * soc memory dump event, then we need to crash the SOC
  1622			 * and wait here for 8 seconds to get the dump packets.
  1623			 * This will block main thread to be on hold until we
  1624			 * collect dump.
  1625			 */
  1626			set_bit(QCA_MEMDUMP_COLLECTION, &qca->flags);
  1627			qca_send_crashbuffer(hu);
  1628			qca_wait_for_dump_collection(hdev);
  1629		} else if (qca->memdump_state == QCA_MEMDUMP_COLLECTING) {
  1630			/* Let us wait here until memory dump collected or
  1631			 * memory dump timer expired.
  1632			 */
  1633			bt_dev_info(hdev, "waiting for dump to complete");
  1634			qca_wait_for_dump_collection(hdev);
  1635		}
  1636	
  1637		mutex_lock(&qca->hci_memdump_lock);
  1638		if (qca->memdump_state != QCA_MEMDUMP_COLLECTED) {
  1639			bt_dev_err(hu->hdev, "clearing allocated memory due to memdump timeout");
  1640			hci_devcd_abort(hu->hdev);
  1641			if (qca->qca_memdump) {
  1642				kfree(qca->qca_memdump);
  1643				qca->qca_memdump = NULL;
  1644			}
  1645			qca->memdump_state = QCA_MEMDUMP_TIMEOUT;
  1646			cancel_delayed_work(&qca->ctrl_memdump_timeout);
  1647		}
  1648		mutex_unlock(&qca->hci_memdump_lock);
  1649	
  1650		if (qca->memdump_state == QCA_MEMDUMP_TIMEOUT ||
  1651		    qca->memdump_state == QCA_MEMDUMP_COLLECTED) {
  1652			cancel_work_sync(&qca->ctrl_memdump_evt);
  1653			skb_queue_purge(&qca->rx_memdump_q);
  1654		}
  1655	
  1656		/*
  1657		 * If the BT chip's bt_en pin is connected to a 3.3V power supply via
  1658		 * hardware and always stays high, driver cannot control the bt_en pin.
  1659		 * As a result, during SSR (SubSystem Restart), QCA_SSR_TRIGGERED and
  1660		 * QCA_IBS_DISABLED flags cannot be cleared, which leads to a reset
  1661		 * command timeout.
  1662		 * Add an msleep delay to ensure controller completes the SSR process.
  1663		 *
  1664		 * Host will not download the firmware after SSR, controller to remain
  1665		 * in the IBS_WAKE state, and the host needs to synchronize with it
  1666		 *
  1667		 * Since the bluetooth chip has been reset, clear the memdump state.
  1668		 */
> 1669		if (!test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) {
  1670			/*
  1671			 * When the SSR (SubSystem Restart) duration exceeds 2 seconds,
  1672			 * it triggers host tx_idle_delay, which sets host TX state
  1673			 * to sleep. Reset tx_idle_timer after SSR to prevent
  1674			 * host enter TX IBS_Sleep mode.
  1675			 */
  1676			mod_timer(&qca->tx_idle_timer, jiffies +
  1677					  msecs_to_jiffies(qca->tx_idle_delay));
  1678	
  1679			/* Controller reset completion time is 50ms */
  1680			msleep(50);
  1681	
  1682			clear_bit(QCA_SSR_TRIGGERED, &qca->flags);
  1683			clear_bit(QCA_IBS_DISABLED, &qca->flags);
  1684	
  1685			qca->tx_ibs_state = HCI_IBS_TX_AWAKE;
  1686			qca->memdump_state = QCA_MEMDUMP_IDLE;
  1687		}
  1688	
  1689		clear_bit(QCA_HW_ERROR_EVENT, &qca->flags);
  1690	}
  1691	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH v8] Bluetooth: hci_qca: Fix SSR (SubSystem Restart) fail when BT_EN is pulled up by hw
Posted by Dmitry Baryshkov 1 month, 1 week ago
On Fri, Aug 22, 2025 at 08:36:05PM +0800, Shuai Zhang wrote:
> When the host actively triggers SSR and collects coredump data,
> the Bluetooth stack sends a reset command to the controller. However, due
> to the inability to clear the QCA_SSR_TRIGGERED and QCA_IBS_DISABLED bits,
> the reset command times out.
> 
> To address this, this patch clears the QCA_SSR_TRIGGERED and
> QCA_IBS_DISABLED flags and adds a 50ms delay after SSR, but only when
> HCI_QUIRK_NON_PERSISTENT_SETUP is not set. This ensures the controller
> completes the SSR process when BT_EN is always high due to hardware.
> 
> For the purpose of HCI_QUIRK_NON_PERSISTENT_SETUP, please refer to
> the comment in `include/net/bluetooth/hci.h`.
> 
> The HCI_QUIRK_NON_PERSISTENT_SETUP quirk is associated with BT_EN,
> and its presence can be used to determine whether BT_EN is defined in DTS.
> 
> After SSR, host will not download the firmware, causing
> controller to remain in the IBS_WAKE state. Host needs
> to synchronize with the controller to maintain proper operation.
> 
> Multiple triggers of SSR only first generate coredump file,
> due to memcoredump_flag no clear.
> 
> add clear coredump flag when ssr completed.
> 
> When the SSR duration exceeds 2 seconds, it triggers
> host tx_idle_timeout, which sets host TX state to sleep. due to the
> hardware pulling up bt_en, the firmware is not downloaded after the SSR.
> As a result, the controller does not enter sleep mode. Consequently,
> when the host sends a command afterward, it sends 0xFD to the controller,
> but the controller does not respond, leading to a command timeout.
> 
> So reset tx_idle_timer after SSR to prevent host enter TX IBS_Sleep mode.
> 
> Changes since v6-7:
> - Merge the changes into a single patch.
> - Update commit.
> 
> Changes since v1-5:
> - Add an explanation for HCI_QUIRK_NON_PERSISTENT_SETUP.
> - Add commments for msleep(50).
> - Update format and commit.

Changelog doesn't belong to the commit message. It should be placed
under tripple-dash.

> 
> Signed-off-by: Shuai Zhang <quic_shuaz@quicinc.com>
> ---
>  drivers/bluetooth/hci_qca.c | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> index 4e56782b0..9dc59b002 100644
> --- a/drivers/bluetooth/hci_qca.c
> +++ b/drivers/bluetooth/hci_qca.c
> @@ -1653,6 +1653,39 @@ static void qca_hw_error(struct hci_dev *hdev, u8 code)
>  		skb_queue_purge(&qca->rx_memdump_q);
>  	}
>  
> +	/*
> +	 * If the BT chip's bt_en pin is connected to a 3.3V power supply via
> +	 * hardware and always stays high, driver cannot control the bt_en pin.
> +	 * As a result, during SSR (SubSystem Restart), QCA_SSR_TRIGGERED and
> +	 * QCA_IBS_DISABLED flags cannot be cleared, which leads to a reset
> +	 * command timeout.
> +	 * Add an msleep delay to ensure controller completes the SSR process.
> +	 *
> +	 * Host will not download the firmware after SSR, controller to remain
> +	 * in the IBS_WAKE state, and the host needs to synchronize with it
> +	 *
> +	 * Since the bluetooth chip has been reset, clear the memdump state.
> +	 */
> +	if (!test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) {

Still based on some old tree. Could you please stop doing that?


-- 
With best wishes
Dmitry