[BlueZ PATCH] Bluetooth: hci_core: Skip hci_cmd_work if hci_request is pending

Hsin-chen Chuang posted 1 patch 1 year, 12 months ago
net/bluetooth/hci_core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
[BlueZ PATCH] Bluetooth: hci_core: Skip hci_cmd_work if hci_request is pending
Posted by Hsin-chen Chuang 1 year, 12 months ago
hci_cmd_work overwrites the hdev->sent_cmd which contains the required
info for a hci_request to work. In the real world, it's observed that
a request from hci_le_ext_create_conn_sync could be interrupted by
the authentication (hci_conn_auth) caused by rfcomm_sock_connect. When
it happends, hci_le_ext_create_conn_sync hangs until timeout; If the
LE connection is triggered by MGMT, it freezes the whole MGMT interface.

Signed-off-by: Hsin-chen Chuang <chharry@chromium.org>
---

 net/bluetooth/hci_core.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 34c8dca2069f..e3706889976d 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -4213,8 +4213,11 @@ static void hci_cmd_work(struct work_struct *work)
 	BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name,
 	       atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q));
 
-	/* Send queued commands */
-	if (atomic_read(&hdev->cmd_cnt)) {
+	/* Send queued commands. Don't send the command when there is a pending
+	 * hci_request because the request callbacks would be overwritten.
+	 */
+	if (atomic_read(&hdev->cmd_cnt) &&
+	    !hci_dev_test_flag(hdev, HCI_CMD_PENDING)) {
 		skb = skb_dequeue(&hdev->cmd_q);
 		if (!skb)
 			return;
-- 
2.43.0.687.g38aa6559b0-goog
Re: [BlueZ PATCH] Bluetooth: hci_core: Skip hci_cmd_work if hci_request is pending
Posted by Luiz Augusto von Dentz 1 year, 11 months ago
Hi Hsin-chen,

On Tue, Feb 13, 2024 at 12:24 PM Hsin-chen Chuang <chharry@chromium.org> wrote:
>
> hci_cmd_work overwrites the hdev->sent_cmd which contains the required
> info for a hci_request to work. In the real world, it's observed that
> a request from hci_le_ext_create_conn_sync could be interrupted by
> the authentication (hci_conn_auth) caused by rfcomm_sock_connect. When
> it happends, hci_le_ext_create_conn_sync hangs until timeout; If the
> LE connection is triggered by MGMT, it freezes the whole MGMT interface.
>
> Signed-off-by: Hsin-chen Chuang <chharry@chromium.org>
> ---
>
>  net/bluetooth/hci_core.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 34c8dca2069f..e3706889976d 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -4213,8 +4213,11 @@ static void hci_cmd_work(struct work_struct *work)
>         BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name,
>                atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q));
>
> -       /* Send queued commands */
> -       if (atomic_read(&hdev->cmd_cnt)) {
> +       /* Send queued commands. Don't send the command when there is a pending
> +        * hci_request because the request callbacks would be overwritten.
> +        */
> +       if (atomic_read(&hdev->cmd_cnt) &&
> +           !hci_dev_test_flag(hdev, HCI_CMD_PENDING)) {
>                 skb = skb_dequeue(&hdev->cmd_q);
>                 if (!skb)
>                         return;
> --
> 2.43.0.687.g38aa6559b0-goog


This seems to be causing some mgmt-tester failures:

Pair Device - Sec Mode 3 Success 1                   Timed out   22.753 seconds
Pair Device - Sec Mode 3 Reject 1                    Timed out   22.533 seconds
Pair Device - Sec Mode 3 Reject 2                    Timed out   22.526 seconds

I think this is because we need to respond to an event with a command like:

< HCI Command: Create Conn.. (0x01|0x0005) plen 13  #241 [hci0] 16:25:38.699066
        Address: 00:AA:01:01:00:00 (Intel Corporation)
        Packet type: 0x0018
          DM1 may be used
          DH1 may be used
        Page scan repetition mode: R2 (0x02)
        Page scan mode: Mandatory (0x00)
        Clock offset: 0x0000
        Role switch: Allow peripheral (0x01)
> HCI Event: Command Status (0x0f) plen 4           #242 [hci0] 16:25:38.701881
      Create Connection (0x01|0x0005) ncmd 1
        Status: Success (0x00)
> HCI Event: Link Key Request (0x17) plen 6         #243 [hci0] 16:25:38.702375
        Address: 00:AA:01:01:00:00 (Intel Corporation)

But because Create Connection is pending we cannot respond to Link Key
Request, so it is actually a design problem if we cannot send commands
because something is pending so perhaps we need to redesign how we
store cmd_sent so we can have multiple outstanding commands rather
than just one.

-- 
Luiz Augusto von Dentz