[PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue

Heitor Alves de Siqueira posted 1 patch 4 weeks, 1 day ago
net/bluetooth/hci_core.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
[PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue
Posted by Heitor Alves de Siqueira 4 weeks, 1 day ago
Syzbot reported a warning when L2CAP calls queue_work() on the hdev
workqueue while it's being drained. This can happen during device reset or
close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().

The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
  - hci_dev_close_sync() clears the HCI_UP bit before draining
  - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining

Add these checks before queuing tx_work, and free the SKB if it's not
queued for transmission.

Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
---
 net/bluetooth/hci_core.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index c46c1236ebfa..5d5f8ad7d1a8 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
 
 	BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
 
+	if (!test_bit(HCI_UP, &hdev->flags) ||
+	    hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
+		kfree_skb(skb);
+		return;
+	}
+
 	hci_queue_acl(chan, &chan->data_q, skb, flags);
 
 	queue_work(hdev->workqueue, &hdev->tx_work);
@@ -3291,6 +3297,12 @@ void hci_send_sco(struct hci_conn *conn, struct sk_buff *skb)
 
 	BT_DBG("%s len %d", hdev->name, skb->len);
 
+	if (!test_bit(HCI_UP, &hdev->flags) ||
+	    hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
+		kfree_skb(skb);
+		return;
+	}
+
 	hdr.handle = cpu_to_le16(conn->handle);
 	hdr.dlen   = skb->len;
 
@@ -3374,6 +3386,12 @@ void hci_send_iso(struct hci_conn *conn, struct sk_buff *skb)
 
 	BT_DBG("%s len %d", hdev->name, skb->len);
 
+	if (!test_bit(HCI_UP, &hdev->flags) ||
+	    hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
+		kfree_skb(skb);
+		return;
+	}
+
 	hci_queue_iso(conn, &conn->data_q, skb);
 
 	queue_work(hdev->workqueue, &hdev->tx_work);

---
base-commit: 1f63dd8ca0dc05a8272bb8155f643c691d29bb11
change-id: 20260513-hci_send-640290de7acc

Best regards,
--  
Heitor Alves de Siqueira <halves@igalia.com>
Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue
Posted by Hillf Danton 4 weeks, 1 day ago
On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
> workqueue while it's being drained. This can happen during device reset or
> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
> 
> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
>   - hci_dev_close_sync() clears the HCI_UP bit before draining
>   - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
> 
> Add these checks before queuing tx_work, and free the SKB if it's not
> queued for transmission.
> 
> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
> ---
>  net/bluetooth/hci_core.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index c46c1236ebfa..5d5f8ad7d1a8 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
>  
>  	BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
>  
> +	if (!test_bit(HCI_UP, &hdev->flags) ||
> +	    hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
> +		kfree_skb(skb);
> +		return;
> +	}
> +
>  	hci_queue_acl(chan, &chan->data_q, skb, flags);
>  
>  	queue_work(hdev->workqueue, &hdev->tx_work);
>
What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue
Posted by Heitor Alves de Siqueira 4 weeks, 1 day ago
On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
> On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
>> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
>> workqueue while it's being drained. This can happen during device reset or
>> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
>> 
>> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
>>   - hci_dev_close_sync() clears the HCI_UP bit before draining
>>   - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
>> 
>> Add these checks before queuing tx_work, and free the SKB if it's not
>> queued for transmission.
>> 
>> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
>> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
>> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
>> ---
>>  net/bluetooth/hci_core.c | 18 ++++++++++++++++++
>>  1 file changed, 18 insertions(+)
>> 
>> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> index c46c1236ebfa..5d5f8ad7d1a8 100644
>> --- a/net/bluetooth/hci_core.c
>> +++ b/net/bluetooth/hci_core.c
>> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
>>  
>>  	BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
>>  
>> +	if (!test_bit(HCI_UP, &hdev->flags) ||
>> +	    hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
>> +		kfree_skb(skb);
>> +		return;
>> +	}
>> +
>>  	hci_queue_acl(chan, &chan->data_q, skb, flags);
>>  
>>  	queue_work(hdev->workqueue, &hdev->tx_work);
>>
> What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
> checked in hci_cmd_work(), and in hci_dev_do_reset() for why.

I see, I missed the RCU guards for the device flags. Sorry about that,
I'll add them to v2.
Thanks for the catch!

Best,
Heitor
Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue
Posted by Luiz Augusto von Dentz 4 weeks ago
Hi Heitor,

On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira
<halves@igalia.com> wrote:
>
> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
> >> workqueue while it's being drained. This can happen during device reset or
> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
> >>
> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
> >>   - hci_dev_close_sync() clears the HCI_UP bit before draining
> >>   - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
> >>
> >> Add these checks before queuing tx_work, and free the SKB if it's not
> >> queued for transmission.
> >>
> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
> >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
> >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
> >> ---
> >>  net/bluetooth/hci_core.c | 18 ++++++++++++++++++
> >>  1 file changed, 18 insertions(+)
> >>
> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> >> index c46c1236ebfa..5d5f8ad7d1a8 100644
> >> --- a/net/bluetooth/hci_core.c
> >> +++ b/net/bluetooth/hci_core.c
> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
> >>
> >>      BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
> >>
> >> +    if (!test_bit(HCI_UP, &hdev->flags) ||
> >> +        hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
> >> +            kfree_skb(skb);
> >> +            return;
> >> +    }
> >> +
> >>      hci_queue_acl(chan, &chan->data_q, skb, flags);
> >>
> >>      queue_work(hdev->workqueue, &hdev->tx_work);
> >>
> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
>
> I see, I missed the RCU guards for the device flags. Sorry about that,
> I'll add them to v2.
> Thanks for the catch!

Actually this whole thing might be because we try to clean the
workqueue without actually closing the hdev, so I suspect that if we
just remove all the code from hci_dev_do_reset with
hci_dev_do_close+hci_dev_do_open, it might work better and align with
how things work over the MGMT interface:

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index c46c1236ebfa..6782bbc9b6a7 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev)

        hci_req_sync_lock(hdev);

-       /* Drop queues */
-       skb_queue_purge(&hdev->rx_q);
-       skb_queue_purge(&hdev->cmd_q);
-
-       /* Cancel these to avoid queueing non-chained pending work */
-       hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
-       /* Wait for
-        *
-        *    if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
-        *        queue_delayed_work(&hdev->{cmd,ncmd}_timer)
-        *
-        * inside RCU section to see the flag or complete scheduling.
-        */
-       synchronize_rcu();
-       /* Explicitly cancel works in case scheduled after setting the flag. */
-       cancel_delayed_work(&hdev->cmd_timer);
-       cancel_delayed_work(&hdev->ncmd_timer);
-
-       /* Avoid potential lockdep warnings from the *_flush() calls by
-        * ensuring the workqueue is empty up front.
-        */
-       drain_workqueue(hdev->workqueue);
-
-       hci_dev_lock(hdev);
-       hci_inquiry_cache_flush(hdev);
-       hci_conn_hash_flush(hdev);
-       hci_dev_unlock(hdev);
-
-       if (hdev->flush)
-               hdev->flush(hdev);
-
-       hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
-
-       atomic_set(&hdev->cmd_cnt, 1);
-       hdev->acl_cnt = 0;
-       hdev->sco_cnt = 0;
-       hdev->le_cnt = 0;
-       hdev->iso_cnt = 0;
+       ret = hci_dev_close_sync(hdev);
+       if (ret)
+               goto done;

-       ret = hci_reset_sync(hdev);
+       ret = hci_dev_open_sync(hdev);

+done:
        hci_req_sync_unlock(hdev);
        return ret;
 }

Seem to work here and as a side effect we get notified over the MGMT
when a user uses hciconfig hci0 reset:

# tools/hciconfig hci0 reset
bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0
bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
bluetoothd[34]: src/adapter.c:cancel_passive_scanning()
bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0
has been disabled
bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008
bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1
bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0
has been enabled
bluetoothd[34]: src/adapter.c:trigger_passive_scanning()
bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged

> Best,
> Heitor



-- 
Luiz Augusto von Dentz
Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue
Posted by Heitor Alves de Siqueira 3 weeks, 4 days ago
Hi Luiz,

On Fri May 15, 2026 at 10:33 AM -03, Luiz Augusto von Dentz wrote:
> Hi Heitor,
>
> On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira
> <halves@igalia.com> wrote:
>>
>> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
>> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
>> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
>> >> workqueue while it's being drained. This can happen during device reset or
>> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
>> >>
>> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
>> >>   - hci_dev_close_sync() clears the HCI_UP bit before draining
>> >>   - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
>> >>
>> >> Add these checks before queuing tx_work, and free the SKB if it's not
>> >> queued for transmission.
>> >>
>> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
>> >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
>> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
>> >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
>> >> ---
>> >>  net/bluetooth/hci_core.c | 18 ++++++++++++++++++
>> >>  1 file changed, 18 insertions(+)
>> >>
>> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> >> index c46c1236ebfa..5d5f8ad7d1a8 100644
>> >> --- a/net/bluetooth/hci_core.c
>> >> +++ b/net/bluetooth/hci_core.c
>> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
>> >>
>> >>      BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
>> >>
>> >> +    if (!test_bit(HCI_UP, &hdev->flags) ||
>> >> +        hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
>> >> +            kfree_skb(skb);
>> >> +            return;
>> >> +    }
>> >> +
>> >>      hci_queue_acl(chan, &chan->data_q, skb, flags);
>> >>
>> >>      queue_work(hdev->workqueue, &hdev->tx_work);
>> >>
>> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
>> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
>>
>> I see, I missed the RCU guards for the device flags. Sorry about that,
>> I'll add them to v2.
>> Thanks for the catch!
>
> Actually this whole thing might be because we try to clean the
> workqueue without actually closing the hdev, so I suspect that if we
> just remove all the code from hci_dev_do_reset with
> hci_dev_do_close+hci_dev_do_open, it might work better and align with
> how things work over the MGMT interface:
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index c46c1236ebfa..6782bbc9b6a7 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
>
>         hci_req_sync_lock(hdev);
>
> -       /* Drop queues */
> -       skb_queue_purge(&hdev->rx_q);
> -       skb_queue_purge(&hdev->cmd_q);
> -
> -       /* Cancel these to avoid queueing non-chained pending work */
> -       hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> -       /* Wait for
> -        *
> -        *    if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> -        *        queue_delayed_work(&hdev->{cmd,ncmd}_timer)
> -        *
> -        * inside RCU section to see the flag or complete scheduling.
> -        */
> -       synchronize_rcu();
> -       /* Explicitly cancel works in case scheduled after setting the flag. */
> -       cancel_delayed_work(&hdev->cmd_timer);
> -       cancel_delayed_work(&hdev->ncmd_timer);
> -
> -       /* Avoid potential lockdep warnings from the *_flush() calls by
> -        * ensuring the workqueue is empty up front.
> -        */
> -       drain_workqueue(hdev->workqueue);
> -
> -       hci_dev_lock(hdev);
> -       hci_inquiry_cache_flush(hdev);
> -       hci_conn_hash_flush(hdev);
> -       hci_dev_unlock(hdev);
> -
> -       if (hdev->flush)
> -               hdev->flush(hdev);
> -
> -       hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> -
> -       atomic_set(&hdev->cmd_cnt, 1);
> -       hdev->acl_cnt = 0;
> -       hdev->sco_cnt = 0;
> -       hdev->le_cnt = 0;
> -       hdev->iso_cnt = 0;
> +       ret = hci_dev_close_sync(hdev);
> +       if (ret)
> +               goto done;
>
> -       ret = hci_reset_sync(hdev);
> +       ret = hci_dev_open_sync(hdev);
>
> +done:
>         hci_req_sync_unlock(hdev);
>         return ret;
>  }
>
> Seem to work here and as a side effect we get notified over the MGMT
> when a user uses hciconfig hci0 reset:
>
> # tools/hciconfig hci0 reset
> bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
> bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0
> bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
> bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
> bluetoothd[34]: src/adapter.c:cancel_passive_scanning()
> bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0
> has been disabled
> bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008
> bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
> bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1
> bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
> bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
> bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0
> has been enabled
> bluetoothd[34]: src/adapter.c:trigger_passive_scanning()
> bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged

Thank you for the review, and for the suggestion! There seems to be a
number of similar syzbot reports for the hdev workqueue, so maybe the
close+open approach is a simpler solution indeed. I've originally
considered implementing a wrapper/helper for workqueue submission in
hci_core.c, but if we can eliminate the race condition altogether that'd
be even better.

My only concern is that there seem to be slight differences between
what hci_dev_do_reset() does now and what we'd get with
hci_dev_close_sync()+hci_dev_open_sync() (e.g. we wouldn't use
HCI_CMD_DRAIN_WORKQUEUE anymore). Would this alternative approach be
suitable for the stable kernels? And if not, do you think the checks
from v1 would be appropriate, in that case?
Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue
Posted by Luiz Augusto von Dentz 3 weeks, 3 days ago
Hi Heitor,

On Mon, May 18, 2026 at 11:21 AM Heitor Alves de Siqueira
<halves@igalia.com> wrote:
>
> Hi Luiz,
>
> On Fri May 15, 2026 at 10:33 AM -03, Luiz Augusto von Dentz wrote:
> > Hi Heitor,
> >
> > On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira
> > <halves@igalia.com> wrote:
> >>
> >> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
> >> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
> >> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
> >> >> workqueue while it's being drained. This can happen during device reset or
> >> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
> >> >>
> >> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
> >> >>   - hci_dev_close_sync() clears the HCI_UP bit before draining
> >> >>   - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
> >> >>
> >> >> Add these checks before queuing tx_work, and free the SKB if it's not
> >> >> queued for transmission.
> >> >>
> >> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
> >> >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
> >> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
> >> >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
> >> >> ---
> >> >>  net/bluetooth/hci_core.c | 18 ++++++++++++++++++
> >> >>  1 file changed, 18 insertions(+)
> >> >>
> >> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> >> >> index c46c1236ebfa..5d5f8ad7d1a8 100644
> >> >> --- a/net/bluetooth/hci_core.c
> >> >> +++ b/net/bluetooth/hci_core.c
> >> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
> >> >>
> >> >>      BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
> >> >>
> >> >> +    if (!test_bit(HCI_UP, &hdev->flags) ||
> >> >> +        hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
> >> >> +            kfree_skb(skb);
> >> >> +            return;
> >> >> +    }
> >> >> +
> >> >>      hci_queue_acl(chan, &chan->data_q, skb, flags);
> >> >>
> >> >>      queue_work(hdev->workqueue, &hdev->tx_work);
> >> >>
> >> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
> >> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
> >>
> >> I see, I missed the RCU guards for the device flags. Sorry about that,
> >> I'll add them to v2.
> >> Thanks for the catch!
> >
> > Actually this whole thing might be because we try to clean the
> > workqueue without actually closing the hdev, so I suspect that if we
> > just remove all the code from hci_dev_do_reset with
> > hci_dev_do_close+hci_dev_do_open, it might work better and align with
> > how things work over the MGMT interface:
> >
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index c46c1236ebfa..6782bbc9b6a7 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
> >
> >         hci_req_sync_lock(hdev);
> >
> > -       /* Drop queues */
> > -       skb_queue_purge(&hdev->rx_q);
> > -       skb_queue_purge(&hdev->cmd_q);
> > -
> > -       /* Cancel these to avoid queueing non-chained pending work */
> > -       hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> > -       /* Wait for
> > -        *
> > -        *    if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> > -        *        queue_delayed_work(&hdev->{cmd,ncmd}_timer)
> > -        *
> > -        * inside RCU section to see the flag or complete scheduling.
> > -        */
> > -       synchronize_rcu();
> > -       /* Explicitly cancel works in case scheduled after setting the flag. */
> > -       cancel_delayed_work(&hdev->cmd_timer);
> > -       cancel_delayed_work(&hdev->ncmd_timer);
> > -
> > -       /* Avoid potential lockdep warnings from the *_flush() calls by
> > -        * ensuring the workqueue is empty up front.
> > -        */
> > -       drain_workqueue(hdev->workqueue);
> > -
> > -       hci_dev_lock(hdev);
> > -       hci_inquiry_cache_flush(hdev);
> > -       hci_conn_hash_flush(hdev);
> > -       hci_dev_unlock(hdev);
> > -
> > -       if (hdev->flush)
> > -               hdev->flush(hdev);
> > -
> > -       hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> > -
> > -       atomic_set(&hdev->cmd_cnt, 1);
> > -       hdev->acl_cnt = 0;
> > -       hdev->sco_cnt = 0;
> > -       hdev->le_cnt = 0;
> > -       hdev->iso_cnt = 0;
> > +       ret = hci_dev_close_sync(hdev);
> > +       if (ret)
> > +               goto done;
> >
> > -       ret = hci_reset_sync(hdev);
> > +       ret = hci_dev_open_sync(hdev);
> >
> > +done:
> >         hci_req_sync_unlock(hdev);
> >         return ret;
> >  }
> >
> > Seem to work here and as a side effect we get notified over the MGMT
> > when a user uses hciconfig hci0 reset:
> >
> > # tools/hciconfig hci0 reset
> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0
> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
> > bluetoothd[34]: src/adapter.c:cancel_passive_scanning()
> > bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0
> > has been disabled
> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008
> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1
> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
> > bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0
> > has been enabled
> > bluetoothd[34]: src/adapter.c:trigger_passive_scanning()
> > bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged
>
> Thank you for the review, and for the suggestion! There seems to be a
> number of similar syzbot reports for the hdev workqueue, so maybe the
> close+open approach is a simpler solution indeed. I've originally
> considered implementing a wrapper/helper for workqueue submission in
> hci_core.c, but if we can eliminate the race condition altogether that'd
> be even better.
>
> My only concern is that there seem to be slight differences between
> what hci_dev_do_reset() does now and what we'd get with
> hci_dev_close_sync()+hci_dev_open_sync() (e.g. we wouldn't use
> HCI_CMD_DRAIN_WORKQUEUE anymore). Would this alternative approach be
> suitable for the stable kernels? And if not, do you think the checks
> from v1 would be appropriate, in that case?

I think, the HCI_Reset is a heavy hammer because it's a global reset.
Getting all the pieces to cooperate has been causing all sort of
issues, and in any case I think what hci_dev_do_reset does is simply
wrong or outdated; for instance it doesn't seem to affect LE, such as
cancelling scanning or advertising. This is likely because interfaces
like MGMT don't use it and require a power cycle to reset.

-- 
Luiz Augusto von Dentz
Re: [PATCH] Bluetooth: hci_core: Don't queue tx_work while draining workqueue
Posted by Heitor Alves de Siqueira 3 weeks, 2 days ago
On Mon May 18, 2026 at 5:48 PM -03, Luiz Augusto von Dentz wrote:
> Hi Heitor,
>
> On Mon, May 18, 2026 at 11:21 AM Heitor Alves de Siqueira
> <halves@igalia.com> wrote:
>>
>> Hi Luiz,
>>
>> On Fri May 15, 2026 at 10:33 AM -03, Luiz Augusto von Dentz wrote:
>> > Hi Heitor,
>> >
>> > On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira
>> > <halves@igalia.com> wrote:
>> >>
>> >> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
>> >> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
>> >> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
>> >> >> workqueue while it's being drained. This can happen during device reset or
>> >> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
>> >> >>
>> >> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
>> >> >>   - hci_dev_close_sync() clears the HCI_UP bit before draining
>> >> >>   - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
>> >> >>
>> >> >> Add these checks before queuing tx_work, and free the SKB if it's not
>> >> >> queued for transmission.
>> >> >>
>> >> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
>> >> >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
>> >> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
>> >> >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
>> >> >> ---
>> >> >>  net/bluetooth/hci_core.c | 18 ++++++++++++++++++
>> >> >>  1 file changed, 18 insertions(+)
>> >> >>
>> >> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> >> >> index c46c1236ebfa..5d5f8ad7d1a8 100644
>> >> >> --- a/net/bluetooth/hci_core.c
>> >> >> +++ b/net/bluetooth/hci_core.c
>> >> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
>> >> >>
>> >> >>      BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
>> >> >>
>> >> >> +    if (!test_bit(HCI_UP, &hdev->flags) ||
>> >> >> +        hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
>> >> >> +            kfree_skb(skb);
>> >> >> +            return;
>> >> >> +    }
>> >> >> +
>> >> >>      hci_queue_acl(chan, &chan->data_q, skb, flags);
>> >> >>
>> >> >>      queue_work(hdev->workqueue, &hdev->tx_work);
>> >> >>
>> >> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
>> >> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
>> >>
>> >> I see, I missed the RCU guards for the device flags. Sorry about that,
>> >> I'll add them to v2.
>> >> Thanks for the catch!
>> >
>> > Actually this whole thing might be because we try to clean the
>> > workqueue without actually closing the hdev, so I suspect that if we
>> > just remove all the code from hci_dev_do_reset with
>> > hci_dev_do_close+hci_dev_do_open, it might work better and align with
>> > how things work over the MGMT interface:
>> >
>> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> > index c46c1236ebfa..6782bbc9b6a7 100644
>> > --- a/net/bluetooth/hci_core.c
>> > +++ b/net/bluetooth/hci_core.c
>> > @@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
>> >
>> >         hci_req_sync_lock(hdev);
>> >
>> > -       /* Drop queues */
>> > -       skb_queue_purge(&hdev->rx_q);
>> > -       skb_queue_purge(&hdev->cmd_q);
>> > -
>> > -       /* Cancel these to avoid queueing non-chained pending work */
>> > -       hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
>> > -       /* Wait for
>> > -        *
>> > -        *    if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
>> > -        *        queue_delayed_work(&hdev->{cmd,ncmd}_timer)
>> > -        *
>> > -        * inside RCU section to see the flag or complete scheduling.
>> > -        */
>> > -       synchronize_rcu();
>> > -       /* Explicitly cancel works in case scheduled after setting the flag. */
>> > -       cancel_delayed_work(&hdev->cmd_timer);
>> > -       cancel_delayed_work(&hdev->ncmd_timer);
>> > -
>> > -       /* Avoid potential lockdep warnings from the *_flush() calls by
>> > -        * ensuring the workqueue is empty up front.
>> > -        */
>> > -       drain_workqueue(hdev->workqueue);
>> > -
>> > -       hci_dev_lock(hdev);
>> > -       hci_inquiry_cache_flush(hdev);
>> > -       hci_conn_hash_flush(hdev);
>> > -       hci_dev_unlock(hdev);
>> > -
>> > -       if (hdev->flush)
>> > -               hdev->flush(hdev);
>> > -
>> > -       hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
>> > -
>> > -       atomic_set(&hdev->cmd_cnt, 1);
>> > -       hdev->acl_cnt = 0;
>> > -       hdev->sco_cnt = 0;
>> > -       hdev->le_cnt = 0;
>> > -       hdev->iso_cnt = 0;
>> > +       ret = hci_dev_close_sync(hdev);
>> > +       if (ret)
>> > +               goto done;
>> >
>> > -       ret = hci_reset_sync(hdev);
>> > +       ret = hci_dev_open_sync(hdev);
>> >
>> > +done:
>> >         hci_req_sync_unlock(hdev);
>> >         return ret;
>> >  }
>> >
>> > Seem to work here and as a side effect we get notified over the MGMT
>> > when a user uses hciconfig hci0 reset:
>> >
>> > # tools/hciconfig hci0 reset
>> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
>> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0
>> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
>> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
>> > bluetoothd[34]: src/adapter.c:cancel_passive_scanning()
>> > bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0
>> > has been disabled
>> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008
>> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
>> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1
>> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
>> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
>> > bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0
>> > has been enabled
>> > bluetoothd[34]: src/adapter.c:trigger_passive_scanning()
>> > bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged
>>
>> Thank you for the review, and for the suggestion! There seems to be a
>> number of similar syzbot reports for the hdev workqueue, so maybe the
>> close+open approach is a simpler solution indeed. I've originally
>> considered implementing a wrapper/helper for workqueue submission in
>> hci_core.c, but if we can eliminate the race condition altogether that'd
>> be even better.
>>
>> My only concern is that there seem to be slight differences between
>> what hci_dev_do_reset() does now and what we'd get with
>> hci_dev_close_sync()+hci_dev_open_sync() (e.g. we wouldn't use
>> HCI_CMD_DRAIN_WORKQUEUE anymore). Would this alternative approach be
>> suitable for the stable kernels? And if not, do you think the checks
>> from v1 would be appropriate, in that case?
>
> I think, the HCI_Reset is a heavy hammer because it's a global reset.
> Getting all the pieces to cooperate has been causing all sort of
> issues, and in any case I think what hci_dev_do_reset does is simply
> wrong or outdated; for instance it doesn't seem to affect LE, such as
> cancelling scanning or advertising. This is likely because interfaces
> like MGMT don't use it and require a power cycle to reset.

ACK, that makes sense. Please disregard this patch, I'll send a
follow-up with the reset changes and any missing bits from the current
async reset path.

Thank you for the pointers, Luiz!