net/bluetooth/hci_core.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
Syzbot reported a warning when L2CAP calls queue_work() on the hdev
workqueue while it's being drained. This can happen during device reset or
close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
- hci_dev_close_sync() clears the HCI_UP bit before draining
- hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
Add these checks before queuing tx_work, and free the SKB if it's not
queued for transmission.
Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
---
net/bluetooth/hci_core.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index c46c1236ebfa..5d5f8ad7d1a8 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
+ if (!test_bit(HCI_UP, &hdev->flags) ||
+ hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
+ kfree_skb(skb);
+ return;
+ }
+
hci_queue_acl(chan, &chan->data_q, skb, flags);
queue_work(hdev->workqueue, &hdev->tx_work);
@@ -3291,6 +3297,12 @@ void hci_send_sco(struct hci_conn *conn, struct sk_buff *skb)
BT_DBG("%s len %d", hdev->name, skb->len);
+ if (!test_bit(HCI_UP, &hdev->flags) ||
+ hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
+ kfree_skb(skb);
+ return;
+ }
+
hdr.handle = cpu_to_le16(conn->handle);
hdr.dlen = skb->len;
@@ -3374,6 +3386,12 @@ void hci_send_iso(struct hci_conn *conn, struct sk_buff *skb)
BT_DBG("%s len %d", hdev->name, skb->len);
+ if (!test_bit(HCI_UP, &hdev->flags) ||
+ hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
+ kfree_skb(skb);
+ return;
+ }
+
hci_queue_iso(conn, &conn->data_q, skb);
queue_work(hdev->workqueue, &hdev->tx_work);
---
base-commit: 1f63dd8ca0dc05a8272bb8155f643c691d29bb11
change-id: 20260513-hci_send-640290de7acc
Best regards,
--
Heitor Alves de Siqueira <halves@igalia.com>
On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
> workqueue while it's being drained. This can happen during device reset or
> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
>
> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
> - hci_dev_close_sync() clears the HCI_UP bit before draining
> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
>
> Add these checks before queuing tx_work, and free the SKB if it's not
> queued for transmission.
>
> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
> ---
> net/bluetooth/hci_core.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index c46c1236ebfa..5d5f8ad7d1a8 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
>
> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
>
> + if (!test_bit(HCI_UP, &hdev->flags) ||
> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
> + kfree_skb(skb);
> + return;
> + }
> +
> hci_queue_acl(chan, &chan->data_q, skb, flags);
>
> queue_work(hdev->workqueue, &hdev->tx_work);
>
What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
> On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
>> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
>> workqueue while it's being drained. This can happen during device reset or
>> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
>>
>> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
>> - hci_dev_close_sync() clears the HCI_UP bit before draining
>> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
>>
>> Add these checks before queuing tx_work, and free the SKB if it's not
>> queued for transmission.
>>
>> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
>> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
>> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
>> ---
>> net/bluetooth/hci_core.c | 18 ++++++++++++++++++
>> 1 file changed, 18 insertions(+)
>>
>> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> index c46c1236ebfa..5d5f8ad7d1a8 100644
>> --- a/net/bluetooth/hci_core.c
>> +++ b/net/bluetooth/hci_core.c
>> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
>>
>> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
>>
>> + if (!test_bit(HCI_UP, &hdev->flags) ||
>> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
>> + kfree_skb(skb);
>> + return;
>> + }
>> +
>> hci_queue_acl(chan, &chan->data_q, skb, flags);
>>
>> queue_work(hdev->workqueue, &hdev->tx_work);
>>
> What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
> checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
I see, I missed the RCU guards for the device flags. Sorry about that,
I'll add them to v2.
Thanks for the catch!
Best,
Heitor
Hi Heitor,
On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira
<halves@igalia.com> wrote:
>
> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
> >> workqueue while it's being drained. This can happen during device reset or
> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
> >>
> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
> >> - hci_dev_close_sync() clears the HCI_UP bit before draining
> >> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
> >>
> >> Add these checks before queuing tx_work, and free the SKB if it's not
> >> queued for transmission.
> >>
> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
> >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
> >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
> >> ---
> >> net/bluetooth/hci_core.c | 18 ++++++++++++++++++
> >> 1 file changed, 18 insertions(+)
> >>
> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> >> index c46c1236ebfa..5d5f8ad7d1a8 100644
> >> --- a/net/bluetooth/hci_core.c
> >> +++ b/net/bluetooth/hci_core.c
> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
> >>
> >> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
> >>
> >> + if (!test_bit(HCI_UP, &hdev->flags) ||
> >> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
> >> + kfree_skb(skb);
> >> + return;
> >> + }
> >> +
> >> hci_queue_acl(chan, &chan->data_q, skb, flags);
> >>
> >> queue_work(hdev->workqueue, &hdev->tx_work);
> >>
> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
>
> I see, I missed the RCU guards for the device flags. Sorry about that,
> I'll add them to v2.
> Thanks for the catch!
Actually this whole thing might be because we try to clean the
workqueue without actually closing the hdev, so I suspect that if we
just remove all the code from hci_dev_do_reset with
hci_dev_do_close+hci_dev_do_open, it might work better and align with
how things work over the MGMT interface:
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index c46c1236ebfa..6782bbc9b6a7 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
hci_req_sync_lock(hdev);
- /* Drop queues */
- skb_queue_purge(&hdev->rx_q);
- skb_queue_purge(&hdev->cmd_q);
-
- /* Cancel these to avoid queueing non-chained pending work */
- hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
- /* Wait for
- *
- * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
- * queue_delayed_work(&hdev->{cmd,ncmd}_timer)
- *
- * inside RCU section to see the flag or complete scheduling.
- */
- synchronize_rcu();
- /* Explicitly cancel works in case scheduled after setting the flag. */
- cancel_delayed_work(&hdev->cmd_timer);
- cancel_delayed_work(&hdev->ncmd_timer);
-
- /* Avoid potential lockdep warnings from the *_flush() calls by
- * ensuring the workqueue is empty up front.
- */
- drain_workqueue(hdev->workqueue);
-
- hci_dev_lock(hdev);
- hci_inquiry_cache_flush(hdev);
- hci_conn_hash_flush(hdev);
- hci_dev_unlock(hdev);
-
- if (hdev->flush)
- hdev->flush(hdev);
-
- hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
-
- atomic_set(&hdev->cmd_cnt, 1);
- hdev->acl_cnt = 0;
- hdev->sco_cnt = 0;
- hdev->le_cnt = 0;
- hdev->iso_cnt = 0;
+ ret = hci_dev_close_sync(hdev);
+ if (ret)
+ goto done;
- ret = hci_reset_sync(hdev);
+ ret = hci_dev_open_sync(hdev);
+done:
hci_req_sync_unlock(hdev);
return ret;
}
Seem to work here and as a side effect we get notified over the MGMT
when a user uses hciconfig hci0 reset:
# tools/hciconfig hci0 reset
bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0
bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
bluetoothd[34]: src/adapter.c:cancel_passive_scanning()
bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0
has been disabled
bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008
bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1
bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0
has been enabled
bluetoothd[34]: src/adapter.c:trigger_passive_scanning()
bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged
> Best,
> Heitor
--
Luiz Augusto von Dentz
Hi Luiz,
On Fri May 15, 2026 at 10:33 AM -03, Luiz Augusto von Dentz wrote:
> Hi Heitor,
>
> On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira
> <halves@igalia.com> wrote:
>>
>> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
>> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
>> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
>> >> workqueue while it's being drained. This can happen during device reset or
>> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
>> >>
>> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
>> >> - hci_dev_close_sync() clears the HCI_UP bit before draining
>> >> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
>> >>
>> >> Add these checks before queuing tx_work, and free the SKB if it's not
>> >> queued for transmission.
>> >>
>> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
>> >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
>> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
>> >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
>> >> ---
>> >> net/bluetooth/hci_core.c | 18 ++++++++++++++++++
>> >> 1 file changed, 18 insertions(+)
>> >>
>> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> >> index c46c1236ebfa..5d5f8ad7d1a8 100644
>> >> --- a/net/bluetooth/hci_core.c
>> >> +++ b/net/bluetooth/hci_core.c
>> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
>> >>
>> >> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
>> >>
>> >> + if (!test_bit(HCI_UP, &hdev->flags) ||
>> >> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
>> >> + kfree_skb(skb);
>> >> + return;
>> >> + }
>> >> +
>> >> hci_queue_acl(chan, &chan->data_q, skb, flags);
>> >>
>> >> queue_work(hdev->workqueue, &hdev->tx_work);
>> >>
>> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
>> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
>>
>> I see, I missed the RCU guards for the device flags. Sorry about that,
>> I'll add them to v2.
>> Thanks for the catch!
>
> Actually this whole thing might be because we try to clean the
> workqueue without actually closing the hdev, so I suspect that if we
> just remove all the code from hci_dev_do_reset with
> hci_dev_do_close+hci_dev_do_open, it might work better and align with
> how things work over the MGMT interface:
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index c46c1236ebfa..6782bbc9b6a7 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
>
> hci_req_sync_lock(hdev);
>
> - /* Drop queues */
> - skb_queue_purge(&hdev->rx_q);
> - skb_queue_purge(&hdev->cmd_q);
> -
> - /* Cancel these to avoid queueing non-chained pending work */
> - hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> - /* Wait for
> - *
> - * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> - * queue_delayed_work(&hdev->{cmd,ncmd}_timer)
> - *
> - * inside RCU section to see the flag or complete scheduling.
> - */
> - synchronize_rcu();
> - /* Explicitly cancel works in case scheduled after setting the flag. */
> - cancel_delayed_work(&hdev->cmd_timer);
> - cancel_delayed_work(&hdev->ncmd_timer);
> -
> - /* Avoid potential lockdep warnings from the *_flush() calls by
> - * ensuring the workqueue is empty up front.
> - */
> - drain_workqueue(hdev->workqueue);
> -
> - hci_dev_lock(hdev);
> - hci_inquiry_cache_flush(hdev);
> - hci_conn_hash_flush(hdev);
> - hci_dev_unlock(hdev);
> -
> - if (hdev->flush)
> - hdev->flush(hdev);
> -
> - hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> -
> - atomic_set(&hdev->cmd_cnt, 1);
> - hdev->acl_cnt = 0;
> - hdev->sco_cnt = 0;
> - hdev->le_cnt = 0;
> - hdev->iso_cnt = 0;
> + ret = hci_dev_close_sync(hdev);
> + if (ret)
> + goto done;
>
> - ret = hci_reset_sync(hdev);
> + ret = hci_dev_open_sync(hdev);
>
> +done:
> hci_req_sync_unlock(hdev);
> return ret;
> }
>
> Seem to work here and as a side effect we get notified over the MGMT
> when a user uses hciconfig hci0 reset:
>
> # tools/hciconfig hci0 reset
> bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
> bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0
> bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
> bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
> bluetoothd[34]: src/adapter.c:cancel_passive_scanning()
> bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0
> has been disabled
> bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008
> bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
> bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1
> bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
> bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
> bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0
> has been enabled
> bluetoothd[34]: src/adapter.c:trigger_passive_scanning()
> bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged
Thank you for the review, and for the suggestion! There seems to be a
number of similar syzbot reports for the hdev workqueue, so maybe the
close+open approach is a simpler solution indeed. I've originally
considered implementing a wrapper/helper for workqueue submission in
hci_core.c, but if we can eliminate the race condition altogether that'd
be even better.
My only concern is that there seem to be slight differences between
what hci_dev_do_reset() does now and what we'd get with
hci_dev_close_sync()+hci_dev_open_sync() (e.g. we wouldn't use
HCI_CMD_DRAIN_WORKQUEUE anymore). Would this alternative approach be
suitable for the stable kernels? And if not, do you think the checks
from v1 would be appropriate, in that case?
Hi Heitor,
On Mon, May 18, 2026 at 11:21 AM Heitor Alves de Siqueira
<halves@igalia.com> wrote:
>
> Hi Luiz,
>
> On Fri May 15, 2026 at 10:33 AM -03, Luiz Augusto von Dentz wrote:
> > Hi Heitor,
> >
> > On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira
> > <halves@igalia.com> wrote:
> >>
> >> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
> >> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
> >> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
> >> >> workqueue while it's being drained. This can happen during device reset or
> >> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
> >> >>
> >> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
> >> >> - hci_dev_close_sync() clears the HCI_UP bit before draining
> >> >> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
> >> >>
> >> >> Add these checks before queuing tx_work, and free the SKB if it's not
> >> >> queued for transmission.
> >> >>
> >> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
> >> >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
> >> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
> >> >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
> >> >> ---
> >> >> net/bluetooth/hci_core.c | 18 ++++++++++++++++++
> >> >> 1 file changed, 18 insertions(+)
> >> >>
> >> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> >> >> index c46c1236ebfa..5d5f8ad7d1a8 100644
> >> >> --- a/net/bluetooth/hci_core.c
> >> >> +++ b/net/bluetooth/hci_core.c
> >> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
> >> >>
> >> >> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
> >> >>
> >> >> + if (!test_bit(HCI_UP, &hdev->flags) ||
> >> >> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
> >> >> + kfree_skb(skb);
> >> >> + return;
> >> >> + }
> >> >> +
> >> >> hci_queue_acl(chan, &chan->data_q, skb, flags);
> >> >>
> >> >> queue_work(hdev->workqueue, &hdev->tx_work);
> >> >>
> >> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
> >> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
> >>
> >> I see, I missed the RCU guards for the device flags. Sorry about that,
> >> I'll add them to v2.
> >> Thanks for the catch!
> >
> > Actually this whole thing might be because we try to clean the
> > workqueue without actually closing the hdev, so I suspect that if we
> > just remove all the code from hci_dev_do_reset with
> > hci_dev_do_close+hci_dev_do_open, it might work better and align with
> > how things work over the MGMT interface:
> >
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index c46c1236ebfa..6782bbc9b6a7 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
> >
> > hci_req_sync_lock(hdev);
> >
> > - /* Drop queues */
> > - skb_queue_purge(&hdev->rx_q);
> > - skb_queue_purge(&hdev->cmd_q);
> > -
> > - /* Cancel these to avoid queueing non-chained pending work */
> > - hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> > - /* Wait for
> > - *
> > - * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> > - * queue_delayed_work(&hdev->{cmd,ncmd}_timer)
> > - *
> > - * inside RCU section to see the flag or complete scheduling.
> > - */
> > - synchronize_rcu();
> > - /* Explicitly cancel works in case scheduled after setting the flag. */
> > - cancel_delayed_work(&hdev->cmd_timer);
> > - cancel_delayed_work(&hdev->ncmd_timer);
> > -
> > - /* Avoid potential lockdep warnings from the *_flush() calls by
> > - * ensuring the workqueue is empty up front.
> > - */
> > - drain_workqueue(hdev->workqueue);
> > -
> > - hci_dev_lock(hdev);
> > - hci_inquiry_cache_flush(hdev);
> > - hci_conn_hash_flush(hdev);
> > - hci_dev_unlock(hdev);
> > -
> > - if (hdev->flush)
> > - hdev->flush(hdev);
> > -
> > - hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> > -
> > - atomic_set(&hdev->cmd_cnt, 1);
> > - hdev->acl_cnt = 0;
> > - hdev->sco_cnt = 0;
> > - hdev->le_cnt = 0;
> > - hdev->iso_cnt = 0;
> > + ret = hci_dev_close_sync(hdev);
> > + if (ret)
> > + goto done;
> >
> > - ret = hci_reset_sync(hdev);
> > + ret = hci_dev_open_sync(hdev);
> >
> > +done:
> > hci_req_sync_unlock(hdev);
> > return ret;
> > }
> >
> > Seem to work here and as a side effect we get notified over the MGMT
> > when a user uses hciconfig hci0 reset:
> >
> > # tools/hciconfig hci0 reset
> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0
> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
> > bluetoothd[34]: src/adapter.c:cancel_passive_scanning()
> > bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0
> > has been disabled
> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008
> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1
> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
> > bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0
> > has been enabled
> > bluetoothd[34]: src/adapter.c:trigger_passive_scanning()
> > bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged
>
> Thank you for the review, and for the suggestion! There seems to be a
> number of similar syzbot reports for the hdev workqueue, so maybe the
> close+open approach is a simpler solution indeed. I've originally
> considered implementing a wrapper/helper for workqueue submission in
> hci_core.c, but if we can eliminate the race condition altogether that'd
> be even better.
>
> My only concern is that there seem to be slight differences between
> what hci_dev_do_reset() does now and what we'd get with
> hci_dev_close_sync()+hci_dev_open_sync() (e.g. we wouldn't use
> HCI_CMD_DRAIN_WORKQUEUE anymore). Would this alternative approach be
> suitable for the stable kernels? And if not, do you think the checks
> from v1 would be appropriate, in that case?
I think, the HCI_Reset is a heavy hammer because it's a global reset.
Getting all the pieces to cooperate has been causing all sort of
issues, and in any case I think what hci_dev_do_reset does is simply
wrong or outdated; for instance it doesn't seem to affect LE, such as
cancelling scanning or advertising. This is likely because interfaces
like MGMT don't use it and require a power cycle to reset.
--
Luiz Augusto von Dentz
On Mon May 18, 2026 at 5:48 PM -03, Luiz Augusto von Dentz wrote:
> Hi Heitor,
>
> On Mon, May 18, 2026 at 11:21 AM Heitor Alves de Siqueira
> <halves@igalia.com> wrote:
>>
>> Hi Luiz,
>>
>> On Fri May 15, 2026 at 10:33 AM -03, Luiz Augusto von Dentz wrote:
>> > Hi Heitor,
>> >
>> > On Thu, May 14, 2026 at 10:54 AM Heitor Alves de Siqueira
>> > <halves@igalia.com> wrote:
>> >>
>> >> On Wed May 13, 2026 at 11:04 PM -03, Hillf Danton wrote:
>> >> > On Wed, 13 May 2026 15:55:23 -0300 Heitor Alves de Siqueira wrote:
>> >> >> Syzbot reported a warning when L2CAP calls queue_work() on the hdev
>> >> >> workqueue while it's being drained. This can happen during device reset or
>> >> >> close paths for hci_send_acl(), hci_send_sco() and hci_send_iso().
>> >> >>
>> >> >> The workqueue is drained in hci_dev_do_reset() and in hci_dev_close_sync():
>> >> >> - hci_dev_close_sync() clears the HCI_UP bit before draining
>> >> >> - hci_dev_do_reset() sets HCI_CMD_DRAIN_WORKQUEUE before draining
>> >> >>
>> >> >> Add these checks before queuing tx_work, and free the SKB if it's not
>> >> >> queued for transmission.
>> >> >>
>> >> >> Fixes: 3eff45eaf817 ("Bluetooth: convert tx_task to workqueue")
>> >> >> Reported-by: syzbot+97721dd81f792e838ba0@syzkaller.appspotmail.com
>> >> >> Closes: https://syzkaller.appspot.com/bug?extid=97721dd81f792e838ba0
>> >> >> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
>> >> >> ---
>> >> >> net/bluetooth/hci_core.c | 18 ++++++++++++++++++
>> >> >> 1 file changed, 18 insertions(+)
>> >> >>
>> >> >> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> >> >> index c46c1236ebfa..5d5f8ad7d1a8 100644
>> >> >> --- a/net/bluetooth/hci_core.c
>> >> >> +++ b/net/bluetooth/hci_core.c
>> >> >> @@ -3278,6 +3278,12 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags)
>> >> >>
>> >> >> BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
>> >> >>
>> >> >> + if (!test_bit(HCI_UP, &hdev->flags) ||
>> >> >> + hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
>> >> >> + kfree_skb(skb);
>> >> >> + return;
>> >> >> + }
>> >> >> +
>> >> >> hci_queue_acl(chan, &chan->data_q, skb, flags);
>> >> >>
>> >> >> queue_work(hdev->workqueue, &hdev->tx_work);
>> >> >>
>> >> > What you add is not enough, go and see how HCI_CMD_DRAIN_WORKQUEUE is
>> >> > checked in hci_cmd_work(), and in hci_dev_do_reset() for why.
>> >>
>> >> I see, I missed the RCU guards for the device flags. Sorry about that,
>> >> I'll add them to v2.
>> >> Thanks for the catch!
>> >
>> > Actually this whole thing might be because we try to clean the
>> > workqueue without actually closing the hdev, so I suspect that if we
>> > just remove all the code from hci_dev_do_reset with
>> > hci_dev_do_close+hci_dev_do_open, it might work better and align with
>> > how things work over the MGMT interface:
>> >
>> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> > index c46c1236ebfa..6782bbc9b6a7 100644
>> > --- a/net/bluetooth/hci_core.c
>> > +++ b/net/bluetooth/hci_core.c
>> > @@ -539,47 +539,13 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
>> >
>> > hci_req_sync_lock(hdev);
>> >
>> > - /* Drop queues */
>> > - skb_queue_purge(&hdev->rx_q);
>> > - skb_queue_purge(&hdev->cmd_q);
>> > -
>> > - /* Cancel these to avoid queueing non-chained pending work */
>> > - hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
>> > - /* Wait for
>> > - *
>> > - * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
>> > - * queue_delayed_work(&hdev->{cmd,ncmd}_timer)
>> > - *
>> > - * inside RCU section to see the flag or complete scheduling.
>> > - */
>> > - synchronize_rcu();
>> > - /* Explicitly cancel works in case scheduled after setting the flag. */
>> > - cancel_delayed_work(&hdev->cmd_timer);
>> > - cancel_delayed_work(&hdev->ncmd_timer);
>> > -
>> > - /* Avoid potential lockdep warnings from the *_flush() calls by
>> > - * ensuring the workqueue is empty up front.
>> > - */
>> > - drain_workqueue(hdev->workqueue);
>> > -
>> > - hci_dev_lock(hdev);
>> > - hci_inquiry_cache_flush(hdev);
>> > - hci_conn_hash_flush(hdev);
>> > - hci_dev_unlock(hdev);
>> > -
>> > - if (hdev->flush)
>> > - hdev->flush(hdev);
>> > -
>> > - hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
>> > -
>> > - atomic_set(&hdev->cmd_cnt, 1);
>> > - hdev->acl_cnt = 0;
>> > - hdev->sco_cnt = 0;
>> > - hdev->le_cnt = 0;
>> > - hdev->iso_cnt = 0;
>> > + ret = hci_dev_close_sync(hdev);
>> > + if (ret)
>> > + goto done;
>> >
>> > - ret = hci_reset_sync(hdev);
>> > + ret = hci_dev_open_sync(hdev);
>> >
>> > +done:
>> > hci_req_sync_unlock(hdev);
>> > return ret;
>> > }
>> >
>> > Seem to work here and as a side effect we get notified over the MGMT
>> > when a user uses hciconfig hci0 reset:
>> >
>> > # tools/hciconfig hci0 reset
>> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
>> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac0
>> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
>> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
>> > bluetoothd[34]: src/adapter.c:cancel_passive_scanning()
>> > bluetoothd[34]: src/adapter.c:adapter_stop() adapter /org/bluez/hci0
>> > has been disabled
>> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0008
>> > bluetoothd[34]: src/shared/mgmt.c:can_read_data() [0x0000] event 0x0006
>> > bluetoothd[34]: src/adapter.c:new_settings_callback() Settings: 0x01fc0ac1
>> > bluetoothd[34]: src/adapter.c:settings_changed() Changed settings: 0x00000001
>> > bluetoothd[34]: src/adapter.c:settings_changed() Pending settings: 0x00000000
>> > bluetoothd[34]: src/adapter.c:adapter_start() adapter /org/bluez/hci0
>> > has been enabled
>> > bluetoothd[34]: src/adapter.c:trigger_passive_scanning()
>> > bluetoothd[34]: [signal] org.freedesktop.DBus.Properties.PropertiesChanged
>>
>> Thank you for the review, and for the suggestion! There seems to be a
>> number of similar syzbot reports for the hdev workqueue, so maybe the
>> close+open approach is a simpler solution indeed. I've originally
>> considered implementing a wrapper/helper for workqueue submission in
>> hci_core.c, but if we can eliminate the race condition altogether that'd
>> be even better.
>>
>> My only concern is that there seem to be slight differences between
>> what hci_dev_do_reset() does now and what we'd get with
>> hci_dev_close_sync()+hci_dev_open_sync() (e.g. we wouldn't use
>> HCI_CMD_DRAIN_WORKQUEUE anymore). Would this alternative approach be
>> suitable for the stable kernels? And if not, do you think the checks
>> from v1 would be appropriate, in that case?
>
> I think, the HCI_Reset is a heavy hammer because it's a global reset.
> Getting all the pieces to cooperate has been causing all sort of
> issues, and in any case I think what hci_dev_do_reset does is simply
> wrong or outdated; for instance it doesn't seem to affect LE, such as
> cancelling scanning or advertising. This is likely because interfaces
> like MGMT don't use it and require a power cycle to reset.
ACK, that makes sense. Please disregard this patch, I'll send a
follow-up with the reset changes and any missing bits from the current
async reset path.
Thank you for the pointers, Luiz!
© 2016 - 2026 Red Hat, Inc.