[PATCH] Bluetooth: hci: fix refcounts in LE remote features command

Cihangir Akturk posted 1 patch 1 month, 3 weeks ago
net/bluetooth/hci_sync.c | 37 ++++++++++++++++++++++++++-----------
1 file changed, 26 insertions(+), 11 deletions(-)
[PATCH] Bluetooth: hci: fix refcounts in LE remote features command
Posted by Cihangir Akturk 1 month, 3 weeks ago
KASAN reported a slab-use-after-free in le_read_features_complete()
running from hci_cmd_sync_work.  le_read_features_complete() can run
after hci_rx_work/hci_conn_del() has removed the link, so the destroy
callback may touch a freed hci_conn and trigger a KASAN use-after-free.
Duplicate submissions also need to drop the references to avoid leaking
the hold and blocking teardown.

Fix this by taking hci_conn_get() before queueing, passing the conn
directly as work data, and balancing hci_conn_hold()/drop() and
hci_conn_get()/put() in the completion and all error/-EEXIST paths so
the connection stays referenced while the work runs and is released
afterwards.

Reported-by: syzbot+87badbb9094e008e0685@syzkaller.appspotmail.com
Signed-off-by: Cihangir Akturk <cakturk@gmail.com>
---
 net/bluetooth/hci_sync.c | 37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

I am not entirely sure that removing the err == -ECANCELED early return
is strictly correct. My assumption is that, with the changes in this
patch, there does not appear to be another cancellation path that
reliably balances hci_conn_drop() and hci_conn_put() for this command.
Based on that assumption, keeping the early return could leave the
references taken before queuing unbalanced on cancellation, so I opted
to remove it to keep the lifetime handling consistent.

diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index a9f5b1a68356..5a3d288e7015 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -776,14 +776,23 @@ _hci_cmd_sync_lookup_entry(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
  * - Lookup if an entry already exist and only if it doesn't creates a new entry
  *   and queue it.
  */
-int hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
+static int __hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
 			    void *data, hci_cmd_sync_work_destroy_t destroy)
 {
 	if (hci_cmd_sync_lookup_entry(hdev, func, data, destroy))
-		return 0;
+		return -EEXIST;
 
 	return hci_cmd_sync_queue(hdev, func, data, destroy);
 }
+
+int hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
+			    void *data, hci_cmd_sync_work_destroy_t destroy)
+{
+	int ret;
+
+	ret = __hci_cmd_sync_queue_once(hdev, func, data, destroy);
+	return ret == -EEXIST ? 0 : ret;
+}
 EXPORT_SYMBOL(hci_cmd_sync_queue_once);
 
 /* Run HCI command:
@@ -7338,10 +7347,8 @@ static void le_read_features_complete(struct hci_dev *hdev, void *data, int err)
 
 	bt_dev_dbg(hdev, "err %d", err);
 
-	if (err == -ECANCELED)
-		return;
-
 	hci_conn_drop(conn);
+	hci_conn_put(conn);
 }
 
 static int hci_le_read_all_remote_features_sync(struct hci_dev *hdev,
@@ -7408,12 +7415,20 @@ int hci_le_read_remote_features(struct hci_conn *conn)
 	 * role is possible. Otherwise just transition into the
 	 * connected state without requesting the remote features.
 	 */
-	if (conn->out || (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES))
-		err = hci_cmd_sync_queue_once(hdev,
-					      hci_le_read_remote_features_sync,
-					      hci_conn_hold(conn),
-					      le_read_features_complete);
-	else
+	if (conn->out || (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES)) {
+		hci_conn_get(conn);
+		hci_conn_hold(conn);
+		err = __hci_cmd_sync_queue_once(hdev,
+						hci_le_read_remote_features_sync,
+						conn,
+						le_read_features_complete);
+		if (err) {
+			hci_conn_drop(conn);
+			hci_conn_put(conn);
+			if (err == -EEXIST)
+				err = 0;
+		}
+	} else
 		err = -EOPNOTSUPP;
 
 	return err;
-- 
2.52.0
Re: [PATCH] Bluetooth: hci: fix refcounts in LE remote features command
Posted by Luiz Augusto von Dentz 1 month, 3 weeks ago
Hi Cihangir,

On Tue, Dec 16, 2025 at 2:13 PM Cihangir Akturk <cakturk@gmail.com> wrote:
>
> KASAN reported a slab-use-after-free in le_read_features_complete()
> running from hci_cmd_sync_work.  le_read_features_complete() can run
> after hci_rx_work/hci_conn_del() has removed the link, so the destroy
> callback may touch a freed hci_conn and trigger a KASAN use-after-free.
> Duplicate submissions also need to drop the references to avoid leaking
> the hold and blocking teardown.
>
> Fix this by taking hci_conn_get() before queueing, passing the conn
> directly as work data, and balancing hci_conn_hold()/drop() and
> hci_conn_get()/put() in the completion and all error/-EEXIST paths so
> the connection stays referenced while the work runs and is released
> afterwards.
>
> Reported-by: syzbot+87badbb9094e008e0685@syzkaller.appspotmail.com
> Signed-off-by: Cihangir Akturk <cakturk@gmail.com>
> ---
>  net/bluetooth/hci_sync.c | 37 ++++++++++++++++++++++++++-----------
>  1 file changed, 26 insertions(+), 11 deletions(-)
>
> I am not entirely sure that removing the err == -ECANCELED early return
> is strictly correct. My assumption is that, with the changes in this
> patch, there does not appear to be another cancellation path that
> reliably balances hci_conn_drop() and hci_conn_put() for this command.
> Based on that assumption, keeping the early return could leave the
> references taken before queuing unbalanced on cancellation, so I opted
> to remove it to keep the lifetime handling consistent.
>
> diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> index a9f5b1a68356..5a3d288e7015 100644
> --- a/net/bluetooth/hci_sync.c
> +++ b/net/bluetooth/hci_sync.c
> @@ -776,14 +776,23 @@ _hci_cmd_sync_lookup_entry(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
>   * - Lookup if an entry already exist and only if it doesn't creates a new entry
>   *   and queue it.
>   */
> -int hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> +static int __hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
>                             void *data, hci_cmd_sync_work_destroy_t destroy)
>  {
>         if (hci_cmd_sync_lookup_entry(hdev, func, data, destroy))
> -               return 0;
> +               return -EEXIST;
>
>         return hci_cmd_sync_queue(hdev, func, data, destroy);
>  }
> +
> +int hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> +                           void *data, hci_cmd_sync_work_destroy_t destroy)
> +{
> +       int ret;
> +
> +       ret = __hci_cmd_sync_queue_once(hdev, func, data, destroy);
> +       return ret == -EEXIST ? 0 : ret;
> +}
>  EXPORT_SYMBOL(hci_cmd_sync_queue_once);
>
>  /* Run HCI command:
> @@ -7338,10 +7347,8 @@ static void le_read_features_complete(struct hci_dev *hdev, void *data, int err)
>
>         bt_dev_dbg(hdev, "err %d", err);
>
> -       if (err == -ECANCELED)
> -               return;
> -
>         hci_conn_drop(conn);
> +       hci_conn_put(conn);
>  }
>
>  static int hci_le_read_all_remote_features_sync(struct hci_dev *hdev,
> @@ -7408,12 +7415,20 @@ int hci_le_read_remote_features(struct hci_conn *conn)
>          * role is possible. Otherwise just transition into the
>          * connected state without requesting the remote features.
>          */
> -       if (conn->out || (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES))
> -               err = hci_cmd_sync_queue_once(hdev,
> -                                             hci_le_read_remote_features_sync,
> -                                             hci_conn_hold(conn),
> -                                             le_read_features_complete);
> -       else
> +       if (conn->out || (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES)) {
> +               hci_conn_get(conn);
> +               hci_conn_hold(conn);
> +               err = __hci_cmd_sync_queue_once(hdev,
> +                                               hci_le_read_remote_features_sync,
> +                                               conn,
> +                                               le_read_features_complete);
> +               if (err) {
> +                       hci_conn_drop(conn);
> +                       hci_conn_put(conn);
> +                       if (err == -EEXIST)
> +                               err = 0;
> +               }
> +       } else

Sort of overkill, why do we have to use 2 references? Also we do have
code for dequeuing callbacks using conn as user_data so either that is
not working or there is something else at play here. Maybe we need to
change the order so that dequeue happens before hci_conn_cleanup:

diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
index dc085856f5e9..b64c0e53d9cd 100644
--- a/net/bluetooth/hci_conn.c
+++ b/net/bluetooth/hci_conn.c
@@ -1232,15 +1232,15 @@ void hci_conn_del(struct hci_conn *conn)
        skb_queue_purge(&conn->data_q);
        skb_queue_purge(&conn->tx_q.queue);

+       /* Dequeue callbacks using connection pointer as data */
+       hci_cmd_sync_dequeue(hdev, NULL, conn, NULL);
+
        /* Remove the connection from the list and cleanup its remaining
         * state. This is a separate function since for some cases like
         * BT_CONNECT_SCAN we *only* want the cleanup part without the
         * rest of hci_conn_del.
         */
        hci_conn_cleanup(conn);
-
-       /* Dequeue callbacks using connection pointer as data */
-       hci_cmd_sync_dequeue(hdev, NULL, conn, NULL);
 }

 struct hci_dev *hci_get_route(bdaddr_t *dst, bdaddr_t *src, uint8_t src_type)


>                 err = -EOPNOTSUPP;
>
>         return err;
> --
> 2.52.0
>


-- 
Luiz Augusto von Dentz
Re: [PATCH] Bluetooth: hci: fix refcounts in LE remote features command
Posted by Cihangir Aktürk 1 month, 2 weeks ago
On Thu, Dec 18, 2025 at 12:36 AM Luiz Augusto von Dentz
<luiz.dentz@gmail.com> wrote:
>
> Hi Cihangir,

Hi Luiz,

> On Tue, Dec 16, 2025 at 2:13 PM Cihangir Akturk <cakturk@gmail.com> wrote:
> >
> > KASAN reported a slab-use-after-free in le_read_features_complete()
> > running from hci_cmd_sync_work.  le_read_features_complete() can run
> > after hci_rx_work/hci_conn_del() has removed the link, so the destroy
> > callback may touch a freed hci_conn and trigger a KASAN use-after-free.
> > Duplicate submissions also need to drop the references to avoid leaking
> > the hold and blocking teardown.
> >
> > Fix this by taking hci_conn_get() before queueing, passing the conn
> > directly as work data, and balancing hci_conn_hold()/drop() and
> > hci_conn_get()/put() in the completion and all error/-EEXIST paths so
> > the connection stays referenced while the work runs and is released
> > afterwards.
> >
> > Reported-by: syzbot+87badbb9094e008e0685@syzkaller.appspotmail.com
> > Signed-off-by: Cihangir Akturk <cakturk@gmail.com>
> > ---
> >  net/bluetooth/hci_sync.c | 37 ++++++++++++++++++++++++++-----------
> >  1 file changed, 26 insertions(+), 11 deletions(-)
> >
> > I am not entirely sure that removing the err == -ECANCELED early return
> > is strictly correct. My assumption is that, with the changes in this
> > patch, there does not appear to be another cancellation path that
> > reliably balances hci_conn_drop() and hci_conn_put() for this command.
> > Based on that assumption, keeping the early return could leave the
> > references taken before queuing unbalanced on cancellation, so I opted
> > to remove it to keep the lifetime handling consistent.
> >
> > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> > index a9f5b1a68356..5a3d288e7015 100644
> > --- a/net/bluetooth/hci_sync.c
> > +++ b/net/bluetooth/hci_sync.c
> > @@ -776,14 +776,23 @@ _hci_cmd_sync_lookup_entry(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> >   * - Lookup if an entry already exist and only if it doesn't creates a new entry
> >   *   and queue it.
> >   */
> > -int hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > +static int __hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> >                             void *data, hci_cmd_sync_work_destroy_t destroy)
> >  {
> >         if (hci_cmd_sync_lookup_entry(hdev, func, data, destroy))
> > -               return 0;
> > +               return -EEXIST;
> >
> >         return hci_cmd_sync_queue(hdev, func, data, destroy);
> >  }
> > +
> > +int hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > +                           void *data, hci_cmd_sync_work_destroy_t destroy)
> > +{
> > +       int ret;
> > +
> > +       ret = __hci_cmd_sync_queue_once(hdev, func, data, destroy);
> > +       return ret == -EEXIST ? 0 : ret;
> > +}
> >  EXPORT_SYMBOL(hci_cmd_sync_queue_once);
> >
> >  /* Run HCI command:
> > @@ -7338,10 +7347,8 @@ static void le_read_features_complete(struct hci_dev *hdev, void *data, int err)
> >
> >         bt_dev_dbg(hdev, "err %d", err);
> >
> > -       if (err == -ECANCELED)
> > -               return;
> > -
> >         hci_conn_drop(conn);
> > +       hci_conn_put(conn);
> >  }
> >
> >  static int hci_le_read_all_remote_features_sync(struct hci_dev *hdev,
> > @@ -7408,12 +7415,20 @@ int hci_le_read_remote_features(struct hci_conn *conn)
> >          * role is possible. Otherwise just transition into the
> >          * connected state without requesting the remote features.
> >          */
> > -       if (conn->out || (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES))
> > -               err = hci_cmd_sync_queue_once(hdev,
> > -                                             hci_le_read_remote_features_sync,
> > -                                             hci_conn_hold(conn),
> > -                                             le_read_features_complete);
> > -       else
> > +       if (conn->out || (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES)) {
> > +               hci_conn_get(conn);
> > +               hci_conn_hold(conn);
> > +               err = __hci_cmd_sync_queue_once(hdev,
> > +                                               hci_le_read_remote_features_sync,
> > +                                               conn,
> > +                                               le_read_features_complete);
> > +               if (err) {
> > +                       hci_conn_drop(conn);
> > +                       hci_conn_put(conn);
> > +                       if (err == -EEXIST)
> > +                               err = 0;
> > +               }
> > +       } else
>
> Sort of overkill, why do we have to use 2 references? Also we do have
> code for dequeuing callbacks using conn as user_data so either that is
> not working or there is something else at play here. Maybe we need to
> change the order so that dequeue happens before hci_conn_cleanup:
>

From what I understand based on the KASAN trace, the issue happens
when a disconnect event is handled in hci_event_work while, at the
same time, hci_cmd_sync_work processes the LE Read Remote Features
command. So, le_read_features_complete() ends up calling
hci_conn_drop() on a connection that appears to have already been
freed.

Holding a reference with hci_conn_hold() alone does not seem
sufficient to prevent the disconnect path from removing and freeing
the hci_conn. That was the reason I tried taking an additional
reference with hci_conn_get(), to keep the connection object around
until the work finishes.

> diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> index dc085856f5e9..b64c0e53d9cd 100644
> --- a/net/bluetooth/hci_conn.c
> +++ b/net/bluetooth/hci_conn.c
> @@ -1232,15 +1232,15 @@ void hci_conn_del(struct hci_conn *conn)
>         skb_queue_purge(&conn->data_q);
>         skb_queue_purge(&conn->tx_q.queue);
>
> +       /* Dequeue callbacks using connection pointer as data */
> +       hci_cmd_sync_dequeue(hdev, NULL, conn, NULL);
> +
>         /* Remove the connection from the list and cleanup its remaining
>          * state. This is a separate function since for some cases like
>          * BT_CONNECT_SCAN we *only* want the cleanup part without the
>          * rest of hci_conn_del.
>          */
>         hci_conn_cleanup(conn);
> -
> -       /* Dequeue callbacks using connection pointer as data */
> -       hci_cmd_sync_dequeue(hdev, NULL, conn, NULL);
>  }
>
>  struct hci_dev *hci_get_route(bdaddr_t *dst, bdaddr_t *src, uint8_t src_type)
>
>
> >                 err = -EOPNOTSUPP;
> >
> >         return err;
> > --
> > 2.52.0
> >
>

I tried this change and tested the suggested ordering, but in my
testing the issue still appears to reproduce, so it does not seem to
fully address the problem.

>
> --
> Luiz Augusto von Dentz
Re: [PATCH] Bluetooth: hci: fix refcounts in LE remote features command
Posted by Pauli Virtanen 1 month, 2 weeks ago
Hi,

ke, 2025-12-17 kello 16:36 -0500, Luiz Augusto von Dentz kirjoitti:
> Hi Cihangir,
> 
> On Tue, Dec 16, 2025 at 2:13 PM Cihangir Akturk <cakturk@gmail.com> wrote:
> > 
> > KASAN reported a slab-use-after-free in le_read_features_complete()
> > running from hci_cmd_sync_work.  le_read_features_complete() can run
> > after hci_rx_work/hci_conn_del() has removed the link, so the destroy
> > callback may touch a freed hci_conn and trigger a KASAN use-after-free.
> > Duplicate submissions also need to drop the references to avoid leaking
> > the hold and blocking teardown.
> > 
> > Fix this by taking hci_conn_get() before queueing, passing the conn
> > directly as work data, and balancing hci_conn_hold()/drop() and
> > hci_conn_get()/put() in the completion and all error/-EEXIST paths so
> > the connection stays referenced while the work runs and is released
> > afterwards.
> > 
> > Reported-by: syzbot+87badbb9094e008e0685@syzkaller.appspotmail.com
> > Signed-off-by: Cihangir Akturk <cakturk@gmail.com>
> > ---
> >  net/bluetooth/hci_sync.c | 37 ++++++++++++++++++++++++++-----------
> >  1 file changed, 26 insertions(+), 11 deletions(-)
> > 
> > I am not entirely sure that removing the err == -ECANCELED early return
> > is strictly correct. My assumption is that, with the changes in this
> > patch, there does not appear to be another cancellation path that
> > reliably balances hci_conn_drop() and hci_conn_put() for this command.
> > Based on that assumption, keeping the early return could leave the
> > references taken before queuing unbalanced on cancellation, so I opted
> > to remove it to keep the lifetime handling consistent.
> > 
> > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> > index a9f5b1a68356..5a3d288e7015 100644
> > --- a/net/bluetooth/hci_sync.c
> > +++ b/net/bluetooth/hci_sync.c
> > @@ -776,14 +776,23 @@ _hci_cmd_sync_lookup_entry(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> >   * - Lookup if an entry already exist and only if it doesn't creates a new entry
> >   *   and queue it.
> >   */
> > -int hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > +static int __hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> >                             void *data, hci_cmd_sync_work_destroy_t destroy)
> >  {
> >         if (hci_cmd_sync_lookup_entry(hdev, func, data, destroy))
> > -               return 0;
> > +               return -EEXIST;
> > 
> >         return hci_cmd_sync_queue(hdev, func, data, destroy);
> >  }
> > +
> > +int hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > +                           void *data, hci_cmd_sync_work_destroy_t destroy)
> > +{
> > +       int ret;
> > +
> > +       ret = __hci_cmd_sync_queue_once(hdev, func, data, destroy);
> > +       return ret == -EEXIST ? 0 : ret;
> > +}
> >  EXPORT_SYMBOL(hci_cmd_sync_queue_once);
> > 
> >  /* Run HCI command:
> > @@ -7338,10 +7347,8 @@ static void le_read_features_complete(struct hci_dev *hdev, void *data, int err)
> > 
> >         bt_dev_dbg(hdev, "err %d", err);
> > 
> > -       if (err == -ECANCELED)
> > -               return;
> > -
> >         hci_conn_drop(conn);
> > +       hci_conn_put(conn);
> >  }
> > 
> >  static int hci_le_read_all_remote_features_sync(struct hci_dev *hdev,
> > @@ -7408,12 +7415,20 @@ int hci_le_read_remote_features(struct hci_conn *conn)
> >          * role is possible. Otherwise just transition into the
> >          * connected state without requesting the remote features.
> >          */
> > -       if (conn->out || (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES))
> > -               err = hci_cmd_sync_queue_once(hdev,
> > -                                             hci_le_read_remote_features_sync,
> > -                                             hci_conn_hold(conn),
> > -                                             le_read_features_complete);
> > -       else
> > +       if (conn->out || (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES)) {
> > +               hci_conn_get(conn);
> > +               hci_conn_hold(conn);
> > +               err = __hci_cmd_sync_queue_once(hdev,
> > +                                               hci_le_read_remote_features_sync,
> > +                                               conn,
> > +                                               le_read_features_complete);
> > +               if (err) {
> > +                       hci_conn_drop(conn);
> > +                       hci_conn_put(conn);
> > +                       if (err == -EEXIST)
> > +                               err = 0;
> > +               }
> > +       } else
> 
> Sort of overkill, why do we have to use 2 references? Also we do have
> code for dequeuing callbacks using conn as user_data so either that is
> not working or there is something else at play here. Maybe we need to
> change the order so that dequeue happens before hci_conn_cleanup:
> diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> index dc085856f5e9..b64c0e53d9cd 100644
> --- a/net/bluetooth/hci_conn.c
> +++ b/net/bluetooth/hci_conn.c
> @@ -1232,15 +1232,15 @@ void hci_conn_del(struct hci_conn *conn)
>         skb_queue_purge(&conn->data_q);
>         skb_queue_purge(&conn->tx_q.queue);
> 
> +       /* Dequeue callbacks using connection pointer as data */
> +       hci_cmd_sync_dequeue(hdev, NULL, conn, NULL);
> +

hci_cmd_sync_dequeue() does not (i) cancel + wait for a job that is
already running, (ii) prevent further jobs for this conn from being
queued. So it's not guaranteed to work here AFAICS.

For (i): note running hci_sync job may be blocked on taking hdev->lock,
which is held here, so trying to cancel + wait deadlocks. Does not seem
straightforward to fix.

For (ii): one would need to audit the places where these jobs are
queued, and make sure they are all done with hdev->lock held, to avoid
racing with the code here. Maybe doable with separate queueing function
that has lockdep asserts.

I suggested some time ago to always hold either refcount or lock to
keep the hci_conn alive everywhere, also in these hci_sync callbacks:

https://lore.kernel.org/linux-bluetooth/cover.1762100290.git.pav@iki.fi/

with similar changes as suggested in this patch. This may be the
simpler fix.

>         /* Remove the connection from the list and cleanup its remaining
>          * state. This is a separate function since for some cases like
>          * BT_CONNECT_SCAN we *only* want the cleanup part without the
>          * rest of hci_conn_del.
>          */
>         hci_conn_cleanup(conn);
> -
> -       /* Dequeue callbacks using connection pointer as data */
> -       hci_cmd_sync_dequeue(hdev, NULL, conn, NULL);
>  }
>  struct hci_dev *hci_get_route(bdaddr_t *dst, bdaddr_t *src, uint8_t src_type)
> 
> 
> >                 err = -EOPNOTSUPP;
> > 
> >         return err;
> > --
> > 2.52.0
> > 
> 

-- 
Pauli Virtanen
Re: [PATCH] Bluetooth: hci: fix refcounts in LE remote features command
Posted by Luiz Augusto von Dentz 1 month, 2 weeks ago
Hi Pauli,

On Thu, Dec 18, 2025 at 11:33 AM Pauli Virtanen <pav@iki.fi> wrote:
>
> Hi,
>
> ke, 2025-12-17 kello 16:36 -0500, Luiz Augusto von Dentz kirjoitti:
> > Hi Cihangir,
> >
> > On Tue, Dec 16, 2025 at 2:13 PM Cihangir Akturk <cakturk@gmail.com> wrote:
> > >
> > > KASAN reported a slab-use-after-free in le_read_features_complete()
> > > running from hci_cmd_sync_work.  le_read_features_complete() can run
> > > after hci_rx_work/hci_conn_del() has removed the link, so the destroy
> > > callback may touch a freed hci_conn and trigger a KASAN use-after-free.
> > > Duplicate submissions also need to drop the references to avoid leaking
> > > the hold and blocking teardown.
> > >
> > > Fix this by taking hci_conn_get() before queueing, passing the conn
> > > directly as work data, and balancing hci_conn_hold()/drop() and
> > > hci_conn_get()/put() in the completion and all error/-EEXIST paths so
> > > the connection stays referenced while the work runs and is released
> > > afterwards.
> > >
> > > Reported-by: syzbot+87badbb9094e008e0685@syzkaller.appspotmail.com
> > > Signed-off-by: Cihangir Akturk <cakturk@gmail.com>
> > > ---
> > >  net/bluetooth/hci_sync.c | 37 ++++++++++++++++++++++++++-----------
> > >  1 file changed, 26 insertions(+), 11 deletions(-)
> > >
> > > I am not entirely sure that removing the err == -ECANCELED early return
> > > is strictly correct. My assumption is that, with the changes in this
> > > patch, there does not appear to be another cancellation path that
> > > reliably balances hci_conn_drop() and hci_conn_put() for this command.
> > > Based on that assumption, keeping the early return could leave the
> > > references taken before queuing unbalanced on cancellation, so I opted
> > > to remove it to keep the lifetime handling consistent.
> > >
> > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> > > index a9f5b1a68356..5a3d288e7015 100644
> > > --- a/net/bluetooth/hci_sync.c
> > > +++ b/net/bluetooth/hci_sync.c
> > > @@ -776,14 +776,23 @@ _hci_cmd_sync_lookup_entry(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > >   * - Lookup if an entry already exist and only if it doesn't creates a new entry
> > >   *   and queue it.
> > >   */
> > > -int hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > > +static int __hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > >                             void *data, hci_cmd_sync_work_destroy_t destroy)
> > >  {
> > >         if (hci_cmd_sync_lookup_entry(hdev, func, data, destroy))
> > > -               return 0;
> > > +               return -EEXIST;
> > >
> > >         return hci_cmd_sync_queue(hdev, func, data, destroy);
> > >  }
> > > +
> > > +int hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > > +                           void *data, hci_cmd_sync_work_destroy_t destroy)
> > > +{
> > > +       int ret;
> > > +
> > > +       ret = __hci_cmd_sync_queue_once(hdev, func, data, destroy);
> > > +       return ret == -EEXIST ? 0 : ret;
> > > +}
> > >  EXPORT_SYMBOL(hci_cmd_sync_queue_once);
> > >
> > >  /* Run HCI command:
> > > @@ -7338,10 +7347,8 @@ static void le_read_features_complete(struct hci_dev *hdev, void *data, int err)
> > >
> > >         bt_dev_dbg(hdev, "err %d", err);
> > >
> > > -       if (err == -ECANCELED)
> > > -               return;
> > > -
> > >         hci_conn_drop(conn);
> > > +       hci_conn_put(conn);
> > >  }
> > >
> > >  static int hci_le_read_all_remote_features_sync(struct hci_dev *hdev,
> > > @@ -7408,12 +7415,20 @@ int hci_le_read_remote_features(struct hci_conn *conn)
> > >          * role is possible. Otherwise just transition into the
> > >          * connected state without requesting the remote features.
> > >          */
> > > -       if (conn->out || (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES))
> > > -               err = hci_cmd_sync_queue_once(hdev,
> > > -                                             hci_le_read_remote_features_sync,
> > > -                                             hci_conn_hold(conn),
> > > -                                             le_read_features_complete);
> > > -       else
> > > +       if (conn->out || (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES)) {
> > > +               hci_conn_get(conn);
> > > +               hci_conn_hold(conn);
> > > +               err = __hci_cmd_sync_queue_once(hdev,
> > > +                                               hci_le_read_remote_features_sync,
> > > +                                               conn,
> > > +                                               le_read_features_complete);
> > > +               if (err) {
> > > +                       hci_conn_drop(conn);
> > > +                       hci_conn_put(conn);
> > > +                       if (err == -EEXIST)
> > > +                               err = 0;
> > > +               }
> > > +       } else
> >
> > Sort of overkill, why do we have to use 2 references? Also we do have
> > code for dequeuing callbacks using conn as user_data so either that is
> > not working or there is something else at play here. Maybe we need to
> > change the order so that dequeue happens before hci_conn_cleanup:
> > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> > index dc085856f5e9..b64c0e53d9cd 100644
> > --- a/net/bluetooth/hci_conn.c
> > +++ b/net/bluetooth/hci_conn.c
> > @@ -1232,15 +1232,15 @@ void hci_conn_del(struct hci_conn *conn)
> >         skb_queue_purge(&conn->data_q);
> >         skb_queue_purge(&conn->tx_q.queue);
> >
> > +       /* Dequeue callbacks using connection pointer as data */
> > +       hci_cmd_sync_dequeue(hdev, NULL, conn, NULL);
> > +
>
> hci_cmd_sync_dequeue() does not (i) cancel + wait for a job that is
> already running, (ii) prevent further jobs for this conn from being
> queued. So it's not guaranteed to work here AFAICS.
>
> For (i): note running hci_sync job may be blocked on taking hdev->lock,
> which is held here, so trying to cancel + wait deadlocks. Does not seem
> straightforward to fix.

Hmm, there is a lock though that is used when running the callbacks:

            hci_req_sync_lock(hdev);
            err = entry->func(hdev, entry->data);
            if (entry->destroy)
                entry->destroy(hdev, entry->data, err);
            hci_req_sync_unlock(hdev);

We could attempt to acquire hci_req_sync_lock on dequeue, but it looks
like there are code paths that already do call hci_conn_del with that
lock so the likes of mgmt-tester deadlock, anyway if there are code
paths already doing with hci_req_sync_lock held then it should be safe
to already require it when doing hci_conn_del or maybe rename
hci_conn_del to hci_conn_del_sync and then have hci_conn_del
performing the hci_req_sync_lock before calling hci_conn_del_sync then
work out the code paths where hci_req_sync_lock is already held to use
hci_conn_del_sync.

> For (ii): one would need to audit the places where these jobs are
> queued, and make sure they are all done with hdev->lock held, to avoid
> racing with the code here. Maybe doable with separate queueing function
> that has lockdep asserts.
>
> I suggested some time ago to always hold either refcount or lock to
> keep the hci_conn alive everywhere, also in these hci_sync callbacks:
>
> https://lore.kernel.org/linux-bluetooth/cover.1762100290.git.pav@iki.fi/
>
> with similar changes as suggested in this patch. This may be the
> simpler fix.
>
> >         /* Remove the connection from the list and cleanup its remaining
> >          * state. This is a separate function since for some cases like
> >          * BT_CONNECT_SCAN we *only* want the cleanup part without the
> >          * rest of hci_conn_del.
> >          */
> >         hci_conn_cleanup(conn);
> > -
> > -       /* Dequeue callbacks using connection pointer as data */
> > -       hci_cmd_sync_dequeue(hdev, NULL, conn, NULL);
> >  }
> >  struct hci_dev *hci_get_route(bdaddr_t *dst, bdaddr_t *src, uint8_t src_type)
> >
> >
> > >                 err = -EOPNOTSUPP;
> > >
> > >         return err;
> > > --
> > > 2.52.0
> > >
> >
>
> --
> Pauli Virtanen



-- 
Luiz Augusto von Dentz
Re: [PATCH] Bluetooth: hci: fix refcounts in LE remote features command
Posted by Pauli Virtanen 1 month, 2 weeks ago
Hi,

to, 2025-12-18 kello 12:09 -0500, Luiz Augusto von Dentz kirjoitti:
> Hi Pauli,
> 
> On Thu, Dec 18, 2025 at 11:33 AM Pauli Virtanen <pav@iki.fi> wrote:
> > 
> > Hi,
> > 
> > ke, 2025-12-17 kello 16:36 -0500, Luiz Augusto von Dentz kirjoitti:
> > > Hi Cihangir,
> > > 
> > > On Tue, Dec 16, 2025 at 2:13 PM Cihangir Akturk <cakturk@gmail.com> wrote:
> > > > 
> > > > KASAN reported a slab-use-after-free in le_read_features_complete()
> > > > running from hci_cmd_sync_work.  le_read_features_complete() can run
> > > > after hci_rx_work/hci_conn_del() has removed the link, so the destroy
> > > > callback may touch a freed hci_conn and trigger a KASAN use-after-free.
> > > > Duplicate submissions also need to drop the references to avoid leaking
> > > > the hold and blocking teardown.
> > > > 
> > > > Fix this by taking hci_conn_get() before queueing, passing the conn
> > > > directly as work data, and balancing hci_conn_hold()/drop() and
> > > > hci_conn_get()/put() in the completion and all error/-EEXIST paths so
> > > > the connection stays referenced while the work runs and is released
> > > > afterwards.
> > > > 
> > > > Reported-by: syzbot+87badbb9094e008e0685@syzkaller.appspotmail.com
> > > > Signed-off-by: Cihangir Akturk <cakturk@gmail.com>
> > > > ---
> > > >  net/bluetooth/hci_sync.c | 37 ++++++++++++++++++++++++++-----------
> > > >  1 file changed, 26 insertions(+), 11 deletions(-)
> > > > 
> > > > I am not entirely sure that removing the err == -ECANCELED early return
> > > > is strictly correct. My assumption is that, with the changes in this
> > > > patch, there does not appear to be another cancellation path that
> > > > reliably balances hci_conn_drop() and hci_conn_put() for this command.
> > > > Based on that assumption, keeping the early return could leave the
> > > > references taken before queuing unbalanced on cancellation, so I opted
> > > > to remove it to keep the lifetime handling consistent.
> > > > 
> > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> > > > index a9f5b1a68356..5a3d288e7015 100644
> > > > --- a/net/bluetooth/hci_sync.c
> > > > +++ b/net/bluetooth/hci_sync.c
> > > > @@ -776,14 +776,23 @@ _hci_cmd_sync_lookup_entry(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > > >   * - Lookup if an entry already exist and only if it doesn't creates a new entry
> > > >   *   and queue it.
> > > >   */
> > > > -int hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > > > +static int __hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > > >                             void *data, hci_cmd_sync_work_destroy_t destroy)
> > > >  {
> > > >         if (hci_cmd_sync_lookup_entry(hdev, func, data, destroy))
> > > > -               return 0;
> > > > +               return -EEXIST;
> > > > 
> > > >         return hci_cmd_sync_queue(hdev, func, data, destroy);
> > > >  }
> > > > +
> > > > +int hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > > > +                           void *data, hci_cmd_sync_work_destroy_t destroy)
> > > > +{
> > > > +       int ret;
> > > > +
> > > > +       ret = __hci_cmd_sync_queue_once(hdev, func, data, destroy);
> > > > +       return ret == -EEXIST ? 0 : ret;
> > > > +}
> > > >  EXPORT_SYMBOL(hci_cmd_sync_queue_once);
> > > > 
> > > >  /* Run HCI command:
> > > > @@ -7338,10 +7347,8 @@ static void le_read_features_complete(struct hci_dev *hdev, void *data, int err)
> > > > 
> > > >         bt_dev_dbg(hdev, "err %d", err);
> > > > 
> > > > -       if (err == -ECANCELED)
> > > > -               return;
> > > > -
> > > >         hci_conn_drop(conn);
> > > > +       hci_conn_put(conn);
> > > >  }
> > > > 
> > > >  static int hci_le_read_all_remote_features_sync(struct hci_dev *hdev,
> > > > @@ -7408,12 +7415,20 @@ int hci_le_read_remote_features(struct hci_conn *conn)
> > > >          * role is possible. Otherwise just transition into the
> > > >          * connected state without requesting the remote features.
> > > >          */
> > > > -       if (conn->out || (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES))
> > > > -               err = hci_cmd_sync_queue_once(hdev,
> > > > -                                             hci_le_read_remote_features_sync,
> > > > -                                             hci_conn_hold(conn),
> > > > -                                             le_read_features_complete);
> > > > -       else
> > > > +       if (conn->out || (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES)) {
> > > > +               hci_conn_get(conn);
> > > > +               hci_conn_hold(conn);
> > > > +               err = __hci_cmd_sync_queue_once(hdev,
> > > > +                                               hci_le_read_remote_features_sync,
> > > > +                                               conn,
> > > > +                                               le_read_features_complete);
> > > > +               if (err) {
> > > > +                       hci_conn_drop(conn);
> > > > +                       hci_conn_put(conn);
> > > > +                       if (err == -EEXIST)
> > > > +                               err = 0;
> > > > +               }
> > > > +       } else
> > > 
> > > Sort of overkill, why do we have to use 2 references? Also we do have
> > > code for dequeuing callbacks using conn as user_data so either that is
> > > not working or there is something else at play here. Maybe we need to
> > > change the order so that dequeue happens before hci_conn_cleanup:
> > > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> > > index dc085856f5e9..b64c0e53d9cd 100644
> > > --- a/net/bluetooth/hci_conn.c
> > > +++ b/net/bluetooth/hci_conn.c
> > > @@ -1232,15 +1232,15 @@ void hci_conn_del(struct hci_conn *conn)
> > >         skb_queue_purge(&conn->data_q);
> > >         skb_queue_purge(&conn->tx_q.queue);
> > > 
> > > +       /* Dequeue callbacks using connection pointer as data */
> > > +       hci_cmd_sync_dequeue(hdev, NULL, conn, NULL);
> > > +
> > 
> > hci_cmd_sync_dequeue() does not (i) cancel + wait for a job that is
> > already running, (ii) prevent further jobs for this conn from being
> > queued. So it's not guaranteed to work here AFAICS.
> > 
> > For (i): note running hci_sync job may be blocked on taking hdev->lock,
> > which is held here, so trying to cancel + wait deadlocks. Does not seem
> > straightforward to fix.
> 
> Hmm, there is a lock though that is used when running the callbacks:
> 
>             hci_req_sync_lock(hdev);
>             err = entry->func(hdev, entry->data);
>             if (entry->destroy)
>                 entry->destroy(hdev, entry->data, err);
>             hci_req_sync_unlock(hdev);
> 
> We could attempt to acquire hci_req_sync_lock on dequeue, but it looks
> like there are code paths that already do call hci_conn_del with that
> lock so the likes of mgmt-tester deadlock, anyway if there are code
> paths already doing with hci_req_sync_lock held then it should be safe
> to already require it when doing hci_conn_del or maybe rename
> hci_conn_del to hci_conn_del_sync and then have hci_conn_del
> performing the hci_req_sync_lock before calling hci_conn_del_sync then
> work out the code paths where hci_req_sync_lock is already held to use
> hci_conn_del_sync.

Taking hci_req_sync() before hci_conn_del() seems to face some issues:

[hdev->req_workqueue] hci_disconnect_sync()
[hdev->req_workqueue]   __hci_cmd_sync_status() -- req_lock still held

[hdev->workqueue] hci_disconn_complete_evt()
[hdev->workqueue]   hci_conn_del() -- blocks on req_lock

since the command may end up waiting for hci_conn_del. The disconnect
may also be spontaneous from controller, so looks like any
__hci_cmd_sync_status() could lock up.

AFAICS, you'd need something like a dedicated hdev->workqueue work that
does the final hci_conn put under req_lock, so that there is no job
running in either workqueue during the final put. And get it also right
in the hdev teardown.

***

For how the other refcount / lock approach would look in practice, iirc
the earlier RFC had all hci_lookup callsites dealt with (aside from
6lowpan)

https://lore.kernel.org/linux-bluetooth/cover.1758481869.git.pav@iki.fi/
 
> > For (ii): one would need to audit the places where these jobs are
> > queued, and make sure they are all done with hdev->lock held, to avoid
> > racing with the code here. Maybe doable with separate queueing function
> > that has lockdep asserts.
> > 
> > I suggested some time ago to always hold either refcount or lock to
> > keep the hci_conn alive everywhere, also in these hci_sync callbacks:
> > 
> > https://lore.kernel.org/linux-bluetooth/cover.1762100290.git.pav@iki.fi/
> > 
> > with similar changes as suggested in this patch. This may be the
> > simpler fix.
> > 
> > >         /* Remove the connection from the list and cleanup its remaining
> > >          * state. This is a separate function since for some cases like
> > >          * BT_CONNECT_SCAN we *only* want the cleanup part without the
> > >          * rest of hci_conn_del.
> > >          */
> > >         hci_conn_cleanup(conn);
> > > -
> > > -       /* Dequeue callbacks using connection pointer as data */
> > > -       hci_cmd_sync_dequeue(hdev, NULL, conn, NULL);
> > >  }
> > >  struct hci_dev *hci_get_route(bdaddr_t *dst, bdaddr_t *src, uint8_t src_type)
> > > 
> > > 
> > > >                 err = -EOPNOTSUPP;
> > > > 
> > > >         return err;
> > > > --
> > > > 2.52.0
> > > > 
> > > 
> > 
> > --
> > Pauli Virtanen
> 
>