drivers/thunderbolt/ctl.c | 5 +++++ 1 file changed, 5 insertions(+)
Some of our devices crash in tb_cfg_request_dequeue():
general protection fault, probably for non-canonical address 0xdead000000000122
CPU: 6 PID: 91007 Comm: kworker/6:2 Tainted: G U W 6.6.65
RIP: 0010:tb_cfg_request_dequeue+0x2d/0xa0
Call Trace:
<TASK>
? tb_cfg_request_dequeue+0x2d/0xa0
tb_cfg_request_work+0x33/0x80
worker_thread+0x386/0x8f0
kthread+0xed/0x110
ret_from_fork+0x38/0x50
ret_from_fork_asm+0x1b/0x30
The circumstances are unclear, however, the theory is that
tb_cfg_request_work() can be scheduled twice for a request:
first time via frame.callback from ring_work() and second
time from tb_cfg_request(). Both times kworkers will execute
tb_cfg_request_dequeue(), which results in double list_del()
from the ctl->request_queue (the list poison deference hints
at it: 0xdead000000000122).
Do not dequeue requests that don't have TB_CFG_REQUEST_ACTIVE
bit set.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: stable@vger.kernel.org
---
v3: tweaked commit message
drivers/thunderbolt/ctl.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
index cd15e84c47f4..1db2e951b53f 100644
--- a/drivers/thunderbolt/ctl.c
+++ b/drivers/thunderbolt/ctl.c
@@ -151,6 +151,11 @@ static void tb_cfg_request_dequeue(struct tb_cfg_request *req)
struct tb_ctl *ctl = req->ctl;
mutex_lock(&ctl->request_queue_lock);
+ if (!test_bit(TB_CFG_REQUEST_ACTIVE, &req->flags)) {
+ mutex_unlock(&ctl->request_queue_lock);
+ return;
+ }
+
list_del(&req->list);
clear_bit(TB_CFG_REQUEST_ACTIVE, &req->flags);
if (test_bit(TB_CFG_REQUEST_CANCELED, &req->flags))
--
2.49.0.395.g12beb8f557-goog
On (25/03/28 00:03), Sergey Senozhatsky wrote:
> Some of our devices crash in tb_cfg_request_dequeue():
>
> general protection fault, probably for non-canonical address 0xdead000000000122
>
> CPU: 6 PID: 91007 Comm: kworker/6:2 Tainted: G U W 6.6.65
> RIP: 0010:tb_cfg_request_dequeue+0x2d/0xa0
> Call Trace:
> <TASK>
> ? tb_cfg_request_dequeue+0x2d/0xa0
> tb_cfg_request_work+0x33/0x80
> worker_thread+0x386/0x8f0
> kthread+0xed/0x110
> ret_from_fork+0x38/0x50
> ret_from_fork_asm+0x1b/0x30
>
> The circumstances are unclear, however, the theory is that
> tb_cfg_request_work() can be scheduled twice for a request:
> first time via frame.callback from ring_work() and second
> time from tb_cfg_request(). Both times kworkers will execute
> tb_cfg_request_dequeue(), which results in double list_del()
> from the ctl->request_queue (the list poison deference hints
> at it: 0xdead000000000122).
>
> Do not dequeue requests that don't have TB_CFG_REQUEST_ACTIVE
> bit set.
Mika, as was discussed in [1] thread we rolled out the fix to
our fleet and we don't see the crashes anymore. So it's tested
and verified.
[1] https://lore.kernel.org/linux-kernel/20250327145543.GC3152277@black.fi.intel.com
> ---
>
> v3: tweaked commit message
>
> drivers/thunderbolt/ctl.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
> index cd15e84c47f4..1db2e951b53f 100644
> --- a/drivers/thunderbolt/ctl.c
> +++ b/drivers/thunderbolt/ctl.c
> @@ -151,6 +151,11 @@ static void tb_cfg_request_dequeue(struct tb_cfg_request *req)
> struct tb_ctl *ctl = req->ctl;
>
> mutex_lock(&ctl->request_queue_lock);
> + if (!test_bit(TB_CFG_REQUEST_ACTIVE, &req->flags)) {
> + mutex_unlock(&ctl->request_queue_lock);
> + return;
> + }
> +
> list_del(&req->list);
> clear_bit(TB_CFG_REQUEST_ACTIVE, &req->flags);
> if (test_bit(TB_CFG_REQUEST_CANCELED, &req->flags))
> --
> 2.49.0.395.g12beb8f557-goog
>
On Thu, May 08, 2025 at 12:47:18PM +0900, Sergey Senozhatsky wrote: > On (25/03/28 00:03), Sergey Senozhatsky wrote: > > Some of our devices crash in tb_cfg_request_dequeue(): > > > > general protection fault, probably for non-canonical address 0xdead000000000122 > > > > CPU: 6 PID: 91007 Comm: kworker/6:2 Tainted: G U W 6.6.65 > > RIP: 0010:tb_cfg_request_dequeue+0x2d/0xa0 > > Call Trace: > > <TASK> > > ? tb_cfg_request_dequeue+0x2d/0xa0 > > tb_cfg_request_work+0x33/0x80 > > worker_thread+0x386/0x8f0 > > kthread+0xed/0x110 > > ret_from_fork+0x38/0x50 > > ret_from_fork_asm+0x1b/0x30 > > > > The circumstances are unclear, however, the theory is that > > tb_cfg_request_work() can be scheduled twice for a request: > > first time via frame.callback from ring_work() and second > > time from tb_cfg_request(). Both times kworkers will execute > > tb_cfg_request_dequeue(), which results in double list_del() > > from the ctl->request_queue (the list poison deference hints > > at it: 0xdead000000000122). > > > > Do not dequeue requests that don't have TB_CFG_REQUEST_ACTIVE > > bit set. > > Mika, as was discussed in [1] thread we rolled out the fix to > our fleet and we don't see the crashes anymore. So it's tested > and verified. Cool, thanks! Applied to thunderbolt.git/fixes.
© 2016 - 2025 Red Hat, Inc.