[PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore

Lance Yang posted 3 patches 9 months, 1 week ago
There is a newer version of this series
include/linux/hung_task.h               | 94 +++++++++++++++++++++++++
include/linux/sched.h                   |  2 +-
include/linux/semaphore.h               | 15 +++-
kernel/hung_task.c                      | 52 +++++++++++---
kernel/locking/mutex.c                  |  8 ++-
kernel/locking/semaphore.c              | 55 +++++++++++++--
samples/Kconfig                         | 11 +--
samples/hung_task/Makefile              |  3 +-
samples/hung_task/hung_task_mutex.c     | 20 ++++--
samples/hung_task/hung_task_semaphore.c | 74 +++++++++++++++++++
10 files changed, 301 insertions(+), 33 deletions(-)
create mode 100644 include/linux/hung_task.h
create mode 100644 samples/hung_task/hung_task_semaphore.c
[PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore
Posted by Lance Yang 9 months, 1 week ago
Hi all,

Inspired by mutex blocker tracking[1], this patch series extend the
feature to not only dump the blocker task holding a mutex but also to
support semaphores. Unlike mutexes, semaphores lack explicit ownership
tracking, making it challenging to identify the root cause of hangs. To
address this, we introduce a last_holder field to the semaphore structure,
which is updated when a task successfully calls down() and cleared during
up().

The assumption is that if a task is blocked on a semaphore, the holders
must not have released it. While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.

With this change, the hung task detector can now show blocker task's info
like below:

[Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked for more than 122 seconds.
[Thu Mar 13 15:18:38 2025]       Tainted: G           OE      6.14.0-rc3+ #14
[Thu Mar 13 15:18:38 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Mar 13 15:18:38 2025] task:cat             state:D stack:0     pid:1803  tgid:1803  ppid:1057   task_flags:0x400000 flags:0x00000004
[Thu Mar 13 15:18:38 2025] Call trace:
[Thu Mar 13 15:18:38 2025]  __switch_to+0x1ec/0x380 (T)
[Thu Mar 13 15:18:38 2025]  __schedule+0xc30/0x44f8
[Thu Mar 13 15:18:38 2025]  schedule+0xb8/0x3b0
[Thu Mar 13 15:18:38 2025]  schedule_timeout+0x1d0/0x208
[Thu Mar 13 15:18:38 2025]  __down_common+0x2d4/0x6f8
[Thu Mar 13 15:18:38 2025]  __down+0x24/0x50
[Thu Mar 13 15:18:38 2025]  down+0xd0/0x140
[Thu Mar 13 15:18:38 2025]  read_dummy+0x3c/0xa0 [hung_task_sem]
[Thu Mar 13 15:18:38 2025]  full_proxy_read+0xfc/0x1d0
[Thu Mar 13 15:18:38 2025]  vfs_read+0x1a0/0x858
[Thu Mar 13 15:18:38 2025]  ksys_read+0x100/0x220
[Thu Mar 13 15:18:38 2025]  __arm64_sys_read+0x78/0xc8
[Thu Mar 13 15:18:38 2025]  invoke_syscall+0xd8/0x278
[Thu Mar 13 15:18:38 2025]  el0_svc_common.constprop.0+0xb8/0x298
[Thu Mar 13 15:18:38 2025]  do_el0_svc+0x4c/0x88
[Thu Mar 13 15:18:38 2025]  el0_svc+0x44/0x108
[Thu Mar 13 15:18:38 2025]  el0t_64_sync_handler+0x134/0x160
[Thu Mar 13 15:18:38 2025]  el0t_64_sync+0x1b8/0x1c0
[Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked on a semaphore likely last held by task cat:1802
[Thu Mar 13 15:18:38 2025] task:cat             state:S stack:0     pid:1802  tgid:1802  ppid:1057   task_flags:0x400000 flags:0x00000004
[Thu Mar 13 15:18:38 2025] Call trace:
[Thu Mar 13 15:18:38 2025]  __switch_to+0x1ec/0x380 (T)
[Thu Mar 13 15:18:38 2025]  __schedule+0xc30/0x44f8
[Thu Mar 13 15:18:38 2025]  schedule+0xb8/0x3b0
[Thu Mar 13 15:18:38 2025]  schedule_timeout+0xf4/0x208
[Thu Mar 13 15:18:38 2025]  msleep_interruptible+0x70/0x130
[Thu Mar 13 15:18:38 2025]  read_dummy+0x48/0xa0 [hung_task_sem]
[Thu Mar 13 15:18:38 2025]  full_proxy_read+0xfc/0x1d0
[Thu Mar 13 15:18:38 2025]  vfs_read+0x1a0/0x858
[Thu Mar 13 15:18:38 2025]  ksys_read+0x100/0x220
[Thu Mar 13 15:18:38 2025]  __arm64_sys_read+0x78/0xc8
[Thu Mar 13 15:18:38 2025]  invoke_syscall+0xd8/0x278
[Thu Mar 13 15:18:38 2025]  el0_svc_common.constprop.0+0xb8/0x298
[Thu Mar 13 15:18:38 2025]  do_el0_svc+0x4c/0x88
[Thu Mar 13 15:18:38 2025]  el0_svc+0x44/0x108
[Thu Mar 13 15:18:38 2025]  el0t_64_sync_handler+0x134/0x160
[Thu Mar 13 15:18:38 2025]  el0t_64_sync+0x1b8/0x1c0

[1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com

Thanks,
Lance

---
v1 -> v2:
 * Use one field to store the blocker as only one is active at a time,
 suggested by Masami
 * Leverage the LSB of the blocker field to reduce memory footprint,
 suggested by Masami
 * Add a hung_task detector semaphore blocking test sample code
 * https://lore.kernel.org/all/20250301055102.88746-1-ioworker0@gmail.com

Lance Yang (2):
  hung_task: replace blocker_mutex with encoded blocker
  hung_task: show the blocker task if the task is hung on semaphore

Zi Li (1):
  samples: add hung_task detector semaphore blocking sample

 include/linux/hung_task.h               | 94 +++++++++++++++++++++++++
 include/linux/sched.h                   |  2 +-
 include/linux/semaphore.h               | 15 +++-
 kernel/hung_task.c                      | 52 +++++++++++---
 kernel/locking/mutex.c                  |  8 ++-
 kernel/locking/semaphore.c              | 55 +++++++++++++--
 samples/Kconfig                         | 11 +--
 samples/hung_task/Makefile              |  3 +-
 samples/hung_task/hung_task_mutex.c     | 20 ++++--
 samples/hung_task/hung_task_semaphore.c | 74 +++++++++++++++++++
 10 files changed, 301 insertions(+), 33 deletions(-)
 create mode 100644 include/linux/hung_task.h
 create mode 100644 samples/hung_task/hung_task_semaphore.c

-- 
2.45.2
Re: [PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore
Posted by Boqun Feng 9 months, 1 week ago
Hi Lance,

On Fri, Mar 14, 2025 at 10:42:57PM +0800, Lance Yang wrote:
> Hi all,
> 
> Inspired by mutex blocker tracking[1], this patch series extend the
> feature to not only dump the blocker task holding a mutex but also to
> support semaphores. Unlike mutexes, semaphores lack explicit ownership
> tracking, making it challenging to identify the root cause of hangs. To
> address this, we introduce a last_holder field to the semaphore structure,
> which is updated when a task successfully calls down() and cleared during
> up().
> 
> The assumption is that if a task is blocked on a semaphore, the holders
> must not have released it. While this does not guarantee that the last
> holder is one of the current blockers, it likely provides a practical hint
> for diagnosing semaphore-related stalls.
> 

Could you copy John Stultz for the future versions? Because John is
working on proxy execution, which will make a task always track which
mutex it's blocked by:

	https://lore.kernel.org/lkml/20250312221147.1865364-3-jstultz@google.com/

I feel it's better to build the hung task detection with that in mind,
thanks!

Regards,
Boqun

[...]
Re: [PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore
Posted by Lance Yang 9 months, 1 week ago
On Sat, Mar 15, 2025 at 1:38 AM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> Hi Lance,
>
> On Fri, Mar 14, 2025 at 10:42:57PM +0800, Lance Yang wrote:
> > Hi all,
> >
> > Inspired by mutex blocker tracking[1], this patch series extend the
> > feature to not only dump the blocker task holding a mutex but also to
> > support semaphores. Unlike mutexes, semaphores lack explicit ownership
> > tracking, making it challenging to identify the root cause of hangs. To
> > address this, we introduce a last_holder field to the semaphore structure,
> > which is updated when a task successfully calls down() and cleared during
> > up().
> >
> > The assumption is that if a task is blocked on a semaphore, the holders
> > must not have released it. While this does not guarantee that the last
> > holder is one of the current blockers, it likely provides a practical hint
> > for diagnosing semaphore-related stalls.
> >
>
> Could you copy John Stultz for the future versions? Because John is
> working on proxy execution, which will make a task always track which
> mutex it's blocked by:
>
>         https://lore.kernel.org/lkml/20250312221147.1865364-3-jstultz@google.com/
>
> I feel it's better to build the hung task detection with that in mind,
> thanks!

Yeah. Thanks for letting me know. I will keep John in the loop ;)

Thanks,
Lance

>
> Regards,
> Boqun
>
> [...]