include/linux/hung_task.h | 94 +++++++++++++++++++++++++ include/linux/sched.h | 2 +- include/linux/semaphore.h | 15 +++- kernel/hung_task.c | 52 +++++++++++--- kernel/locking/mutex.c | 8 ++- kernel/locking/semaphore.c | 55 +++++++++++++-- samples/Kconfig | 11 +-- samples/hung_task/Makefile | 3 +- samples/hung_task/hung_task_mutex.c | 20 ++++-- samples/hung_task/hung_task_semaphore.c | 74 +++++++++++++++++++ 10 files changed, 301 insertions(+), 33 deletions(-) create mode 100644 include/linux/hung_task.h create mode 100644 samples/hung_task/hung_task_semaphore.c
Hi all, Inspired by mutex blocker tracking[1], this patch series extend the feature to not only dump the blocker task holding a mutex but also to support semaphores. Unlike mutexes, semaphores lack explicit ownership tracking, making it challenging to identify the root cause of hangs. To address this, we introduce a last_holder field to the semaphore structure, which is updated when a task successfully calls down() and cleared during up(). The assumption is that if a task is blocked on a semaphore, the holders must not have released it. While this does not guarantee that the last holder is one of the current blockers, it likely provides a practical hint for diagnosing semaphore-related stalls. With this change, the hung task detector can now show blocker task's info like below: [Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked for more than 122 seconds. [Thu Mar 13 15:18:38 2025] Tainted: G OE 6.14.0-rc3+ #14 [Thu Mar 13 15:18:38 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Mar 13 15:18:38 2025] task:cat state:D stack:0 pid:1803 tgid:1803 ppid:1057 task_flags:0x400000 flags:0x00000004 [Thu Mar 13 15:18:38 2025] Call trace: [Thu Mar 13 15:18:38 2025] __switch_to+0x1ec/0x380 (T) [Thu Mar 13 15:18:38 2025] __schedule+0xc30/0x44f8 [Thu Mar 13 15:18:38 2025] schedule+0xb8/0x3b0 [Thu Mar 13 15:18:38 2025] schedule_timeout+0x1d0/0x208 [Thu Mar 13 15:18:38 2025] __down_common+0x2d4/0x6f8 [Thu Mar 13 15:18:38 2025] __down+0x24/0x50 [Thu Mar 13 15:18:38 2025] down+0xd0/0x140 [Thu Mar 13 15:18:38 2025] read_dummy+0x3c/0xa0 [hung_task_sem] [Thu Mar 13 15:18:38 2025] full_proxy_read+0xfc/0x1d0 [Thu Mar 13 15:18:38 2025] vfs_read+0x1a0/0x858 [Thu Mar 13 15:18:38 2025] ksys_read+0x100/0x220 [Thu Mar 13 15:18:38 2025] __arm64_sys_read+0x78/0xc8 [Thu Mar 13 15:18:38 2025] invoke_syscall+0xd8/0x278 [Thu Mar 13 15:18:38 2025] el0_svc_common.constprop.0+0xb8/0x298 [Thu Mar 13 15:18:38 2025] do_el0_svc+0x4c/0x88 [Thu Mar 13 15:18:38 2025] el0_svc+0x44/0x108 [Thu Mar 13 15:18:38 2025] el0t_64_sync_handler+0x134/0x160 [Thu Mar 13 15:18:38 2025] el0t_64_sync+0x1b8/0x1c0 [Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked on a semaphore likely last held by task cat:1802 [Thu Mar 13 15:18:38 2025] task:cat state:S stack:0 pid:1802 tgid:1802 ppid:1057 task_flags:0x400000 flags:0x00000004 [Thu Mar 13 15:18:38 2025] Call trace: [Thu Mar 13 15:18:38 2025] __switch_to+0x1ec/0x380 (T) [Thu Mar 13 15:18:38 2025] __schedule+0xc30/0x44f8 [Thu Mar 13 15:18:38 2025] schedule+0xb8/0x3b0 [Thu Mar 13 15:18:38 2025] schedule_timeout+0xf4/0x208 [Thu Mar 13 15:18:38 2025] msleep_interruptible+0x70/0x130 [Thu Mar 13 15:18:38 2025] read_dummy+0x48/0xa0 [hung_task_sem] [Thu Mar 13 15:18:38 2025] full_proxy_read+0xfc/0x1d0 [Thu Mar 13 15:18:38 2025] vfs_read+0x1a0/0x858 [Thu Mar 13 15:18:38 2025] ksys_read+0x100/0x220 [Thu Mar 13 15:18:38 2025] __arm64_sys_read+0x78/0xc8 [Thu Mar 13 15:18:38 2025] invoke_syscall+0xd8/0x278 [Thu Mar 13 15:18:38 2025] el0_svc_common.constprop.0+0xb8/0x298 [Thu Mar 13 15:18:38 2025] do_el0_svc+0x4c/0x88 [Thu Mar 13 15:18:38 2025] el0_svc+0x44/0x108 [Thu Mar 13 15:18:38 2025] el0t_64_sync_handler+0x134/0x160 [Thu Mar 13 15:18:38 2025] el0t_64_sync+0x1b8/0x1c0 [1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com Thanks, Lance --- v1 -> v2: * Use one field to store the blocker as only one is active at a time, suggested by Masami * Leverage the LSB of the blocker field to reduce memory footprint, suggested by Masami * Add a hung_task detector semaphore blocking test sample code * https://lore.kernel.org/all/20250301055102.88746-1-ioworker0@gmail.com Lance Yang (2): hung_task: replace blocker_mutex with encoded blocker hung_task: show the blocker task if the task is hung on semaphore Zi Li (1): samples: add hung_task detector semaphore blocking sample include/linux/hung_task.h | 94 +++++++++++++++++++++++++ include/linux/sched.h | 2 +- include/linux/semaphore.h | 15 +++- kernel/hung_task.c | 52 +++++++++++--- kernel/locking/mutex.c | 8 ++- kernel/locking/semaphore.c | 55 +++++++++++++-- samples/Kconfig | 11 +-- samples/hung_task/Makefile | 3 +- samples/hung_task/hung_task_mutex.c | 20 ++++-- samples/hung_task/hung_task_semaphore.c | 74 +++++++++++++++++++ 10 files changed, 301 insertions(+), 33 deletions(-) create mode 100644 include/linux/hung_task.h create mode 100644 samples/hung_task/hung_task_semaphore.c -- 2.45.2
Oops, I got the version wrong and will resend the new one right away. Thanks, Lance On Fri, Mar 14, 2025 at 10:29 PM Lance Yang <ioworker0@gmail.com> wrote: > > Hi all, > > Inspired by mutex blocker tracking[1], this patch series extend the > feature to not only dump the blocker task holding a mutex but also to > support semaphores. Unlike mutexes, semaphores lack explicit ownership > tracking, making it challenging to identify the root cause of hangs. To > address this, we introduce a last_holder field to the semaphore structure, > which is updated when a task successfully calls down() and cleared during > up(). > > The assumption is that if a task is blocked on a semaphore, the holders > must not have released it. While this does not guarantee that the last > holder is one of the current blockers, it likely provides a practical hint > for diagnosing semaphore-related stalls. > > With this change, the hung task detector can now show blocker task's info > like below: > > [Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked for more than 122 seconds. > [Thu Mar 13 15:18:38 2025] Tainted: G OE 6.14.0-rc3+ #14 > [Thu Mar 13 15:18:38 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [Thu Mar 13 15:18:38 2025] task:cat state:D stack:0 pid:1803 tgid:1803 ppid:1057 task_flags:0x400000 flags:0x00000004 > [Thu Mar 13 15:18:38 2025] Call trace: > [Thu Mar 13 15:18:38 2025] __switch_to+0x1ec/0x380 (T) > [Thu Mar 13 15:18:38 2025] __schedule+0xc30/0x44f8 > [Thu Mar 13 15:18:38 2025] schedule+0xb8/0x3b0 > [Thu Mar 13 15:18:38 2025] schedule_timeout+0x1d0/0x208 > [Thu Mar 13 15:18:38 2025] __down_common+0x2d4/0x6f8 > [Thu Mar 13 15:18:38 2025] __down+0x24/0x50 > [Thu Mar 13 15:18:38 2025] down+0xd0/0x140 > [Thu Mar 13 15:18:38 2025] read_dummy+0x3c/0xa0 [hung_task_sem] > [Thu Mar 13 15:18:38 2025] full_proxy_read+0xfc/0x1d0 > [Thu Mar 13 15:18:38 2025] vfs_read+0x1a0/0x858 > [Thu Mar 13 15:18:38 2025] ksys_read+0x100/0x220 > [Thu Mar 13 15:18:38 2025] __arm64_sys_read+0x78/0xc8 > [Thu Mar 13 15:18:38 2025] invoke_syscall+0xd8/0x278 > [Thu Mar 13 15:18:38 2025] el0_svc_common.constprop.0+0xb8/0x298 > [Thu Mar 13 15:18:38 2025] do_el0_svc+0x4c/0x88 > [Thu Mar 13 15:18:38 2025] el0_svc+0x44/0x108 > [Thu Mar 13 15:18:38 2025] el0t_64_sync_handler+0x134/0x160 > [Thu Mar 13 15:18:38 2025] el0t_64_sync+0x1b8/0x1c0 > [Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked on a semaphore likely last held by task cat:1802 > [Thu Mar 13 15:18:38 2025] task:cat state:S stack:0 pid:1802 tgid:1802 ppid:1057 task_flags:0x400000 flags:0x00000004 > [Thu Mar 13 15:18:38 2025] Call trace: > [Thu Mar 13 15:18:38 2025] __switch_to+0x1ec/0x380 (T) > [Thu Mar 13 15:18:38 2025] __schedule+0xc30/0x44f8 > [Thu Mar 13 15:18:38 2025] schedule+0xb8/0x3b0 > [Thu Mar 13 15:18:38 2025] schedule_timeout+0xf4/0x208 > [Thu Mar 13 15:18:38 2025] msleep_interruptible+0x70/0x130 > [Thu Mar 13 15:18:38 2025] read_dummy+0x48/0xa0 [hung_task_sem] > [Thu Mar 13 15:18:38 2025] full_proxy_read+0xfc/0x1d0 > [Thu Mar 13 15:18:38 2025] vfs_read+0x1a0/0x858 > [Thu Mar 13 15:18:38 2025] ksys_read+0x100/0x220 > [Thu Mar 13 15:18:38 2025] __arm64_sys_read+0x78/0xc8 > [Thu Mar 13 15:18:38 2025] invoke_syscall+0xd8/0x278 > [Thu Mar 13 15:18:38 2025] el0_svc_common.constprop.0+0xb8/0x298 > [Thu Mar 13 15:18:38 2025] do_el0_svc+0x4c/0x88 > [Thu Mar 13 15:18:38 2025] el0_svc+0x44/0x108 > [Thu Mar 13 15:18:38 2025] el0t_64_sync_handler+0x134/0x160 > [Thu Mar 13 15:18:38 2025] el0t_64_sync+0x1b8/0x1c0 > > [1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com > > Thanks, > Lance > > --- > v1 -> v2: > * Use one field to store the blocker as only one is active at a time, > suggested by Masami > * Leverage the LSB of the blocker field to reduce memory footprint, > suggested by Masami > * Add a hung_task detector semaphore blocking test sample code > * https://lore.kernel.org/all/20250301055102.88746-1-ioworker0@gmail.com > > Lance Yang (2): > hung_task: replace blocker_mutex with encoded blocker > hung_task: show the blocker task if the task is hung on semaphore > > Zi Li (1): > samples: add hung_task detector semaphore blocking sample > > include/linux/hung_task.h | 94 +++++++++++++++++++++++++ > include/linux/sched.h | 2 +- > include/linux/semaphore.h | 15 +++- > kernel/hung_task.c | 52 +++++++++++--- > kernel/locking/mutex.c | 8 ++- > kernel/locking/semaphore.c | 55 +++++++++++++-- > samples/Kconfig | 11 +-- > samples/hung_task/Makefile | 3 +- > samples/hung_task/hung_task_mutex.c | 20 ++++-- > samples/hung_task/hung_task_semaphore.c | 74 +++++++++++++++++++ > 10 files changed, 301 insertions(+), 33 deletions(-) > create mode 100644 include/linux/hung_task.h > create mode 100644 samples/hung_task/hung_task_semaphore.c > > -- > 2.45.2 >
© 2016 - 2026 Red Hat, Inc.