tools/perf/Documentation/perf-lock.txt | 5 +- tools/perf/builtin-lock.c | 56 +++- tools/perf/util/bpf_lock_contention.c | 85 +++++- .../perf/util/bpf_skel/lock_contention.bpf.c | 245 +++++++++++++++++- tools/perf/util/bpf_skel/lock_data.h | 7 + tools/perf/util/lock-contention.h | 7 + 6 files changed, 372 insertions(+), 33 deletions(-)
For perf lock contention, the current owner tracking (-o option) only
works with per-thread mode (-t option). Enabling call stack mode for
owner can be useful for diagnosing why a system running slow in
lock contention.
Example output:
$ sudo ~/linux/tools/perf/perf lock con -abvo -Y mutex -E16 perf bench sched pipe
...
contended total wait max wait avg wait type caller
171 1.55 ms 20.26 us 9.06 us mutex pipe_read+0x57
0xffffffffac6318e7 pipe_read+0x57
0xffffffffac623862 vfs_read+0x332
0xffffffffac62434b ksys_read+0xbb
0xfffffffface604b2 do_syscall_64+0x82
0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76
36 193.71 us 15.27 us 5.38 us mutex pipe_write+0x50
0xffffffffac631ee0 pipe_write+0x50
0xffffffffac6241db vfs_write+0x3bb
0xffffffffac6244ab ksys_write+0xbb
0xfffffffface604b2 do_syscall_64+0x82
0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76
4 51.22 us 16.47 us 12.80 us mutex do_epoll_wait+0x24d
0xffffffffac691f0d do_epoll_wait+0x24d
0xffffffffac69249b do_epoll_pwait.part.0+0xb
0xffffffffac693ba5 __x64_sys_epoll_pwait+0x95
0xfffffffface604b2 do_syscall_64+0x82
0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76
2 20.88 us 11.95 us 10.44 us mutex do_epoll_wait+0x24d
0xffffffffac691f0d do_epoll_wait+0x24d
0xffffffffac693943 __x64_sys_epoll_wait+0x73
0xfffffffface604b2 do_syscall_64+0x82
0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76
1 7.33 us 7.33 us 7.33 us mutex do_epoll_ctl+0x6c1
0xffffffffac692e01 do_epoll_ctl+0x6c1
0xffffffffac6937e0 __x64_sys_epoll_ctl+0x70
0xfffffffface604b2 do_syscall_64+0x82
0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76
1 6.64 us 6.64 us 6.64 us mutex do_epoll_ctl+0x3d4
0xffffffffac692b14 do_epoll_ctl+0x3d4
0xffffffffac6937e0 __x64_sys_epoll_ctl+0x70
0xfffffffface604b2 do_syscall_64+0x82
0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76
=== owner stack trace ===
3 31.24 us 15.27 us 10.41 us mutex pipe_read+0x348
0xffffffffac631bd8 pipe_read+0x348
0xffffffffac623862 vfs_read+0x332
0xffffffffac62434b ksys_read+0xbb
0xfffffffface604b2 do_syscall_64+0x82
0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76
...
v8:
Fix compilation error found by Athira Rajeev and Namhyung Kim.
v7: lore.kernel.org/20250224184742.4144931-1-ctshao@google.com
Remove duplicate contention records.
v6: lore.kernel.org/20250219214400.3317548-1-ctshao@google.com
Free allocated memory in error patch.
Add description in man page.
v5: lore.kernel.org/20250212222859.2086080-1-ctshao@google.com
Move duplicated code into function.
Remove code to retrieve undesired callstack at the end of `contention_end()`.
Other minor fix based on Namhyung's review.
v4: lore.kernel.org/20250130052510.860318-1-ctshao@google.com
Use `__sync_fetch_and_add()` to generate owner stackid automatically.
Use `__sync_fetch_and_add(..., -1)` to workaround compiler error from
`__sync_fetch_and_sub()`
Remove unnecessary include headers.
Dedicate on C-style comment.
Other minor fix based on Namhyung's review.
v3: lore.kernel.org/20250129001905.619859-1-ctshao@google.com
Rename/shorten multiple variables.
Implement owner stackid.
Add description for lock function return code in `contention_end()`.
Other minor fix based on Namhyung's review.
v2: lore.kernel.org/20250113052220.2105645-1-ctshao@google.com
Fix logic deficit in v1 patch 2/4.
v1: lore.kernel.org/20250110051346.1507178-1-ctshao@google.com
Chun-Tse Shao (4):
perf lock: Add bpf maps for owner stack tracing
perf lock: Retrieve owner callstack in bpf program
perf lock: Make rb_tree helper functions generic
perf lock: Report owner stack in usermode
tools/perf/Documentation/perf-lock.txt | 5 +-
tools/perf/builtin-lock.c | 56 +++-
tools/perf/util/bpf_lock_contention.c | 85 +++++-
.../perf/util/bpf_skel/lock_contention.bpf.c | 245 +++++++++++++++++-
tools/perf/util/bpf_skel/lock_data.h | 7 +
tools/perf/util/lock-contention.h | 7 +
6 files changed, 372 insertions(+), 33 deletions(-)
--
2.48.1.658.g4767266eb4-goog
On Wed, 26 Feb 2025 16:28:52 -0800, Chun-Tse Shao wrote: > For perf lock contention, the current owner tracking (-o option) only > works with per-thread mode (-t option). Enabling call stack mode for > owner can be useful for diagnosing why a system running slow in > lock contention. > > Example output: > $ sudo ~/linux/tools/perf/perf lock con -abvo -Y mutex -E16 perf bench sched pipe > ... > contended total wait max wait avg wait type caller > > [...] Applied to perf-tools-next, thanks! Best regards, Namhyung
> On 27 Feb 2025, at 5:58 AM, Chun-Tse Shao <ctshao@google.com> wrote: > > For perf lock contention, the current owner tracking (-o option) only > works with per-thread mode (-t option). Enabling call stack mode for > owner can be useful for diagnosing why a system running slow in > lock contention. > > Example output: > $ sudo ~/linux/tools/perf/perf lock con -abvo -Y mutex -E16 perf bench sched pipe > ... > contended total wait max wait avg wait type caller > > 171 1.55 ms 20.26 us 9.06 us mutex pipe_read+0x57 > 0xffffffffac6318e7 pipe_read+0x57 > 0xffffffffac623862 vfs_read+0x332 > 0xffffffffac62434b ksys_read+0xbb > 0xfffffffface604b2 do_syscall_64+0x82 > 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 > 36 193.71 us 15.27 us 5.38 us mutex pipe_write+0x50 > 0xffffffffac631ee0 pipe_write+0x50 > 0xffffffffac6241db vfs_write+0x3bb > 0xffffffffac6244ab ksys_write+0xbb > 0xfffffffface604b2 do_syscall_64+0x82 > 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 > 4 51.22 us 16.47 us 12.80 us mutex do_epoll_wait+0x24d > 0xffffffffac691f0d do_epoll_wait+0x24d > 0xffffffffac69249b do_epoll_pwait.part.0+0xb > 0xffffffffac693ba5 __x64_sys_epoll_pwait+0x95 > 0xfffffffface604b2 do_syscall_64+0x82 > 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 > 2 20.88 us 11.95 us 10.44 us mutex do_epoll_wait+0x24d > 0xffffffffac691f0d do_epoll_wait+0x24d > 0xffffffffac693943 __x64_sys_epoll_wait+0x73 > 0xfffffffface604b2 do_syscall_64+0x82 > 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 > 1 7.33 us 7.33 us 7.33 us mutex do_epoll_ctl+0x6c1 > 0xffffffffac692e01 do_epoll_ctl+0x6c1 > 0xffffffffac6937e0 __x64_sys_epoll_ctl+0x70 > 0xfffffffface604b2 do_syscall_64+0x82 > 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 > 1 6.64 us 6.64 us 6.64 us mutex do_epoll_ctl+0x3d4 > 0xffffffffac692b14 do_epoll_ctl+0x3d4 > 0xffffffffac6937e0 __x64_sys_epoll_ctl+0x70 > 0xfffffffface604b2 do_syscall_64+0x82 > 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 > > === owner stack trace === > > 3 31.24 us 15.27 us 10.41 us mutex pipe_read+0x348 > 0xffffffffac631bd8 pipe_read+0x348 > 0xffffffffac623862 vfs_read+0x332 > 0xffffffffac62434b ksys_read+0xbb > 0xfffffffface604b2 do_syscall_64+0x82 > 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 > ... > > v8: > Fix compilation error found by Athira Rajeev and Namhyung Kim. Tested with v8 and compiles fine, thanks for addressing the issue. Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> > > v7: lore.kernel.org/20250224184742.4144931-1-ctshao@google.com > Remove duplicate contention records. > > v6: lore.kernel.org/20250219214400.3317548-1-ctshao@google.com > Free allocated memory in error patch. > Add description in man page. > > v5: lore.kernel.org/20250212222859.2086080-1-ctshao@google.com > Move duplicated code into function. > Remove code to retrieve undesired callstack at the end of `contention_end()`. > Other minor fix based on Namhyung's review. > > v4: lore.kernel.org/20250130052510.860318-1-ctshao@google.com > Use `__sync_fetch_and_add()` to generate owner stackid automatically. > Use `__sync_fetch_and_add(..., -1)` to workaround compiler error from > `__sync_fetch_and_sub()` > Remove unnecessary include headers. > Dedicate on C-style comment. > Other minor fix based on Namhyung's review. > > v3: lore.kernel.org/20250129001905.619859-1-ctshao@google.com > Rename/shorten multiple variables. > Implement owner stackid. > Add description for lock function return code in `contention_end()`. > Other minor fix based on Namhyung's review. > > v2: lore.kernel.org/20250113052220.2105645-1-ctshao@google.com > Fix logic deficit in v1 patch 2/4. > > v1: lore.kernel.org/20250110051346.1507178-1-ctshao@google.com > > Chun-Tse Shao (4): > perf lock: Add bpf maps for owner stack tracing > perf lock: Retrieve owner callstack in bpf program > perf lock: Make rb_tree helper functions generic > perf lock: Report owner stack in usermode > > tools/perf/Documentation/perf-lock.txt | 5 +- > tools/perf/builtin-lock.c | 56 +++- > tools/perf/util/bpf_lock_contention.c | 85 +++++- > .../perf/util/bpf_skel/lock_contention.bpf.c | 245 +++++++++++++++++- > tools/perf/util/bpf_skel/lock_data.h | 7 + > tools/perf/util/lock-contention.h | 7 + > 6 files changed, 372 insertions(+), 33 deletions(-) > > -- > 2.48.1.658.g4767266eb4-goog > >
Thank you Athira! On Thu, Feb 27, 2025 at 3:11 AM Athira Rajeev <atrajeev@linux.ibm.com> wrote: > > > > > On 27 Feb 2025, at 5:58 AM, Chun-Tse Shao <ctshao@google.com> wrote: > > > > For perf lock contention, the current owner tracking (-o option) only > > works with per-thread mode (-t option). Enabling call stack mode for > > owner can be useful for diagnosing why a system running slow in > > lock contention. > > > > Example output: > > $ sudo ~/linux/tools/perf/perf lock con -abvo -Y mutex -E16 perf bench sched pipe > > ... > > contended total wait max wait avg wait type caller > > > > 171 1.55 ms 20.26 us 9.06 us mutex pipe_read+0x57 > > 0xffffffffac6318e7 pipe_read+0x57 > > 0xffffffffac623862 vfs_read+0x332 > > 0xffffffffac62434b ksys_read+0xbb > > 0xfffffffface604b2 do_syscall_64+0x82 > > 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 > > 36 193.71 us 15.27 us 5.38 us mutex pipe_write+0x50 > > 0xffffffffac631ee0 pipe_write+0x50 > > 0xffffffffac6241db vfs_write+0x3bb > > 0xffffffffac6244ab ksys_write+0xbb > > 0xfffffffface604b2 do_syscall_64+0x82 > > 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 > > 4 51.22 us 16.47 us 12.80 us mutex do_epoll_wait+0x24d > > 0xffffffffac691f0d do_epoll_wait+0x24d > > 0xffffffffac69249b do_epoll_pwait.part.0+0xb > > 0xffffffffac693ba5 __x64_sys_epoll_pwait+0x95 > > 0xfffffffface604b2 do_syscall_64+0x82 > > 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 > > 2 20.88 us 11.95 us 10.44 us mutex do_epoll_wait+0x24d > > 0xffffffffac691f0d do_epoll_wait+0x24d > > 0xffffffffac693943 __x64_sys_epoll_wait+0x73 > > 0xfffffffface604b2 do_syscall_64+0x82 > > 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 > > 1 7.33 us 7.33 us 7.33 us mutex do_epoll_ctl+0x6c1 > > 0xffffffffac692e01 do_epoll_ctl+0x6c1 > > 0xffffffffac6937e0 __x64_sys_epoll_ctl+0x70 > > 0xfffffffface604b2 do_syscall_64+0x82 > > 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 > > 1 6.64 us 6.64 us 6.64 us mutex do_epoll_ctl+0x3d4 > > 0xffffffffac692b14 do_epoll_ctl+0x3d4 > > 0xffffffffac6937e0 __x64_sys_epoll_ctl+0x70 > > 0xfffffffface604b2 do_syscall_64+0x82 > > 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 > > > > === owner stack trace === > > > > 3 31.24 us 15.27 us 10.41 us mutex pipe_read+0x348 > > 0xffffffffac631bd8 pipe_read+0x348 > > 0xffffffffac623862 vfs_read+0x332 > > 0xffffffffac62434b ksys_read+0xbb > > 0xfffffffface604b2 do_syscall_64+0x82 > > 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 > > ... > > > > v8: > > Fix compilation error found by Athira Rajeev and Namhyung Kim. > > Tested with v8 and compiles fine, thanks for addressing the issue. > > Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> > > > > > v7: lore.kernel.org/20250224184742.4144931-1-ctshao@google.com > > Remove duplicate contention records. > > > > v6: lore.kernel.org/20250219214400.3317548-1-ctshao@google.com > > Free allocated memory in error patch. > > Add description in man page. > > > > v5: lore.kernel.org/20250212222859.2086080-1-ctshao@google.com > > Move duplicated code into function. > > Remove code to retrieve undesired callstack at the end of `contention_end()`. > > Other minor fix based on Namhyung's review. > > > > v4: lore.kernel.org/20250130052510.860318-1-ctshao@google.com > > Use `__sync_fetch_and_add()` to generate owner stackid automatically. > > Use `__sync_fetch_and_add(..., -1)` to workaround compiler error from > > `__sync_fetch_and_sub()` > > Remove unnecessary include headers. > > Dedicate on C-style comment. > > Other minor fix based on Namhyung's review. > > > > v3: lore.kernel.org/20250129001905.619859-1-ctshao@google.com > > Rename/shorten multiple variables. > > Implement owner stackid. > > Add description for lock function return code in `contention_end()`. > > Other minor fix based on Namhyung's review. > > > > v2: lore.kernel.org/20250113052220.2105645-1-ctshao@google.com > > Fix logic deficit in v1 patch 2/4. > > > > v1: lore.kernel.org/20250110051346.1507178-1-ctshao@google.com > > > > Chun-Tse Shao (4): > > perf lock: Add bpf maps for owner stack tracing > > perf lock: Retrieve owner callstack in bpf program > > perf lock: Make rb_tree helper functions generic > > perf lock: Report owner stack in usermode > > > > tools/perf/Documentation/perf-lock.txt | 5 +- > > tools/perf/builtin-lock.c | 56 +++- > > tools/perf/util/bpf_lock_contention.c | 85 +++++- > > .../perf/util/bpf_skel/lock_contention.bpf.c | 245 +++++++++++++++++- > > tools/perf/util/bpf_skel/lock_data.h | 7 + > > tools/perf/util/lock-contention.h | 7 + > > 6 files changed, 372 insertions(+), 33 deletions(-) > > > > -- > > 2.48.1.658.g4767266eb4-goog > > > > >
© 2016 - 2025 Red Hat, Inc.