[PATCH v14 00/10] perf record --off-cpu: Dump off-cpu samples directly

Howard Chu posted 10 patches 1 year ago
There is a newer version of this series
tools/perf/Documentation/perf-record.txt |   9 ++
tools/perf/builtin-record.c              |  26 +++++
tools/perf/builtin-script.c              |   4 +-
tools/perf/tests/builtin-test.c          |   1 +
tools/perf/tests/shell/record_offcpu.sh  |  35 ++++++-
tools/perf/tests/tests.h                 |   1 +
tools/perf/tests/workloads/Build         |   1 +
tools/perf/tests/workloads/offcpu.c      |  16 +++
tools/perf/util/bpf_off_cpu.c            | 118 ++++++++++++++---------
tools/perf/util/bpf_skel/off_cpu.bpf.c   |  97 ++++++++++++++++++-
tools/perf/util/evsel.c                  |  41 +++++++-
tools/perf/util/evsel.h                  |   2 +
tools/perf/util/off_cpu.h                |   3 +-
tools/perf/util/record.h                 |   1 +
14 files changed, 297 insertions(+), 58 deletions(-)
create mode 100644 tools/perf/tests/workloads/offcpu.c
[PATCH v14 00/10] perf record --off-cpu: Dump off-cpu samples directly
Posted by Howard Chu 1 year ago
Changes in v14:
 - Change the internal off_cpu_thresh_us to off_cpu_thresh_ns, i.e. use
   nsec instead of usec

Changes in v13:
 - Move the definition of 'off_cpu_thresh_ns' to the same commit as
   dumping off-cpu samples in BPF, and give off_cpu_thresh_ns a default
   value before the --off-cpu-thresh option is parsed.

Changes in v12:
 - Restore patches' bisectability, because the ordering of patches has
   changed.
 - Change 'us = ms * 1000' to 'us = ms * USEC_PER_MSEC'

Changes in v11:
 - Modify the options used in the off-cpu tests, as I changed the unit
   of the off-cpu threshold to milliseconds.

Changes in v10:
 - Move the commit "perf record --off-cpu: Add --off-cpu-thresh option"
   to where the direct sample feature is completed.
 - Make --off-cpu-thresh use milliseconds as the unit.

Changes in v9:
 - Add documentation for the new option '--off-cpu-thresh', and include
   an example of its usage in the commit message
 - Set inherit in evsel__config() to prevent future modifications
 - Support off-cpu sample data collected by perf before this patch series

Changes in v8:
 - Make this series bisectable
 - Rename off_cpu_thresh to off_cpu_thresh_us and offcpu_thresh (in BPF)
   to offcpu_thresh_ns for clarity
 - Add commit messages to 'perf evsel: Expose evsel__is_offcpu_event()
   for future use' commit
 - Correct spelling mistakes in the commit message (s/is should be/should be/)
 - Add kernel-doc comments to off_cpu_dump(), and comments to the empty
   if block
 - Add some comments to off-cpu test
 - Delete an unused variable 'timestamp' in off_cpu_dump()

Changes in v7:
 - Make off-cpu event system-wide
 - Use strtoull instead of strtoul
 - Delete unused variable such as sample_id, and sample_type
 - Use i as index to update BPF perf_event map
 - MAX_OFFCPU_LEN 128 is too big, make it smaller.
 - Delete some bound check as it's always guaranteed
 - Do not set ip_pos in BPF
 - Add a new field for storing stack traces in the tstamp map
 - Dump the off-cpu sample directly or save it in the off_cpu map, not both
 - Delete the sample_type_off_cpu check
 - Use __set_off_cpu_sample() to parse samples instead of a two-pass parsing

Changes in v6:
 - Make patches bisectable

Changes in v5:
 - Delete unnecessary copy in BPF program
 - Remove sample_embed from perf header, hard code off-cpu stuff instead
 - Move evsel__is_offcpu_event() to evsel.h
 - Minor changes to the test
 - Edit some comments

Changes in v4:
 - Minimize the size of data output by perf_event_output()
 - Keep only one off-cpu event
 - Change off-cpu threshold's unit to microseconds
 - Set a default off-cpu threshold
 - Print the correct error message for the field 'embed' in perf data header

Changes in v3:
 - Add off-cpu-thresh argument
 - Process direct off-cpu samples in post

Changes in v2:
 - Remove unnecessary comments.
 - Rename function off_cpu_change_type to off_cpu_prepare_parse

v1:

As mentioned in: https://bugzilla.kernel.org/show_bug.cgi?id=207323

Currently, off-cpu samples are dumped when perf record is exiting. This
results in off-cpu samples being after the regular samples. This patch
series makes possible dumping off-cpu samples on-the-fly, directly into
perf ring buffer. And it dispatches those samples to the correct format
for perf.data consumers.

Before:
```
     migration/0      21 [000] 27981.041319: 2944637851    cycles:P:  ffffffff90d2e8aa record_times+0xa ([kernel.kallsyms])
            perf  770116 [001] 27981.041375:          1    cycles:P:  ffffffff90ee4960 event_function+0xf0 ([kernel.kallsyms])
            perf  770116 [001] 27981.041377:          1    cycles:P:  ffffffff90c184b1 intel_bts_enable_local+0x31 ([kernel.kallsyms])
            perf  770116 [001] 27981.041379:      51611    cycles:P:  ffffffff91a160b0 native_sched_clock+0x30 ([kernel.kallsyms])
     migration/1      26 [001] 27981.041400: 4227682775    cycles:P:  ffffffff90d06a74 wakeup_preempt+0x44 ([kernel.kallsyms])
     migration/2      32 [002] 27981.041477: 4159401534    cycles:P:  ffffffff90d11993 update_load_avg+0x63 ([kernel.kallsyms])

sshd  708098 [000] 18446744069.414584:     286392 offcpu-time: 
	    79a864f1c8bb ppoll+0x4b (/usr/lib/libc.so.6)
	    585690935cca [unknown] (/usr/bin/sshd)
```

After:
```
            perf  774767 [003] 28178.033444:        497           cycles:P:  ffffffff91a160c3 native_sched_clock+0x43 ([kernel.kallsyms])
            perf  774767 [003] 28178.033445:     399440           cycles:P:  ffffffff91c01f8d nmi_restore+0x25 ([kernel.kallsyms])
         swapper       0 [001] 28178.036639:  376650973           cycles:P:  ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
         swapper       0 [003] 28178.182921:  348779378           cycles:P:  ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
    blueman-tray    1355 [000] 28178.627906:  100184571 offcpu-time: 
	    7528eef1c39d __poll+0x4d (/usr/lib/libc.so.6)
	    7528edf7d8fd [unknown] (/usr/lib/libglib-2.0.so.0.8000.2)
	    7528edf1af95 g_main_context_iteration+0x35 (/usr/lib/libglib-2.0.so.0.8000.2)
	    7528eda4ab86 g_application_run+0x1f6 (/usr/lib/libgio-2.0.so.0.8000.2)
	    7528ee6aa596 [unknown] (/usr/lib/libffi.so.8.1.4)
	    7fff24e862d8 [unknown] ([unknown])


    blueman-tray    1355 [000] 28178.728137:  100187539 offcpu-time: 
	    7528eef1c39d __poll+0x4d (/usr/lib/libc.so.6)
	    7528edf7d8fd [unknown] (/usr/lib/libglib-2.0.so.0.8000.2)
	    7528edf1af95 g_main_context_iteration+0x35 (/usr/lib/libglib-2.0.so.0.8000.2)
	    7528eda4ab86 g_application_run+0x1f6 (/usr/lib/libgio-2.0.so.0.8000.2)
	    7528ee6aa596 [unknown] (/usr/lib/libffi.so.8.1.4)
	    7fff24e862d8 [unknown] ([unknown])


         swapper       0 [000] 28178.463253:  195945410           cycles:P:  ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
     dbus-broker     412 [002] 28178.464855:  376737008           cycles:P:  ffffffff91c000a0 entry_SYSCALL_64+0x20 ([kernel.kallsyms])
```

Howard Chu (10):
  perf evsel: Expose evsel__is_offcpu_event() for future use
  perf record --off-cpu: Parse off-cpu event
  perf record --off-cpu: Preparation of off-cpu BPF program
  perf record --off-cpu: Dump off-cpu samples in BPF
  perf evsel: Assemble offcpu samples
  perf record --off-cpu: Disable perf_event's callchain collection
  perf script: Display off-cpu samples correctly
  perf record --off-cpu: Dump the remaining samples in BPF's stack trace
    map
  perf record --off-cpu: Add --off-cpu-thresh option
  perf test: Add direct off-cpu test

 tools/perf/Documentation/perf-record.txt |   9 ++
 tools/perf/builtin-record.c              |  26 +++++
 tools/perf/builtin-script.c              |   4 +-
 tools/perf/tests/builtin-test.c          |   1 +
 tools/perf/tests/shell/record_offcpu.sh  |  35 ++++++-
 tools/perf/tests/tests.h                 |   1 +
 tools/perf/tests/workloads/Build         |   1 +
 tools/perf/tests/workloads/offcpu.c      |  16 +++
 tools/perf/util/bpf_off_cpu.c            | 118 ++++++++++++++---------
 tools/perf/util/bpf_skel/off_cpu.bpf.c   |  97 ++++++++++++++++++-
 tools/perf/util/evsel.c                  |  41 +++++++-
 tools/perf/util/evsel.h                  |   2 +
 tools/perf/util/off_cpu.h                |   3 +-
 tools/perf/util/record.h                 |   1 +
 14 files changed, 297 insertions(+), 58 deletions(-)
 create mode 100644 tools/perf/tests/workloads/offcpu.c

-- 
2.43.0
Re: [PATCH v14 00/10] perf record --off-cpu: Dump off-cpu samples directly
Posted by Namhyung Kim 1 year ago
On Sun, Dec 15, 2024 at 10:12:10AM -0800, Howard Chu wrote:
> Changes in v14:
>  - Change the internal off_cpu_thresh_us to off_cpu_thresh_ns, i.e. use
>    nsec instead of usec
> 
> Changes in v13:
>  - Move the definition of 'off_cpu_thresh_ns' to the same commit as
>    dumping off-cpu samples in BPF, and give off_cpu_thresh_ns a default
>    value before the --off-cpu-thresh option is parsed.
> 
> Changes in v12:
>  - Restore patches' bisectability, because the ordering of patches has
>    changed.
>  - Change 'us = ms * 1000' to 'us = ms * USEC_PER_MSEC'
> 
> Changes in v11:
>  - Modify the options used in the off-cpu tests, as I changed the unit
>    of the off-cpu threshold to milliseconds.
> 
> Changes in v10:
>  - Move the commit "perf record --off-cpu: Add --off-cpu-thresh option"
>    to where the direct sample feature is completed.
>  - Make --off-cpu-thresh use milliseconds as the unit.
> 
> Changes in v9:
>  - Add documentation for the new option '--off-cpu-thresh', and include
>    an example of its usage in the commit message
>  - Set inherit in evsel__config() to prevent future modifications
>  - Support off-cpu sample data collected by perf before this patch series
> 
> Changes in v8:
>  - Make this series bisectable
>  - Rename off_cpu_thresh to off_cpu_thresh_us and offcpu_thresh (in BPF)
>    to offcpu_thresh_ns for clarity
>  - Add commit messages to 'perf evsel: Expose evsel__is_offcpu_event()
>    for future use' commit
>  - Correct spelling mistakes in the commit message (s/is should be/should be/)
>  - Add kernel-doc comments to off_cpu_dump(), and comments to the empty
>    if block
>  - Add some comments to off-cpu test
>  - Delete an unused variable 'timestamp' in off_cpu_dump()
> 
> Changes in v7:
>  - Make off-cpu event system-wide
>  - Use strtoull instead of strtoul
>  - Delete unused variable such as sample_id, and sample_type
>  - Use i as index to update BPF perf_event map
>  - MAX_OFFCPU_LEN 128 is too big, make it smaller.
>  - Delete some bound check as it's always guaranteed
>  - Do not set ip_pos in BPF
>  - Add a new field for storing stack traces in the tstamp map
>  - Dump the off-cpu sample directly or save it in the off_cpu map, not both
>  - Delete the sample_type_off_cpu check
>  - Use __set_off_cpu_sample() to parse samples instead of a two-pass parsing
> 
> Changes in v6:
>  - Make patches bisectable
> 
> Changes in v5:
>  - Delete unnecessary copy in BPF program
>  - Remove sample_embed from perf header, hard code off-cpu stuff instead
>  - Move evsel__is_offcpu_event() to evsel.h
>  - Minor changes to the test
>  - Edit some comments
> 
> Changes in v4:
>  - Minimize the size of data output by perf_event_output()
>  - Keep only one off-cpu event
>  - Change off-cpu threshold's unit to microseconds
>  - Set a default off-cpu threshold
>  - Print the correct error message for the field 'embed' in perf data header
> 
> Changes in v3:
>  - Add off-cpu-thresh argument
>  - Process direct off-cpu samples in post
> 
> Changes in v2:
>  - Remove unnecessary comments.
>  - Rename function off_cpu_change_type to off_cpu_prepare_parse
> 
> v1:
> 
> As mentioned in: https://bugzilla.kernel.org/show_bug.cgi?id=207323
> 
> Currently, off-cpu samples are dumped when perf record is exiting. This
> results in off-cpu samples being after the regular samples. This patch
> series makes possible dumping off-cpu samples on-the-fly, directly into
> perf ring buffer. And it dispatches those samples to the correct format
> for perf.data consumers.
> 
> Before:
> ```
>      migration/0      21 [000] 27981.041319: 2944637851    cycles:P:  ffffffff90d2e8aa record_times+0xa ([kernel.kallsyms])
>             perf  770116 [001] 27981.041375:          1    cycles:P:  ffffffff90ee4960 event_function+0xf0 ([kernel.kallsyms])
>             perf  770116 [001] 27981.041377:          1    cycles:P:  ffffffff90c184b1 intel_bts_enable_local+0x31 ([kernel.kallsyms])
>             perf  770116 [001] 27981.041379:      51611    cycles:P:  ffffffff91a160b0 native_sched_clock+0x30 ([kernel.kallsyms])
>      migration/1      26 [001] 27981.041400: 4227682775    cycles:P:  ffffffff90d06a74 wakeup_preempt+0x44 ([kernel.kallsyms])
>      migration/2      32 [002] 27981.041477: 4159401534    cycles:P:  ffffffff90d11993 update_load_avg+0x63 ([kernel.kallsyms])
> 
> sshd  708098 [000] 18446744069.414584:     286392 offcpu-time: 
> 	    79a864f1c8bb ppoll+0x4b (/usr/lib/libc.so.6)
> 	    585690935cca [unknown] (/usr/bin/sshd)
> ```
> 
> After:
> ```
>             perf  774767 [003] 28178.033444:        497           cycles:P:  ffffffff91a160c3 native_sched_clock+0x43 ([kernel.kallsyms])
>             perf  774767 [003] 28178.033445:     399440           cycles:P:  ffffffff91c01f8d nmi_restore+0x25 ([kernel.kallsyms])
>          swapper       0 [001] 28178.036639:  376650973           cycles:P:  ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
>          swapper       0 [003] 28178.182921:  348779378           cycles:P:  ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
>     blueman-tray    1355 [000] 28178.627906:  100184571 offcpu-time: 
> 	    7528eef1c39d __poll+0x4d (/usr/lib/libc.so.6)
> 	    7528edf7d8fd [unknown] (/usr/lib/libglib-2.0.so.0.8000.2)
> 	    7528edf1af95 g_main_context_iteration+0x35 (/usr/lib/libglib-2.0.so.0.8000.2)
> 	    7528eda4ab86 g_application_run+0x1f6 (/usr/lib/libgio-2.0.so.0.8000.2)
> 	    7528ee6aa596 [unknown] (/usr/lib/libffi.so.8.1.4)
> 	    7fff24e862d8 [unknown] ([unknown])
> 
> 
>     blueman-tray    1355 [000] 28178.728137:  100187539 offcpu-time: 
> 	    7528eef1c39d __poll+0x4d (/usr/lib/libc.so.6)
> 	    7528edf7d8fd [unknown] (/usr/lib/libglib-2.0.so.0.8000.2)
> 	    7528edf1af95 g_main_context_iteration+0x35 (/usr/lib/libglib-2.0.so.0.8000.2)
> 	    7528eda4ab86 g_application_run+0x1f6 (/usr/lib/libgio-2.0.so.0.8000.2)
> 	    7528ee6aa596 [unknown] (/usr/lib/libffi.so.8.1.4)
> 	    7fff24e862d8 [unknown] ([unknown])
> 
> 
>          swapper       0 [000] 28178.463253:  195945410           cycles:P:  ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
>      dbus-broker     412 [002] 28178.464855:  376737008           cycles:P:  ffffffff91c000a0 entry_SYSCALL_64+0x20 ([kernel.kallsyms])
> ```
> 
> Howard Chu (10):
>   perf evsel: Expose evsel__is_offcpu_event() for future use
>   perf record --off-cpu: Parse off-cpu event
>   perf record --off-cpu: Preparation of off-cpu BPF program
>   perf record --off-cpu: Dump off-cpu samples in BPF
>   perf evsel: Assemble offcpu samples
>   perf record --off-cpu: Disable perf_event's callchain collection
>   perf script: Display off-cpu samples correctly
>   perf record --off-cpu: Dump the remaining samples in BPF's stack trace
>     map
>   perf record --off-cpu: Add --off-cpu-thresh option
>   perf test: Add direct off-cpu test

Thanks for your work and the persistence.

Reviewed-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung

> 
>  tools/perf/Documentation/perf-record.txt |   9 ++
>  tools/perf/builtin-record.c              |  26 +++++
>  tools/perf/builtin-script.c              |   4 +-
>  tools/perf/tests/builtin-test.c          |   1 +
>  tools/perf/tests/shell/record_offcpu.sh  |  35 ++++++-
>  tools/perf/tests/tests.h                 |   1 +
>  tools/perf/tests/workloads/Build         |   1 +
>  tools/perf/tests/workloads/offcpu.c      |  16 +++
>  tools/perf/util/bpf_off_cpu.c            | 118 ++++++++++++++---------
>  tools/perf/util/bpf_skel/off_cpu.bpf.c   |  97 ++++++++++++++++++-
>  tools/perf/util/evsel.c                  |  41 +++++++-
>  tools/perf/util/evsel.h                  |   2 +
>  tools/perf/util/off_cpu.h                |   3 +-
>  tools/perf/util/record.h                 |   1 +
>  14 files changed, 297 insertions(+), 58 deletions(-)
>  create mode 100644 tools/perf/tests/workloads/offcpu.c
> 
> -- 
> 2.43.0
>
Re: [PATCH v14 00/10] perf record --off-cpu: Dump off-cpu samples directly
Posted by Arnaldo Carvalho de Melo 10 months, 1 week ago
On Mon, Dec 16, 2024 at 12:11:07PM -0800, Namhyung Kim wrote:
> On Sun, Dec 15, 2024 at 10:12:10AM -0800, Howard Chu wrote:
> > As mentioned in: https://bugzilla.kernel.org/show_bug.cgi?id=207323
> > 
> > Currently, off-cpu samples are dumped when perf record is exiting. This
> > results in off-cpu samples being after the regular samples. This patch
> > series makes possible dumping off-cpu samples on-the-fly, directly into
> > perf ring buffer. And it dispatches those samples to the correct format
> > for perf.data consumers.
> > 
> > Before:
> > ```
> >      migration/0      21 [000] 27981.041319: 2944637851    cycles:P:  ffffffff90d2e8aa record_times+0xa ([kernel.kallsyms])
> >             perf  770116 [001] 27981.041375:          1    cycles:P:  ffffffff90ee4960 event_function+0xf0 ([kernel.kallsyms])
> >             perf  770116 [001] 27981.041377:          1    cycles:P:  ffffffff90c184b1 intel_bts_enable_local+0x31 ([kernel.kallsyms])
> >             perf  770116 [001] 27981.041379:      51611    cycles:P:  ffffffff91a160b0 native_sched_clock+0x30 ([kernel.kallsyms])
> >      migration/1      26 [001] 27981.041400: 4227682775    cycles:P:  ffffffff90d06a74 wakeup_preempt+0x44 ([kernel.kallsyms])
> >      migration/2      32 [002] 27981.041477: 4159401534    cycles:P:  ffffffff90d11993 update_load_avg+0x63 ([kernel.kallsyms])
> > 
> > sshd  708098 [000] 18446744069.414584:     286392 offcpu-time: 
> > 	    79a864f1c8bb ppoll+0x4b (/usr/lib/libc.so.6)
> > 	    585690935cca [unknown] (/usr/bin/sshd)
> > ```
> > 
> > After:
> > ```
> >             perf  774767 [003] 28178.033444:        497           cycles:P:  ffffffff91a160c3 native_sched_clock+0x43 ([kernel.kallsyms])
> >             perf  774767 [003] 28178.033445:     399440           cycles:P:  ffffffff91c01f8d nmi_restore+0x25 ([kernel.kallsyms])
> >          swapper       0 [001] 28178.036639:  376650973           cycles:P:  ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
> >          swapper       0 [003] 28178.182921:  348779378           cycles:P:  ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
> >     blueman-tray    1355 [000] 28178.627906:  100184571 offcpu-time: 
> > 	    7528eef1c39d __poll+0x4d (/usr/lib/libc.so.6)
> > 	    7528edf7d8fd [unknown] (/usr/lib/libglib-2.0.so.0.8000.2)
> > 	    7528edf1af95 g_main_context_iteration+0x35 (/usr/lib/libglib-2.0.so.0.8000.2)
> > 	    7528eda4ab86 g_application_run+0x1f6 (/usr/lib/libgio-2.0.so.0.8000.2)
> > 	    7528ee6aa596 [unknown] (/usr/lib/libffi.so.8.1.4)
> > 	    7fff24e862d8 [unknown] ([unknown])
> > 
> > 
> >     blueman-tray    1355 [000] 28178.728137:  100187539 offcpu-time: 
> > 	    7528eef1c39d __poll+0x4d (/usr/lib/libc.so.6)
> > 	    7528edf7d8fd [unknown] (/usr/lib/libglib-2.0.so.0.8000.2)
> > 	    7528edf1af95 g_main_context_iteration+0x35 (/usr/lib/libglib-2.0.so.0.8000.2)
> > 	    7528eda4ab86 g_application_run+0x1f6 (/usr/lib/libgio-2.0.so.0.8000.2)
> > 	    7528ee6aa596 [unknown] (/usr/lib/libffi.so.8.1.4)
> > 	    7fff24e862d8 [unknown] ([unknown])
> > 
> > 
> >          swapper       0 [000] 28178.463253:  195945410           cycles:P:  ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
> >      dbus-broker     412 [002] 28178.464855:  376737008           cycles:P:  ffffffff91c000a0 entry_SYSCALL_64+0x20 ([kernel.kallsyms])
> > ```

So I'm trying to reproduce your results, with:

root@number:~# perf record --off-cpu sleep 0.6
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
root@number:~# perf trace -e *sleep sleep 0.6
     0.000 (600.137 ms): sleep/466769 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 600000000 }, rmtp: 0x7ffe024dea30) = 0
root@number:~#

So a simple test that crosses that 500ms threshold and thus should
produce an offcpu-time sample, right?

root@number:~# perf record --off-cpu sleep 0.6
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
root@number:~# perf evlist
cpu_atom/cycles/P
cpu_core/cycles/P
offcpu-time
dummy:u
root@number:~#

root@number:~# perf script | grep offcpu
root@number:~#

What I have:

⬢ [acme@toolbox perf-tools-next]$ git log --oneline -11
cd59081880e89df8 (HEAD -> perf-tools-next) perf test: Add direct off-cpu test
56cbd794c0c46ba9 perf record --off-cpu: Add --off-cpu-thresh option
28d9b19c5455556f perf record --off-cpu: Dump the remaining samples in BPF's stack trace map
2bc05b02743b50a7 perf script: Display off-cpu samples correctly
bfa457a621596947 perf record --off-cpu: Disable perf_event's callchain collection
eca732cc42d20266 perf evsel: Assemble offcpu samples
74ce50e40c569e90 perf record --off-cpu: Dump off-cpu samples in BPF
e75f8ce63bfa6cb9 perf record --off-cpu: Preparation of off-cpu BPF program
0ffab9d26971c91c perf record --off-cpu: Parse off-cpu event
efc3fe2070853b7d perf evsel: Expose evsel__is_offcpu_event() for future use
357b965deba9fb71 (perf-tools-next/perf-tools-next) perf stat: Changes to event name uniquification
⬢ [acme@toolbox perf-tools-next]$

The only change was the one I mentioned about 'struct __stack'
replacing 'struct stack' to avoid the clash with the vmlinux.h generated
using GEN_VMLINUX_H=1.

Perhaps that is because my test is not being system wide, if I use '-a'
we get:

root@number:~# perf record --off-cpu -a sleep 0.6
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.868 MB perf.data (443 samples) ]
root@number:~# perf script | grep offcpu
         firefox    4010 [004] 16746.571035:  518930922       offcpu-time: 
           sleep  466955 [019] 16746.629875:  600279888       offcpu-time: 
            perf  466953 [027] 16746.630541:  601731496       offcpu-time: 
  IPC I/O Parent    4039 [000] 18446744069.414584:      20781       offcpu-time: 
 TaskCon~ller #3    4454 [000] 18446744069.414584:     117724       offcpu-time: 
 Isolated Servic    4253 [000] 18446744069.414584:  387311029       offcpu-time: 
  IPC I/O Parent    4039 [000] 18446744069.414584:  520075456       offcpu-time: 
           Timer    4040 [000] 18446744069.414584:  581156755       offcpu-time: 
           Timer    4380 [000] 18446744069.414584:   22520305       offcpu-time: 
 sqldb:c~lite #6    4327 [000] 18446744069.414584:    2073583       offcpu-time: 
 Isolated Web Co  156033 [000] 18446744069.414584:      23277       offcpu-time: 
 sqldb:c~lite #6    4327 [000] 18446744069.414584:      24974       offcpu-time: 
     dbus-broker    2526 [000] 18446744069.414584:    1175447       offcpu-time: 
 IPDL Background    4043 [000] 18446744069.414584:      12055       offcpu-time: 
 glean.dispatche    4037 [000] 18446744069.414584:      62321       offcpu-time: 
          podman   12139 [000] 18446744069.414584:  502656763       offcpu-time: 
     gnome-shell    2709 [000] 18446744069.414584:  579190477       offcpu-time: 
          podman    3936 [000] 18446744069.414584:   60777801       offcpu-time: 
      KMS thread    2739 [000] 18446744069.414584:   14578641       offcpu-time: 
   IPC I/O Child    7113 [000] 18446744069.414584:  386343655       offcpu-time: 
    WaylandProxy    4027 [000] 18446744069.414584:   46989879       offcpu-time: 
           Timer  156048 [000] 18446744069.414584:  404662799       offcpu-time: 
  URL Classifier    4234 [000] 18446744069.414584:      55337       offcpu-time: 
         firefox    4010 [000] 18446744069.414584:      57229       offcpu-time: 
 Isolated Web Co    4261 [000] 18446744069.414584:      26671       offcpu-time: 
          podman   12130 [000] 18446744069.414584:  450853233       offcpu-time: 
         firefox    4010 [000] 18446744069.414584:      56800       offcpu-time: 
          podman    3951 [000] 18446744069.414584:  502100761       offcpu-time: 
          podman   12131 [000] 18446744069.414584:  501954466       offcpu-time: 
           Timer    4306 [000] 18446744069.414584:  360125199       offcpu-time: 
 RemoteLzyStream    4230 [000] 18446744069.414584:     172647       offcpu-time: 
 TaskCon~ller #1    4452 [000] 18446744069.414584:      38083       offcpu-time: 
 Isolated Web Co  156033 [000] 18446744069.414584:  496574141       offcpu-time: 
   Socket Thread    4042 [000] 18446744069.414584:  295030041       offcpu-time: 
 IPDL Background    4043 [000] 18446744069.414584:      13990       offcpu-time: 
 Inotify~tThread    7430 [000] 18446744069.414584:  300841349       offcpu-time: 
           Timer    7186 [000] 18446744069.414584:     117536       offcpu-time: 
 Backgro~ol #163  466653 [000] 18446744069.414584:      29312       offcpu-time: 
 Isolated Web Co    7107 [000] 18446744069.414584:  418466228       offcpu-time: 
 Isolated Web Co  156033 [000] 18446744069.414584:      22049       offcpu-time: 
   Socket Thread    4295 [000] 18446744069.414584:  309685524       offcpu-time: 
          podman   12130 [000] 18446744069.414584:   60960566       offcpu-time: 
   Socket Thread    4042 [000] 18446744069.414584:       9726       offcpu-time: 
 glean.dispatche    4037 [000] 18446744069.414584:      30560       offcpu-time: 
 TaskCon~ller #0    4451 [000] 18446744069.414584:    3661284       offcpu-time: 
  URL Classifier    4234 [000] 18446744069.414584:      18112       offcpu-time: 
 Isolated Web Co    4261 [000] 18446744069.414584:  380745777       offcpu-time: 
   IPC I/O Child    7039 [000] 18446744069.414584:   54365677       offcpu-time: 
   IPC I/O Child    4292 [000] 18446744069.414584:  386687065       offcpu-time: 
 glean.dispatche    4037 [000] 18446744069.414584:      48091       offcpu-time: 
         firefox    4010 [000] 18446744069.414584:  570509826       offcpu-time: 
          podman    3936 [000] 18446744069.414584:  451176929       offcpu-time: 
 Isolated Web Co    7034 [000] 18446744069.414584:   57005742       offcpu-time: 
 gnome-terminal-    3583 [000] 18446744069.414584:   97027967       offcpu-time: 
 sqldb:c~lite #6    4327 [000] 18446744069.414584:      27118       offcpu-time: 
 TaskCon~ller #1    4452 [000] 18446744069.414584:      20008       offcpu-time: 
  IPC I/O Parent    4039 [000] 18446744069.414584:      58205       offcpu-time: 
           Timer  157669 [000] 18446744069.414584:     180649       offcpu-time: 
   IPC I/O Child    4265 [000] 18446744069.414584:  387442919       offcpu-time: 
         firefox    4010 [000] 18446744069.414584:     215006       offcpu-time: 
          podman    3949 [000] 18446744069.414584:  502715978       offcpu-time: 
         firefox    4010 [000] 18446744069.414584:      94719       offcpu-time: 
           gdbus    4032 [000] 18446744069.414584:    1330625       offcpu-time: 
 Worker Launcher    4348 [000] 18446744069.414584:   83393104       offcpu-time: 
   Socket Thread    4042 [000] 18446744069.414584:      48713       offcpu-time: 
 IPDL Background    4043 [000] 18446744069.414584:  386560332       offcpu-time: 
           Timer    7123 [000] 18446744069.414584:  422634111       offcpu-time: 
 Isolated Web Co    4358 [000] 18446744069.414584:   18063623       offcpu-time: 
 glean.dispatche    4037 [000] 18446744069.414584:  302249112       offcpu-time: 
  IPC I/O Parent    4039 [000] 18446744069.414584:      16204       offcpu-time: 
 Inotify~tThread    7430 [000] 18446744069.414584:  300780100       offcpu-time: 
 RemoteLzyStream    4230 [000] 18446744069.414584:      14778       offcpu-time: 
      DOM Worker   10133 [000] 18446744069.414584:     331946       offcpu-time: 
  URL Classifier    4234 [000] 18446744069.414584:    5003953       offcpu-time: 
           gdbus    2729 [000] 18446744069.414584:     331119       offcpu-time: 
 TaskCon~ller #2    4453 [000] 18446744069.414584:     125448       offcpu-time: 
         firefox    4010 [000] 18446744069.414584:      40894       offcpu-time: 
   IPC I/O Child  156038 [000] 18446744069.414584:    4079439       offcpu-time: 
      DOM Worker   10228 [000] 18446744069.414584:    3397046       offcpu-time: 
root@number:~# 

With 'offcpu-time' samples intermixed with the other cycles events:

<SNIP>
 Isolated Web Co    4261 [010] 16746.570827:     287731 cpu_core/cycles/P:  ffffffff8b145b24 __get_user_nocheck_8+0x4 ([kernel.kallsyms])
 Isolated Web Co    7107 [006] 16746.570834:     235654 cpu_core/cycles/P:      7f08bf336592 pthread_mutex_trylock@@GLIBC_2.34+0xe2 (/usr/lib64/libc.so.6)
         firefox    4010 [004] 16746.571035:  518930922       offcpu-time: 
            7f2dab89eda9 dlsym@@GLIBC_2.34+0x59 (/usr/lib64/libc.so.6)
            7f2dab8a17f9 pthread_cond_broadcast@@GLIBC_2.3.2+0x399 (/usr/lib64/libc.so.6)
            556309ec5bca mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&)+0xa (/usr/lib64/firefox/firefox)

         swapper       0 [019] 16746.571151:     315218 cpu_atom/cycles/P:  ffffffff8adf4ad0 menu_select+0x1b0 ([kernel.kallsyms])
<SNIP>
 Isolated Web Co    7107 [010] 16746.607631:     288723 cpu_core/cycles/P:      34d114ccce4a [unknown] (/tmp/perf-7107.map)
         swapper       0 [008] 16746.622828:     136175 cpu_core/cycles/P:  ffffffff8b213d7a intel_idle+0x6a ([kernel.kallsyms])
           Timer    4306 [008] 16746.622977:     129681 cpu_core/cycles/P:  ffffffff8b134e46 rb_erase+0x46 ([kernel.kallsyms])
         swapper       0 [000] 16746.623018:     298081 cpu_core/cycles/P:  ffffffff8a13d3ea update_rq_clock+0x7a ([kernel.kallsyms])
         swapper       0 [019] 16746.629839:     256701 cpu_atom/cycles/P:  ffffffff8a148455 __dequeue_entity+0x5 ([kernel.kallsyms])
           sleep  466955 [019] 16746.629875:  600279888       offcpu-time: 
            7f6feb8bf497 clock_nanosleep@GLIBC_2.2.5+0x27 (/usr/lib64/libc.so.6)
            5608d1601a9f [unknown] (/usr/bin/sleep)
            7f6feb805088 __libc_start_call_main+0x78 (/usr/lib64/libc.so.6)
            7f6feb80514b __libc_start_main@@GLIBC_2.34+0x8b (/usr/lib64/libc.so.6)
            5608d1601c05 [unknown] (/usr/bin/sleep)

           sleep  466955 [006] 16746.630223:     225700 cpu_core/cycles/P:  ffffffff8a40ec5a folio_remove_rmap_ptes+0x1a ([kernel.kallsyms])
         :466955  466955 [006] 16746.630442:     225700 cpu_core/cycles/P:  ffffffff8b1387eb timerqueue_add+0x3b ([kernel.kallsyms])
            perf  466953 [027] 16746.630541:  601731496       offcpu-time: 
            7f818451c1fd __poll+0x4d (/usr/lib64/libc.so.6)
                  49606e fdarray__poll+0x2e (/home/acme/bin/perf)
                  429eed __cmd_record.constprop.0+0x18ad (/home/acme/bin/perf)
                  42c77e cmd_record+0xd9e (/home/acme/bin/perf)
                  495700 run_builtin+0x70 (/home/acme/bin/perf)
                  495a1b handle_internal_command+0xab (/home/acme/bin/perf)
                  413473 main+0x2f3 (/home/acme/bin/perf)
            7f8184438088 __libc_start_call_main+0x78 (/usr/lib64/libc.so.6)
            7f818443814b __libc_start_main@@GLIBC_2.34+0x8b (/usr/lib64/libc.so.6)
                  413ac5 _start+0x25 (/home/acme/bin/perf)

            perf  466953 [000] 16746.630831:     298081 cpu_core/cycles/P:  ffffffff8b40012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
         swapper       0 [019] 16746.630985:     250096 cpu_atom/cycles/P:  ffffffff8b213d7a intel_idle+0x6a ([kernel.kallsyms])
            perf  466953 [010] 16746.632045:     286467 cpu_core/cycles/P:  ffffffff8a1c3f21 __rcu_read_lock+0x11 ([kernel.kallsyms])
  IPC I/O Parent    4039 [000] 18446744069.414584:      20781       offcpu-time: 
            7f2dab89f000 __GI___pthread_enable_asynccancel+0x20 (/usr/lib64/libc.so.6)
            556309ebede4 moz_xmalloc+0x104 (/usr/lib64/firefox/firefox)
           3bbdf3d834830 [unknown] ([unknown])


And if I install the 'coreutils' debuginfo I get the sleep backtraces
resolved, nice, but again only when I'm running in system wide (-a)
mode.

 Isolated Web Co    7167 [008] 16915.115432:     188682 cpu_core/cycles/P:      7fe85f558260 nsBaseHashtable<nsStringHashKey, mozilla::UniquePtr<nsBaseHashtable<nsIntegralHashKey<unsigned lo>
         swapper       0 [013] 16915.126739:     120783 cpu_core/cycles/P:  ffffffff8a15221d enqueue_task_fair+0x22d ([kernel.kallsyms])
           sleep  467147 [013] 16915.126780:  600336127       offcpu-time: 
            7ff010611497 clock_nanosleep@GLIBC_2.2.5+0x27 (/usr/lib64/libc.so.6)
            563f045a3a9f main+0x4af (/usr/bin/sleep)
            7ff010557088 __libc_start_call_main+0x78 (/usr/lib64/libc.so.6)
            7ff01055714b __libc_start_main@@GLIBC_2.34+0x8b (/usr/lib64/libc.so.6)
            563f045a3c05 _start+0x25 (/usr/bin/sleep)

           sleep  467147 [013] 16915.126861:     120783 cpu_core/cycles/P:  ffffffff8a3ef391 unmap_page_range+0x661 ([kernel.kallsyms])
           sleep  467147 [013] 16915.126973:     139106 cpu_core/cycles/P:  ffffffff8a3a5b4d folios_put_refs+0x8d ([kernel.kallsyms])
         swapper       0 [013] 16915.127102:     160383 cpu_core/cycles/P:  ffffffff8a14b5c4 sched_balance_update_blocked_averages+0xb4 ([kernel.kallsyms])

When running just one workload:

root@number:~# for a in `seq 10` ; do perf record --off-cpu sleep 1s ; perf script | grep offcpu-time ; done
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (13 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (16 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (14 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (12 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
root@number:~#

Have you tested it like that? Is this a known issue/current limitation?

I'm reading the patches to see I understand this better.

- Arnaldo
Re: [PATCH v14 00/10] perf record --off-cpu: Dump off-cpu samples directly
Posted by Howard Chu 10 months, 1 week ago
Hello Arnaldo,

On Tue, Feb 11, 2025 at 10:02 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Mon, Dec 16, 2024 at 12:11:07PM -0800, Namhyung Kim wrote:
> > On Sun, Dec 15, 2024 at 10:12:10AM -0800, Howard Chu wrote:
> > > As mentioned in: https://bugzilla.kernel.org/show_bug.cgi?id=207323
> > >
> > > Currently, off-cpu samples are dumped when perf record is exiting. This
> > > results in off-cpu samples being after the regular samples. This patch
> > > series makes possible dumping off-cpu samples on-the-fly, directly into
> > > perf ring buffer. And it dispatches those samples to the correct format
> > > for perf.data consumers.
> > >
> > > Before:
> > > ```
> > >      migration/0      21 [000] 27981.041319: 2944637851    cycles:P:  ffffffff90d2e8aa record_times+0xa ([kernel.kallsyms])
> > >             perf  770116 [001] 27981.041375:          1    cycles:P:  ffffffff90ee4960 event_function+0xf0 ([kernel.kallsyms])
> > >             perf  770116 [001] 27981.041377:          1    cycles:P:  ffffffff90c184b1 intel_bts_enable_local+0x31 ([kernel.kallsyms])
> > >             perf  770116 [001] 27981.041379:      51611    cycles:P:  ffffffff91a160b0 native_sched_clock+0x30 ([kernel.kallsyms])
> > >      migration/1      26 [001] 27981.041400: 4227682775    cycles:P:  ffffffff90d06a74 wakeup_preempt+0x44 ([kernel.kallsyms])
> > >      migration/2      32 [002] 27981.041477: 4159401534    cycles:P:  ffffffff90d11993 update_load_avg+0x63 ([kernel.kallsyms])
> > >
> > > sshd  708098 [000] 18446744069.414584:     286392 offcpu-time:
> > >         79a864f1c8bb ppoll+0x4b (/usr/lib/libc.so.6)
> > >         585690935cca [unknown] (/usr/bin/sshd)
> > > ```
> > >
> > > After:
> > > ```
> > >             perf  774767 [003] 28178.033444:        497           cycles:P:  ffffffff91a160c3 native_sched_clock+0x43 ([kernel.kallsyms])
> > >             perf  774767 [003] 28178.033445:     399440           cycles:P:  ffffffff91c01f8d nmi_restore+0x25 ([kernel.kallsyms])
> > >          swapper       0 [001] 28178.036639:  376650973           cycles:P:  ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
> > >          swapper       0 [003] 28178.182921:  348779378           cycles:P:  ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
> > >     blueman-tray    1355 [000] 28178.627906:  100184571 offcpu-time:
> > >         7528eef1c39d __poll+0x4d (/usr/lib/libc.so.6)
> > >         7528edf7d8fd [unknown] (/usr/lib/libglib-2.0.so.0.8000.2)
> > >         7528edf1af95 g_main_context_iteration+0x35 (/usr/lib/libglib-2.0.so.0.8000.2)
> > >         7528eda4ab86 g_application_run+0x1f6 (/usr/lib/libgio-2.0.so.0.8000.2)
> > >         7528ee6aa596 [unknown] (/usr/lib/libffi.so.8.1.4)
> > >         7fff24e862d8 [unknown] ([unknown])
> > >
> > >
> > >     blueman-tray    1355 [000] 28178.728137:  100187539 offcpu-time:
> > >         7528eef1c39d __poll+0x4d (/usr/lib/libc.so.6)
> > >         7528edf7d8fd [unknown] (/usr/lib/libglib-2.0.so.0.8000.2)
> > >         7528edf1af95 g_main_context_iteration+0x35 (/usr/lib/libglib-2.0.so.0.8000.2)
> > >         7528eda4ab86 g_application_run+0x1f6 (/usr/lib/libgio-2.0.so.0.8000.2)
> > >         7528ee6aa596 [unknown] (/usr/lib/libffi.so.8.1.4)
> > >         7fff24e862d8 [unknown] ([unknown])
> > >
> > >
> > >          swapper       0 [000] 28178.463253:  195945410           cycles:P:  ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
> > >      dbus-broker     412 [002] 28178.464855:  376737008           cycles:P:  ffffffff91c000a0 entry_SYSCALL_64+0x20 ([kernel.kallsyms])
> > > ```
>
> So I'm trying to reproduce your results, with:
>
> root@number:~# perf record --off-cpu sleep 0.6
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> root@number:~# perf trace -e *sleep sleep 0.6
>      0.000 (600.137 ms): sleep/466769 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 600000000 }, rmtp: 0x7ffe024dea30) = 0
> root@number:~#
>
> So a simple test that crosses that 500ms threshold and thus should
> produce an offcpu-time sample, right?
>
> root@number:~# perf record --off-cpu sleep 0.6
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> root@number:~# perf evlist
> cpu_atom/cycles/P
> cpu_core/cycles/P
> offcpu-time
> dummy:u
> root@number:~#
>
> root@number:~# perf script | grep offcpu
> root@number:~#
>
> What I have:
>
> ⬢ [acme@toolbox perf-tools-next]$ git log --oneline -11
> cd59081880e89df8 (HEAD -> perf-tools-next) perf test: Add direct off-cpu test
> 56cbd794c0c46ba9 perf record --off-cpu: Add --off-cpu-thresh option
> 28d9b19c5455556f perf record --off-cpu: Dump the remaining samples in BPF's stack trace map
> 2bc05b02743b50a7 perf script: Display off-cpu samples correctly
> bfa457a621596947 perf record --off-cpu: Disable perf_event's callchain collection
> eca732cc42d20266 perf evsel: Assemble offcpu samples
> 74ce50e40c569e90 perf record --off-cpu: Dump off-cpu samples in BPF
> e75f8ce63bfa6cb9 perf record --off-cpu: Preparation of off-cpu BPF program
> 0ffab9d26971c91c perf record --off-cpu: Parse off-cpu event
> efc3fe2070853b7d perf evsel: Expose evsel__is_offcpu_event() for future use
> 357b965deba9fb71 (perf-tools-next/perf-tools-next) perf stat: Changes to event name uniquification
> ⬢ [acme@toolbox perf-tools-next]$
>
> The only change was the one I mentioned about 'struct __stack'
> replacing 'struct stack' to avoid the clash with the vmlinux.h generated
> using GEN_VMLINUX_H=1.
>
> Perhaps that is because my test is not being system wide, if I use '-a'
> we get:
>
> root@number:~# perf record --off-cpu -a sleep 0.6
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 1.868 MB perf.data (443 samples) ]
> root@number:~# perf script | grep offcpu
>          firefox    4010 [004] 16746.571035:  518930922       offcpu-time:
>            sleep  466955 [019] 16746.629875:  600279888       offcpu-time:
>             perf  466953 [027] 16746.630541:  601731496       offcpu-time:
>   IPC I/O Parent    4039 [000] 18446744069.414584:      20781       offcpu-time:
>  TaskCon~ller #3    4454 [000] 18446744069.414584:     117724       offcpu-time:
>  Isolated Servic    4253 [000] 18446744069.414584:  387311029       offcpu-time:
>   IPC I/O Parent    4039 [000] 18446744069.414584:  520075456       offcpu-time:
>            Timer    4040 [000] 18446744069.414584:  581156755       offcpu-time:
>            Timer    4380 [000] 18446744069.414584:   22520305       offcpu-time:
>  sqldb:c~lite #6    4327 [000] 18446744069.414584:    2073583       offcpu-time:
>  Isolated Web Co  156033 [000] 18446744069.414584:      23277       offcpu-time:
>  sqldb:c~lite #6    4327 [000] 18446744069.414584:      24974       offcpu-time:
>      dbus-broker    2526 [000] 18446744069.414584:    1175447       offcpu-time:
>  IPDL Background    4043 [000] 18446744069.414584:      12055       offcpu-time:
>  glean.dispatche    4037 [000] 18446744069.414584:      62321       offcpu-time:
>           podman   12139 [000] 18446744069.414584:  502656763       offcpu-time:
>      gnome-shell    2709 [000] 18446744069.414584:  579190477       offcpu-time:
>           podman    3936 [000] 18446744069.414584:   60777801       offcpu-time:
>       KMS thread    2739 [000] 18446744069.414584:   14578641       offcpu-time:
>    IPC I/O Child    7113 [000] 18446744069.414584:  386343655       offcpu-time:
>     WaylandProxy    4027 [000] 18446744069.414584:   46989879       offcpu-time:
>            Timer  156048 [000] 18446744069.414584:  404662799       offcpu-time:
>   URL Classifier    4234 [000] 18446744069.414584:      55337       offcpu-time:
>          firefox    4010 [000] 18446744069.414584:      57229       offcpu-time:
>  Isolated Web Co    4261 [000] 18446744069.414584:      26671       offcpu-time:
>           podman   12130 [000] 18446744069.414584:  450853233       offcpu-time:
>          firefox    4010 [000] 18446744069.414584:      56800       offcpu-time:
>           podman    3951 [000] 18446744069.414584:  502100761       offcpu-time:
>           podman   12131 [000] 18446744069.414584:  501954466       offcpu-time:
>            Timer    4306 [000] 18446744069.414584:  360125199       offcpu-time:
>  RemoteLzyStream    4230 [000] 18446744069.414584:     172647       offcpu-time:
>  TaskCon~ller #1    4452 [000] 18446744069.414584:      38083       offcpu-time:
>  Isolated Web Co  156033 [000] 18446744069.414584:  496574141       offcpu-time:
>    Socket Thread    4042 [000] 18446744069.414584:  295030041       offcpu-time:
>  IPDL Background    4043 [000] 18446744069.414584:      13990       offcpu-time:
>  Inotify~tThread    7430 [000] 18446744069.414584:  300841349       offcpu-time:
>            Timer    7186 [000] 18446744069.414584:     117536       offcpu-time:
>  Backgro~ol #163  466653 [000] 18446744069.414584:      29312       offcpu-time:
>  Isolated Web Co    7107 [000] 18446744069.414584:  418466228       offcpu-time:
>  Isolated Web Co  156033 [000] 18446744069.414584:      22049       offcpu-time:
>    Socket Thread    4295 [000] 18446744069.414584:  309685524       offcpu-time:
>           podman   12130 [000] 18446744069.414584:   60960566       offcpu-time:
>    Socket Thread    4042 [000] 18446744069.414584:       9726       offcpu-time:
>  glean.dispatche    4037 [000] 18446744069.414584:      30560       offcpu-time:
>  TaskCon~ller #0    4451 [000] 18446744069.414584:    3661284       offcpu-time:
>   URL Classifier    4234 [000] 18446744069.414584:      18112       offcpu-time:
>  Isolated Web Co    4261 [000] 18446744069.414584:  380745777       offcpu-time:
>    IPC I/O Child    7039 [000] 18446744069.414584:   54365677       offcpu-time:
>    IPC I/O Child    4292 [000] 18446744069.414584:  386687065       offcpu-time:
>  glean.dispatche    4037 [000] 18446744069.414584:      48091       offcpu-time:
>          firefox    4010 [000] 18446744069.414584:  570509826       offcpu-time:
>           podman    3936 [000] 18446744069.414584:  451176929       offcpu-time:
>  Isolated Web Co    7034 [000] 18446744069.414584:   57005742       offcpu-time:
>  gnome-terminal-    3583 [000] 18446744069.414584:   97027967       offcpu-time:
>  sqldb:c~lite #6    4327 [000] 18446744069.414584:      27118       offcpu-time:
>  TaskCon~ller #1    4452 [000] 18446744069.414584:      20008       offcpu-time:
>   IPC I/O Parent    4039 [000] 18446744069.414584:      58205       offcpu-time:
>            Timer  157669 [000] 18446744069.414584:     180649       offcpu-time:
>    IPC I/O Child    4265 [000] 18446744069.414584:  387442919       offcpu-time:
>          firefox    4010 [000] 18446744069.414584:     215006       offcpu-time:
>           podman    3949 [000] 18446744069.414584:  502715978       offcpu-time:
>          firefox    4010 [000] 18446744069.414584:      94719       offcpu-time:
>            gdbus    4032 [000] 18446744069.414584:    1330625       offcpu-time:
>  Worker Launcher    4348 [000] 18446744069.414584:   83393104       offcpu-time:
>    Socket Thread    4042 [000] 18446744069.414584:      48713       offcpu-time:
>  IPDL Background    4043 [000] 18446744069.414584:  386560332       offcpu-time:
>            Timer    7123 [000] 18446744069.414584:  422634111       offcpu-time:
>  Isolated Web Co    4358 [000] 18446744069.414584:   18063623       offcpu-time:
>  glean.dispatche    4037 [000] 18446744069.414584:  302249112       offcpu-time:
>   IPC I/O Parent    4039 [000] 18446744069.414584:      16204       offcpu-time:
>  Inotify~tThread    7430 [000] 18446744069.414584:  300780100       offcpu-time:
>  RemoteLzyStream    4230 [000] 18446744069.414584:      14778       offcpu-time:
>       DOM Worker   10133 [000] 18446744069.414584:     331946       offcpu-time:
>   URL Classifier    4234 [000] 18446744069.414584:    5003953       offcpu-time:
>            gdbus    2729 [000] 18446744069.414584:     331119       offcpu-time:
>  TaskCon~ller #2    4453 [000] 18446744069.414584:     125448       offcpu-time:
>          firefox    4010 [000] 18446744069.414584:      40894       offcpu-time:
>    IPC I/O Child  156038 [000] 18446744069.414584:    4079439       offcpu-time:
>       DOM Worker   10228 [000] 18446744069.414584:    3397046       offcpu-time:
> root@number:~#
>
> With 'offcpu-time' samples intermixed with the other cycles events:
>
> <SNIP>
>  Isolated Web Co    4261 [010] 16746.570827:     287731 cpu_core/cycles/P:  ffffffff8b145b24 __get_user_nocheck_8+0x4 ([kernel.kallsyms])
>  Isolated Web Co    7107 [006] 16746.570834:     235654 cpu_core/cycles/P:      7f08bf336592 pthread_mutex_trylock@@GLIBC_2.34+0xe2 (/usr/lib64/libc.so.6)
>          firefox    4010 [004] 16746.571035:  518930922       offcpu-time:
>             7f2dab89eda9 dlsym@@GLIBC_2.34+0x59 (/usr/lib64/libc.so.6)
>             7f2dab8a17f9 pthread_cond_broadcast@@GLIBC_2.3.2+0x399 (/usr/lib64/libc.so.6)
>             556309ec5bca mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&)+0xa (/usr/lib64/firefox/firefox)
>
>          swapper       0 [019] 16746.571151:     315218 cpu_atom/cycles/P:  ffffffff8adf4ad0 menu_select+0x1b0 ([kernel.kallsyms])
> <SNIP>
>  Isolated Web Co    7107 [010] 16746.607631:     288723 cpu_core/cycles/P:      34d114ccce4a [unknown] (/tmp/perf-7107.map)
>          swapper       0 [008] 16746.622828:     136175 cpu_core/cycles/P:  ffffffff8b213d7a intel_idle+0x6a ([kernel.kallsyms])
>            Timer    4306 [008] 16746.622977:     129681 cpu_core/cycles/P:  ffffffff8b134e46 rb_erase+0x46 ([kernel.kallsyms])
>          swapper       0 [000] 16746.623018:     298081 cpu_core/cycles/P:  ffffffff8a13d3ea update_rq_clock+0x7a ([kernel.kallsyms])
>          swapper       0 [019] 16746.629839:     256701 cpu_atom/cycles/P:  ffffffff8a148455 __dequeue_entity+0x5 ([kernel.kallsyms])
>            sleep  466955 [019] 16746.629875:  600279888       offcpu-time:
>             7f6feb8bf497 clock_nanosleep@GLIBC_2.2.5+0x27 (/usr/lib64/libc.so.6)
>             5608d1601a9f [unknown] (/usr/bin/sleep)
>             7f6feb805088 __libc_start_call_main+0x78 (/usr/lib64/libc.so.6)
>             7f6feb80514b __libc_start_main@@GLIBC_2.34+0x8b (/usr/lib64/libc.so.6)
>             5608d1601c05 [unknown] (/usr/bin/sleep)
>
>            sleep  466955 [006] 16746.630223:     225700 cpu_core/cycles/P:  ffffffff8a40ec5a folio_remove_rmap_ptes+0x1a ([kernel.kallsyms])
>          :466955  466955 [006] 16746.630442:     225700 cpu_core/cycles/P:  ffffffff8b1387eb timerqueue_add+0x3b ([kernel.kallsyms])
>             perf  466953 [027] 16746.630541:  601731496       offcpu-time:
>             7f818451c1fd __poll+0x4d (/usr/lib64/libc.so.6)
>                   49606e fdarray__poll+0x2e (/home/acme/bin/perf)
>                   429eed __cmd_record.constprop.0+0x18ad (/home/acme/bin/perf)
>                   42c77e cmd_record+0xd9e (/home/acme/bin/perf)
>                   495700 run_builtin+0x70 (/home/acme/bin/perf)
>                   495a1b handle_internal_command+0xab (/home/acme/bin/perf)
>                   413473 main+0x2f3 (/home/acme/bin/perf)
>             7f8184438088 __libc_start_call_main+0x78 (/usr/lib64/libc.so.6)
>             7f818443814b __libc_start_main@@GLIBC_2.34+0x8b (/usr/lib64/libc.so.6)
>                   413ac5 _start+0x25 (/home/acme/bin/perf)
>
>             perf  466953 [000] 16746.630831:     298081 cpu_core/cycles/P:  ffffffff8b40012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
>          swapper       0 [019] 16746.630985:     250096 cpu_atom/cycles/P:  ffffffff8b213d7a intel_idle+0x6a ([kernel.kallsyms])
>             perf  466953 [010] 16746.632045:     286467 cpu_core/cycles/P:  ffffffff8a1c3f21 __rcu_read_lock+0x11 ([kernel.kallsyms])
>   IPC I/O Parent    4039 [000] 18446744069.414584:      20781       offcpu-time:
>             7f2dab89f000 __GI___pthread_enable_asynccancel+0x20 (/usr/lib64/libc.so.6)
>             556309ebede4 moz_xmalloc+0x104 (/usr/lib64/firefox/firefox)
>            3bbdf3d834830 [unknown] ([unknown])
>
>
> And if I install the 'coreutils' debuginfo I get the sleep backtraces
> resolved, nice, but again only when I'm running in system wide (-a)
> mode.
>
>  Isolated Web Co    7167 [008] 16915.115432:     188682 cpu_core/cycles/P:      7fe85f558260 nsBaseHashtable<nsStringHashKey, mozilla::UniquePtr<nsBaseHashtable<nsIntegralHashKey<unsigned lo>
>          swapper       0 [013] 16915.126739:     120783 cpu_core/cycles/P:  ffffffff8a15221d enqueue_task_fair+0x22d ([kernel.kallsyms])
>            sleep  467147 [013] 16915.126780:  600336127       offcpu-time:
>             7ff010611497 clock_nanosleep@GLIBC_2.2.5+0x27 (/usr/lib64/libc.so.6)
>             563f045a3a9f main+0x4af (/usr/bin/sleep)
>             7ff010557088 __libc_start_call_main+0x78 (/usr/lib64/libc.so.6)
>             7ff01055714b __libc_start_main@@GLIBC_2.34+0x8b (/usr/lib64/libc.so.6)
>             563f045a3c05 _start+0x25 (/usr/bin/sleep)
>
>            sleep  467147 [013] 16915.126861:     120783 cpu_core/cycles/P:  ffffffff8a3ef391 unmap_page_range+0x661 ([kernel.kallsyms])
>            sleep  467147 [013] 16915.126973:     139106 cpu_core/cycles/P:  ffffffff8a3a5b4d folios_put_refs+0x8d ([kernel.kallsyms])
>          swapper       0 [013] 16915.127102:     160383 cpu_core/cycles/P:  ffffffff8a14b5c4 sched_balance_update_blocked_averages+0xb4 ([kernel.kallsyms])
>
> When running just one workload:
>
> root@number:~# for a in `seq 10` ; do perf record --off-cpu sleep 1s ; perf script | grep offcpu-time ; done
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (13 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (16 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (14 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (12 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> root@number:~#
>
> Have you tested it like that? Is this a known issue/current limitation?

With your command:
Tuel $ for a in `seq 10` ; do sudo perf record --off-cpu sleep 1s ;
sudo perf script | grep offcpu-time ; done
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ]
:56645   56645 18446744069.414584: 1000127411 offcpu-time:
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ]
:56723   56723 18446744069.414584: 1000058038 offcpu-time:
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ]
:56778   56778 18446744069.414584: 1000230651 offcpu-time:
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (9 samples) ]
:56810   56810 18446744069.414584: 1000140263 offcpu-time:
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (17 samples) ]
:56877   56877 18446744069.414584: 1001209864 offcpu-time:
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (9 samples) ]
:56919   56919 18446744069.414584: 1000230734 offcpu-time:
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (9 samples) ]
:56975   56975 18446744069.414584: 1000080484 offcpu-time:
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (9 samples) ]
:57008   57008 18446744069.414584: 1000057996 offcpu-time:
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (9 samples) ]
:57040   57040 18446744069.414584: 1000084270 offcpu-time:
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ]
:57072   57072 18446744069.414584: 1000081663 offcpu-time:

It works for me... I'm trying to think of a reason. I have no perf config btw.

Thanks,
Howard
>
> I'm reading the patches to see I understand this better.
>
> - Arnaldo
Re: [PATCH v14 00/10] perf record --off-cpu: Dump off-cpu samples directly
Posted by Arnaldo Carvalho de Melo 10 months, 1 week ago
On Tue, Feb 11, 2025 at 10:30:16AM -0800, Howard Chu wrote:
> On Tue, Feb 11, 2025 at 10:02 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > When running just one workload:

> > root@number:~# for a in `seq 10` ; do perf record --off-cpu sleep 1s ; perf script | grep offcpu-time ; done
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.037 MB perf.data (13 samples) ]
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.037 MB perf.data (16 samples) ]
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.037 MB perf.data (14 samples) ]
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.037 MB perf.data (12 samples) ]
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> > root@number:~#

> > Have you tested it like that? Is this a known issue/current limitation?

> With your command:
> Tuel $ for a in `seq 10` ; do sudo perf record --off-cpu sleep 1s ;
> sudo perf script | grep offcpu-time ; done
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ]
> :56645   56645 18446744069.414584: 1000127411 offcpu-time:
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ]
> :56723   56723 18446744069.414584: 1000058038 offcpu-time:
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ]
> :56778   56778 18446744069.414584: 1000230651 offcpu-time:
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.017 MB perf.data (9 samples) ]
> :56810   56810 18446744069.414584: 1000140263 offcpu-time:
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.017 MB perf.data (17 samples) ]
> :56877   56877 18446744069.414584: 1001209864 offcpu-time:
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.017 MB perf.data (9 samples) ]
> :56919   56919 18446744069.414584: 1000230734 offcpu-time:
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.017 MB perf.data (9 samples) ]
> :56975   56975 18446744069.414584: 1000080484 offcpu-time:
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.017 MB perf.data (9 samples) ]
> :57008   57008 18446744069.414584: 1000057996 offcpu-time:
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.017 MB perf.data (9 samples) ]
> :57040   57040 18446744069.414584: 1000084270 offcpu-time:
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ]
> :57072   57072 18446744069.414584: 1000081663 offcpu-time:
> 
> It works for me... I'm trying to think of a reason. I have no perf config btw.

root@number:~# uname -a
Linux number 6.12.11-100.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jan 23 22:07:15 UTC 2025 x86_64 GNU/Linux
root@number:~# cat ~/.perfconfig 
# this file is auto-generated.
[report]
	children = no
root@number:~# rm -f ~/.perfconfig 
root@number:~# for a in `seq 10` ; do perf record --off-cpu sleep 1s ; perf script | grep offcpu-time ; done
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (14 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (13 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
root@number:~#

Some more data:

root@number:~# perf record --off-cpu sleep 1s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
root@number:~# perf evlist
cpu_atom/cycles/P
cpu_core/cycles/P
offcpu-time
dummy:u
root@number:~# perf evlist -v
cpu_atom/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000000, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
cpu_core/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000000, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
offcpu-time: type: 1 (software), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID|LOST, disabled: 1, enable_on_exec: 1, sample_id_all: 1
dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|IDENTIFIER, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
root@number:~# perf report -D | grep PERF_RECORD_SAMPLE
4294967295 20129389592917 0x9240 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 474261/474261: 0xffffffff8a376ff1 period: 1 addr: 0
4294967295 20129389596787 0x9270 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 474261/474261: 0xffffffff8a01e8b6 period: 1 addr: 0
4294967295 20129389598447 0x92a0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 474261/474261: 0xffffffff8a053e64 period: 13 addr: 0
4294967295 20129389599368 0x92d0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 474261/474261: 0xffffffff8a00f052 period: 398 addr: 0
4294967295 20129389600270 0x9300 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 474261/474261: 0xffffffff8b401fad period: 15226 addr: 0
4294967295 20129389604095 0x9360 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 474261/474261: 0xffffffff8a405930 period: 607346 addr: 0
4294967295 20129389739536 0x94e8 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4002): 474261/474261: 0x7f9bf499fe7c period: 5436975 addr: 0
root@number:~#

root@number:~# perf report -D | grep PERF_RECORD_SAMPLE -A2
4294967295 20129389592917 0x9240 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 474261/474261: 0xffffffff8a376ff1 period: 1 addr: 0
 ... thread: perf-exec:474261
 ...... dso: /proc/kcore
--
4294967295 20129389596787 0x9270 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 474261/474261: 0xffffffff8a01e8b6 period: 1 addr: 0
 ... thread: perf-exec:474261
 ...... dso: /proc/kcore
--
4294967295 20129389598447 0x92a0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 474261/474261: 0xffffffff8a053e64 period: 13 addr: 0
 ... thread: perf-exec:474261
 ...... dso: /proc/kcore
--
4294967295 20129389599368 0x92d0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 474261/474261: 0xffffffff8a00f052 period: 398 addr: 0
 ... thread: perf-exec:474261
 ...... dso: /proc/kcore
--
4294967295 20129389600270 0x9300 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 474261/474261: 0xffffffff8b401fad period: 15226 addr: 0
 ... thread: perf-exec:474261
 ...... dso: /proc/kcore
--
4294967295 20129389604095 0x9360 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 474261/474261: 0xffffffff8a405930 period: 607346 addr: 0
 ... thread: sleep:474261
 ...... dso: /proc/kcore
--
4294967295 20129389739536 0x94e8 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4002): 474261/474261: 0x7f9bf499fe7c period: 5436975 addr: 0
 ... thread: sleep:474261
 ...... dso: /usr/lib64/ld-linux-x86-64.so.2
root@number:~# 

- Arnaldo
Re: [PATCH v14 00/10] perf record --off-cpu: Dump off-cpu samples directly
Posted by Arnaldo Carvalho de Melo 10 months, 1 week ago
On Tue, Feb 11, 2025 at 07:02:04PM +0100, Arnaldo Carvalho de Melo wrote:
> When running just one workload:
> 
> root@number:~# for a in `seq 10` ; do perf record --off-cpu sleep 1s ; perf script | grep offcpu-time ; done
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (13 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (16 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (14 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (12 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> root@number:~#
> 
> Have you tested it like that? Is this a known issue/current limitation?
> 
> I'm reading the patches to see I understand this better.

Ok, if I go back and use the example Namhyung added to the commit
message introducing off-cpu profiling:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=edc41a1099c2d08ccfd4ed7d59688501e3749015

We get offcpu-time samples:

root@number:~# perf record --off-cpu perf bench sched messaging -l 1000
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.244 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.182 MB perf.data (43600 samples) ]
root@number:~# perf script | grep offcpu-time | wc -l
867
root@number:~# 

But they are all at the end:

sched-messaging  469373 17707.960698:    1055439 cpu_atom/cycles/P:  ffffffff8a49a453 mod_objcg_state+0xb3 ([kernel.kallsyms])
 sched-messaging  469373 17707.960944:    1057379 cpu_atom/cycles/P:  ffffffff8b222cf9 native_queued_spin_lock_slowpath+0x209 ([kernel.kallsyms])
 sched-messaging  469274 17707.961299:    6311728 cpu_core/cycles/P:      7fa751a510f1 __cxa_finalize+0x151 (/usr/lib64/libc.so.6)
         :469374  469374 [000] 18446744069.414584:   43388310       offcpu-time: 
            7fa751b1c1c4 [unknown] ([unknown])
                  4fb0e8 [unknown] ([unknown])
                  4faa29 [unknown] ([unknown])
                  495700 [unknown] ([unknown])
                  495a1b [unknown] ([unknown])
                  413473 [unknown] ([unknown])
            7fa751a38088 [unknown] ([unknown])
            7fa751a3814b [unknown] ([unknown])
                  413ac5 [unknown] ([unknown])

         :469518  469518 [000] 18446744069.414584:   22436206       offcpu-time: 
            7fa751b1c1c4 [unknown] ([unknown])
                  4fb0e8 [unknown] ([unknown])
                  4faa29 [unknown] ([unknown])
                  495700 [unknown] ([unknown])
                  495a1b [unknown] ([unknown])
                  413473 [unknown] ([unknown])
            7fa751a38088 [unknown] ([unknown])
            7fa751a3814b [unknown] ([unknown])
                  413ac5 [unknown] ([unknown])

         :469663  469663 [000] 18446744069.414584:  180198700       offcpu-time: 
            7fa751b1d304 [unknown] ([unknown])
                  4faa29 [unknown] ([unknown])
                  495700 [unknown] ([unknown])
                  495a1b [unknown] ([unknown])
                  413473 [unknown] ([unknown])
            7fa751a38088 [unknown] ([unknown])
            7fa751a3814b [unknown] ([unknown])
                  413ac5 [unknown] ([unknown])

possibly the threshold?

Tried:

root@number:~# perf record --off-cpu --off-cpu-thresh 10 perf bench sched messaging -l 1000

Didn't help.

When doing a system wide session with that 10 threshold I get the
offcpu-time samples intermixed, i.e. not just at the end:

root@number:~# perf record -a --off-cpu --off-cpu-thresh 10 perf bench sched messaging -l 1000

sched-messaging  472324 [007] 18587.393976:    1371787 cpu_core/cycles/P:  ffffffff8a7864d2 avc_has_perm+0x92 ([kernel.kallsyms])
         swapper       0 [023] 18587.394111:     286156 cpu_atom/cycles/P:  ffffffff8a16aacb do_idle+0x14b ([kernel.kallsyms])
 sched-messaging  472532 [016] 18587.394210:     294672 cpu_atom/cycles/P:  ffffffff8b4000ab entry_SYSCALL_64_safe_stack+0x0 ([kernel.kallsyms])
 sched-messaging  472324 [007] 18587.394226:    1371903 cpu_core/cycles/P:  ffffffff8a3f6308 copy_page_range+0x12a8 ([kernel.kallsyms])
          podman    3935 [012] 18587.394310:   10106482       offcpu-time: 
            5630f6bf2457 [unknown] (/usr/bin/podman)
            5630f6bc49e5 [unknown] (/usr/bin/podman)
            5630f6bbb1f5 [unknown] (/usr/bin/podman)
            5630f6bbb136 [unknown] (/usr/bin/podman)
            5630f6beeb65 [unknown] (/usr/bin/podman)
            7f4474032088 start_thread+0x3b8 (/usr/lib64/libc.so.6)
            7f44740b5f8c clone3+0x2c (/usr/lib64/libc.so.6)

 sched-messaging  472534 [016] 18587.394434:     279374 cpu_atom/cycles/P:  ffffffff8a3826c9 get_callchain_entry+0x9 ([kernel.kallsyms])
 sched-messaging  472324 [007] 18587.394476:    1371988 cpu_core/cycles/P:  ffffffff8a43d264 __kmalloc_node_noprof+0xd4 ([kernel.kallsyms])
         swapper       0 [016] 18587.394717:     276534 cpu_atom/cycles/P:  ffffffff8a1d2ef4 rcu_preempt_deferred_qs+0x4 ([kernel.kallsyms])
<SNIP>
 sched-messaging  472674 [013] 18587.414755:    6576247 cpu_core/cycles/P:  ffffffff8b222d7d native_queued_spin_lock_slowpath+0x28d ([kernel.kallsyms])
 sched-messaging  472625 [022] 18587.414760:   10699110       offcpu-time: 
            7f6b1a71d304 __GI___libc_write+0x14 (/usr/lib64/libc.so.6)
                  4fb0e8 bench_sched_messaging+0x218 (/home/acme/bin/perf)
                  4faa29 cmd_bench+0x2f9 (/home/acme/bin/perf)
                  495700 run_builtin+0x70 (/home/acme/bin/perf)
                  495a1b handle_internal_command+0xab (/home/acme/bin/perf)
                  413473 main+0x2f3 (/home/acme/bin/perf)
            7f6b1a638088 __libc_start_call_main+0x78 (/usr/lib64/libc.so.6)
            7f6b1a63814b __libc_start_main@@GLIBC_2.34+0x8b (/usr/lib64/libc.so.6)
                  413ac5 _start+0x25 (/home/acme/bin/perf)

 sched-messaging  472623 [023] 18587.414763:   10911814       offcpu-time: 
            7f6b1a71d304 __GI___libc_write+0x14 (/usr/lib64/libc.so.6)
                  4fb0e8 bench_sched_messaging+0x218 (/home/acme/bin/perf)
                  4faa29 cmd_bench+0x2f9 (/home/acme/bin/perf)
                  495700 run_builtin+0x70 (/home/acme/bin/perf)
                  495a1b handle_internal_command+0xab (/home/acme/bin/perf)
                  413473 main+0x2f3 (/home/acme/bin/perf)
            7f6b1a638088 __libc_start_call_main+0x78 (/usr/lib64/libc.so.6)
            7f6b1a63814b __libc_start_main@@GLIBC_2.34+0x8b (/usr/lib64/libc.so.6)
                  413ac5 _start+0x25 (/home/acme/bin/perf)
<SNIP>

So I'm guessing that running just the 'sleep' workload we are missing
the sample at some point while with the 'perf bench' we have way more
samples, etc.

Also the intermixed entries have backtraces resolved to symbols, while
the ones at the end don't have, something else to investigate.

- Arnaldo
Re: [PATCH v14 00/10] perf record --off-cpu: Dump off-cpu samples directly
Posted by Howard Chu 1 year ago
Hello Namhyung,

On Mon, Dec 16, 2024 at 12:11 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Sun, Dec 15, 2024 at 10:12:10AM -0800, Howard Chu wrote:
> > Changes in v14:
> >  - Change the internal off_cpu_thresh_us to off_cpu_thresh_ns, i.e. use
> >    nsec instead of usec
> >
> > Changes in v13:
> >  - Move the definition of 'off_cpu_thresh_ns' to the same commit as
> >    dumping off-cpu samples in BPF, and give off_cpu_thresh_ns a default
> >    value before the --off-cpu-thresh option is parsed.
> >
> > Changes in v12:
> >  - Restore patches' bisectability, because the ordering of patches has
> >    changed.
> >  - Change 'us = ms * 1000' to 'us = ms * USEC_PER_MSEC'
> >
> > Changes in v11:
> >  - Modify the options used in the off-cpu tests, as I changed the unit
> >    of the off-cpu threshold to milliseconds.
> >
> > Changes in v10:
> >  - Move the commit "perf record --off-cpu: Add --off-cpu-thresh option"
> >    to where the direct sample feature is completed.
> >  - Make --off-cpu-thresh use milliseconds as the unit.
> >
> > Changes in v9:
> >  - Add documentation for the new option '--off-cpu-thresh', and include
> >    an example of its usage in the commit message
> >  - Set inherit in evsel__config() to prevent future modifications
> >  - Support off-cpu sample data collected by perf before this patch series
> >
> > Changes in v8:
> >  - Make this series bisectable
> >  - Rename off_cpu_thresh to off_cpu_thresh_us and offcpu_thresh (in BPF)
> >    to offcpu_thresh_ns for clarity
> >  - Add commit messages to 'perf evsel: Expose evsel__is_offcpu_event()
> >    for future use' commit
> >  - Correct spelling mistakes in the commit message (s/is should be/should be/)
> >  - Add kernel-doc comments to off_cpu_dump(), and comments to the empty
> >    if block
> >  - Add some comments to off-cpu test
> >  - Delete an unused variable 'timestamp' in off_cpu_dump()
> >
> > Changes in v7:
> >  - Make off-cpu event system-wide
> >  - Use strtoull instead of strtoul
> >  - Delete unused variable such as sample_id, and sample_type
> >  - Use i as index to update BPF perf_event map
> >  - MAX_OFFCPU_LEN 128 is too big, make it smaller.
> >  - Delete some bound check as it's always guaranteed
> >  - Do not set ip_pos in BPF
> >  - Add a new field for storing stack traces in the tstamp map
> >  - Dump the off-cpu sample directly or save it in the off_cpu map, not both
> >  - Delete the sample_type_off_cpu check
> >  - Use __set_off_cpu_sample() to parse samples instead of a two-pass parsing
> >
> > Changes in v6:
> >  - Make patches bisectable
> >
> > Changes in v5:
> >  - Delete unnecessary copy in BPF program
> >  - Remove sample_embed from perf header, hard code off-cpu stuff instead
> >  - Move evsel__is_offcpu_event() to evsel.h
> >  - Minor changes to the test
> >  - Edit some comments
> >
> > Changes in v4:
> >  - Minimize the size of data output by perf_event_output()
> >  - Keep only one off-cpu event
> >  - Change off-cpu threshold's unit to microseconds
> >  - Set a default off-cpu threshold
> >  - Print the correct error message for the field 'embed' in perf data header
> >
> > Changes in v3:
> >  - Add off-cpu-thresh argument
> >  - Process direct off-cpu samples in post
> >
> > Changes in v2:
> >  - Remove unnecessary comments.
> >  - Rename function off_cpu_change_type to off_cpu_prepare_parse
> >
> > v1:
> >
> > As mentioned in: https://bugzilla.kernel.org/show_bug.cgi?id=207323
> >
> > Currently, off-cpu samples are dumped when perf record is exiting. This
> > results in off-cpu samples being after the regular samples. This patch
> > series makes possible dumping off-cpu samples on-the-fly, directly into
> > perf ring buffer. And it dispatches those samples to the correct format
> > for perf.data consumers.
> >
> > Before:
> > ```
> >      migration/0      21 [000] 27981.041319: 2944637851    cycles:P:  ffffffff90d2e8aa record_times+0xa ([kernel.kallsyms])
> >             perf  770116 [001] 27981.041375:          1    cycles:P:  ffffffff90ee4960 event_function+0xf0 ([kernel.kallsyms])
> >             perf  770116 [001] 27981.041377:          1    cycles:P:  ffffffff90c184b1 intel_bts_enable_local+0x31 ([kernel.kallsyms])
> >             perf  770116 [001] 27981.041379:      51611    cycles:P:  ffffffff91a160b0 native_sched_clock+0x30 ([kernel.kallsyms])
> >      migration/1      26 [001] 27981.041400: 4227682775    cycles:P:  ffffffff90d06a74 wakeup_preempt+0x44 ([kernel.kallsyms])
> >      migration/2      32 [002] 27981.041477: 4159401534    cycles:P:  ffffffff90d11993 update_load_avg+0x63 ([kernel.kallsyms])
> >
> > sshd  708098 [000] 18446744069.414584:     286392 offcpu-time:
> >           79a864f1c8bb ppoll+0x4b (/usr/lib/libc.so.6)
> >           585690935cca [unknown] (/usr/bin/sshd)
> > ```
> >
> > After:
> > ```
> >             perf  774767 [003] 28178.033444:        497           cycles:P:  ffffffff91a160c3 native_sched_clock+0x43 ([kernel.kallsyms])
> >             perf  774767 [003] 28178.033445:     399440           cycles:P:  ffffffff91c01f8d nmi_restore+0x25 ([kernel.kallsyms])
> >          swapper       0 [001] 28178.036639:  376650973           cycles:P:  ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
> >          swapper       0 [003] 28178.182921:  348779378           cycles:P:  ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
> >     blueman-tray    1355 [000] 28178.627906:  100184571 offcpu-time:
> >           7528eef1c39d __poll+0x4d (/usr/lib/libc.so.6)
> >           7528edf7d8fd [unknown] (/usr/lib/libglib-2.0.so.0.8000.2)
> >           7528edf1af95 g_main_context_iteration+0x35 (/usr/lib/libglib-2.0.so.0.8000.2)
> >           7528eda4ab86 g_application_run+0x1f6 (/usr/lib/libgio-2.0.so.0.8000.2)
> >           7528ee6aa596 [unknown] (/usr/lib/libffi.so.8.1.4)
> >           7fff24e862d8 [unknown] ([unknown])
> >
> >
> >     blueman-tray    1355 [000] 28178.728137:  100187539 offcpu-time:
> >           7528eef1c39d __poll+0x4d (/usr/lib/libc.so.6)
> >           7528edf7d8fd [unknown] (/usr/lib/libglib-2.0.so.0.8000.2)
> >           7528edf1af95 g_main_context_iteration+0x35 (/usr/lib/libglib-2.0.so.0.8000.2)
> >           7528eda4ab86 g_application_run+0x1f6 (/usr/lib/libgio-2.0.so.0.8000.2)
> >           7528ee6aa596 [unknown] (/usr/lib/libffi.so.8.1.4)
> >           7fff24e862d8 [unknown] ([unknown])
> >
> >
> >          swapper       0 [000] 28178.463253:  195945410           cycles:P:  ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
> >      dbus-broker     412 [002] 28178.464855:  376737008           cycles:P:  ffffffff91c000a0 entry_SYSCALL_64+0x20 ([kernel.kallsyms])
> > ```
> >
> > Howard Chu (10):
> >   perf evsel: Expose evsel__is_offcpu_event() for future use
> >   perf record --off-cpu: Parse off-cpu event
> >   perf record --off-cpu: Preparation of off-cpu BPF program
> >   perf record --off-cpu: Dump off-cpu samples in BPF
> >   perf evsel: Assemble offcpu samples
> >   perf record --off-cpu: Disable perf_event's callchain collection
> >   perf script: Display off-cpu samples correctly
> >   perf record --off-cpu: Dump the remaining samples in BPF's stack trace
> >     map
> >   perf record --off-cpu: Add --off-cpu-thresh option
> >   perf test: Add direct off-cpu test
>
> Thanks for your work and the persistence.

I think you misspelled 'procrastination' :). All jokes aside, thank
you for the reviews—it’s much harder on your side.

Thanks,
Howard

>
> Reviewed-by: Namhyung Kim <namhyung@kernel.org>
>
> Thanks,
> Namhyung