[RFC/PATCHSET 0/5] perf tools: Support deferred user callchains (v2)

Namhyung Kim posted 5 patches 2 months, 1 week ago
tools/include/uapi/linux/perf_event.h     | 21 +++++-
tools/lib/perf/include/perf/event.h       |  7 ++
tools/perf/Documentation/perf-script.txt  |  5 ++
tools/perf/builtin-script.c               | 92 +++++++++++++++++++++++
tools/perf/util/callchain.c               | 24 ++++++
tools/perf/util/callchain.h               |  3 +
tools/perf/util/event.c                   |  1 +
tools/perf/util/evlist.c                  |  1 +
tools/perf/util/evlist.h                  |  1 +
tools/perf/util/evsel.c                   | 32 +++++++-
tools/perf/util/evsel.h                   |  1 +
tools/perf/util/machine.c                 |  1 +
tools/perf/util/perf_event_attr_fprintf.c |  1 +
tools/perf/util/sample.h                  |  3 +-
tools/perf/util/session.c                 | 78 +++++++++++++++++++
tools/perf/util/tool.c                    |  2 +
tools/perf/util/tool.h                    |  4 +-
17 files changed, 273 insertions(+), 4 deletions(-)
[RFC/PATCHSET 0/5] perf tools: Support deferred user callchains (v2)
Posted by Namhyung Kim 2 months, 1 week ago
Hello,

This is a counterpart for Josh's kernel change v2 [1] to support deferred
user callchains.  The change is transparent and users should not notice
anything with the deferred callchains.

  $ perf record -g sleep 1

I added --[no-]merge-callchains option to control output of perf script.
You can verify it has the deferred callchains like this:

  $ perf script --no-merge-callchains
  perf     801 [000]    18.031793:          1 cycles:P:
          ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
          ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
          ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
          ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
          ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
          ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
          ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
          ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
          ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
          ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
          ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
          ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
          ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])

  perf     801 [000]    18.031814: DEFERRED CALLCHAIN
                  7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)

  ...

When the callchain is merged (it's the default) it'd look like below:

  $ perf script
  perf     801 [000]    18.031793:          1 cycles:P:
          ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
          ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
          ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
          ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
          ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
          ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
          ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
          ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
          ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
          ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
          ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
          ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
          ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
                  7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)

  ...

Notice that the last line and it has the __GI___ioctl in the same
callchain.  It should work with other tools like perf report.

The code is available at 'perf/defer-callchain-v2' branch in
https://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung

[1] https://lore.kernel.org/lkml/cover.1726268190.git.jpoimboe@kernel.org


Namhyung Kim (5):
  perf tools: Sync UAPI perf_event.h header
  perf tools: Minimal DEFERRED_CALLCHAIN support
  perf record: Enable defer_callchain for user callchains
  perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED
  perf tools: Merge deferred user callchains

 tools/include/uapi/linux/perf_event.h     | 21 +++++-
 tools/lib/perf/include/perf/event.h       |  7 ++
 tools/perf/Documentation/perf-script.txt  |  5 ++
 tools/perf/builtin-script.c               | 92 +++++++++++++++++++++++
 tools/perf/util/callchain.c               | 24 ++++++
 tools/perf/util/callchain.h               |  3 +
 tools/perf/util/event.c                   |  1 +
 tools/perf/util/evlist.c                  |  1 +
 tools/perf/util/evlist.h                  |  1 +
 tools/perf/util/evsel.c                   | 32 +++++++-
 tools/perf/util/evsel.h                   |  1 +
 tools/perf/util/machine.c                 |  1 +
 tools/perf/util/perf_event_attr_fprintf.c |  1 +
 tools/perf/util/sample.h                  |  3 +-
 tools/perf/util/session.c                 | 78 +++++++++++++++++++
 tools/perf/util/tool.c                    |  2 +
 tools/perf/util/tool.h                    |  4 +-
 17 files changed, 273 insertions(+), 4 deletions(-)

-- 
2.46.0.792.g87dc391469-goog
Re: [RFC/PATCHSET 0/5] perf tools: Support deferred user callchains (v2)
Posted by Liang, Kan 2 months, 1 week ago

On 2024-09-17 6:28 p.m., Namhyung Kim wrote:
> Hello,
> 
> This is a counterpart for Josh's kernel change v2 [1] to support deferred
> user callchains.  The change is transparent and users should not notice
> anything with the deferred callchains.
> 
>   $ perf record -g sleep 1
> 
> I added --[no-]merge-callchains option to control output of perf script.
> You can verify it has the deferred callchains like this:
> 
>   $ perf script --no-merge-callchains
>   perf     801 [000]    18.031793:          1 cycles:P:
>           ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
>           ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
>           ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
>           ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
>           ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
>           ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
>           ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
>           ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
>           ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
>           ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
>           ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
>           ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
>           ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
> 
>   perf     801 [000]    18.031814: DEFERRED CALLCHAIN
>                   7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)
> 
>   ...
> 
> When the callchain is merged (it's the default) it'd look like below:
> 
>   $ perf script
>   perf     801 [000]    18.031793:          1 cycles:P:
>           ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
>           ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
>           ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
>           ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
>           ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
>           ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
>           ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
>           ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
>           ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
>           ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
>           ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
>           ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
>           ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
>                   7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)
> 
>   ...
> 
> Notice that the last line and it has the __GI___ioctl in the same
> callchain.  It should work with other tools like perf report.


It seems it only works with perf report -D, when I test it on a
non-hybrid machine.
$perf record -e branches -g -c 3000000 ~/tchain_edit
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.397 MB perf.data ]
$ perf report -D | tail -n 17

Aggregated stats:
               TOTAL events:       8235
                MMAP events:         78  ( 0.9%)
                COMM events:          2  ( 0.0%)
                EXIT events:          1  ( 0.0%)
              SAMPLE events:       4060  (49.3%)
               MMAP2 events:          2  ( 0.0%)
             KSYMBOL events:         12  ( 0.1%)
           BPF_EVENT events:         12  ( 0.1%)
  CALLCHAIN_DEFERRED events:       4060  (49.3%)
      FINISHED_ROUND events:          3  ( 0.0%)
            ID_INDEX events:          1  ( 0.0%)
          THREAD_MAP events:          1  ( 0.0%)
             CPU_MAP events:          1  ( 0.0%)
           TIME_CONV events:          1  ( 0.0%)
       FINISHED_INIT events:          1  ( 0.0%)
$ perf report
Error:
The perf.data data has no samples!
# To display the perf.data header info, please use
--header/--header-only options.
#


On a hybrid machine, perf record errors out.

$perf record -g true
[ perf record: Woken up 1 times to write data ]
0x58a8 [0x38]: failed to process type: 22 [Bad address]
[ perf record: Captured and wrote 0.022 MB perf.data ]

Thanks,
Kan
> 
> The code is available at 'perf/defer-callchain-v2' branch in
> https://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> 
> Thanks,
> Namhyung
> 
> [1] https://lore.kernel.org/lkml/cover.1726268190.git.jpoimboe@kernel.org
> 
> 
> Namhyung Kim (5):
>   perf tools: Sync UAPI perf_event.h header
>   perf tools: Minimal DEFERRED_CALLCHAIN support
>   perf record: Enable defer_callchain for user callchains
>   perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED
>   perf tools: Merge deferred user callchains
> 
>  tools/include/uapi/linux/perf_event.h     | 21 +++++-
>  tools/lib/perf/include/perf/event.h       |  7 ++
>  tools/perf/Documentation/perf-script.txt  |  5 ++
>  tools/perf/builtin-script.c               | 92 +++++++++++++++++++++++
>  tools/perf/util/callchain.c               | 24 ++++++
>  tools/perf/util/callchain.h               |  3 +
>  tools/perf/util/event.c                   |  1 +
>  tools/perf/util/evlist.c                  |  1 +
>  tools/perf/util/evlist.h                  |  1 +
>  tools/perf/util/evsel.c                   | 32 +++++++-
>  tools/perf/util/evsel.h                   |  1 +
>  tools/perf/util/machine.c                 |  1 +
>  tools/perf/util/perf_event_attr_fprintf.c |  1 +
>  tools/perf/util/sample.h                  |  3 +-
>  tools/perf/util/session.c                 | 78 +++++++++++++++++++
>  tools/perf/util/tool.c                    |  2 +
>  tools/perf/util/tool.h                    |  4 +-
>  17 files changed, 273 insertions(+), 4 deletions(-)
>
Re: [RFC/PATCHSET 0/5] perf tools: Support deferred user callchains (v2)
Posted by Namhyung Kim 2 months ago
Hi Kan,

On Wed, Sep 18, 2024 at 04:26:56PM -0400, Liang, Kan wrote:
> 
> 
> On 2024-09-17 6:28 p.m., Namhyung Kim wrote:
> > Hello,
> > 
> > This is a counterpart for Josh's kernel change v2 [1] to support deferred
> > user callchains.  The change is transparent and users should not notice
> > anything with the deferred callchains.
> > 
> >   $ perf record -g sleep 1
> > 
> > I added --[no-]merge-callchains option to control output of perf script.
> > You can verify it has the deferred callchains like this:
> > 
> >   $ perf script --no-merge-callchains
> >   perf     801 [000]    18.031793:          1 cycles:P:
> >           ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
> >           ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
> >           ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
> >           ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
> >           ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
> >           ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
> >           ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
> >           ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
> >           ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
> >           ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
> >           ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
> >           ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
> >           ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
> > 
> >   perf     801 [000]    18.031814: DEFERRED CALLCHAIN
> >                   7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)
> > 
> >   ...
> > 
> > When the callchain is merged (it's the default) it'd look like below:
> > 
> >   $ perf script
> >   perf     801 [000]    18.031793:          1 cycles:P:
> >           ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
> >           ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
> >           ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
> >           ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
> >           ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
> >           ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
> >           ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
> >           ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
> >           ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
> >           ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
> >           ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
> >           ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
> >           ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
> >                   7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)
> > 
> >   ...
> > 
> > Notice that the last line and it has the __GI___ioctl in the same
> > callchain.  It should work with other tools like perf report.
> 
> 
> It seems it only works with perf report -D, when I test it on a
> non-hybrid machine.
> $perf record -e branches -g -c 3000000 ~/tchain_edit
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.397 MB perf.data ]
> $ perf report -D | tail -n 17
> 
> Aggregated stats:
>                TOTAL events:       8235
>                 MMAP events:         78  ( 0.9%)
>                 COMM events:          2  ( 0.0%)
>                 EXIT events:          1  ( 0.0%)
>               SAMPLE events:       4060  (49.3%)
>                MMAP2 events:          2  ( 0.0%)
>              KSYMBOL events:         12  ( 0.1%)
>            BPF_EVENT events:         12  ( 0.1%)
>   CALLCHAIN_DEFERRED events:       4060  (49.3%)
>       FINISHED_ROUND events:          3  ( 0.0%)
>             ID_INDEX events:          1  ( 0.0%)
>           THREAD_MAP events:          1  ( 0.0%)
>              CPU_MAP events:          1  ( 0.0%)
>            TIME_CONV events:          1  ( 0.0%)
>        FINISHED_INIT events:          1  ( 0.0%)
> $ perf report
> Error:
> The perf.data data has no samples!
> # To display the perf.data header info, please use
> --header/--header-only options.
> #
> 
> 
> On a hybrid machine, perf record errors out.
> 
> $perf record -g true
> [ perf record: Woken up 1 times to write data ]
> 0x58a8 [0x38]: failed to process type: 22 [Bad address]
> [ perf record: Captured and wrote 0.022 MB perf.data ]

Thanks for the test, I'll take a look what I missed.

Thanks,
Namhyung
Re: [RFC/PATCHSET 0/5] perf tools: Support deferred user callchains (v2)
Posted by Ian Rogers 2 months, 1 week ago
On Wed, Sep 18, 2024 at 12:28 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hello,
>
> This is a counterpart for Josh's kernel change v2 [1] to support deferred
> user callchains.  The change is transparent and users should not notice
> anything with the deferred callchains.
>
>   $ perf record -g sleep 1
>
> I added --[no-]merge-callchains option to control output of perf script.
> You can verify it has the deferred callchains like this:
>
>   $ perf script --no-merge-callchains
>   perf     801 [000]    18.031793:          1 cycles:P:
>           ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
>           ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
>           ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
>           ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
>           ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
>           ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
>           ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
>           ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
>           ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
>           ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
>           ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
>           ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
>           ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
>
>   perf     801 [000]    18.031814: DEFERRED CALLCHAIN
>                   7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)
>
>   ...
>
> When the callchain is merged (it's the default) it'd look like below:
>
>   $ perf script
>   perf     801 [000]    18.031793:          1 cycles:P:
>           ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
>           ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
>           ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
>           ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
>           ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
>           ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
>           ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
>           ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
>           ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
>           ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
>           ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
>           ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
>           ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
>                   7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)
>
>   ...
>
> Notice that the last line and it has the __GI___ioctl in the same
> callchain.  It should work with other tools like perf report.

Hi Namhyung, I think this is interesting work!

The issue feels similar to leader sampling and some of the unpicking
of that we've been dealing with. With leader sampling it was added and
then the dispatch of events modified so that tools wouldn't see leader
samples, instead new events would be synthesized based on the leader
sample data. However, the leader sample event wasn't changed and so
now we have multiple repeated events and perf inject wouldn't just
pass through a perf data file.

What I'm expecting based on this description is that a deferred call
chain will be merged with a regular one, however, perf inject isn't
updated to drop the deferred callchain so now we have the deferred
callchain event twice.

My feeling is that making the dispatch of events to tools "smart" is a
false economy. Tools can add handlers for these events easily enough.
What's harder is undoing the smartness when it does things that lead
to duplicated events and the like. I'm not a fan of how leader
sampling was implemented and I still think it odd that with perf
script we see invented events when trying to just dump the contents of
a perf.data file.

Thanks,
Ian

> The code is available at 'perf/defer-callchain-v2' branch in
> https://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
> Thanks,
> Namhyung
>
> [1] https://lore.kernel.org/lkml/cover.1726268190.git.jpoimboe@kernel.org
>
>
> Namhyung Kim (5):
>   perf tools: Sync UAPI perf_event.h header
>   perf tools: Minimal DEFERRED_CALLCHAIN support
>   perf record: Enable defer_callchain for user callchains
>   perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED
>   perf tools: Merge deferred user callchains
>
>  tools/include/uapi/linux/perf_event.h     | 21 +++++-
>  tools/lib/perf/include/perf/event.h       |  7 ++
>  tools/perf/Documentation/perf-script.txt  |  5 ++
>  tools/perf/builtin-script.c               | 92 +++++++++++++++++++++++
>  tools/perf/util/callchain.c               | 24 ++++++
>  tools/perf/util/callchain.h               |  3 +
>  tools/perf/util/event.c                   |  1 +
>  tools/perf/util/evlist.c                  |  1 +
>  tools/perf/util/evlist.h                  |  1 +
>  tools/perf/util/evsel.c                   | 32 +++++++-
>  tools/perf/util/evsel.h                   |  1 +
>  tools/perf/util/machine.c                 |  1 +
>  tools/perf/util/perf_event_attr_fprintf.c |  1 +
>  tools/perf/util/sample.h                  |  3 +-
>  tools/perf/util/session.c                 | 78 +++++++++++++++++++
>  tools/perf/util/tool.c                    |  2 +
>  tools/perf/util/tool.h                    |  4 +-
>  17 files changed, 273 insertions(+), 4 deletions(-)
>
> --
> 2.46.0.792.g87dc391469-goog
>
Re: [RFC/PATCHSET 0/5] perf tools: Support deferred user callchains (v2)
Posted by Namhyung Kim 2 months, 1 week ago
Hi Ian,

On Wed, Sep 18, 2024 at 08:38:22AM +0200, Ian Rogers wrote:
> On Wed, Sep 18, 2024 at 12:28 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hello,
> >
> > This is a counterpart for Josh's kernel change v2 [1] to support deferred
> > user callchains.  The change is transparent and users should not notice
> > anything with the deferred callchains.
> >
> >   $ perf record -g sleep 1
> >
> > I added --[no-]merge-callchains option to control output of perf script.
> > You can verify it has the deferred callchains like this:
> >
> >   $ perf script --no-merge-callchains
> >   perf     801 [000]    18.031793:          1 cycles:P:
> >           ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
> >           ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
> >           ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
> >           ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
> >           ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
> >           ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
> >           ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
> >           ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
> >           ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
> >           ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
> >           ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
> >           ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
> >           ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
> >
> >   perf     801 [000]    18.031814: DEFERRED CALLCHAIN
> >                   7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)
> >
> >   ...
> >
> > When the callchain is merged (it's the default) it'd look like below:
> >
> >   $ perf script
> >   perf     801 [000]    18.031793:          1 cycles:P:
> >           ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
> >           ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
> >           ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
> >           ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
> >           ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
> >           ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
> >           ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
> >           ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
> >           ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
> >           ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
> >           ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
> >           ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
> >           ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
> >                   7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)
> >
> >   ...
> >
> > Notice that the last line and it has the __GI___ioctl in the same
> > callchain.  It should work with other tools like perf report.
> 
> Hi Namhyung, I think this is interesting work!
> 
> The issue feels similar to leader sampling and some of the unpicking
> of that we've been dealing with. With leader sampling it was added and
> then the dispatch of events modified so that tools wouldn't see leader
> samples, instead new events would be synthesized based on the leader
> sample data. However, the leader sample event wasn't changed and so
> now we have multiple repeated events and perf inject wouldn't just
> pass through a perf data file.
> 
> What I'm expecting based on this description is that a deferred call
> chain will be merged with a regular one, however, perf inject isn't
> updated to drop the deferred callchain so now we have the deferred
> callchain event twice.
> 
> My feeling is that making the dispatch of events to tools "smart" is a
> false economy. Tools can add handlers for these events easily enough.
> What's harder is undoing the smartness when it does things that lead
> to duplicated events and the like. I'm not a fan of how leader
> sampling was implemented and I still think it odd that with perf
> script we see invented events when trying to just dump the contents of
> a perf.data file.

That's why I added perf_tool.merge_deferred_callchains flag to control
the behavior.  I haven't implemented it to perf inject because it covers
a couple of different use cases.  I believe the default behavior is to
not invoke the callback for deferred callchains during perf inject and
each sample will get the full callchains.  But you can add a new
callback and set perf_tool.merge_deferred_callchains to false.

Thanks,
Namhyung
Re: [RFC/PATCHSET 0/5] perf tools: Support deferred user callchains (v2)
Posted by Ian Rogers 2 months, 1 week ago
On Wed, Sep 18, 2024 at 11:38 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Ian,
>
> On Wed, Sep 18, 2024 at 08:38:22AM +0200, Ian Rogers wrote:
> > On Wed, Sep 18, 2024 at 12:28 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > Hello,
> > >
> > > This is a counterpart for Josh's kernel change v2 [1] to support deferred
> > > user callchains.  The change is transparent and users should not notice
> > > anything with the deferred callchains.
> > >
> > >   $ perf record -g sleep 1
> > >
> > > I added --[no-]merge-callchains option to control output of perf script.
> > > You can verify it has the deferred callchains like this:
> > >
> > >   $ perf script --no-merge-callchains
> > >   perf     801 [000]    18.031793:          1 cycles:P:
> > >           ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
> > >           ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
> > >           ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
> > >           ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
> > >           ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
> > >           ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
> > >           ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
> > >           ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
> > >           ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
> > >           ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
> > >           ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
> > >           ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
> > >           ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
> > >
> > >   perf     801 [000]    18.031814: DEFERRED CALLCHAIN
> > >                   7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)
> > >
> > >   ...
> > >
> > > When the callchain is merged (it's the default) it'd look like below:
> > >
> > >   $ perf script
> > >   perf     801 [000]    18.031793:          1 cycles:P:
> > >           ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
> > >           ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
> > >           ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
> > >           ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
> > >           ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
> > >           ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
> > >           ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
> > >           ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
> > >           ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
> > >           ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
> > >           ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
> > >           ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
> > >           ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
> > >                   7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)
> > >
> > >   ...
> > >
> > > Notice that the last line and it has the __GI___ioctl in the same
> > > callchain.  It should work with other tools like perf report.
> >
> > Hi Namhyung, I think this is interesting work!
> >
> > The issue feels similar to leader sampling and some of the unpicking
> > of that we've been dealing with. With leader sampling it was added and
> > then the dispatch of events modified so that tools wouldn't see leader
> > samples, instead new events would be synthesized based on the leader
> > sample data. However, the leader sample event wasn't changed and so
> > now we have multiple repeated events and perf inject wouldn't just
> > pass through a perf data file.
> >
> > What I'm expecting based on this description is that a deferred call
> > chain will be merged with a regular one, however, perf inject isn't
> > updated to drop the deferred callchain so now we have the deferred
> > callchain event twice.
> >
> > My feeling is that making the dispatch of events to tools "smart" is a
> > false economy. Tools can add handlers for these events easily enough.
> > What's harder is undoing the smartness when it does things that lead
> > to duplicated events and the like. I'm not a fan of how leader
> > sampling was implemented and I still think it odd that with perf
> > script we see invented events when trying to just dump the contents of
> > a perf.data file.
>
> That's why I added perf_tool.merge_deferred_callchains flag to control
> the behavior.  I haven't implemented it to perf inject because it covers
> a couple of different use cases.  I believe the default behavior is to
> not invoke the callback for deferred callchains during perf inject and
> each sample will get the full callchains.  But you can add a new
> callback and set perf_tool.merge_deferred_callchains to false.

I wonder if there is a different strategy for handling this. Normally
with a visitor pattern you fail when you call an unimplemented
visitor, this is then a signal the (in our case) tool needs to handle
the new case. This avoids naively doing things like making perf inject
duplicate events. The equivalent in the perf code would be to
initialize the callbacks in the tool constructor to be to stubs that
abort, then explicitly initialize and use things like callchain
merging as appropriate. The whole booleans next to the callbacks feels
like a kludge and likely to hide bugs. It is also marginally less
efficient.

Thanks,
Ian
Re: [RFC/PATCHSET 0/5] perf tools: Support deferred user callchains (v2)
Posted by Namhyung Kim 2 months ago
Hi Ian,

On Wed, Sep 18, 2024 at 03:39:31PM +0200, Ian Rogers wrote:
> On Wed, Sep 18, 2024 at 11:38 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi Ian,
> >
> > On Wed, Sep 18, 2024 at 08:38:22AM +0200, Ian Rogers wrote:
> > > On Wed, Sep 18, 2024 at 12:28 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > Hello,
> > > >
> > > > This is a counterpart for Josh's kernel change v2 [1] to support deferred
> > > > user callchains.  The change is transparent and users should not notice
> > > > anything with the deferred callchains.
> > > >
> > > >   $ perf record -g sleep 1
> > > >
> > > > I added --[no-]merge-callchains option to control output of perf script.
> > > > You can verify it has the deferred callchains like this:
> > > >
> > > >   $ perf script --no-merge-callchains
> > > >   perf     801 [000]    18.031793:          1 cycles:P:
> > > >           ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
> > > >           ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
> > > >           ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
> > > >           ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
> > > >           ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
> > > >           ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
> > > >           ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
> > > >           ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
> > > >           ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
> > > >           ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
> > > >           ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
> > > >           ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
> > > >           ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
> > > >
> > > >   perf     801 [000]    18.031814: DEFERRED CALLCHAIN
> > > >                   7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)
> > > >
> > > >   ...
> > > >
> > > > When the callchain is merged (it's the default) it'd look like below:
> > > >
> > > >   $ perf script
> > > >   perf     801 [000]    18.031793:          1 cycles:P:
> > > >           ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
> > > >           ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
> > > >           ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
> > > >           ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
> > > >           ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
> > > >           ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
> > > >           ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
> > > >           ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
> > > >           ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
> > > >           ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
> > > >           ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
> > > >           ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
> > > >           ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
> > > >                   7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)
> > > >
> > > >   ...
> > > >
> > > > Notice that the last line and it has the __GI___ioctl in the same
> > > > callchain.  It should work with other tools like perf report.
> > >
> > > Hi Namhyung, I think this is interesting work!
> > >
> > > The issue feels similar to leader sampling and some of the unpicking
> > > of that we've been dealing with. With leader sampling it was added and
> > > then the dispatch of events modified so that tools wouldn't see leader
> > > samples, instead new events would be synthesized based on the leader
> > > sample data. However, the leader sample event wasn't changed and so
> > > now we have multiple repeated events and perf inject wouldn't just
> > > pass through a perf data file.
> > >
> > > What I'm expecting based on this description is that a deferred call
> > > chain will be merged with a regular one, however, perf inject isn't
> > > updated to drop the deferred callchain so now we have the deferred
> > > callchain event twice.
> > >
> > > My feeling is that making the dispatch of events to tools "smart" is a
> > > false economy. Tools can add handlers for these events easily enough.
> > > What's harder is undoing the smartness when it does things that lead
> > > to duplicated events and the like. I'm not a fan of how leader
> > > sampling was implemented and I still think it odd that with perf
> > > script we see invented events when trying to just dump the contents of
> > > a perf.data file.
> >
> > That's why I added perf_tool.merge_deferred_callchains flag to control
> > the behavior.  I haven't implemented it to perf inject because it covers
> > a couple of different use cases.  I believe the default behavior is to
> > not invoke the callback for deferred callchains during perf inject and
> > each sample will get the full callchains.  But you can add a new
> > callback and set perf_tool.merge_deferred_callchains to false.
> 
> I wonder if there is a different strategy for handling this. Normally
> with a visitor pattern you fail when you call an unimplemented
> visitor, this is then a signal the (in our case) tool needs to handle
> the new case. This avoids naively doing things like making perf inject
> duplicate events. The equivalent in the perf code would be to
> initialize the callbacks in the tool constructor to be to stubs that
> abort, then explicitly initialize and use things like callchain
> merging as appropriate. The whole booleans next to the callbacks feels
> like a kludge and likely to hide bugs. It is also marginally less
> efficient.

Well.. we might change it that way later, but I just wanted to test the
deferred callchains quickly in this series.

Thanks,
Namhyung