[PATCH v3 0/7] perf: Add a libdw addr2line implementation

Ian Rogers posted 7 patches 3 weeks, 6 days ago
tools/perf/builtin-report.c                 |  10 ++
tools/perf/tests/builtin-test.c             |   1 +
tools/perf/tests/shell/addr2line_inlines.sh |  47 ++++++
tools/perf/tests/tests.h                    |   1 +
tools/perf/tests/workloads/Build            |   2 +
tools/perf/tests/workloads/inlineloop.c     |  52 +++++++
tools/perf/util/Build                       |   1 +
tools/perf/util/addr2line.c                 |  20 +--
tools/perf/util/config.c                    |   4 +
tools/perf/util/dso.c                       |   2 +
tools/perf/util/dso.h                       |  11 ++
tools/perf/util/evsel_fprintf.c             |   8 +-
tools/perf/util/libdw.c                     | 153 ++++++++++++++++++++
tools/perf/util/libdw.h                     |  60 ++++++++
tools/perf/util/srcline.c                   | 116 ++++++++++++++-
tools/perf/util/srcline.h                   |   3 +
tools/perf/util/symbol_conf.h               |  10 ++
tools/perf/util/unwind-libdw.c              |   7 +-
18 files changed, 486 insertions(+), 22 deletions(-)
create mode 100755 tools/perf/tests/shell/addr2line_inlines.sh
create mode 100644 tools/perf/tests/workloads/inlineloop.c
create mode 100644 tools/perf/util/libdw.c
create mode 100644 tools/perf/util/libdw.h
[PATCH v3 0/7] perf: Add a libdw addr2line implementation
Posted by Ian Rogers 3 weeks, 6 days ago
addr2line is a performance bottleneck in perf, add a libdw based
implementation that avoids forking addr2line and caches the decoded
debug information.

Allow the addr2line implementation to be picked via the configuration
file or --addr2line-style with `perf report`.

Test/fix that inline callchains are properly displayed by perf script.

An example:
```
$ perf record --call-graph dwarf -e cycles:u -- perf test -w inlineloop 1
[ perf record: Woken up 132 times to write data ]
[ perf record: Captured and wrote 32.814 MB perf.data (4074 samples) ]
$ perf script --fields +srcline
...
perf-inlineloop 1814670 293100.228871:     640004 cpu_core/cycles/u: 
            55a11d6e61ee leaf+0x2e
  inlineloop.c:21 (inlined)
            55a11d6e61ee middle+0x2e
  inlineloop.c:27 (inlined)
            55a11d6e61ee parent+0x2e (perf)
  inlineloop.c:32
            55a11d6e629b inlineloop+0x8b (perf)
  inlineloop.c:47
            55a11d69a3bc run_workload+0x5a (perf)
  builtin-test.c:715
            55a11d69aa9f cmd_test+0x417 (perf)
  builtin-test.c:825
            55a11d6155f5 run_builtin+0xd4 (perf)
  perf.c:349
            55a11d61588d handle_internal_command+0xdd (perf)
  perf.c:401
            55a11d6159e6 run_argv+0x35 (perf)
  perf.c:445
            55a11d615d2f main+0x2cb (perf)
  perf.c:553
            7fae3d233ca7 __libc_start_call_main+0x77 (libc.so.6)
  libc_start_call_main.h:58
            7fae3d233d64 __libc_start_main_impl+0x84
  libc-start.c:360 (inlined)
            55a11d565f80 _start+0x20 (perf)
  ??:0
...
```

v3: Make the caller inline file and line number accurate in the libdw
    addr2line, rather than using the function's declared location.
    Fix reference counts in unwind-libdw. Add fixes tag for srcline
    inline printing.

v2: Fix bias issue with libdwfl functions. Use cu_walk_functions_at
    from perf's dwarf-aux to fully walk inline functions. Add testing
    that inlined functions are shown in the perf script srcline
    callchain information. Add configurability as to which addr2line
    style to use.
    https://lore.kernel.org/lkml/20260110082647.1487574-1-irogers@google.com/

v1: https://lore.kernel.org/lkml/20251122093934.94971-1-irogers@google.com/

Ian Rogers (7):
  perf unwind-libdw: Fix invalid reference counts
  perf addr2line: Add a libdw implementation
  perf addr2line.c: Rename a2l_style to cmd_a2l_style
  perf srcline: Add configuration support for the addr2line style
  perf callchain: Fix srcline printing with inlines
  perf test workload: Add inlineloop test workload
  perf test: Test addr2line unwinding works with inline functions

 tools/perf/builtin-report.c                 |  10 ++
 tools/perf/tests/builtin-test.c             |   1 +
 tools/perf/tests/shell/addr2line_inlines.sh |  47 ++++++
 tools/perf/tests/tests.h                    |   1 +
 tools/perf/tests/workloads/Build            |   2 +
 tools/perf/tests/workloads/inlineloop.c     |  52 +++++++
 tools/perf/util/Build                       |   1 +
 tools/perf/util/addr2line.c                 |  20 +--
 tools/perf/util/config.c                    |   4 +
 tools/perf/util/dso.c                       |   2 +
 tools/perf/util/dso.h                       |  11 ++
 tools/perf/util/evsel_fprintf.c             |   8 +-
 tools/perf/util/libdw.c                     | 153 ++++++++++++++++++++
 tools/perf/util/libdw.h                     |  60 ++++++++
 tools/perf/util/srcline.c                   | 116 ++++++++++++++-
 tools/perf/util/srcline.h                   |   3 +
 tools/perf/util/symbol_conf.h               |  10 ++
 tools/perf/util/unwind-libdw.c              |   7 +-
 18 files changed, 486 insertions(+), 22 deletions(-)
 create mode 100755 tools/perf/tests/shell/addr2line_inlines.sh
 create mode 100644 tools/perf/tests/workloads/inlineloop.c
 create mode 100644 tools/perf/util/libdw.c
 create mode 100644 tools/perf/util/libdw.h

-- 
2.52.0.457.g6b5491de43-goog
Re: [PATCH v3 0/7] perf: Add a libdw addr2line implementation
Posted by James Clark 3 weeks, 5 days ago

On 11/01/2026 4:13 am, Ian Rogers wrote:
> addr2line is a performance bottleneck in perf, add a libdw based
> implementation that avoids forking addr2line and caches the decoded
> debug information.
> 
> Allow the addr2line implementation to be picked via the configuration
> file or --addr2line-style with `perf report`.
> 
> Test/fix that inline callchains are properly displayed by perf script.
> 
> An example:
> ```
> $ perf record --call-graph dwarf -e cycles:u -- perf test -w inlineloop 1
> [ perf record: Woken up 132 times to write data ]
> [ perf record: Captured and wrote 32.814 MB perf.data (4074 samples) ]
> $ perf script --fields +srcline
> ...
> perf-inlineloop 1814670 293100.228871:     640004 cpu_core/cycles/u:
>              55a11d6e61ee leaf+0x2e
>    inlineloop.c:21 (inlined)
>              55a11d6e61ee middle+0x2e
>    inlineloop.c:27 (inlined)
>              55a11d6e61ee parent+0x2e (perf)
>    inlineloop.c:32
>              55a11d6e629b inlineloop+0x8b (perf)
>    inlineloop.c:47
>              55a11d69a3bc run_workload+0x5a (perf)
>    builtin-test.c:715
>              55a11d69aa9f cmd_test+0x417 (perf)
>    builtin-test.c:825
>              55a11d6155f5 run_builtin+0xd4 (perf)
>    perf.c:349
>              55a11d61588d handle_internal_command+0xdd (perf)
>    perf.c:401
>              55a11d6159e6 run_argv+0x35 (perf)
>    perf.c:445
>              55a11d615d2f main+0x2cb (perf)
>    perf.c:553
>              7fae3d233ca7 __libc_start_call_main+0x77 (libc.so.6)
>    libc_start_call_main.h:58
>              7fae3d233d64 __libc_start_main_impl+0x84
>    libc-start.c:360 (inlined)
>              55a11d565f80 _start+0x20 (perf)
>    ??:0
> ...
> ```
> 
> v3: Make the caller inline file and line number accurate in the libdw
>      addr2line, rather than using the function's declared location.
>      Fix reference counts in unwind-libdw. Add fixes tag for srcline
>      inline printing.
> 
> v2: Fix bias issue with libdwfl functions. Use cu_walk_functions_at
>      from perf's dwarf-aux to fully walk inline functions. Add testing
>      that inlined functions are shown in the perf script srcline
>      callchain information. Add configurability as to which addr2line
>      style to use.
>      https://lore.kernel.org/lkml/20260110082647.1487574-1-irogers@google.com/
> 
> v1: https://lore.kernel.org/lkml/20251122093934.94971-1-irogers@google.com/
> 
> Ian Rogers (7):
>    perf unwind-libdw: Fix invalid reference counts
>    perf addr2line: Add a libdw implementation
>    perf addr2line.c: Rename a2l_style to cmd_a2l_style
>    perf srcline: Add configuration support for the addr2line style
>    perf callchain: Fix srcline printing with inlines
>    perf test workload: Add inlineloop test workload
>    perf test: Test addr2line unwinding works with inline functions
> 
>   tools/perf/builtin-report.c                 |  10 ++
>   tools/perf/tests/builtin-test.c             |   1 +
>   tools/perf/tests/shell/addr2line_inlines.sh |  47 ++++++
>   tools/perf/tests/tests.h                    |   1 +
>   tools/perf/tests/workloads/Build            |   2 +
>   tools/perf/tests/workloads/inlineloop.c     |  52 +++++++
>   tools/perf/util/Build                       |   1 +
>   tools/perf/util/addr2line.c                 |  20 +--
>   tools/perf/util/config.c                    |   4 +
>   tools/perf/util/dso.c                       |   2 +
>   tools/perf/util/dso.h                       |  11 ++
>   tools/perf/util/evsel_fprintf.c             |   8 +-
>   tools/perf/util/libdw.c                     | 153 ++++++++++++++++++++
>   tools/perf/util/libdw.h                     |  60 ++++++++
>   tools/perf/util/srcline.c                   | 116 ++++++++++++++-
>   tools/perf/util/srcline.h                   |   3 +
>   tools/perf/util/symbol_conf.h               |  10 ++
>   tools/perf/util/unwind-libdw.c              |   7 +-
>   18 files changed, 486 insertions(+), 22 deletions(-)
>   create mode 100755 tools/perf/tests/shell/addr2line_inlines.sh
>   create mode 100644 tools/perf/tests/workloads/inlineloop.c
>   create mode 100644 tools/perf/util/libdw.c
>   create mode 100644 tools/perf/util/libdw.h
> 

I don't see the differences to the other addr2line implementations 
anymore, but only because it falls through to the old ones when libdw 
fails now.

For example when building Perf with LLVM it can't get the line in the 
inlineloop workload, and there's still a few things in libc and other 
system libraries it fails on.

But I think it's fine because it doesn't give the wrong line anymore, it 
just falls through to another working addr2line implementation.


Reviewed-by: James Clark <james.clark@linaro.org>
Re: [PATCH v3 0/7] perf: Add a libdw addr2line implementation
Posted by Ian Rogers 3 weeks, 5 days ago
On Mon, Jan 12, 2026 at 3:18 AM James Clark <james.clark@linaro.org> wrote:
>
> On 11/01/2026 4:13 am, Ian Rogers wrote:
> > addr2line is a performance bottleneck in perf, add a libdw based
> > implementation that avoids forking addr2line and caches the decoded
> > debug information.
> >
> > Allow the addr2line implementation to be picked via the configuration
> > file or --addr2line-style with `perf report`.
> >
> > Test/fix that inline callchains are properly displayed by perf script.
> >
> > An example:
> > ```
> > $ perf record --call-graph dwarf -e cycles:u -- perf test -w inlineloop 1
> > [ perf record: Woken up 132 times to write data ]
> > [ perf record: Captured and wrote 32.814 MB perf.data (4074 samples) ]
> > $ perf script --fields +srcline
> > ...
> > perf-inlineloop 1814670 293100.228871:     640004 cpu_core/cycles/u:
> >              55a11d6e61ee leaf+0x2e
> >    inlineloop.c:21 (inlined)
> >              55a11d6e61ee middle+0x2e
> >    inlineloop.c:27 (inlined)
> >              55a11d6e61ee parent+0x2e (perf)
> >    inlineloop.c:32
> >              55a11d6e629b inlineloop+0x8b (perf)
> >    inlineloop.c:47
> >              55a11d69a3bc run_workload+0x5a (perf)
> >    builtin-test.c:715
> >              55a11d69aa9f cmd_test+0x417 (perf)
> >    builtin-test.c:825
> >              55a11d6155f5 run_builtin+0xd4 (perf)
> >    perf.c:349
> >              55a11d61588d handle_internal_command+0xdd (perf)
> >    perf.c:401
> >              55a11d6159e6 run_argv+0x35 (perf)
> >    perf.c:445
> >              55a11d615d2f main+0x2cb (perf)
> >    perf.c:553
> >              7fae3d233ca7 __libc_start_call_main+0x77 (libc.so.6)
> >    libc_start_call_main.h:58
> >              7fae3d233d64 __libc_start_main_impl+0x84
> >    libc-start.c:360 (inlined)
> >              55a11d565f80 _start+0x20 (perf)
> >    ??:0
> > ...
> > ```
> >
> > v3: Make the caller inline file and line number accurate in the libdw
> >      addr2line, rather than using the function's declared location.
> >      Fix reference counts in unwind-libdw. Add fixes tag for srcline
> >      inline printing.
> >
> > v2: Fix bias issue with libdwfl functions. Use cu_walk_functions_at
> >      from perf's dwarf-aux to fully walk inline functions. Add testing
> >      that inlined functions are shown in the perf script srcline
> >      callchain information. Add configurability as to which addr2line
> >      style to use.
> >      https://lore.kernel.org/lkml/20260110082647.1487574-1-irogers@google.com/
> >
> > v1: https://lore.kernel.org/lkml/20251122093934.94971-1-irogers@google.com/
> >
> > Ian Rogers (7):
> >    perf unwind-libdw: Fix invalid reference counts
> >    perf addr2line: Add a libdw implementation
> >    perf addr2line.c: Rename a2l_style to cmd_a2l_style
> >    perf srcline: Add configuration support for the addr2line style
> >    perf callchain: Fix srcline printing with inlines
> >    perf test workload: Add inlineloop test workload
> >    perf test: Test addr2line unwinding works with inline functions
> >
> >   tools/perf/builtin-report.c                 |  10 ++
> >   tools/perf/tests/builtin-test.c             |   1 +
> >   tools/perf/tests/shell/addr2line_inlines.sh |  47 ++++++
> >   tools/perf/tests/tests.h                    |   1 +
> >   tools/perf/tests/workloads/Build            |   2 +
> >   tools/perf/tests/workloads/inlineloop.c     |  52 +++++++
> >   tools/perf/util/Build                       |   1 +
> >   tools/perf/util/addr2line.c                 |  20 +--
> >   tools/perf/util/config.c                    |   4 +
> >   tools/perf/util/dso.c                       |   2 +
> >   tools/perf/util/dso.h                       |  11 ++
> >   tools/perf/util/evsel_fprintf.c             |   8 +-
> >   tools/perf/util/libdw.c                     | 153 ++++++++++++++++++++
> >   tools/perf/util/libdw.h                     |  60 ++++++++
> >   tools/perf/util/srcline.c                   | 116 ++++++++++++++-
> >   tools/perf/util/srcline.h                   |   3 +
> >   tools/perf/util/symbol_conf.h               |  10 ++
> >   tools/perf/util/unwind-libdw.c              |   7 +-
> >   18 files changed, 486 insertions(+), 22 deletions(-)
> >   create mode 100755 tools/perf/tests/shell/addr2line_inlines.sh
> >   create mode 100644 tools/perf/tests/workloads/inlineloop.c
> >   create mode 100644 tools/perf/util/libdw.c
> >   create mode 100644 tools/perf/util/libdw.h
> >
>
> I don't see the differences to the other addr2line implementations
> anymore, but only because it falls through to the old ones when libdw
> fails now.
>
> For example when building Perf with LLVM it can't get the line in the
> inlineloop workload, and there's still a few things in libc and other
> system libraries it fails on.

Hmm.. I wonder what the issue is. I was looking at the dwarf output
from my gcc builds with llvm-dwarfdump. I wonder if LLVM builds are
doing something to confuse libdw? I'll try to investigate. There are
quite a few levels of libdw: there's the raw libdw, libdwfl (frontend
to libdw) that does the parsing and tries to give things like nested
debug scopes (libdwfl is the one needing addresses with a module bias
rather than raw file offsets), and then there is the dwarf-aux.c that
is in perf and is used by things like probe finding (I believe this
doesn't need biases addresses). Anyway, with the biases there are
things I can screw up (like in the v1 patch) but maybe the LLVM issue
is just a libdw and dwarf-5 kind of thing. Maybe it is ARM specific
:-/

> But I think it's fine because it doesn't give the wrong line anymore, it
> just falls through to another working addr2line implementation.

Just to confirm that with gcc builds it isn't failing now? ie it isn't
just an addr2line implementation that falls through all the time? I
was seeing things working/testing on x86 with gcc.

> Reviewed-by: James Clark <james.clark@linaro.org>

Thanks,
Ian
Re: [PATCH v3 0/7] perf: Add a libdw addr2line implementation
Posted by Ian Rogers 3 weeks, 4 days ago
On Mon, Jan 12, 2026 at 6:49 AM Ian Rogers <irogers@google.com> wrote:
>
> On Mon, Jan 12, 2026 at 3:18 AM James Clark <james.clark@linaro.org> wrote:
> >
> > On 11/01/2026 4:13 am, Ian Rogers wrote:
> > > addr2line is a performance bottleneck in perf, add a libdw based
> > > implementation that avoids forking addr2line and caches the decoded
> > > debug information.
> > >
> > > Allow the addr2line implementation to be picked via the configuration
> > > file or --addr2line-style with `perf report`.
> > >
> > > Test/fix that inline callchains are properly displayed by perf script.
> > >
> > > An example:
> > > ```
> > > $ perf record --call-graph dwarf -e cycles:u -- perf test -w inlineloop 1
> > > [ perf record: Woken up 132 times to write data ]
> > > [ perf record: Captured and wrote 32.814 MB perf.data (4074 samples) ]
> > > $ perf script --fields +srcline
> > > ...
> > > perf-inlineloop 1814670 293100.228871:     640004 cpu_core/cycles/u:
> > >              55a11d6e61ee leaf+0x2e
> > >    inlineloop.c:21 (inlined)
> > >              55a11d6e61ee middle+0x2e
> > >    inlineloop.c:27 (inlined)
> > >              55a11d6e61ee parent+0x2e (perf)
> > >    inlineloop.c:32
> > >              55a11d6e629b inlineloop+0x8b (perf)
> > >    inlineloop.c:47
> > >              55a11d69a3bc run_workload+0x5a (perf)
> > >    builtin-test.c:715
> > >              55a11d69aa9f cmd_test+0x417 (perf)
> > >    builtin-test.c:825
> > >              55a11d6155f5 run_builtin+0xd4 (perf)
> > >    perf.c:349
> > >              55a11d61588d handle_internal_command+0xdd (perf)
> > >    perf.c:401
> > >              55a11d6159e6 run_argv+0x35 (perf)
> > >    perf.c:445
> > >              55a11d615d2f main+0x2cb (perf)
> > >    perf.c:553
> > >              7fae3d233ca7 __libc_start_call_main+0x77 (libc.so.6)
> > >    libc_start_call_main.h:58
> > >              7fae3d233d64 __libc_start_main_impl+0x84
> > >    libc-start.c:360 (inlined)
> > >              55a11d565f80 _start+0x20 (perf)
> > >    ??:0
> > > ...
> > > ```
> > >
> > > v3: Make the caller inline file and line number accurate in the libdw
> > >      addr2line, rather than using the function's declared location.
> > >      Fix reference counts in unwind-libdw. Add fixes tag for srcline
> > >      inline printing.
> > >
> > > v2: Fix bias issue with libdwfl functions. Use cu_walk_functions_at
> > >      from perf's dwarf-aux to fully walk inline functions. Add testing
> > >      that inlined functions are shown in the perf script srcline
> > >      callchain information. Add configurability as to which addr2line
> > >      style to use.
> > >      https://lore.kernel.org/lkml/20260110082647.1487574-1-irogers@google.com/
> > >
> > > v1: https://lore.kernel.org/lkml/20251122093934.94971-1-irogers@google.com/
> > >
> > > Ian Rogers (7):
> > >    perf unwind-libdw: Fix invalid reference counts
> > >    perf addr2line: Add a libdw implementation
> > >    perf addr2line.c: Rename a2l_style to cmd_a2l_style
> > >    perf srcline: Add configuration support for the addr2line style
> > >    perf callchain: Fix srcline printing with inlines
> > >    perf test workload: Add inlineloop test workload
> > >    perf test: Test addr2line unwinding works with inline functions
> > >
> > >   tools/perf/builtin-report.c                 |  10 ++
> > >   tools/perf/tests/builtin-test.c             |   1 +
> > >   tools/perf/tests/shell/addr2line_inlines.sh |  47 ++++++
> > >   tools/perf/tests/tests.h                    |   1 +
> > >   tools/perf/tests/workloads/Build            |   2 +
> > >   tools/perf/tests/workloads/inlineloop.c     |  52 +++++++
> > >   tools/perf/util/Build                       |   1 +
> > >   tools/perf/util/addr2line.c                 |  20 +--
> > >   tools/perf/util/config.c                    |   4 +
> > >   tools/perf/util/dso.c                       |   2 +
> > >   tools/perf/util/dso.h                       |  11 ++
> > >   tools/perf/util/evsel_fprintf.c             |   8 +-
> > >   tools/perf/util/libdw.c                     | 153 ++++++++++++++++++++
> > >   tools/perf/util/libdw.h                     |  60 ++++++++
> > >   tools/perf/util/srcline.c                   | 116 ++++++++++++++-
> > >   tools/perf/util/srcline.h                   |   3 +
> > >   tools/perf/util/symbol_conf.h               |  10 ++
> > >   tools/perf/util/unwind-libdw.c              |   7 +-
> > >   18 files changed, 486 insertions(+), 22 deletions(-)
> > >   create mode 100755 tools/perf/tests/shell/addr2line_inlines.sh
> > >   create mode 100644 tools/perf/tests/workloads/inlineloop.c
> > >   create mode 100644 tools/perf/util/libdw.c
> > >   create mode 100644 tools/perf/util/libdw.h
> > >
> >
> > I don't see the differences to the other addr2line implementations
> > anymore, but only because it falls through to the old ones when libdw
> > fails now.
> >
> > For example when building Perf with LLVM it can't get the line in the
> > inlineloop workload, and there's still a few things in libc and other
> > system libraries it fails on.
>
> Hmm.. I wonder what the issue is. I was looking at the dwarf output
> from my gcc builds with llvm-dwarfdump. I wonder if LLVM builds are
> doing something to confuse libdw? I'll try to investigate. There are
> quite a few levels of libdw: there's the raw libdw, libdwfl (frontend
> to libdw) that does the parsing and tries to give things like nested
> debug scopes (libdwfl is the one needing addresses with a module bias
> rather than raw file offsets), and then there is the dwarf-aux.c that
> is in perf and is used by things like probe finding (I believe this
> doesn't need biases addresses). Anyway, with the biases there are
> things I can screw up (like in the v1 patch) but maybe the LLVM issue
> is just a libdw and dwarf-5 kind of thing. Maybe it is ARM specific
> :-/

Testing with clang/llvm on x86-64 (dwarf5):
```
$ make -C tools/perf O=/tmp/perf DEBUG=1 CC=clang CXX=clang++
HOSTCC=clang clean all
...
$ llvm-dwarfdump /tmp/perf/perf
...
0x0014f852: Compile Unit: length = 0x00000294, format = DWARF32,
version = 0x0005, unit_type = DW_UT_compile,
abbr_offset = 0x1879a, addr_size = 0x08 (next unit at 0x0014faea)

0x0014f85e: DW_TAG_compile_unit
             DW_AT_producer    ("Debian clang version 19.1.7 (3+build5)")
             DW_AT_language    (DW_LANG_C11)
             DW_AT_name        ("tests/workloads/inlineloop.c")
             DW_AT_str_offsets_base    (0x0004a550)
             DW_AT_stmt_list   (0x0008c3f2)
             DW_AT_comp_dir    ("linux/tools/perf")
             DW_AT_low_pc      (0x00000000001e61c0)
             DW_AT_high_pc     (0x00000000001e62e9)
             DW_AT_addr_base   (0x00022248)
             DW_AT_loclists_base       (0x0000018a)
...
$  sudo /tmp/perf/perf record --call-graph dwarf -e cycles:u --
/tmp/perf/perf test -w inlineloop 1
...
$  sudo /tmp/perf/perf script --fields +srcline
...
perf-inlineloop 2284167 423038.015394:     569917 cpu_core/cycles/u:
           56390020d2c6 leaf+0x26
 inlineloop.c:21 (inlined)
           56390020d2c6 middle+0x26
 inlineloop.c:27 (inlined)
           56390020d2c6 parent+0x26 (/tmp/perf/perf)
...
```
I ran inside of gdb and confirmed that the libdw code is creating the
inlined information (breakpoint on libdw_a2l_cb, etc.). So I'm not
able to reproduce the LLVM issue for now on x86-64.

Thanks,
Ian

> > But I think it's fine because it doesn't give the wrong line anymore, it
> > just falls through to another working addr2line implementation.
>
> Just to confirm that with gcc builds it isn't failing now? ie it isn't
> just an addr2line implementation that falls through all the time? I
> was seeing things working/testing on x86 with gcc.
>
> > Reviewed-by: James Clark <james.clark@linaro.org>
>
> Thanks,
> Ian
Re: [PATCH v3 0/7] perf: Add a libdw addr2line implementation
Posted by James Clark 3 weeks, 4 days ago

On 12/01/2026 6:29 pm, Ian Rogers wrote:
> On Mon, Jan 12, 2026 at 6:49 AM Ian Rogers <irogers@google.com> wrote:
>>
>> On Mon, Jan 12, 2026 at 3:18 AM James Clark <james.clark@linaro.org> wrote:
>>>
>>> On 11/01/2026 4:13 am, Ian Rogers wrote:
>>>> addr2line is a performance bottleneck in perf, add a libdw based
>>>> implementation that avoids forking addr2line and caches the decoded
>>>> debug information.
>>>>
>>>> Allow the addr2line implementation to be picked via the configuration
>>>> file or --addr2line-style with `perf report`.
>>>>
>>>> Test/fix that inline callchains are properly displayed by perf script.
>>>>
>>>> An example:
>>>> ```
>>>> $ perf record --call-graph dwarf -e cycles:u -- perf test -w inlineloop 1
>>>> [ perf record: Woken up 132 times to write data ]
>>>> [ perf record: Captured and wrote 32.814 MB perf.data (4074 samples) ]
>>>> $ perf script --fields +srcline
>>>> ...
>>>> perf-inlineloop 1814670 293100.228871:     640004 cpu_core/cycles/u:
>>>>               55a11d6e61ee leaf+0x2e
>>>>     inlineloop.c:21 (inlined)
>>>>               55a11d6e61ee middle+0x2e
>>>>     inlineloop.c:27 (inlined)
>>>>               55a11d6e61ee parent+0x2e (perf)
>>>>     inlineloop.c:32
>>>>               55a11d6e629b inlineloop+0x8b (perf)
>>>>     inlineloop.c:47
>>>>               55a11d69a3bc run_workload+0x5a (perf)
>>>>     builtin-test.c:715
>>>>               55a11d69aa9f cmd_test+0x417 (perf)
>>>>     builtin-test.c:825
>>>>               55a11d6155f5 run_builtin+0xd4 (perf)
>>>>     perf.c:349
>>>>               55a11d61588d handle_internal_command+0xdd (perf)
>>>>     perf.c:401
>>>>               55a11d6159e6 run_argv+0x35 (perf)
>>>>     perf.c:445
>>>>               55a11d615d2f main+0x2cb (perf)
>>>>     perf.c:553
>>>>               7fae3d233ca7 __libc_start_call_main+0x77 (libc.so.6)
>>>>     libc_start_call_main.h:58
>>>>               7fae3d233d64 __libc_start_main_impl+0x84
>>>>     libc-start.c:360 (inlined)
>>>>               55a11d565f80 _start+0x20 (perf)
>>>>     ??:0
>>>> ...
>>>> ```
>>>>
>>>> v3: Make the caller inline file and line number accurate in the libdw
>>>>       addr2line, rather than using the function's declared location.
>>>>       Fix reference counts in unwind-libdw. Add fixes tag for srcline
>>>>       inline printing.
>>>>
>>>> v2: Fix bias issue with libdwfl functions. Use cu_walk_functions_at
>>>>       from perf's dwarf-aux to fully walk inline functions. Add testing
>>>>       that inlined functions are shown in the perf script srcline
>>>>       callchain information. Add configurability as to which addr2line
>>>>       style to use.
>>>>       https://lore.kernel.org/lkml/20260110082647.1487574-1-irogers@google.com/
>>>>
>>>> v1: https://lore.kernel.org/lkml/20251122093934.94971-1-irogers@google.com/
>>>>
>>>> Ian Rogers (7):
>>>>     perf unwind-libdw: Fix invalid reference counts
>>>>     perf addr2line: Add a libdw implementation
>>>>     perf addr2line.c: Rename a2l_style to cmd_a2l_style
>>>>     perf srcline: Add configuration support for the addr2line style
>>>>     perf callchain: Fix srcline printing with inlines
>>>>     perf test workload: Add inlineloop test workload
>>>>     perf test: Test addr2line unwinding works with inline functions
>>>>
>>>>    tools/perf/builtin-report.c                 |  10 ++
>>>>    tools/perf/tests/builtin-test.c             |   1 +
>>>>    tools/perf/tests/shell/addr2line_inlines.sh |  47 ++++++
>>>>    tools/perf/tests/tests.h                    |   1 +
>>>>    tools/perf/tests/workloads/Build            |   2 +
>>>>    tools/perf/tests/workloads/inlineloop.c     |  52 +++++++
>>>>    tools/perf/util/Build                       |   1 +
>>>>    tools/perf/util/addr2line.c                 |  20 +--
>>>>    tools/perf/util/config.c                    |   4 +
>>>>    tools/perf/util/dso.c                       |   2 +
>>>>    tools/perf/util/dso.h                       |  11 ++
>>>>    tools/perf/util/evsel_fprintf.c             |   8 +-
>>>>    tools/perf/util/libdw.c                     | 153 ++++++++++++++++++++
>>>>    tools/perf/util/libdw.h                     |  60 ++++++++
>>>>    tools/perf/util/srcline.c                   | 116 ++++++++++++++-
>>>>    tools/perf/util/srcline.h                   |   3 +
>>>>    tools/perf/util/symbol_conf.h               |  10 ++
>>>>    tools/perf/util/unwind-libdw.c              |   7 +-
>>>>    18 files changed, 486 insertions(+), 22 deletions(-)
>>>>    create mode 100755 tools/perf/tests/shell/addr2line_inlines.sh
>>>>    create mode 100644 tools/perf/tests/workloads/inlineloop.c
>>>>    create mode 100644 tools/perf/util/libdw.c
>>>>    create mode 100644 tools/perf/util/libdw.h
>>>>
>>>
>>> I don't see the differences to the other addr2line implementations
>>> anymore, but only because it falls through to the old ones when libdw
>>> fails now.
>>>
>>> For example when building Perf with LLVM it can't get the line in the
>>> inlineloop workload, and there's still a few things in libc and other
>>> system libraries it fails on.
>>
>> Hmm.. I wonder what the issue is. I was looking at the dwarf output
>> from my gcc builds with llvm-dwarfdump. I wonder if LLVM builds are

I see some issues in libc on Ubuntu though, which I assume is compiled 
with GCC, although there's no .comment section in it so I can't be sure. 
So it's not exclusively LLVM but it does seem like LLVM builds cause a 
lot more failures.

>> doing something to confuse libdw? I'll try to investigate. There are
>> quite a few levels of libdw: there's the raw libdw, libdwfl (frontend
>> to libdw) that does the parsing and tries to give things like nested
>> debug scopes (libdwfl is the one needing addresses with a module bias
>> rather than raw file offsets), and then there is the dwarf-aux.c that
>> is in perf and is used by things like probe finding (I believe this
>> doesn't need biases addresses). Anyway, with the biases there are
>> things I can screw up (like in the v1 patch) but maybe the LLVM issue
>> is just a libdw and dwarf-5 kind of thing. Maybe it is ARM specific
>> :-/

Actually I get the same behavior on Arm and x86.

> 
> Testing with clang/llvm on x86-64 (dwarf5):
> ```
> $ make -C tools/perf O=/tmp/perf DEBUG=1 CC=clang CXX=clang++
> HOSTCC=clang clean all
> ...
> $ llvm-dwarfdump /tmp/perf/perf
> ...
> 0x0014f852: Compile Unit: length = 0x00000294, format = DWARF32,
> version = 0x0005, unit_type = DW_UT_compile,
> abbr_offset = 0x1879a, addr_size = 0x08 (next unit at 0x0014faea)
> 
> 0x0014f85e: DW_TAG_compile_unit
>               DW_AT_producer    ("Debian clang version 19.1.7 (3+build5)")
>               DW_AT_language    (DW_LANG_C11)
>               DW_AT_name        ("tests/workloads/inlineloop.c")
>               DW_AT_str_offsets_base    (0x0004a550)
>               DW_AT_stmt_list   (0x0008c3f2)
>               DW_AT_comp_dir    ("linux/tools/perf")
>               DW_AT_low_pc      (0x00000000001e61c0)
>               DW_AT_high_pc     (0x00000000001e62e9)
>               DW_AT_addr_base   (0x00022248)
>               DW_AT_loclists_base       (0x0000018a)
> ...
> $  sudo /tmp/perf/perf record --call-graph dwarf -e cycles:u --
> /tmp/perf/perf test -w inlineloop 1
> ...
> $  sudo /tmp/perf/perf script --fields +srcline
> ...
> perf-inlineloop 2284167 423038.015394:     569917 cpu_core/cycles/u:
>             56390020d2c6 leaf+0x26
>   inlineloop.c:21 (inlined)
>             56390020d2c6 middle+0x26
>   inlineloop.c:27 (inlined)
>             56390020d2c6 parent+0x26 (/tmp/perf/perf)
> ...
> ```
> I ran inside of gdb and confirmed that the libdw code is creating the
> inlined information (breakpoint on libdw_a2l_cb, etc.). So I'm not
> able to reproduce the LLVM issue for now on x86-64.
> 
> Thanks,
> Ian
> 

If I set this in ~/.perfconfig so the fallback is disabled:

  [addr2line]
	style = libdw

Then:

  $ make LLVM=1 -C tools/perf DEBUG=1 clean all
  $ perf record --delay 1000 -- perf test -w inlineloop 2
  $ perf script --fields ip,srcline
      6012b5957b70
   perf[1f7b70]
      6012b5957b70
   perf[1f7b70]
   ...


x86:

  $ clang -v
  Ubuntu clang version 15.0.7

Arm:

  $ clang -v
  Ubuntu clang version 18.1.8 (11~20.04.2)

Disabling the ~/.perfconfig to re-enable the LLVM fallback works:

(x86)
$ perf script --fields ip,srcline
      6012b5957b70
   inlineloop.c:20
      6012b5957b70
   inlineloop.c:20

Interestingly, on Arm this results in zeros for line numbers. This is a 
completely different issue though which I didn't notice before because I 
built with GCC. It falls all the way back to A2L_STYLE_CMD:

(Arm)
$ perf script --fields ip,srcline
      aaaad0a7828c
   inlineloop.c:0
      aaaad0a7828c
   inlineloop.c:0

$ addr2line -e `which perf` -a -i -f aaaad0a7828c
0x0000aaaad0a7828c
??
??:0

Probably shouldn't get sidetracked by that here though. It's at least 
working when compiled with GCC, and neither LLVM or libdw work, so it's 
no worse.

>>> But I think it's fine because it doesn't give the wrong line anymore, it
>>> just falls through to another working addr2line implementation.
>>
>> Just to confirm that with gcc builds it isn't failing now? ie it isn't
>> just an addr2line implementation that falls through all the time? I
>> was seeing things working/testing on x86 with gcc.
>>

No, the GCC Perf build always works with libdw as far as I can see. Just 
the occasional fall through to LLVM with some libc addresses.

>>> Reviewed-by: James Clark <james.clark@linaro.org>
>>
>> Thanks,
>> Ian

Re: [PATCH v3 0/7] perf: Add a libdw addr2line implementation
Posted by Ian Rogers 3 weeks, 3 days ago
Thanks Arnaldo for landing the series in perf-tools-next!

James mentioned frame pointer unwinding lacking inlines. I had a look
at and I think this patch may suffice (although on some quick testing
I wasn't able to get inlines other than at the leaf):
```
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2090,6 +2090,8 @@ struct iterations {
       u64 cycles;
};

+static int append_inlines(struct callchain_cursor *cursor, struct
map_symbol *ms, u64 ip);
+
static int add_callchain_ip(struct thread *thread,
                           struct callchain_cursor *cursor,
                           struct symbol **parent,
@@ -2170,6 +2172,10 @@ static int add_callchain_ip(struct thread *thread,
       ms.maps = maps__get(al.maps);
       ms.map = map__get(al.map);
       ms.sym = al.sym;
+
+       if (append_inlines(cursor, &ms, ip) == 0)
+               goto out;
+
       srcline = callchain_srcline(&ms, al.addr);
       err = callchain_cursor_append(cursor, ip, &ms,
                                     branch, flags, nr_loop_iter,
```

Having inline information seems like a good thing regardless of the
stack trace format, so it'd be nice to move a patch like this forward.

Thanks,
Ian