[PATCH v5 0/7] dso/dsos memory savings and clean up

Ian Rogers posted 7 patches 1 year, 7 months ago
There is a newer version of this series
tools/perf/builtin-annotate.c                 |   6 +-
tools/perf/builtin-buildid-cache.c            |   2 +-
tools/perf/builtin-buildid-list.c             |  18 +-
tools/perf/builtin-inject.c                   |  71 ++-
tools/perf/builtin-kallsyms.c                 |   2 +-
tools/perf/builtin-mem.c                      |   4 +-
tools/perf/builtin-report.c                   |   6 +-
tools/perf/builtin-script.c                   |   8 +-
tools/perf/builtin-top.c                      |   4 +-
tools/perf/builtin-trace.c                    |   2 +-
tools/perf/tests/code-reading.c               |   8 +-
tools/perf/tests/dso-data.c                   |  67 ++-
tools/perf/tests/hists_common.c               |   6 +-
tools/perf/tests/hists_cumulate.c             |   4 +-
tools/perf/tests/hists_output.c               |   2 +-
tools/perf/tests/maps.c                       |   4 +-
tools/perf/tests/symbols.c                    |   8 +-
tools/perf/tests/vmlinux-kallsyms.c           |   6 +-
tools/perf/ui/browsers/annotate.c             |   6 +-
tools/perf/ui/browsers/hists.c                |   8 +-
tools/perf/ui/browsers/map.c                  |   4 +-
tools/perf/util/annotate-data.c               |  16 +-
tools/perf/util/annotate.c                    |  17 +-
tools/perf/util/auxtrace.c                    |   2 +-
tools/perf/util/block-info.c                  |   2 +-
tools/perf/util/bpf-event.c                   |   8 +-
tools/perf/util/build-id.c                    |  38 +-
tools/perf/util/callchain.c                   |   2 +-
tools/perf/util/data-convert-json.c           |   2 +-
tools/perf/util/db-export.c                   |   6 +-
tools/perf/util/disasm.c                      |  40 +-
tools/perf/util/dlfilter.c                    |  12 +-
tools/perf/util/dso.c                         | 429 ++++++++-------
tools/perf/util/dso.h                         | 500 ++++++++++++++++--
tools/perf/util/dsos.c                        | 286 +++++-----
tools/perf/util/dsos.h                        |  18 +-
tools/perf/util/event.c                       |   8 +-
tools/perf/util/header.c                      |   8 +-
tools/perf/util/hist.c                        |   4 +-
tools/perf/util/intel-pt.c                    |  22 +-
tools/perf/util/machine.c                     |  50 +-
tools/perf/util/map.c                         |  78 +--
tools/perf/util/maps.c                        |  14 +-
tools/perf/util/print_insn.c                  |   2 +-
tools/perf/util/probe-event.c                 |  25 +-
.../util/scripting-engines/trace-event-perl.c |   6 +-
.../scripting-engines/trace-event-python.c    |  21 +-
tools/perf/util/sort.c                        |  19 +-
tools/perf/util/srcline.c                     |  65 +--
tools/perf/util/symbol-elf.c                  | 145 +++--
tools/perf/util/symbol-minimal.c              |   4 +-
tools/perf/util/symbol.c                      | 186 +++----
tools/perf/util/symbol_fprintf.c              |   4 +-
tools/perf/util/synthetic-events.c            |  24 +-
tools/perf/util/thread.c                      |   4 +-
tools/perf/util/unwind-libunwind-local.c      |  18 +-
tools/perf/util/unwind-libunwind.c            |   2 +-
tools/perf/util/vdso.c                        |   8 +-
58 files changed, 1409 insertions(+), 932 deletions(-)
[PATCH v5 0/7] dso/dsos memory savings and clean up
Posted by Ian Rogers 1 year, 7 months ago
7 more patches from:
https://lore.kernel.org/lkml/20240202061532.1939474-1-irogers@google.com/
a near half year old adventure in trying to lower perf's dynamic
memory use. Bits like the memory overhead of opendir are on the
sidelines for now, too much fighting over how
distributions/C-libraries present getdents. These changes are more
good old fashioned replace an rb-tree with a sorted array and add
reference count tracking.

The changes migrate dsos code, the collection of dso structs, more
into the dsos.c/dsos.h files. As with maps and threads, this is done
so the internals can be changed - replacing a linked list (for fast
iteration) and an rb-tree (for fast finds) with a lazily sorted
array. The complexity of operations remain roughly the same, although
iterating an array is likely faster than iterating a linked list, the
memory usage is at least reduced by half.

As fixing the memory usage necessitates changing operations like find,
modify these operations so that they increment the reference count to
avoid races like a find in dsos and a remove. Similarly tighten up
lock usage so that operations working on dsos state hold the
appropriate lock. Note, since this series is partially applied in the
perf-tools-next tree currently some memory leaks have been introduced.

v5. Rebase, adding use of accessors to dso as necessary. Previous
    versions were all rebases or dropping merged patches.

Ian Rogers (7):
  perf dsos: Switch backing storage to array from rbtree/list
  perf dsos: Remove __dsos__addnew
  perf dsos: Remove __dsos__findnew_link_by_longname_id
  perf dsos: Switch hand code to bsearch
  perf dso: Add reference count checking and accessor functions
  perf dso: Reference counting related fixes
  perf dso: Use container_of to avoid a pointer in dso_data

 tools/perf/builtin-annotate.c                 |   6 +-
 tools/perf/builtin-buildid-cache.c            |   2 +-
 tools/perf/builtin-buildid-list.c             |  18 +-
 tools/perf/builtin-inject.c                   |  71 ++-
 tools/perf/builtin-kallsyms.c                 |   2 +-
 tools/perf/builtin-mem.c                      |   4 +-
 tools/perf/builtin-report.c                   |   6 +-
 tools/perf/builtin-script.c                   |   8 +-
 tools/perf/builtin-top.c                      |   4 +-
 tools/perf/builtin-trace.c                    |   2 +-
 tools/perf/tests/code-reading.c               |   8 +-
 tools/perf/tests/dso-data.c                   |  67 ++-
 tools/perf/tests/hists_common.c               |   6 +-
 tools/perf/tests/hists_cumulate.c             |   4 +-
 tools/perf/tests/hists_output.c               |   2 +-
 tools/perf/tests/maps.c                       |   4 +-
 tools/perf/tests/symbols.c                    |   8 +-
 tools/perf/tests/vmlinux-kallsyms.c           |   6 +-
 tools/perf/ui/browsers/annotate.c             |   6 +-
 tools/perf/ui/browsers/hists.c                |   8 +-
 tools/perf/ui/browsers/map.c                  |   4 +-
 tools/perf/util/annotate-data.c               |  16 +-
 tools/perf/util/annotate.c                    |  17 +-
 tools/perf/util/auxtrace.c                    |   2 +-
 tools/perf/util/block-info.c                  |   2 +-
 tools/perf/util/bpf-event.c                   |   8 +-
 tools/perf/util/build-id.c                    |  38 +-
 tools/perf/util/callchain.c                   |   2 +-
 tools/perf/util/data-convert-json.c           |   2 +-
 tools/perf/util/db-export.c                   |   6 +-
 tools/perf/util/disasm.c                      |  40 +-
 tools/perf/util/dlfilter.c                    |  12 +-
 tools/perf/util/dso.c                         | 429 ++++++++-------
 tools/perf/util/dso.h                         | 500 ++++++++++++++++--
 tools/perf/util/dsos.c                        | 286 +++++-----
 tools/perf/util/dsos.h                        |  18 +-
 tools/perf/util/event.c                       |   8 +-
 tools/perf/util/header.c                      |   8 +-
 tools/perf/util/hist.c                        |   4 +-
 tools/perf/util/intel-pt.c                    |  22 +-
 tools/perf/util/machine.c                     |  50 +-
 tools/perf/util/map.c                         |  78 +--
 tools/perf/util/maps.c                        |  14 +-
 tools/perf/util/print_insn.c                  |   2 +-
 tools/perf/util/probe-event.c                 |  25 +-
 .../util/scripting-engines/trace-event-perl.c |   6 +-
 .../scripting-engines/trace-event-python.c    |  21 +-
 tools/perf/util/sort.c                        |  19 +-
 tools/perf/util/srcline.c                     |  65 +--
 tools/perf/util/symbol-elf.c                  | 145 +++--
 tools/perf/util/symbol-minimal.c              |   4 +-
 tools/perf/util/symbol.c                      | 186 +++----
 tools/perf/util/symbol_fprintf.c              |   4 +-
 tools/perf/util/synthetic-events.c            |  24 +-
 tools/perf/util/thread.c                      |   4 +-
 tools/perf/util/unwind-libunwind-local.c      |  18 +-
 tools/perf/util/unwind-libunwind.c            |   2 +-
 tools/perf/util/vdso.c                        |   8 +-
 58 files changed, 1409 insertions(+), 932 deletions(-)

-- 
2.44.0.769.g3c40516874-goog
Re: [PATCH v5 0/7] dso/dsos memory savings and clean up
Posted by Arnaldo Carvalho de Melo 1 year, 7 months ago
On Mon, Apr 29, 2024 at 11:46:07AM -0700, Ian Rogers wrote:
> 7 more patches from:
> https://lore.kernel.org/lkml/20240202061532.1939474-1-irogers@google.com/
> a near half year old adventure in trying to lower perf's dynamic
> memory use. Bits like the memory overhead of opendir are on the
> sidelines for now, too much fighting over how
> distributions/C-libraries present getdents. These changes are more
> good old fashioned replace an rb-tree with a sorted array and add
> reference count tracking.
> 
> The changes migrate dsos code, the collection of dso structs, more
> into the dsos.c/dsos.h files. As with maps and threads, this is done
> so the internals can be changed - replacing a linked list (for fast
> iteration) and an rb-tree (for fast finds) with a lazily sorted
> array. The complexity of operations remain roughly the same, although
> iterating an array is likely faster than iterating a linked list, the
Th> memory usage is at least reduced by half.
> 
> As fixing the memory usage necessitates changing operations like find,
> modify these operations so that they increment the reference count to
> avoid races like a find in dsos and a remove. Similarly tighten up
> lock usage so that operations working on dsos state hold the
> appropriate lock. Note, since this series is partially applied in the
> perf-tools-next tree currently some memory leaks have been introduced.
> 
> v5. Rebase, adding use of accessors to dso as necessary. Previous
>     versions were all rebases or dropping merged patches.

So, on an Intel machine:

⬢[acme@toolbox perf-tools-next]$ git log --oneline -8
fb401385575211e6 (HEAD -> perf-tools-next) perf dso: Use container_of to avoid a pointer in 'struct dso_data'
0fe118d129ab1c77 perf dso: Reference counting related fixes
35e44adf6103a407 perf dso: Add reference count checking and accessor functions
35673675ebbbac5d perf dsos: Switch hand code to bsearch()
654d60f2f5c737cd perf dsos: Remove __dsos__findnew_link_by_longname_id()
94b0ba802e090b66 perf dsos: Remove __dsos__addnew()
47692286dd856469 perf dsos: Switch backing storage to array from rbtree/list
8c618b58c89ce4c2 (perf-tools-next.korg/perf-tools-next, acme.korg/perf-tools-next) perf test: Reintroduce -p/--parallel and make -S/--sequential the default
⬢[acme@toolbox perf-tools-next]$

When I'm at:

8c618b58c89ce4c2 (perf-tools-next.korg/perf-tools-next, acme.korg/perf-tools-next) perf test: Reintroduce -p/--parallel and make -S/--sequential the default

root@x1:~# perf -v
perf version 6.9.rc5.g8c618b58c89c
root@x1:~# perf test "lock contention"
 87: kernel lock contention analysis test                            : Ok
root@x1:~# perf test "lock contention"
 87: kernel lock contention analysis test                            : Ok
root@x1:~# perf test "lock contention"
 87: kernel lock contention analysis test                            : Ok
root@x1:~# time perf test "lock contention"
 87: kernel lock contention analysis test                            : Ok

real	0m9.143s
user	0m5.201s
sys	0m4.812s
root@x1:~#

Moving to the first patch in this series:

⬢[acme@toolbox perf-tools-next]$ git log --oneline -2
47692286dd856469 (HEAD) perf dsos: Switch backing storage to array from rbtree/list
8c618b58c89ce4c2 (perf-tools-next.korg/perf-tools-next, acme.korg/perf-tools-next) perf test: Reintroduce -p/--parallel and make -S/--sequential the default
⬢[acme@toolbox perf-tools-next]$ alias m
alias m='rm -rf ~/libexec/perf-core/ ; make -k CORESIGHT=1 O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin && perf test pythond'
⬢[acme@toolbox perf-tools-next]$ rm -rf /tmp/build/$(basename $PWD)/ ; mkdir -p /tmp/build/$(basename $PWD)/ ; m
<SNIP>

root@x1:~# perf -v
perf version 6.9.rc5.g47692286dd85
root@x1:~# perf test "lock contention"
 87: kernel lock contention analysis test                            : FAILED!
root@x1:~# perf test "lock contention"
 87: kernel lock contention analysis test                            : FAILED!
root@x1:~# perf test "lock contention"
 87: kernel lock contention analysis test                            : FAILED!
root@x1:~# perf test "lock contention"
 87: kernel lock contention analysis test                            : FAILED!
root@x1:~# perf test -vvv "lock contention"
 87: kernel lock contention analysis test:
--- start ---
test child forked, pid 2279518
Testing perf lock record and perf lock contention
[Fail] Recorded result count is not 1: 2
---- end(-1) ----
 87: kernel lock contention analysis test                            : FAILED!
root@x1:~#

This breaks bisectability, but then lets see if at the end of the series
it works:

⬢[acme@toolbox perf-tools-next]$ git log --oneline -8
fb401385575211e6 (HEAD -> perf-tools-next) perf dso: Use container_of to avoid a pointer in 'struct dso_data'
0fe118d129ab1c77 perf dso: Reference counting related fixes
35e44adf6103a407 perf dso: Add reference count checking and accessor functions
35673675ebbbac5d perf dsos: Switch hand code to bsearch()
654d60f2f5c737cd perf dsos: Remove __dsos__findnew_link_by_longname_id()
94b0ba802e090b66 perf dsos: Remove __dsos__addnew()
47692286dd856469 perf dsos: Switch backing storage to array from rbtree/list
8c618b58c89ce4c2 (perf-tools-next.korg/perf-tools-next, acme.korg/perf-tools-next) perf test: Reintroduce -p/--parallel and make -S/--sequential the default
⬢[acme@toolbox perf-tools-next]$ rm -rf /tmp/build/$(basename $PWD)/ ; mkdir -p /tmp/build/$(basename $PWD)/ ; m

root@x1:~# perf -v
perf version 6.9.rc5.gfb4013855752
root@x1:~# perf test "lock contention"
 87: kernel lock contention analysis test                            : FAILED!
root@x1:~# perf test "lock contention"
 87: kernel lock contention analysis test                            : FAILED!
root@x1:~# perf test "lock contention"
 87: kernel lock contention analysis test                            : FAILED!
root@x1:~# perf test "lock contention"
 87: kernel lock contention analysis test                            : FAILED!
root@x1:~# perf test -vvv "lock contention"
 87: kernel lock contention analysis test:
--- start ---
test child forked, pid 2289060
Testing perf lock record and perf lock contention
Testing perf lock contention --use-bpf
Testing perf lock record and perf lock contention at the same time
[Fail] Recorded result count is not 1: 2
---- end(-1) ----
 87: kernel lock contention analysis test                            : FAILED!
root@x1:~#

⬢[acme@toolbox perf-tools-next]$ gcc --version | head -1
gcc (GCC) 13.2.1 20240316 (Red Hat 13.2.1-7)
⬢[acme@toolbox perf-tools-next]$ rpm -q gcc
gcc-13.2.1-7.fc39.x86_64
⬢[acme@toolbox perf-tools-next]$

root@x1:~# grep -m1 'model name' /proc/cpuinfo 
model name	: 13th Gen Intel(R) Core(TM) i7-1365U
root@x1:~# free
               total        used        free      shared  buff/cache   available
Mem:        32507912     9081644     3531432     1987616    22554868    23426268
Swap:        8388604      314112     8074492
root@x1:~#

- Arnaldo
Re: [PATCH v5 0/7] dso/dsos memory savings and clean up
Posted by Ian Rogers 1 year, 7 months ago
On Mon, Apr 29, 2024 at 12:17 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Mon, Apr 29, 2024 at 11:46:07AM -0700, Ian Rogers wrote:
> > 7 more patches from:
> > https://lore.kernel.org/lkml/20240202061532.1939474-1-irogers@google.com/
> > a near half year old adventure in trying to lower perf's dynamic
> > memory use. Bits like the memory overhead of opendir are on the
> > sidelines for now, too much fighting over how
> > distributions/C-libraries present getdents. These changes are more
> > good old fashioned replace an rb-tree with a sorted array and add
> > reference count tracking.
> >
> > The changes migrate dsos code, the collection of dso structs, more
> > into the dsos.c/dsos.h files. As with maps and threads, this is done
> > so the internals can be changed - replacing a linked list (for fast
> > iteration) and an rb-tree (for fast finds) with a lazily sorted
> > array. The complexity of operations remain roughly the same, although
> > iterating an array is likely faster than iterating a linked list, the
> Th> memory usage is at least reduced by half.
> >
> > As fixing the memory usage necessitates changing operations like find,
> > modify these operations so that they increment the reference count to
> > avoid races like a find in dsos and a remove. Similarly tighten up
> > lock usage so that operations working on dsos state hold the
> > appropriate lock. Note, since this series is partially applied in the
> > perf-tools-next tree currently some memory leaks have been introduced.
> >
> > v5. Rebase, adding use of accessors to dso as necessary. Previous
> >     versions were all rebases or dropping merged patches.
>
> So, on an Intel machine:
>
> ⬢[acme@toolbox perf-tools-next]$ git log --oneline -8
> fb401385575211e6 (HEAD -> perf-tools-next) perf dso: Use container_of to avoid a pointer in 'struct dso_data'
> 0fe118d129ab1c77 perf dso: Reference counting related fixes
> 35e44adf6103a407 perf dso: Add reference count checking and accessor functions
> 35673675ebbbac5d perf dsos: Switch hand code to bsearch()
> 654d60f2f5c737cd perf dsos: Remove __dsos__findnew_link_by_longname_id()
> 94b0ba802e090b66 perf dsos: Remove __dsos__addnew()
> 47692286dd856469 perf dsos: Switch backing storage to array from rbtree/list
> 8c618b58c89ce4c2 (perf-tools-next.korg/perf-tools-next, acme.korg/perf-tools-next) perf test: Reintroduce -p/--parallel and make -S/--sequential the default
> ⬢[acme@toolbox perf-tools-next]$
>
> When I'm at:
>
> 8c618b58c89ce4c2 (perf-tools-next.korg/perf-tools-next, acme.korg/perf-tools-next) perf test: Reintroduce -p/--parallel and make -S/--sequential the default
>
> root@x1:~# perf -v
> perf version 6.9.rc5.g8c618b58c89c
> root@x1:~# perf test "lock contention"
>  87: kernel lock contention analysis test                            : Ok
> root@x1:~# perf test "lock contention"
>  87: kernel lock contention analysis test                            : Ok
> root@x1:~# perf test "lock contention"
>  87: kernel lock contention analysis test                            : Ok
> root@x1:~# time perf test "lock contention"
>  87: kernel lock contention analysis test                            : Ok
>
> real    0m9.143s
> user    0m5.201s
> sys     0m4.812s
> root@x1:~#
>
> Moving to the first patch in this series:
>
> ⬢[acme@toolbox perf-tools-next]$ git log --oneline -2
> 47692286dd856469 (HEAD) perf dsos: Switch backing storage to array from rbtree/list
> 8c618b58c89ce4c2 (perf-tools-next.korg/perf-tools-next, acme.korg/perf-tools-next) perf test: Reintroduce -p/--parallel and make -S/--sequential the default
> ⬢[acme@toolbox perf-tools-next]$ alias m
> alias m='rm -rf ~/libexec/perf-core/ ; make -k CORESIGHT=1 O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin && perf test pythond'
> ⬢[acme@toolbox perf-tools-next]$ rm -rf /tmp/build/$(basename $PWD)/ ; mkdir -p /tmp/build/$(basename $PWD)/ ; m
> <SNIP>
>
> root@x1:~# perf -v
> perf version 6.9.rc5.g47692286dd85
> root@x1:~# perf test "lock contention"
>  87: kernel lock contention analysis test                            : FAILED!
> root@x1:~# perf test "lock contention"
>  87: kernel lock contention analysis test                            : FAILED!
> root@x1:~# perf test "lock contention"
>  87: kernel lock contention analysis test                            : FAILED!
> root@x1:~# perf test "lock contention"
>  87: kernel lock contention analysis test                            : FAILED!
> root@x1:~# perf test -vvv "lock contention"
>  87: kernel lock contention analysis test:
> --- start ---
> test child forked, pid 2279518
> Testing perf lock record and perf lock contention
> [Fail] Recorded result count is not 1: 2
> ---- end(-1) ----
>  87: kernel lock contention analysis test                            : FAILED!
> root@x1:~#
>
> This breaks bisectability, but then lets see if at the end of the series
> it works:
>
> ⬢[acme@toolbox perf-tools-next]$ git log --oneline -8
> fb401385575211e6 (HEAD -> perf-tools-next) perf dso: Use container_of to avoid a pointer in 'struct dso_data'
> 0fe118d129ab1c77 perf dso: Reference counting related fixes
> 35e44adf6103a407 perf dso: Add reference count checking and accessor functions
> 35673675ebbbac5d perf dsos: Switch hand code to bsearch()
> 654d60f2f5c737cd perf dsos: Remove __dsos__findnew_link_by_longname_id()
> 94b0ba802e090b66 perf dsos: Remove __dsos__addnew()
> 47692286dd856469 perf dsos: Switch backing storage to array from rbtree/list
> 8c618b58c89ce4c2 (perf-tools-next.korg/perf-tools-next, acme.korg/perf-tools-next) perf test: Reintroduce -p/--parallel and make -S/--sequential the default
> ⬢[acme@toolbox perf-tools-next]$ rm -rf /tmp/build/$(basename $PWD)/ ; mkdir -p /tmp/build/$(basename $PWD)/ ; m
>
> root@x1:~# perf -v
> perf version 6.9.rc5.gfb4013855752
> root@x1:~# perf test "lock contention"
>  87: kernel lock contention analysis test                            : FAILED!
> root@x1:~# perf test "lock contention"
>  87: kernel lock contention analysis test                            : FAILED!
> root@x1:~# perf test "lock contention"
>  87: kernel lock contention analysis test                            : FAILED!
> root@x1:~# perf test "lock contention"
>  87: kernel lock contention analysis test                            : FAILED!
> root@x1:~# perf test -vvv "lock contention"
>  87: kernel lock contention analysis test:
> --- start ---
> test child forked, pid 2289060
> Testing perf lock record and perf lock contention
> Testing perf lock contention --use-bpf
> Testing perf lock record and perf lock contention at the same time
> [Fail] Recorded result count is not 1: 2
> ---- end(-1) ----
>  87: kernel lock contention analysis test                            : FAILED!
> root@x1:~#
>
> ⬢[acme@toolbox perf-tools-next]$ gcc --version | head -1
> gcc (GCC) 13.2.1 20240316 (Red Hat 13.2.1-7)
> ⬢[acme@toolbox perf-tools-next]$ rpm -q gcc
> gcc-13.2.1-7.fc39.x86_64
> ⬢[acme@toolbox perf-tools-next]$
>
> root@x1:~# grep -m1 'model name' /proc/cpuinfo
> model name      : 13th Gen Intel(R) Core(TM) i7-1365U
> root@x1:~# free
>                total        used        free      shared  buff/cache   available
> Mem:        32507912     9081644     3531432     1987616    22554868    23426268
> Swap:        8388604      314112     8074492
> root@x1:~#

Thanks, will check. Will likely need a v6 to fix. v5 was just
addressing the rebase issues.

Ian

> - Arnaldo