[PATCH v6 00/47] maps/threads/dsos memory improvements and fixes

Ian Rogers posted 47 patches 2 years ago
There is a newer version of this series
tools/perf/arch/x86/tests/dwarf-unwind.c      |    1 +
tools/perf/arch/x86/util/event.c              |  103 +-
tools/perf/builtin-annotate.c                 |    6 +-
tools/perf/builtin-buildid-cache.c            |    2 +-
tools/perf/builtin-buildid-list.c             |   18 +-
tools/perf/builtin-inject.c                   |   96 +-
tools/perf/builtin-kallsyms.c                 |    2 +-
tools/perf/builtin-mem.c                      |    4 +-
tools/perf/builtin-record.c                   |    2 +-
tools/perf/builtin-report.c                   |  243 +--
tools/perf/builtin-script.c                   |    8 +-
tools/perf/builtin-top.c                      |    4 +-
tools/perf/builtin-trace.c                    |   41 +-
tools/perf/tests/code-reading.c               |    8 +-
tools/perf/tests/dso-data.c                   |   67 +-
tools/perf/tests/hists_common.c               |    6 +-
tools/perf/tests/hists_cumulate.c             |    4 +-
tools/perf/tests/hists_output.c               |    2 +-
tools/perf/tests/maps.c                       |   64 +-
tools/perf/tests/symbols.c                    |    2 +-
tools/perf/tests/thread-maps-share.c          |    8 +-
tools/perf/tests/vmlinux-kallsyms.c           |  181 ++-
tools/perf/ui/browsers/annotate.c             |    6 +-
tools/perf/ui/browsers/hists.c                |    8 +-
tools/perf/ui/browsers/map.c                  |    4 +-
tools/perf/util/Build                         |    1 +
tools/perf/util/annotate.c                    |   44 +-
tools/perf/util/auxtrace.c                    |    2 +-
tools/perf/util/block-info.c                  |    2 +-
tools/perf/util/bpf-event.c                   |    9 +-
tools/perf/util/bpf_lock_contention.c         |   10 +-
tools/perf/util/build-id.c                    |  136 +-
tools/perf/util/build-id.h                    |    2 -
tools/perf/util/callchain.c                   |    4 +-
tools/perf/util/data-convert-json.c           |    2 +-
tools/perf/util/db-export.c                   |    6 +-
tools/perf/util/debug.c                       |   22 +-
tools/perf/util/debug.h                       |    1 +
tools/perf/util/dlfilter.c                    |   12 +-
tools/perf/util/dso.c                         |  468 +++---
tools/perf/util/dso.h                         |  544 ++++++-
tools/perf/util/dsos.c                        |  529 ++++---
tools/perf/util/dsos.h                        |   40 +-
tools/perf/util/event.c                       |   12 +-
tools/perf/util/header.c                      |    8 +-
tools/perf/util/hist.c                        |    4 +-
tools/perf/util/intel-pt.c                    |   22 +-
tools/perf/util/machine.c                     |  630 +++-----
tools/perf/util/machine.h                     |   32 +-
tools/perf/util/map.c                         |   73 +-
tools/perf/util/map.h                         |   16 +-
tools/perf/util/maps.c                        | 1398 +++++++++++------
tools/perf/util/maps.h                        |  105 +-
tools/perf/util/probe-event.c                 |   62 +-
tools/perf/util/rb_resort.h                   |    5 -
.../scripting-engines/trace-event-python.c    |   21 +-
tools/perf/util/session.c                     |   21 +
tools/perf/util/session.h                     |    2 +
tools/perf/util/sort.c                        |   19 +-
tools/perf/util/srcline.c                     |   65 +-
tools/perf/util/symbol-elf.c                  |  132 +-
tools/perf/util/symbol.c                      |  275 ++--
tools/perf/util/symbol_fprintf.c              |    4 +-
tools/perf/util/synthetic-events.c            |  134 +-
tools/perf/util/thread.c                      |   48 +-
tools/perf/util/thread.h                      |    6 -
tools/perf/util/threads.c                     |  186 +++
tools/perf/util/threads.h                     |   35 +
tools/perf/util/unwind-libunwind-local.c      |   50 +-
tools/perf/util/unwind-libunwind.c            |    9 +-
tools/perf/util/vdso.c                        |   89 +-
71 files changed, 3691 insertions(+), 2496 deletions(-)
create mode 100644 tools/perf/util/threads.c
create mode 100644 tools/perf/util/threads.h
[PATCH v6 00/47] maps/threads/dsos memory improvements and fixes
Posted by Ian Rogers 2 years ago
Modify the implementation of maps to not use an rbtree as the
container for maps, instead use a sorted array. Improve locking and
reference counting issues.

Similar to maps separate out and reimplement threads to use a hashmap
for lower memory consumption and faster look up. The fixes a
regression in memory usage where reference count checking switched to
using non-invasive tree nodes.  Reduce its default size by 32 times
and improve locking discipline. Also, fix regressions where tids had
become unordered to make `perf report --tasks` and
`perf trace --summary` output easier to read.

Better encapsulate the dsos abstraction. Remove the linked list and
rbtree used for faster iteration and log(n) lookup to a sorted array
for similar performance but half the memory usage per dso. Improve
reference counting and locking discipline, adding reference count
checking to dso.

v6:
 - Patch 1 is a parameter name fix requested by Namhyung.
 - Patches 2 to 13 split apart a macro to function callback refactor
   requested by Arnaldo.
 - Add fixes and acked-by to later patches from Namhyung.

v5 series is here:
https://lore.kernel.org/lkml/20231127220902.1315692-1-irogers@google.com/

Ian Rogers (47):
  perf map: Improve map/unmap parameter names
  perf maps: Add maps__for_each_map to iterate maps holding the lock
  perf events x86: Use function to add missing lock
  perf report: Use function to add missing maps lock
  perf tests: Use function to add missing maps lock
  perf machine: Use function to add missing maps lock
  perf probe-event: Use function to add missing maps lock
  perf symbol: Use function to add missing maps lock
  perf synthetic-events: Use function to add missing maps lock
  perf thread: Use function to add missing maps lock
  perf unwind: Use function to add missing maps lock
  perf vdso: Use function to add missing maps lock
  perf maps: Reduce scope of maps__for_each_entry
  perf maps: Add remove maps function to remove a map based on callback
  perf debug: Expose debug file
  perf maps: Refactor maps__fixup_overlappings
  perf maps: Do simple merge if given map doesn't overlap
  perf maps: Rename clone to copy from
  perf maps: Add maps__load_first
  perf maps: Add find next entry to give entry after the given map
  perf maps: Reduce scope of map_rb_node and maps internals
  perf maps: Fix up overlaps during fixup_end
  perf maps: Switch from rbtree to lazily sorted array for addresses
  perf maps: Get map before returning in maps__find
  perf maps: Get map before returning in maps__find_by_name
  perf maps: Get map before returning in maps__find_next_entry
  perf maps: Hide maps internals
  perf maps: Locking tidy up of nr_maps
  perf dso: Reorder variables to save space in struct dso
  perf report: Sort child tasks by tid
  perf trace: Ignore thread hashing in summary
  perf machine: Move fprintf to for_each loop and a callback
  perf threads: Move threads to its own files
  perf threads: Switch from rbtree to hashmap
  perf threads: Reduce table size from 256 to 8
  perf dsos: Attempt to better abstract dsos internals
  perf dsos: Tidy reference counting and locking
  perf dsos: Add dsos__for_each_dso
  perf dso: Move dso functions out of dsos
  perf dsos: Switch more loops to dsos__for_each_dso
  perf dsos: Switch backing storage to array from rbtree/list
  perf dsos: Remove __dsos__addnew
  perf dsos: Remove __dsos__findnew_link_by_longname_id
  perf dsos: Switch hand code to bsearch
  perf dso: Add reference count checking and accessor functions
  perf dso: Reference counting related fixes
  perf dso: Use container_of to avoid a pointer in dso_data

 tools/perf/arch/x86/tests/dwarf-unwind.c      |    1 +
 tools/perf/arch/x86/util/event.c              |  103 +-
 tools/perf/builtin-annotate.c                 |    6 +-
 tools/perf/builtin-buildid-cache.c            |    2 +-
 tools/perf/builtin-buildid-list.c             |   18 +-
 tools/perf/builtin-inject.c                   |   96 +-
 tools/perf/builtin-kallsyms.c                 |    2 +-
 tools/perf/builtin-mem.c                      |    4 +-
 tools/perf/builtin-record.c                   |    2 +-
 tools/perf/builtin-report.c                   |  243 +--
 tools/perf/builtin-script.c                   |    8 +-
 tools/perf/builtin-top.c                      |    4 +-
 tools/perf/builtin-trace.c                    |   41 +-
 tools/perf/tests/code-reading.c               |    8 +-
 tools/perf/tests/dso-data.c                   |   67 +-
 tools/perf/tests/hists_common.c               |    6 +-
 tools/perf/tests/hists_cumulate.c             |    4 +-
 tools/perf/tests/hists_output.c               |    2 +-
 tools/perf/tests/maps.c                       |   64 +-
 tools/perf/tests/symbols.c                    |    2 +-
 tools/perf/tests/thread-maps-share.c          |    8 +-
 tools/perf/tests/vmlinux-kallsyms.c           |  181 ++-
 tools/perf/ui/browsers/annotate.c             |    6 +-
 tools/perf/ui/browsers/hists.c                |    8 +-
 tools/perf/ui/browsers/map.c                  |    4 +-
 tools/perf/util/Build                         |    1 +
 tools/perf/util/annotate.c                    |   44 +-
 tools/perf/util/auxtrace.c                    |    2 +-
 tools/perf/util/block-info.c                  |    2 +-
 tools/perf/util/bpf-event.c                   |    9 +-
 tools/perf/util/bpf_lock_contention.c         |   10 +-
 tools/perf/util/build-id.c                    |  136 +-
 tools/perf/util/build-id.h                    |    2 -
 tools/perf/util/callchain.c                   |    4 +-
 tools/perf/util/data-convert-json.c           |    2 +-
 tools/perf/util/db-export.c                   |    6 +-
 tools/perf/util/debug.c                       |   22 +-
 tools/perf/util/debug.h                       |    1 +
 tools/perf/util/dlfilter.c                    |   12 +-
 tools/perf/util/dso.c                         |  468 +++---
 tools/perf/util/dso.h                         |  544 ++++++-
 tools/perf/util/dsos.c                        |  529 ++++---
 tools/perf/util/dsos.h                        |   40 +-
 tools/perf/util/event.c                       |   12 +-
 tools/perf/util/header.c                      |    8 +-
 tools/perf/util/hist.c                        |    4 +-
 tools/perf/util/intel-pt.c                    |   22 +-
 tools/perf/util/machine.c                     |  630 +++-----
 tools/perf/util/machine.h                     |   32 +-
 tools/perf/util/map.c                         |   73 +-
 tools/perf/util/map.h                         |   16 +-
 tools/perf/util/maps.c                        | 1398 +++++++++++------
 tools/perf/util/maps.h                        |  105 +-
 tools/perf/util/probe-event.c                 |   62 +-
 tools/perf/util/rb_resort.h                   |    5 -
 .../scripting-engines/trace-event-python.c    |   21 +-
 tools/perf/util/session.c                     |   21 +
 tools/perf/util/session.h                     |    2 +
 tools/perf/util/sort.c                        |   19 +-
 tools/perf/util/srcline.c                     |   65 +-
 tools/perf/util/symbol-elf.c                  |  132 +-
 tools/perf/util/symbol.c                      |  275 ++--
 tools/perf/util/symbol_fprintf.c              |    4 +-
 tools/perf/util/synthetic-events.c            |  134 +-
 tools/perf/util/thread.c                      |   48 +-
 tools/perf/util/thread.h                      |    6 -
 tools/perf/util/threads.c                     |  186 +++
 tools/perf/util/threads.h                     |   35 +
 tools/perf/util/unwind-libunwind-local.c      |   50 +-
 tools/perf/util/unwind-libunwind.c            |    9 +-
 tools/perf/util/vdso.c                        |   89 +-
 71 files changed, 3691 insertions(+), 2496 deletions(-)
 create mode 100644 tools/perf/util/threads.c
 create mode 100644 tools/perf/util/threads.h

-- 
2.43.0.rc2.451.g8631bc7472-goog
Re: [PATCH v6 00/47] maps/threads/dsos memory improvements and fixes
Posted by Arnaldo Carvalho de Melo 2 years ago
Em Wed, Dec 06, 2023 at 05:16:34PM -0800, Ian Rogers escreveu:
> Modify the implementation of maps to not use an rbtree as the
> container for maps, instead use a sorted array. Improve locking and
> reference counting issues.
> 
> Similar to maps separate out and reimplement threads to use a hashmap
> for lower memory consumption and faster look up. The fixes a
> regression in memory usage where reference count checking switched to
> using non-invasive tree nodes.  Reduce its default size by 32 times
> and improve locking discipline. Also, fix regressions where tids had
> become unordered to make `perf report --tasks` and
> `perf trace --summary` output easier to read.
> 
> Better encapsulate the dsos abstraction. Remove the linked list and
> rbtree used for faster iteration and log(n) lookup to a sorted array
> for similar performance but half the memory usage per dso. Improve
> reference counting and locking discipline, adding reference count
> checking to dso.
> 
> v6:
>  - Patch 1 is a parameter name fix requested by Namhyung.
>  - Patches 2 to 13 split apart a macro to function callback refactor
>    requested by Arnaldo.
>  - Add fixes and acked-by to later patches from Namhyung.

Applied 1-10, 11 is failing, I'll try to resolve if you don't do it
first.

This should be in tmp.perf-tools-next soon

- Arnaldo
Re: [PATCH v6 00/47] maps/threads/dsos memory improvements and fixes
Posted by Arnaldo Carvalho de Melo 2 years ago
Em Mon, Dec 18, 2023 at 05:53:37PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Dec 06, 2023 at 05:16:34PM -0800, Ian Rogers escreveu:
> > v6:
> >  - Patch 1 is a parameter name fix requested by Namhyung.
> >  - Patches 2 to 13 split apart a macro to function callback refactor
> >    requested by Arnaldo.
> >  - Add fixes and acked-by to later patches from Namhyung.
> 
> Applied 1-10, 11 is failing, I'll try to resolve if you don't do it
> first.
> 
> This should be in tmp.perf-tools-next soon

This is all in perf-tools-next/perf-tools-next, merged with
torvalds/master, which made some of the tests (e.g. "perf list" IIRC) to
stop failing, something that went via perf-tools for v6.7.

Please refresh this series and any other that you may have outstanding
on top of that.

I'll continue processing patches and will try to help with refreshing as
soon as I can.

- Arnaldo
Re: [PATCH v6 00/47] maps/threads/dsos memory improvements and fixes
Posted by Arnaldo Carvalho de Melo 2 years ago
Em Mon, Dec 18, 2023 at 10:27:16PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Dec 18, 2023 at 05:53:37PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Wed, Dec 06, 2023 at 05:16:34PM -0800, Ian Rogers escreveu:
> > > v6:
> > >  - Patch 1 is a parameter name fix requested by Namhyung.
> > >  - Patches 2 to 13 split apart a macro to function callback refactor
> > >    requested by Arnaldo.
> > >  - Add fixes and acked-by to later patches from Namhyung.
> > 
> > Applied 1-10, 11 is failing, I'll try to resolve if you don't do it
> > first.
> > 
> > This should be in tmp.perf-tools-next soon
> 
> This is all in perf-tools-next/perf-tools-next, merged with
> torvalds/master, which made some of the tests (e.g. "perf list" IIRC) to
> stop failing, something that went via perf-tools for v6.7.
> 
> Please refresh this series and any other that you may have outstanding
> on top of that.
> 
> I'll continue processing patches and will try to help with refreshing as
> soon as I can.



11/47 was failing due to a trivial conflict with:

4fb54994b2360ab5 ("perf unwind-libunwind: Fix base address for .eh_frame")

Fixed up, continuing...

- Arnaldo
Re: [PATCH v6 00/47] maps/threads/dsos memory improvements and fixes
Posted by Arnaldo Carvalho de Melo 2 years ago
Em Wed, Dec 20, 2023 at 02:46:54PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Dec 18, 2023 at 10:27:16PM -0300, Arnaldo Carvalho de Melo escreveu:
> > This is all in perf-tools-next/perf-tools-next, merged with
> > torvalds/master, which made some of the tests (e.g. "perf list" IIRC) to
> > stop failing, something that went via perf-tools for v6.7.

> > Please refresh this series and any other that you may have outstanding
> > on top of that.

> > I'll continue processing patches and will try to help with refreshing as
> > soon as I can.

> 11/47 was failing due to a trivial conflict with:

> 4fb54994b2360ab5 ("perf unwind-libunwind: Fix base address for .eh_frame")

> Fixed up, continuing...

applied up to 22/47, will review from them on later, after what has been
merged so far has some time in linux-next.

- Arnaldo