[PATCH v4 0/7] Thread memory improvements and fixes

Ian Rogers posted 7 patches 1 year, 11 months ago
tools/perf/builtin-report.c           | 217 +++++++++-------
tools/perf/builtin-trace.c            |  41 ++--
tools/perf/util/Build                 |   1 +
tools/perf/util/bpf_lock_contention.c |   4 +-
tools/perf/util/machine.c             | 341 +++++++-------------------
tools/perf/util/machine.h             |  30 +--
tools/perf/util/rb_resort.h           |   5 -
tools/perf/util/thread.c              |   2 +-
tools/perf/util/thread.h              |   6 -
tools/perf/util/threads.c             | 190 ++++++++++++++
tools/perf/util/threads.h             |  35 +++
11 files changed, 478 insertions(+), 394 deletions(-)
create mode 100644 tools/perf/util/threads.c
create mode 100644 tools/perf/util/threads.h
[PATCH v4 0/7] Thread memory improvements and fixes
Posted by Ian Rogers 1 year, 11 months ago
The next 6 patches (now 7) from:
https://lore.kernel.org/lkml/20240202061532.1939474-1-irogers@google.com/
now the initial maps fixes have landed:
https://lore.kernel.org/all/20240210031746.4057262-1-irogers@google.com/

Separate out and reimplement threads to use a hashmap for lower memory
consumption and faster look up. The fixes a regression in memory usage
where reference count checking switched to using non-invasive tree
nodes.  Reduce threads default size by 32 times and improve locking
discipline. Also, fix regressions where tids had become unordered to
make `perf report --tasks` and `perf trace --summary` output easier to
read.

v4. Add read lock to threads__for_each_thread, Namhyung.
v3. Factor threads out of machine in 1 patch, then move threads
    functions in a second.
v2: improve comments and a commit message.

Ian Rogers (7):
  perf report: Sort child tasks by tid
  perf trace: Ignore thread hashing in summary
  perf machine: Move fprintf to for_each loop and a callback
  perf machine: Move machine's threads into its own abstraction
  perf threads: Move threads to its own files
  perf threads: Switch from rbtree to hashmap
  perf threads: Reduce table size from 256 to 8

 tools/perf/builtin-report.c           | 217 +++++++++-------
 tools/perf/builtin-trace.c            |  41 ++--
 tools/perf/util/Build                 |   1 +
 tools/perf/util/bpf_lock_contention.c |   4 +-
 tools/perf/util/machine.c             | 341 +++++++-------------------
 tools/perf/util/machine.h             |  30 +--
 tools/perf/util/rb_resort.h           |   5 -
 tools/perf/util/thread.c              |   2 +-
 tools/perf/util/thread.h              |   6 -
 tools/perf/util/threads.c             | 190 ++++++++++++++
 tools/perf/util/threads.h             |  35 +++
 11 files changed, 478 insertions(+), 394 deletions(-)
 create mode 100644 tools/perf/util/threads.c
 create mode 100644 tools/perf/util/threads.h

-- 
2.44.0.278.ge034bb2e1d-goog
Re: [PATCH v4 0/7] Thread memory improvements and fixes
Posted by Namhyung Kim 1 year, 11 months ago
On Thu, 29 Feb 2024 21:36:38 -0800, Ian Rogers wrote:
> The next 6 patches (now 7) from:
> https://lore.kernel.org/lkml/20240202061532.1939474-1-irogers@google.com/
> now the initial maps fixes have landed:
> https://lore.kernel.org/all/20240210031746.4057262-1-irogers@google.com/
> 
> Separate out and reimplement threads to use a hashmap for lower memory
> consumption and faster look up. The fixes a regression in memory usage
> where reference count checking switched to using non-invasive tree
> nodes.  Reduce threads default size by 32 times and improve locking
> discipline. Also, fix regressions where tids had become unordered to
> make `perf report --tasks` and `perf trace --summary` output easier to
> read.
> 
> [...]

Applied to perf-tools-next, thanks!

Best regards,
-- 
Namhyung Kim <namhyung@kernel.org>
Re: [PATCH v4 0/7] Thread memory improvements and fixes
Posted by Namhyung Kim 1 year, 11 months ago
Hi Ian,

On Thu, Feb 29, 2024 at 9:36 PM Ian Rogers <irogers@google.com> wrote:
>
> The next 6 patches (now 7) from:
> https://lore.kernel.org/lkml/20240202061532.1939474-1-irogers@google.com/
> now the initial maps fixes have landed:
> https://lore.kernel.org/all/20240210031746.4057262-1-irogers@google.com/
>
> Separate out and reimplement threads to use a hashmap for lower memory
> consumption and faster look up. The fixes a regression in memory usage
> where reference count checking switched to using non-invasive tree
> nodes.  Reduce threads default size by 32 times and improve locking
> discipline. Also, fix regressions where tids had become unordered to
> make `perf report --tasks` and `perf trace --summary` output easier to
> read.
>
> v4. Add read lock to threads__for_each_thread, Namhyung.
> v3. Factor threads out of machine in 1 patch, then move threads
>     functions in a second.
> v2: improve comments and a commit message.
>
> Ian Rogers (7):
>   perf report: Sort child tasks by tid
>   perf trace: Ignore thread hashing in summary
>   perf machine: Move fprintf to for_each loop and a callback
>   perf machine: Move machine's threads into its own abstraction
>   perf threads: Move threads to its own files
>   perf threads: Switch from rbtree to hashmap
>   perf threads: Reduce table size from 256 to 8

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung

>
>  tools/perf/builtin-report.c           | 217 +++++++++-------
>  tools/perf/builtin-trace.c            |  41 ++--
>  tools/perf/util/Build                 |   1 +
>  tools/perf/util/bpf_lock_contention.c |   4 +-
>  tools/perf/util/machine.c             | 341 +++++++-------------------
>  tools/perf/util/machine.h             |  30 +--
>  tools/perf/util/rb_resort.h           |   5 -
>  tools/perf/util/thread.c              |   2 +-
>  tools/perf/util/thread.h              |   6 -
>  tools/perf/util/threads.c             | 190 ++++++++++++++
>  tools/perf/util/threads.h             |  35 +++
>  11 files changed, 478 insertions(+), 394 deletions(-)
>  create mode 100644 tools/perf/util/threads.c
>  create mode 100644 tools/perf/util/threads.h
>
> --
> 2.44.0.278.ge034bb2e1d-goog
>