tools/lib/api/Makefile | 2 +- tools/lib/api/io_dir.h | 104 +++++++++++++++++++++++++++++ tools/perf/util/header.c | 31 ++++----- tools/perf/util/hwmon_pmu.c | 42 +++++------- tools/perf/util/machine.c | 57 ++++++++-------- tools/perf/util/parse-events.c | 32 +++++---- tools/perf/util/pmu.c | 46 ++++++------- tools/perf/util/pmus.c | 30 +++------ tools/perf/util/synthetic-events.c | 22 +++--- 9 files changed, 229 insertions(+), 137 deletions(-) create mode 100644 tools/lib/api/io_dir.h
glibc's opendir allocates a minimum of 32kb, when called recursively
for a directory tree the memory consumption can add up - nearly 300kb
during perf start-up when processing modules. Add a stack allocated
variant of readdir sized a little more than 1kb
v3: Rebase on top of Krzysztof Łopatowski's work. Add additional
defines for SYS_getdents64 on all other architectures if its
definition is missing. Add a patch to further reduce the
stack/memory usage in machine__set_modules_path_dir by appending
to a buffer rather than creating a copy.
v2: Remove the feature test and always use a perf supplied getdents64
to workaround an Alpine Linux issue in v1:
https://lore.kernel.org/lkml/20231207050433.1426834-1-irogers@google.com/
As suggested by Krzysztof Łopatowski
<krzysztof.m.lopatowski@gmail.com> who also pointed to the perf
trace performance improvements in start-up time eliminating stat
calls can achieve:
https://lore.kernel.org/lkml/20250206113314.335376-2-krzysztof.m.lopatowski@gmail.com/
Convert parse-events and hwmon_pmu to use io_dir.
v1: This was previously part of the memory saving change set:
https://lore.kernel.org/lkml/20231127220902.1315692-1-irogers@google.com/
It is separated here and a feature check and syscall workaround
for missing getdents64 added.
Ian Rogers (8):
tools lib api: Add io_dir an allocation free readdir alternative
perf maps: Switch modules tree walk to io_dir__readdir
perf pmu: Switch to io_dir__readdir
perf header: Switch mem topology to io_dir__readdir
perf events: Remove scandir in thread synthesis
perf parse-events: Switch tracepoints to io_dir__readdir
perf hwmon_pmu: Switch event discovery to io_dir__readdir
perf machine: Reuse module path buffer
tools/lib/api/Makefile | 2 +-
tools/lib/api/io_dir.h | 104 +++++++++++++++++++++++++++++
tools/perf/util/header.c | 31 ++++-----
tools/perf/util/hwmon_pmu.c | 42 +++++-------
tools/perf/util/machine.c | 57 ++++++++--------
tools/perf/util/parse-events.c | 32 +++++----
tools/perf/util/pmu.c | 46 ++++++-------
tools/perf/util/pmus.c | 30 +++------
tools/perf/util/synthetic-events.c | 22 +++---
9 files changed, 229 insertions(+), 137 deletions(-)
create mode 100644 tools/lib/api/io_dir.h
--
2.48.1.658.g4767266eb4-goog
On Fri, 21 Feb 2025 22:10:05 -0800, Ian Rogers wrote: > glibc's opendir allocates a minimum of 32kb, when called recursively > for a directory tree the memory consumption can add up - nearly 300kb > during perf start-up when processing modules. Add a stack allocated > variant of readdir sized a little more than 1kb > > v3: Rebase on top of Krzysztof Łopatowski's work. Add additional > defines for SYS_getdents64 on all other architectures if its > definition is missing. Add a patch to further reduce the > stack/memory usage in machine__set_modules_path_dir by appending > to a buffer rather than creating a copy. > v2: Remove the feature test and always use a perf supplied getdents64 > to workaround an Alpine Linux issue in v1: > https://lore.kernel.org/lkml/20231207050433.1426834-1-irogers@google.com/ > As suggested by Krzysztof Łopatowski > <krzysztof.m.lopatowski@gmail.com> who also pointed to the perf > trace performance improvements in start-up time eliminating stat > calls can achieve: > https://lore.kernel.org/lkml/20250206113314.335376-2-krzysztof.m.lopatowski@gmail.com/ > Convert parse-events and hwmon_pmu to use io_dir. > v1: This was previously part of the memory saving change set: > https://lore.kernel.org/lkml/20231127220902.1315692-1-irogers@google.com/ > It is separated here and a feature check and syscall workaround > for missing getdents64 added. > > [...] Applied to perf-tools-next, thanks! Best regards, Namhyung
Hi Ian, On Fri, Feb 21, 2025 at 10:10:05PM -0800, Ian Rogers wrote: > glibc's opendir allocates a minimum of 32kb, when called recursively > for a directory tree the memory consumption can add up - nearly 300kb > during perf start-up when processing modules. Add a stack allocated > variant of readdir sized a little more than 1kb It's still small and hard to verify. I've run the following command before and after the change but didn't see a difference. $ sudo time -f %Mk ./perf record -a true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.757 MB perf.data (563 samples) ] 74724k According to man time(1), %M is for max RSS. Thanks, Namhyung > > v3: Rebase on top of Krzysztof Łopatowski's work. Add additional > defines for SYS_getdents64 on all other architectures if its > definition is missing. Add a patch to further reduce the > stack/memory usage in machine__set_modules_path_dir by appending > to a buffer rather than creating a copy. > v2: Remove the feature test and always use a perf supplied getdents64 > to workaround an Alpine Linux issue in v1: > https://lore.kernel.org/lkml/20231207050433.1426834-1-irogers@google.com/ > As suggested by Krzysztof Łopatowski > <krzysztof.m.lopatowski@gmail.com> who also pointed to the perf > trace performance improvements in start-up time eliminating stat > calls can achieve: > https://lore.kernel.org/lkml/20250206113314.335376-2-krzysztof.m.lopatowski@gmail.com/ > Convert parse-events and hwmon_pmu to use io_dir. > v1: This was previously part of the memory saving change set: > https://lore.kernel.org/lkml/20231127220902.1315692-1-irogers@google.com/ > It is separated here and a feature check and syscall workaround > for missing getdents64 added. > > Ian Rogers (8): > tools lib api: Add io_dir an allocation free readdir alternative > perf maps: Switch modules tree walk to io_dir__readdir > perf pmu: Switch to io_dir__readdir > perf header: Switch mem topology to io_dir__readdir > perf events: Remove scandir in thread synthesis > perf parse-events: Switch tracepoints to io_dir__readdir > perf hwmon_pmu: Switch event discovery to io_dir__readdir > perf machine: Reuse module path buffer > > tools/lib/api/Makefile | 2 +- > tools/lib/api/io_dir.h | 104 +++++++++++++++++++++++++++++ > tools/perf/util/header.c | 31 ++++----- > tools/perf/util/hwmon_pmu.c | 42 +++++------- > tools/perf/util/machine.c | 57 ++++++++-------- > tools/perf/util/parse-events.c | 32 +++++---- > tools/perf/util/pmu.c | 46 ++++++------- > tools/perf/util/pmus.c | 30 +++------ > tools/perf/util/synthetic-events.c | 22 +++--- > 9 files changed, 229 insertions(+), 137 deletions(-) > create mode 100644 tools/lib/api/io_dir.h > > -- > 2.48.1.658.g4767266eb4-goog >
On Mon, Feb 24, 2025 at 04:28:24PM -0800, Namhyung Kim wrote: > Hi Ian, > > On Fri, Feb 21, 2025 at 10:10:05PM -0800, Ian Rogers wrote: > > glibc's opendir allocates a minimum of 32kb, when called recursively > > for a directory tree the memory consumption can add up - nearly 300kb > > during perf start-up when processing modules. Add a stack allocated > > variant of readdir sized a little more than 1kb > > It's still small and hard to verify. I've run the following command > before and after the change but didn't see a difference. > > $ sudo time -f %Mk ./perf record -a true > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 1.757 MB perf.data (563 samples) ] > 74724k > > According to man time(1), %M is for max RSS. But anyway, it looks ok and build is fine now. Thanks, Namhyung
On Mon, Feb 24, 2025 at 4:30 PM Namhyung Kim <namhyung@kernel.org> wrote: > > On Mon, Feb 24, 2025 at 04:28:24PM -0800, Namhyung Kim wrote: > > Hi Ian, > > > > On Fri, Feb 21, 2025 at 10:10:05PM -0800, Ian Rogers wrote: > > > glibc's opendir allocates a minimum of 32kb, when called recursively > > > for a directory tree the memory consumption can add up - nearly 300kb > > > during perf start-up when processing modules. Add a stack allocated > > > variant of readdir sized a little more than 1kb > > > > It's still small and hard to verify. I've run the following command > > before and after the change but didn't see a difference. > > > > $ sudo time -f %Mk ./perf record -a true > > [ perf record: Woken up 1 times to write data ] > > [ perf record: Captured and wrote 1.757 MB perf.data (563 samples) ] > > 74724k > > > > According to man time(1), %M is for max RSS. > > But anyway, it looks ok and build is fine now. Thanks for the testing! So doing a regular build I could repeat what you saw - basically the opendir isn't contributing to maxrss as the BPF handling and so gets lost. Doing a minimal static build, that loses BPF support, things were clearer but not as good as I'd originally measured, 10880k reduced to 10696k - a 184k saving (raw data below). Perhaps opendir got better or perhaps there are fewer kernel modules. I tried heaptrack but unfortunately it wasn't able to instrument the allocations in glibc's allocdir function (it reported 0). Originally heaptrack showing opendir allocations were significant for `perf record` was what led me to this code. At the moment BPF event synthesis and topdown event checking look particularly expensive. Thanks, Ian Before: $ sudo /bin/time -f %Mk /tmp/perf/perf record -a true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.489 MB perf.data (107 samples) ] 10880k $ sudo /bin/time -f %Mk /tmp/perf/perf record -a true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.489 MB perf.data (106 samples) ] 10880k $ sudo /bin/time -f %Mk /tmp/perf/perf record -a true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.489 MB perf.data (109 samples) ] 10880k $ sudo /bin/time -f %Mk /tmp/perf/perf record -a true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.489 MB perf.data (102 samples) ] 10880k $ sudo /bin/time -f %Mk /tmp/perf/perf record -a true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.489 MB perf.data (106 samples) ] 11008k After: $ sudo /bin/time -f %Mk /tmp/perf/perf record -a true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.473 MB perf.data (106 samples) ] 10820k $ sudo /bin/time -f %Mk /tmp/perf/perf record -a true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.473 MB perf.data (109 samples) ] 10696k $ sudo /bin/time -f %Mk /tmp/perf/perf record -a true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.473 MB perf.data (98 samples) ] 10696k $ sudo /bin/time -f %Mk /tmp/perf/perf record -a true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.489 MB perf.data (98 samples) ] 10696k $ sudo /bin/time -f %Mk /tmp/perf/perf record -a true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.490 MB perf.data (110 samples) ] 10696k
© 2016 - 2025 Red Hat, Inc.