[PATCH v3 0/8] Add io_dir to avoid memory overhead from opendir

Ian Rogers posted 8 patches 9 months, 4 weeks ago
tools/lib/api/Makefile             |   2 +-
tools/lib/api/io_dir.h             | 104 +++++++++++++++++++++++++++++
tools/perf/util/header.c           |  31 ++++-----
tools/perf/util/hwmon_pmu.c        |  42 +++++-------
tools/perf/util/machine.c          |  57 ++++++++--------
tools/perf/util/parse-events.c     |  32 +++++----
tools/perf/util/pmu.c              |  46 ++++++-------
tools/perf/util/pmus.c             |  30 +++------
tools/perf/util/synthetic-events.c |  22 +++---
9 files changed, 229 insertions(+), 137 deletions(-)
create mode 100644 tools/lib/api/io_dir.h
[PATCH v3 0/8] Add io_dir to avoid memory overhead from opendir
Posted by Ian Rogers 9 months, 4 weeks ago
glibc's opendir allocates a minimum of 32kb, when called recursively
for a directory tree the memory consumption can add up - nearly 300kb
during perf start-up when processing modules. Add a stack allocated
variant of readdir sized a little more than 1kb

v3: Rebase on top of Krzysztof Łopatowski's work. Add additional
    defines for SYS_getdents64 on all other architectures if its
    definition is missing. Add a patch to further reduce the
    stack/memory usage in machine__set_modules_path_dir by appending
    to a buffer rather than creating a copy.
v2: Remove the feature test and always use a perf supplied getdents64
    to workaround an Alpine Linux issue in v1:
    https://lore.kernel.org/lkml/20231207050433.1426834-1-irogers@google.com/
    As suggested by Krzysztof Łopatowski
    <krzysztof.m.lopatowski@gmail.com> who also pointed to the perf
    trace performance improvements in start-up time eliminating stat
    calls can achieve:
    https://lore.kernel.org/lkml/20250206113314.335376-2-krzysztof.m.lopatowski@gmail.com/
    Convert parse-events and hwmon_pmu to use io_dir.
v1: This was previously part of the memory saving change set:
    https://lore.kernel.org/lkml/20231127220902.1315692-1-irogers@google.com/
    It is separated here and a feature check and syscall workaround
    for missing getdents64 added.

Ian Rogers (8):
  tools lib api: Add io_dir an allocation free readdir alternative
  perf maps: Switch modules tree walk to io_dir__readdir
  perf pmu: Switch to io_dir__readdir
  perf header: Switch mem topology to io_dir__readdir
  perf events: Remove scandir in thread synthesis
  perf parse-events: Switch tracepoints to io_dir__readdir
  perf hwmon_pmu: Switch event discovery to io_dir__readdir
  perf machine: Reuse module path buffer

 tools/lib/api/Makefile             |   2 +-
 tools/lib/api/io_dir.h             | 104 +++++++++++++++++++++++++++++
 tools/perf/util/header.c           |  31 ++++-----
 tools/perf/util/hwmon_pmu.c        |  42 +++++-------
 tools/perf/util/machine.c          |  57 ++++++++--------
 tools/perf/util/parse-events.c     |  32 +++++----
 tools/perf/util/pmu.c              |  46 ++++++-------
 tools/perf/util/pmus.c             |  30 +++------
 tools/perf/util/synthetic-events.c |  22 +++---
 9 files changed, 229 insertions(+), 137 deletions(-)
 create mode 100644 tools/lib/api/io_dir.h

-- 
2.48.1.658.g4767266eb4-goog
Re: [PATCH v3 0/8] Add io_dir to avoid memory overhead from opendir
Posted by Namhyung Kim 9 months, 3 weeks ago
On Fri, 21 Feb 2025 22:10:05 -0800, Ian Rogers wrote:
> glibc's opendir allocates a minimum of 32kb, when called recursively
> for a directory tree the memory consumption can add up - nearly 300kb
> during perf start-up when processing modules. Add a stack allocated
> variant of readdir sized a little more than 1kb
> 
> v3: Rebase on top of Krzysztof Łopatowski's work. Add additional
>     defines for SYS_getdents64 on all other architectures if its
>     definition is missing. Add a patch to further reduce the
>     stack/memory usage in machine__set_modules_path_dir by appending
>     to a buffer rather than creating a copy.
> v2: Remove the feature test and always use a perf supplied getdents64
>     to workaround an Alpine Linux issue in v1:
>     https://lore.kernel.org/lkml/20231207050433.1426834-1-irogers@google.com/
>     As suggested by Krzysztof Łopatowski
>     <krzysztof.m.lopatowski@gmail.com> who also pointed to the perf
>     trace performance improvements in start-up time eliminating stat
>     calls can achieve:
>     https://lore.kernel.org/lkml/20250206113314.335376-2-krzysztof.m.lopatowski@gmail.com/
>     Convert parse-events and hwmon_pmu to use io_dir.
> v1: This was previously part of the memory saving change set:
>     https://lore.kernel.org/lkml/20231127220902.1315692-1-irogers@google.com/
>     It is separated here and a feature check and syscall workaround
>     for missing getdents64 added.
> 
> [...]
Applied to perf-tools-next, thanks!

Best regards,
Namhyung


Re: [PATCH v3 0/8] Add io_dir to avoid memory overhead from opendir
Posted by Namhyung Kim 9 months, 3 weeks ago
Hi Ian,

On Fri, Feb 21, 2025 at 10:10:05PM -0800, Ian Rogers wrote:
> glibc's opendir allocates a minimum of 32kb, when called recursively
> for a directory tree the memory consumption can add up - nearly 300kb
> during perf start-up when processing modules. Add a stack allocated
> variant of readdir sized a little more than 1kb

It's still small and hard to verify.  I've run the following command
before and after the change but didn't see a difference.

  $ sudo time -f %Mk ./perf record -a true
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 1.757 MB perf.data (563 samples) ]
  74724k

According to man time(1), %M is for max RSS.

Thanks,
Namhyung

> 
> v3: Rebase on top of Krzysztof Łopatowski's work. Add additional
>     defines for SYS_getdents64 on all other architectures if its
>     definition is missing. Add a patch to further reduce the
>     stack/memory usage in machine__set_modules_path_dir by appending
>     to a buffer rather than creating a copy.
> v2: Remove the feature test and always use a perf supplied getdents64
>     to workaround an Alpine Linux issue in v1:
>     https://lore.kernel.org/lkml/20231207050433.1426834-1-irogers@google.com/
>     As suggested by Krzysztof Łopatowski
>     <krzysztof.m.lopatowski@gmail.com> who also pointed to the perf
>     trace performance improvements in start-up time eliminating stat
>     calls can achieve:
>     https://lore.kernel.org/lkml/20250206113314.335376-2-krzysztof.m.lopatowski@gmail.com/
>     Convert parse-events and hwmon_pmu to use io_dir.
> v1: This was previously part of the memory saving change set:
>     https://lore.kernel.org/lkml/20231127220902.1315692-1-irogers@google.com/
>     It is separated here and a feature check and syscall workaround
>     for missing getdents64 added.
> 
> Ian Rogers (8):
>   tools lib api: Add io_dir an allocation free readdir alternative
>   perf maps: Switch modules tree walk to io_dir__readdir
>   perf pmu: Switch to io_dir__readdir
>   perf header: Switch mem topology to io_dir__readdir
>   perf events: Remove scandir in thread synthesis
>   perf parse-events: Switch tracepoints to io_dir__readdir
>   perf hwmon_pmu: Switch event discovery to io_dir__readdir
>   perf machine: Reuse module path buffer
> 
>  tools/lib/api/Makefile             |   2 +-
>  tools/lib/api/io_dir.h             | 104 +++++++++++++++++++++++++++++
>  tools/perf/util/header.c           |  31 ++++-----
>  tools/perf/util/hwmon_pmu.c        |  42 +++++-------
>  tools/perf/util/machine.c          |  57 ++++++++--------
>  tools/perf/util/parse-events.c     |  32 +++++----
>  tools/perf/util/pmu.c              |  46 ++++++-------
>  tools/perf/util/pmus.c             |  30 +++------
>  tools/perf/util/synthetic-events.c |  22 +++---
>  9 files changed, 229 insertions(+), 137 deletions(-)
>  create mode 100644 tools/lib/api/io_dir.h
> 
> -- 
> 2.48.1.658.g4767266eb4-goog
> 
Re: [PATCH v3 0/8] Add io_dir to avoid memory overhead from opendir
Posted by Namhyung Kim 9 months, 3 weeks ago
On Mon, Feb 24, 2025 at 04:28:24PM -0800, Namhyung Kim wrote:
> Hi Ian,
> 
> On Fri, Feb 21, 2025 at 10:10:05PM -0800, Ian Rogers wrote:
> > glibc's opendir allocates a minimum of 32kb, when called recursively
> > for a directory tree the memory consumption can add up - nearly 300kb
> > during perf start-up when processing modules. Add a stack allocated
> > variant of readdir sized a little more than 1kb
> 
> It's still small and hard to verify.  I've run the following command
> before and after the change but didn't see a difference.
> 
>   $ sudo time -f %Mk ./perf record -a true
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 1.757 MB perf.data (563 samples) ]
>   74724k
> 
> According to man time(1), %M is for max RSS.

But anyway, it looks ok and build is fine now.

Thanks,
Namhyung
Re: [PATCH v3 0/8] Add io_dir to avoid memory overhead from opendir
Posted by Ian Rogers 9 months, 3 weeks ago
On Mon, Feb 24, 2025 at 4:30 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Mon, Feb 24, 2025 at 04:28:24PM -0800, Namhyung Kim wrote:
> > Hi Ian,
> >
> > On Fri, Feb 21, 2025 at 10:10:05PM -0800, Ian Rogers wrote:
> > > glibc's opendir allocates a minimum of 32kb, when called recursively
> > > for a directory tree the memory consumption can add up - nearly 300kb
> > > during perf start-up when processing modules. Add a stack allocated
> > > variant of readdir sized a little more than 1kb
> >
> > It's still small and hard to verify.  I've run the following command
> > before and after the change but didn't see a difference.
> >
> >   $ sudo time -f %Mk ./perf record -a true
> >   [ perf record: Woken up 1 times to write data ]
> >   [ perf record: Captured and wrote 1.757 MB perf.data (563 samples) ]
> >   74724k
> >
> > According to man time(1), %M is for max RSS.
>
> But anyway, it looks ok and build is fine now.

Thanks for the testing! So doing a regular build I could repeat what
you saw - basically the opendir isn't contributing to maxrss as the
BPF handling and so gets lost. Doing a minimal static build, that
loses BPF support, things were clearer but not as good as I'd
originally measured, 10880k reduced to 10696k - a 184k saving (raw
data below). Perhaps opendir got better or perhaps there are fewer
kernel modules. I tried heaptrack but unfortunately it wasn't able to
instrument the allocations in glibc's allocdir function (it reported
0). Originally heaptrack showing opendir allocations were significant
for `perf record` was what led me to this code. At the moment BPF
event synthesis and topdown event checking look particularly
expensive.

Thanks,
Ian

Before:
$ sudo /bin/time -f %Mk /tmp/perf/perf record -a true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.489 MB perf.data (107 samples) ]
10880k
$ sudo /bin/time -f %Mk /tmp/perf/perf record -a true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.489 MB perf.data (106 samples) ]
10880k
$ sudo /bin/time -f %Mk /tmp/perf/perf record -a true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.489 MB perf.data (109 samples) ]
10880k
$ sudo /bin/time -f %Mk /tmp/perf/perf record -a true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.489 MB perf.data (102 samples) ]
10880k
$ sudo /bin/time -f %Mk /tmp/perf/perf record -a true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.489 MB perf.data (106 samples) ]
11008k

After:
$ sudo /bin/time -f %Mk /tmp/perf/perf record -a true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.473 MB perf.data (106 samples) ]
10820k
$ sudo /bin/time -f %Mk /tmp/perf/perf record -a true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.473 MB perf.data (109 samples) ]
10696k
$ sudo /bin/time -f %Mk /tmp/perf/perf record -a true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.473 MB perf.data (98 samples) ]
10696k
$ sudo /bin/time -f %Mk /tmp/perf/perf record -a true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.489 MB perf.data (98 samples) ]
10696k
$ sudo /bin/time -f %Mk /tmp/perf/perf record -a true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.490 MB perf.data (110 samples) ]
10696k