[PATCH v1 00/14] perf build: Reduce build time by one third

Ian Rogers posted 14 patches 1 week, 6 days ago
tools/bpf/bpftool/Makefile                    |   5 +
tools/build/Makefile.feature                  |   6 +-
tools/build/feature/Makefile                  |   2 +-
tools/build/feature/test-all.c                |   5 +
tools/perf/Build                              |   2 +
tools/perf/Makefile.config                    |   6 +-
tools/perf/Makefile.perf                      | 427 +-----------------
tools/perf/bench/Build                        |   6 +
.../bpf_skel/bench_uprobe.bpf.c               |   0
tools/perf/bench/uprobe.c                     |   2 +-
tools/perf/bpf_skel.mak                       | 110 +++++
tools/perf/builtin-trace.c                    |  30 +-
tools/perf/pmu-events/Build                   |  15 +-
tools/perf/pmu-events/jevents.py              |  56 ++-
tools/perf/trace/beauty/Build                 | 280 ++++++++++++
tools/perf/trace/beauty/arch_errno_names.c    |   2 +
tools/perf/trace/beauty/arch_errno_names.sh   |   2 +-
tools/perf/trace/beauty/beauty.h              |  60 +++
tools/perf/trace/beauty/eventfd.c             |   6 +-
tools/perf/trace/beauty/fsconfig.c            |   5 +
tools/perf/trace/beauty/futex_op.c            |   6 +-
tools/perf/trace/beauty/futex_val3.c          |   6 +-
tools/perf/trace/beauty/mmap.c                |  24 +-
tools/perf/trace/beauty/mode_t.c              |   6 +-
tools/perf/trace/beauty/msg_flags.c           |   8 +-
tools/perf/trace/beauty/open_flags.c          |   1 +
tools/perf/trace/beauty/perf_event_open.c     |  22 +-
tools/perf/trace/beauty/pid.c                 |   5 +-
tools/perf/trace/beauty/sched_policy.c        |   8 +-
tools/perf/trace/beauty/seccomp.c             |  12 +-
tools/perf/trace/beauty/signum.c              |   6 +-
tools/perf/trace/beauty/socket_type.c         |   6 +-
.../perf/{util => trace/beauty}/syscalltbl.c  |   0
.../perf/{util => trace/beauty}/syscalltbl.h  |   0
tools/perf/trace/beauty/tracepoints/Build     |  22 +
tools/perf/trace/beauty/waitid_options.c      |   8 +-
tools/perf/util/Build                         |  17 +-
tools/perf/util/bpf-trace-summary.c           |   2 +-
tools/perf/util/env.c                         |   4 +-
tools/perf/util/env.h                         |   1 +
40 files changed, 700 insertions(+), 491 deletions(-)
rename tools/perf/{util => bench}/bpf_skel/bench_uprobe.bpf.c (100%)
create mode 100644 tools/perf/bpf_skel.mak
create mode 100644 tools/perf/trace/beauty/fsconfig.c
rename tools/perf/{util => trace/beauty}/syscalltbl.c (100%)
rename tools/perf/{util => trace/beauty}/syscalltbl.h (100%)
[PATCH v1 00/14] perf build: Reduce build time by one third
Posted by Ian Rogers 1 week, 6 days ago
This patch series refactors many aspects of the perf build aiming to
better encapsulate BPF code generation, remove serial build code and
gain build parallelism. The prepare step that blocks the parallel
build is reduced to a core 6 smaller dependencies. BPF skeletons are
made regular dependencies on the targets that use them. Feature tests
and dependencies are reorgnized. The jevents.py script processes json
files in parallel and allows the big_c_string to be compiled
separately.

On a 28-core build workstation (make -j28 all from scratch), clean build
latency improves by over 36%:

  Before:
    real    0m29.006s
    user    2m46.019s
    sys     0m30.610s

  After:
    real    0m18.498s
    user    2m32.922s
    sys     0m27.623s

Summary of Patches:

1: bpftool Bootstrap Optimization
  - Exempts bpftool bootstrap from non-essential feature tests (LLVM, libbfd,
    libcap), saving 1.1s of sub-make fork overhead during Kbuild startup.

2-4: Flattening Umbrella Prepare Barriers
  - builtin-trace embedded inclusions and pmu-events generation are completely
    decoupled from the sequential "prepare" umbrella target, eliminating Make
    AST double-parsing overhead and unchoking parallel compilation barriers.

5-8: Decoupling & Pre-generating BPF Skeletons
  - BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
  - Decouples bpftool bootstrap from top-level static libbpf dependencies,
    attaching bpf-skel-prepare directly to the umbrella prepare target. This
    allows Make to pre-compile bpftool and dump vmlinux.h in the background at
    build startup, removing the 7-second serialization bottleneck before BPF
    object compilation.

9-11: Foundational Linkage & Fast-Path Feature Detection
  - Eliminates redundant libbpf sub-make feature checks during static builds.
  - Integrates libdebuginfod directly into test-all.c, allowing Make to skip
    individual feature check sub-make forks during AST parsing on fully
    configured workstations.

12-13: jevents.py Concurrency & Deduplication
  - Splits the massive 2.8 MB big_c_string literal out of pmu-events.c into a
    dedicated pmu-events-string.c compilation unit. This slices C compilation
    latency in half by compiling string and struct tables simultaneously across
    separate CPU cores while preserving zero dynamic ELF relocations.
  - Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
    all available CPU cores using ProcessPoolExecutor (accelerating Python
    execution by 11x, from 3.3s down to ~290ms).

14: Out-of-Tree Incremental Rebuild Fix
  - Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
    Make from continuously re-executing script installation rules on already
    built out-of-tree builds.

Ian Rogers (14):
  bpftool build: Restrict feature tests during bootstrap compilation
  perf trace beauty: Make beauty generated C code standalone .o files
  perf build: Decouple pmu-events from prepare umbrella target
  perf build: Remove empty archheaders target
  perf build: Move BPF skeleton generation out of Makefile.perf
  perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
  perf build: Move static libbpf dependency out of prepare step
  perf build: Pre-generate BPF skeletons during umbrella prepare phase
  perf build: Move libsymbol dependency out of prepare step
  perf build: Remove redundant libbpf feature check for static builds
  tools build: Integrate libdebuginfod into test-all fast path
  perf pmu-events: Split big_c_string storage into standalone
    compilation unit
  perf pmu-events: Parallelize JSON and metric pre-computation in
    jevents.py
  perf build: Prefix SCRIPTS with output directory to fix continuous
    rebuilds

 tools/bpf/bpftool/Makefile                    |   5 +
 tools/build/Makefile.feature                  |   6 +-
 tools/build/feature/Makefile                  |   2 +-
 tools/build/feature/test-all.c                |   5 +
 tools/perf/Build                              |   2 +
 tools/perf/Makefile.config                    |   6 +-
 tools/perf/Makefile.perf                      | 427 +-----------------
 tools/perf/bench/Build                        |   6 +
 .../bpf_skel/bench_uprobe.bpf.c               |   0
 tools/perf/bench/uprobe.c                     |   2 +-
 tools/perf/bpf_skel.mak                       | 110 +++++
 tools/perf/builtin-trace.c                    |  30 +-
 tools/perf/pmu-events/Build                   |  15 +-
 tools/perf/pmu-events/jevents.py              |  56 ++-
 tools/perf/trace/beauty/Build                 | 280 ++++++++++++
 tools/perf/trace/beauty/arch_errno_names.c    |   2 +
 tools/perf/trace/beauty/arch_errno_names.sh   |   2 +-
 tools/perf/trace/beauty/beauty.h              |  60 +++
 tools/perf/trace/beauty/eventfd.c             |   6 +-
 tools/perf/trace/beauty/fsconfig.c            |   5 +
 tools/perf/trace/beauty/futex_op.c            |   6 +-
 tools/perf/trace/beauty/futex_val3.c          |   6 +-
 tools/perf/trace/beauty/mmap.c                |  24 +-
 tools/perf/trace/beauty/mode_t.c              |   6 +-
 tools/perf/trace/beauty/msg_flags.c           |   8 +-
 tools/perf/trace/beauty/open_flags.c          |   1 +
 tools/perf/trace/beauty/perf_event_open.c     |  22 +-
 tools/perf/trace/beauty/pid.c                 |   5 +-
 tools/perf/trace/beauty/sched_policy.c        |   8 +-
 tools/perf/trace/beauty/seccomp.c             |  12 +-
 tools/perf/trace/beauty/signum.c              |   6 +-
 tools/perf/trace/beauty/socket_type.c         |   6 +-
 .../perf/{util => trace/beauty}/syscalltbl.c  |   0
 .../perf/{util => trace/beauty}/syscalltbl.h  |   0
 tools/perf/trace/beauty/tracepoints/Build     |  22 +
 tools/perf/trace/beauty/waitid_options.c      |   8 +-
 tools/perf/util/Build                         |  17 +-
 tools/perf/util/bpf-trace-summary.c           |   2 +-
 tools/perf/util/env.c                         |   4 +-
 tools/perf/util/env.h                         |   1 +
 40 files changed, 700 insertions(+), 491 deletions(-)
 rename tools/perf/{util => bench}/bpf_skel/bench_uprobe.bpf.c (100%)
 create mode 100644 tools/perf/bpf_skel.mak
 create mode 100644 tools/perf/trace/beauty/fsconfig.c
 rename tools/perf/{util => trace/beauty}/syscalltbl.c (100%)
 rename tools/perf/{util => trace/beauty}/syscalltbl.h (100%)

-- 
2.54.0.563.g4f69b47b94-goog
Re: [PATCH v1 00/14] perf build: Reduce build time by one third
Posted by James Clark 1 week, 6 days ago

On 12/05/2026 6:35 am, Ian Rogers wrote:
> This patch series refactors many aspects of the perf build aiming to
> better encapsulate BPF code generation, remove serial build code and
> gain build parallelism. The prepare step that blocks the parallel
> build is reduced to a core 6 smaller dependencies. BPF skeletons are
> made regular dependencies on the targets that use them. Feature tests
> and dependencies are reorgnized. The jevents.py script processes json
> files in parallel and allows the big_c_string to be compiled
> separately.
> 
> On a 28-core build workstation (make -j28 all from scratch), clean build
> latency improves by over 36%:
> 
>    Before:
>      real    0m29.006s
>      user    2m46.019s
>      sys     0m30.610s
> 
>    After:
>      real    0m18.498s
>      user    2m32.922s
>      sys     0m27.623s
> 

I also get similar numbers, even when using ccache:

Before:

   real    0m29.584s
   user    0m48.993s
   sys     0m20.466s

After:

   real    0m18.322s
   user    0m49.995s
   sys     0m17.077s

Tested-by: James Clark <james.clark@linaro.org>

> Summary of Patches:
> 
> 1: bpftool Bootstrap Optimization
>    - Exempts bpftool bootstrap from non-essential feature tests (LLVM, libbfd,
>      libcap), saving 1.1s of sub-make fork overhead during Kbuild startup.
> 
> 2-4: Flattening Umbrella Prepare Barriers
>    - builtin-trace embedded inclusions and pmu-events generation are completely
>      decoupled from the sequential "prepare" umbrella target, eliminating Make
>      AST double-parsing overhead and unchoking parallel compilation barriers.
> 
> 5-8: Decoupling & Pre-generating BPF Skeletons
>    - BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
>    - Decouples bpftool bootstrap from top-level static libbpf dependencies,
>      attaching bpf-skel-prepare directly to the umbrella prepare target. This
>      allows Make to pre-compile bpftool and dump vmlinux.h in the background at
>      build startup, removing the 7-second serialization bottleneck before BPF
>      object compilation.
> 
> 9-11: Foundational Linkage & Fast-Path Feature Detection
>    - Eliminates redundant libbpf sub-make feature checks during static builds.
>    - Integrates libdebuginfod directly into test-all.c, allowing Make to skip
>      individual feature check sub-make forks during AST parsing on fully
>      configured workstations.
> 
> 12-13: jevents.py Concurrency & Deduplication
>    - Splits the massive 2.8 MB big_c_string literal out of pmu-events.c into a
>      dedicated pmu-events-string.c compilation unit. This slices C compilation
>      latency in half by compiling string and struct tables simultaneously across
>      separate CPU cores while preserving zero dynamic ELF relocations.
>    - Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
>      all available CPU cores using ProcessPoolExecutor (accelerating Python
>      execution by 11x, from 3.3s down to ~290ms).
> 
> 14: Out-of-Tree Incremental Rebuild Fix
>    - Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
>      Make from continuously re-executing script installation rules on already
>      built out-of-tree builds.
> 
> Ian Rogers (14):
>    bpftool build: Restrict feature tests during bootstrap compilation
>    perf trace beauty: Make beauty generated C code standalone .o files
>    perf build: Decouple pmu-events from prepare umbrella target
>    perf build: Remove empty archheaders target
>    perf build: Move BPF skeleton generation out of Makefile.perf
>    perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
>    perf build: Move static libbpf dependency out of prepare step
>    perf build: Pre-generate BPF skeletons during umbrella prepare phase
>    perf build: Move libsymbol dependency out of prepare step
>    perf build: Remove redundant libbpf feature check for static builds
>    tools build: Integrate libdebuginfod into test-all fast path
>    perf pmu-events: Split big_c_string storage into standalone
>      compilation unit
>    perf pmu-events: Parallelize JSON and metric pre-computation in
>      jevents.py
>    perf build: Prefix SCRIPTS with output directory to fix continuous
>      rebuilds
> 
>   tools/bpf/bpftool/Makefile                    |   5 +
>   tools/build/Makefile.feature                  |   6 +-
>   tools/build/feature/Makefile                  |   2 +-
>   tools/build/feature/test-all.c                |   5 +
>   tools/perf/Build                              |   2 +
>   tools/perf/Makefile.config                    |   6 +-
>   tools/perf/Makefile.perf                      | 427 +-----------------
>   tools/perf/bench/Build                        |   6 +
>   .../bpf_skel/bench_uprobe.bpf.c               |   0
>   tools/perf/bench/uprobe.c                     |   2 +-
>   tools/perf/bpf_skel.mak                       | 110 +++++
>   tools/perf/builtin-trace.c                    |  30 +-
>   tools/perf/pmu-events/Build                   |  15 +-
>   tools/perf/pmu-events/jevents.py              |  56 ++-
>   tools/perf/trace/beauty/Build                 | 280 ++++++++++++
>   tools/perf/trace/beauty/arch_errno_names.c    |   2 +
>   tools/perf/trace/beauty/arch_errno_names.sh   |   2 +-
>   tools/perf/trace/beauty/beauty.h              |  60 +++
>   tools/perf/trace/beauty/eventfd.c             |   6 +-
>   tools/perf/trace/beauty/fsconfig.c            |   5 +
>   tools/perf/trace/beauty/futex_op.c            |   6 +-
>   tools/perf/trace/beauty/futex_val3.c          |   6 +-
>   tools/perf/trace/beauty/mmap.c                |  24 +-
>   tools/perf/trace/beauty/mode_t.c              |   6 +-
>   tools/perf/trace/beauty/msg_flags.c           |   8 +-
>   tools/perf/trace/beauty/open_flags.c          |   1 +
>   tools/perf/trace/beauty/perf_event_open.c     |  22 +-
>   tools/perf/trace/beauty/pid.c                 |   5 +-
>   tools/perf/trace/beauty/sched_policy.c        |   8 +-
>   tools/perf/trace/beauty/seccomp.c             |  12 +-
>   tools/perf/trace/beauty/signum.c              |   6 +-
>   tools/perf/trace/beauty/socket_type.c         |   6 +-
>   .../perf/{util => trace/beauty}/syscalltbl.c  |   0
>   .../perf/{util => trace/beauty}/syscalltbl.h  |   0
>   tools/perf/trace/beauty/tracepoints/Build     |  22 +
>   tools/perf/trace/beauty/waitid_options.c      |   8 +-
>   tools/perf/util/Build                         |  17 +-
>   tools/perf/util/bpf-trace-summary.c           |   2 +-
>   tools/perf/util/env.c                         |   4 +-
>   tools/perf/util/env.h                         |   1 +
>   40 files changed, 700 insertions(+), 491 deletions(-)
>   rename tools/perf/{util => bench}/bpf_skel/bench_uprobe.bpf.c (100%)
>   create mode 100644 tools/perf/bpf_skel.mak
>   create mode 100644 tools/perf/trace/beauty/fsconfig.c
>   rename tools/perf/{util => trace/beauty}/syscalltbl.c (100%)
>   rename tools/perf/{util => trace/beauty}/syscalltbl.h (100%)
>