On 12/05/2026 6:35 am, Ian Rogers wrote:
> This patch series refactors many aspects of the perf build aiming to
> better encapsulate BPF code generation, remove serial build code and
> gain build parallelism. The prepare step that blocks the parallel
> build is reduced to a core 6 smaller dependencies. BPF skeletons are
> made regular dependencies on the targets that use them. Feature tests
> and dependencies are reorgnized. The jevents.py script processes json
> files in parallel and allows the big_c_string to be compiled
> separately.
>
> On a 28-core build workstation (make -j28 all from scratch), clean build
> latency improves by over 36%:
>
> Before:
> real 0m29.006s
> user 2m46.019s
> sys 0m30.610s
>
> After:
> real 0m18.498s
> user 2m32.922s
> sys 0m27.623s
>
I also get similar numbers, even when using ccache:
Before:
real 0m29.584s
user 0m48.993s
sys 0m20.466s
After:
real 0m18.322s
user 0m49.995s
sys 0m17.077s
Tested-by: James Clark <james.clark@linaro.org>
> Summary of Patches:
>
> 1: bpftool Bootstrap Optimization
> - Exempts bpftool bootstrap from non-essential feature tests (LLVM, libbfd,
> libcap), saving 1.1s of sub-make fork overhead during Kbuild startup.
>
> 2-4: Flattening Umbrella Prepare Barriers
> - builtin-trace embedded inclusions and pmu-events generation are completely
> decoupled from the sequential "prepare" umbrella target, eliminating Make
> AST double-parsing overhead and unchoking parallel compilation barriers.
>
> 5-8: Decoupling & Pre-generating BPF Skeletons
> - BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
> - Decouples bpftool bootstrap from top-level static libbpf dependencies,
> attaching bpf-skel-prepare directly to the umbrella prepare target. This
> allows Make to pre-compile bpftool and dump vmlinux.h in the background at
> build startup, removing the 7-second serialization bottleneck before BPF
> object compilation.
>
> 9-11: Foundational Linkage & Fast-Path Feature Detection
> - Eliminates redundant libbpf sub-make feature checks during static builds.
> - Integrates libdebuginfod directly into test-all.c, allowing Make to skip
> individual feature check sub-make forks during AST parsing on fully
> configured workstations.
>
> 12-13: jevents.py Concurrency & Deduplication
> - Splits the massive 2.8 MB big_c_string literal out of pmu-events.c into a
> dedicated pmu-events-string.c compilation unit. This slices C compilation
> latency in half by compiling string and struct tables simultaneously across
> separate CPU cores while preserving zero dynamic ELF relocations.
> - Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
> all available CPU cores using ProcessPoolExecutor (accelerating Python
> execution by 11x, from 3.3s down to ~290ms).
>
> 14: Out-of-Tree Incremental Rebuild Fix
> - Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
> Make from continuously re-executing script installation rules on already
> built out-of-tree builds.
>
> Ian Rogers (14):
> bpftool build: Restrict feature tests during bootstrap compilation
> perf trace beauty: Make beauty generated C code standalone .o files
> perf build: Decouple pmu-events from prepare umbrella target
> perf build: Remove empty archheaders target
> perf build: Move BPF skeleton generation out of Makefile.perf
> perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
> perf build: Move static libbpf dependency out of prepare step
> perf build: Pre-generate BPF skeletons during umbrella prepare phase
> perf build: Move libsymbol dependency out of prepare step
> perf build: Remove redundant libbpf feature check for static builds
> tools build: Integrate libdebuginfod into test-all fast path
> perf pmu-events: Split big_c_string storage into standalone
> compilation unit
> perf pmu-events: Parallelize JSON and metric pre-computation in
> jevents.py
> perf build: Prefix SCRIPTS with output directory to fix continuous
> rebuilds
>
> tools/bpf/bpftool/Makefile | 5 +
> tools/build/Makefile.feature | 6 +-
> tools/build/feature/Makefile | 2 +-
> tools/build/feature/test-all.c | 5 +
> tools/perf/Build | 2 +
> tools/perf/Makefile.config | 6 +-
> tools/perf/Makefile.perf | 427 +-----------------
> tools/perf/bench/Build | 6 +
> .../bpf_skel/bench_uprobe.bpf.c | 0
> tools/perf/bench/uprobe.c | 2 +-
> tools/perf/bpf_skel.mak | 110 +++++
> tools/perf/builtin-trace.c | 30 +-
> tools/perf/pmu-events/Build | 15 +-
> tools/perf/pmu-events/jevents.py | 56 ++-
> tools/perf/trace/beauty/Build | 280 ++++++++++++
> tools/perf/trace/beauty/arch_errno_names.c | 2 +
> tools/perf/trace/beauty/arch_errno_names.sh | 2 +-
> tools/perf/trace/beauty/beauty.h | 60 +++
> tools/perf/trace/beauty/eventfd.c | 6 +-
> tools/perf/trace/beauty/fsconfig.c | 5 +
> tools/perf/trace/beauty/futex_op.c | 6 +-
> tools/perf/trace/beauty/futex_val3.c | 6 +-
> tools/perf/trace/beauty/mmap.c | 24 +-
> tools/perf/trace/beauty/mode_t.c | 6 +-
> tools/perf/trace/beauty/msg_flags.c | 8 +-
> tools/perf/trace/beauty/open_flags.c | 1 +
> tools/perf/trace/beauty/perf_event_open.c | 22 +-
> tools/perf/trace/beauty/pid.c | 5 +-
> tools/perf/trace/beauty/sched_policy.c | 8 +-
> tools/perf/trace/beauty/seccomp.c | 12 +-
> tools/perf/trace/beauty/signum.c | 6 +-
> tools/perf/trace/beauty/socket_type.c | 6 +-
> .../perf/{util => trace/beauty}/syscalltbl.c | 0
> .../perf/{util => trace/beauty}/syscalltbl.h | 0
> tools/perf/trace/beauty/tracepoints/Build | 22 +
> tools/perf/trace/beauty/waitid_options.c | 8 +-
> tools/perf/util/Build | 17 +-
> tools/perf/util/bpf-trace-summary.c | 2 +-
> tools/perf/util/env.c | 4 +-
> tools/perf/util/env.h | 1 +
> 40 files changed, 700 insertions(+), 491 deletions(-)
> rename tools/perf/{util => bench}/bpf_skel/bench_uprobe.bpf.c (100%)
> create mode 100644 tools/perf/bpf_skel.mak
> create mode 100644 tools/perf/trace/beauty/fsconfig.c
> rename tools/perf/{util => trace/beauty}/syscalltbl.c (100%)
> rename tools/perf/{util => trace/beauty}/syscalltbl.h (100%)
>