tools/build/feature/Makefile | 13 +-
tools/perf/.gitignore | 1 +
tools/perf/Build | 2 +
tools/perf/Makefile.config | 19 +-
tools/perf/Makefile.perf | 423 ++----------------
tools/perf/bench/Build | 6 +
.../bpf_skel/bench_uprobe.bpf.c | 0
tools/perf/bench/uprobe.c | 2 +-
tools/perf/bpf_skel.mak | 109 +++++
tools/perf/builtin-trace.c | 32 +-
tools/perf/pmu-events/Build | 26 +-
tools/perf/pmu-events/jevents.py | 59 ++-
tools/perf/trace/beauty/Build | 276 ++++++++++++
tools/perf/trace/beauty/arch_errno_names.c | 2 +
tools/perf/trace/beauty/arch_errno_names.sh | 2 +-
tools/perf/trace/beauty/beauty.h | 60 +++
tools/perf/trace/beauty/eventfd.c | 6 +-
tools/perf/trace/beauty/fsconfig.c | 5 +
tools/perf/trace/beauty/futex_op.c | 5 +-
tools/perf/trace/beauty/futex_val3.c | 5 +-
tools/perf/trace/beauty/mmap.c | 24 +-
tools/perf/trace/beauty/mode_t.c | 6 +-
tools/perf/trace/beauty/msg_flags.c | 8 +-
tools/perf/trace/beauty/open_flags.c | 2 +
tools/perf/trace/beauty/perf_event_open.c | 21 +-
tools/perf/trace/beauty/pid.c | 5 +-
tools/perf/trace/beauty/sched_policy.c | 8 +-
tools/perf/trace/beauty/seccomp.c | 12 +-
tools/perf/trace/beauty/signum.c | 6 +-
tools/perf/trace/beauty/socket_type.c | 6 +-
.../perf/{util => trace/beauty}/syscalltbl.c | 0
.../perf/{util => trace/beauty}/syscalltbl.h | 0
tools/perf/trace/beauty/tracepoints/Build | 21 +
tools/perf/trace/beauty/waitid_options.c | 8 +-
tools/perf/util/Build | 17 +-
tools/perf/util/bpf-trace-summary.c | 2 +-
tools/perf/util/env.c | 4 -
tools/perf/util/env.h | 1 +
38 files changed, 688 insertions(+), 516 deletions(-)
rename tools/perf/{util => bench}/bpf_skel/bench_uprobe.bpf.c (100%)
create mode 100644 tools/perf/bpf_skel.mak
create mode 100644 tools/perf/trace/beauty/fsconfig.c
rename tools/perf/{util => trace/beauty}/syscalltbl.c (100%)
rename tools/perf/{util => trace/beauty}/syscalltbl.h (100%)
This patch series refactors Kbuild internals, BPF skeleton generation,
Python AST pre-computation, and foundational tooling dependencies across
the perf tool build system. By eliminating umbrella target synchronization
barriers, decoupling static library prerequisites, parallelizing single-core
script generators, and eradicating redundant feature checks, this series
unlocks absolute theoretical peak multi-core concurrency during Kbuild startup.
On a 28-core build workstation (make -j28 all from scratch), clean build
latency improves by over 44%:
Before:
real 0m29.006s
user 2m46.019s
sys 0m30.610s
After:
real 0m16.091s
user 2m40.135s
sys 0m25.740s
Saving 12.9 full seconds time per clean build. Furthermore, nothing to
build incremental builds are improved by nearly 7x:
Before:
real 0m11.528s
user 0m9.633s
sys 0m6.965s
After:
real 0m1.717s
user 0m1.682s
sys 0m0.960s
Summary of Patches:
1: Fast-Path Feature Detection
- Refactors test-clang-bpf-co-re.bin and test-bpftool-skeletons.bin feature
checks to group shell pipelines within curly braces and redirect both stdout
and stderr to .make.output before touching $@ purely upon success
(> $(@:.bin=.make.output) 2>&1 && touch $@). Grouping the pipeline ({ cmd1 | cmd2; })
ensures that compiler stderr is successfully captured in .make.output rather
than escaping to the parent shell. This perfectly matches standard Kbuild
feature check conventions and ensures the target files are touched on disk
purely upon success, allowing Kbuild to cache positive detections and avoid
continuous sub-make re-evaluations during incremental builds. Adds
test-bpftool-skeletons.bin to the clean FILES list and explicit source
prerequisite test-clang-bpf-co-re.c.
2-4: Flattening Umbrella Prepare Barriers
- builtin-trace embedded inclusions and pmu-events generation are completely
decoupled from the sequential "prepare" umbrella target, eliminating Make
AST double-parsing overhead and unchoking parallel compilation barriers.
5-7: Decoupling & Pre-generating BPF Skeletons
- BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
- Decouples bpftool bootstrap from top-level static libbpf dependencies,
attaching bpf-skel-prepare directly to the umbrella prepare target. This
allows Make to pre-compile bpftool and dump vmlinux.h in the background at
build startup, removing the 7-second serialization bottleneck before BPF
object compilation.
- Ensures benchmark skeleton intermediate .bpf.o files are cleanly removed
during make clean, and adds bpf-skel-prepare to .PHONY.
8-9: Foundational Linkage Optimization
- Moves static libsymbol library prerequisites out of the prepare step.
- Eliminates redundant libbpf sub-make feature checks during static builds.
10-11: jevents.py Concurrency & Deduplication
- Splits the massive 2.8 MB big_c_string literal out of pmu-events.c into a
dedicated pmu-events-string.c compilation unit. This slices C compilation
latency in half by compiling string and struct tables simultaneously across
separate CPU cores while preserving zero dynamic ELF relocations. Adds
pmu-events-string.c to .gitignore, declares extern const char big_c_string[];
locally inside output_string_file and output_file when split to prevent linkage
conflicts with empty-pmu-events.c, defers file closures to ensure identical
timestamps, and uses canonical Make 4.0 @: dependency chaining.
- Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
all available CPU cores using ProcessPoolExecutor (accelerating Python
execution by 11x, from 3.3s down to ~290ms). Moves _init_worker to top-level
scope to ensure clean pickling under spawn multiprocessing start methods.
12: Out-of-Tree Incremental Rebuild Fix
- Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
Make from continuously re-executing script installation rules on already
built out-of-tree builds.
13-14: AST Parsing Optimization & Shell Fork Eradication
- Converts ZENS, ARMS, and INTELS in pmu-events/Build from recursive assignment
(=) to simply expanded assignment (:=) and replaces model_name/vendor_name
with pure GNU Make string functions. This guarantees Make executes directory
probing shell forks exactly once during AST parsing and evaluates path macros
purely in memory, completely eradicating over 7,800 redundant sub-processes
during out-of-tree build evaluation.
- Converts llvm-config shell queries in Makefile.config from recursive assignment
(=) to simply expanded assignment (:=). This eliminates ~185 redundant sub-processes
that were previously executed across object compilation dependency checks.
Changes since v6:
- Rebase/resend as last series failed to apply by Sashiko.
Ian Rogers (14):
tools build: Fix feature checks to touch target files on success
perf trace beauty: Make beauty generated C code standalone .o files
perf build: Decouple pmu-events from prepare umbrella target
perf build: Remove empty archheaders target
perf build: Move BPF skeleton generation out of Makefile.perf
perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
perf build: Pre-generate BPF skeleton tooling during umbrella prepare
phase
perf build: Move libsymbol dependency out of prepare step
perf build: Remove redundant libbpf feature check for static builds
perf pmu-events: Split big_c_string storage into standalone
compilation unit
perf pmu-events: Parallelize JSON and metric pre-computation in
jevents.py
perf build: Prefix SCRIPTS with output directory to fix continuous
rebuilds
perf pmu-events: Convert recursive shell assignments and macros to
Make built-ins
perf build: Convert llvm-config shell queries to simply expanded
variables
tools/build/feature/Makefile | 13 +-
tools/perf/.gitignore | 1 +
tools/perf/Build | 2 +
tools/perf/Makefile.config | 19 +-
tools/perf/Makefile.perf | 423 ++----------------
tools/perf/bench/Build | 6 +
.../bpf_skel/bench_uprobe.bpf.c | 0
tools/perf/bench/uprobe.c | 2 +-
tools/perf/bpf_skel.mak | 109 +++++
tools/perf/builtin-trace.c | 32 +-
tools/perf/pmu-events/Build | 26 +-
tools/perf/pmu-events/jevents.py | 59 ++-
tools/perf/trace/beauty/Build | 276 ++++++++++++
tools/perf/trace/beauty/arch_errno_names.c | 2 +
tools/perf/trace/beauty/arch_errno_names.sh | 2 +-
tools/perf/trace/beauty/beauty.h | 60 +++
tools/perf/trace/beauty/eventfd.c | 6 +-
tools/perf/trace/beauty/fsconfig.c | 5 +
tools/perf/trace/beauty/futex_op.c | 5 +-
tools/perf/trace/beauty/futex_val3.c | 5 +-
tools/perf/trace/beauty/mmap.c | 24 +-
tools/perf/trace/beauty/mode_t.c | 6 +-
tools/perf/trace/beauty/msg_flags.c | 8 +-
tools/perf/trace/beauty/open_flags.c | 2 +
tools/perf/trace/beauty/perf_event_open.c | 21 +-
tools/perf/trace/beauty/pid.c | 5 +-
tools/perf/trace/beauty/sched_policy.c | 8 +-
tools/perf/trace/beauty/seccomp.c | 12 +-
tools/perf/trace/beauty/signum.c | 6 +-
tools/perf/trace/beauty/socket_type.c | 6 +-
.../perf/{util => trace/beauty}/syscalltbl.c | 0
.../perf/{util => trace/beauty}/syscalltbl.h | 0
tools/perf/trace/beauty/tracepoints/Build | 21 +
tools/perf/trace/beauty/waitid_options.c | 8 +-
tools/perf/util/Build | 17 +-
tools/perf/util/bpf-trace-summary.c | 2 +-
tools/perf/util/env.c | 4 -
tools/perf/util/env.h | 1 +
38 files changed, 688 insertions(+), 516 deletions(-)
rename tools/perf/{util => bench}/bpf_skel/bench_uprobe.bpf.c (100%)
create mode 100644 tools/perf/bpf_skel.mak
create mode 100644 tools/perf/trace/beauty/fsconfig.c
rename tools/perf/{util => trace/beauty}/syscalltbl.c (100%)
rename tools/perf/{util => trace/beauty}/syscalltbl.h (100%)
--
2.54.0.563.g4f69b47b94-goog
On Mon, May 18, 2026 at 08:46:24AM -0700, Ian Rogers wrote:
> This patch series refactors Kbuild internals, BPF skeleton generation,
> Python AST pre-computation, and foundational tooling dependencies across
> the perf tool build system. By eliminating umbrella target synchronization
> barriers, decoupling static library prerequisites, parallelizing single-core
> script generators, and eradicating redundant feature checks, this series
> unlocks absolute theoretical peak multi-core concurrency during Kbuild startup.
>
> On a 28-core build workstation (make -j28 all from scratch), clean build
> latency improves by over 44%:
>
> Before:
> real 0m29.006s
> user 2m46.019s
> sys 0m30.610s
>
> After:
> real 0m16.091s
> user 2m40.135s
> sys 0m25.740s
>
> Saving 12.9 full seconds time per clean build. Furthermore, nothing to
> build incremental builds are improved by nearly 7x:
>
> Before:
> real 0m11.528s
> user 0m9.633s
> sys 0m6.965s
>
> After:
> real 0m1.717s
> user 0m1.682s
> sys 0m0.960s
>
> Summary of Patches:
>
> 1: Fast-Path Feature Detection
> - Refactors test-clang-bpf-co-re.bin and test-bpftool-skeletons.bin feature
> checks to group shell pipelines within curly braces and redirect both stdout
> and stderr to .make.output before touching $@ purely upon success
> (> $(@:.bin=.make.output) 2>&1 && touch $@). Grouping the pipeline ({ cmd1 | cmd2; })
> ensures that compiler stderr is successfully captured in .make.output rather
> than escaping to the parent shell. This perfectly matches standard Kbuild
> feature check conventions and ensures the target files are touched on disk
> purely upon success, allowing Kbuild to cache positive detections and avoid
> continuous sub-make re-evaluations during incremental builds. Adds
> test-bpftool-skeletons.bin to the clean FILES list and explicit source
> prerequisite test-clang-bpf-co-re.c.
I think patch 1 can be separated and needs Ack/Review from BPF folks.
>
> 2-4: Flattening Umbrella Prepare Barriers
> - builtin-trace embedded inclusions and pmu-events generation are completely
> decoupled from the sequential "prepare" umbrella target, eliminating Make
> AST double-parsing overhead and unchoking parallel compilation barriers.
>
> 5-7: Decoupling & Pre-generating BPF Skeletons
> - BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
> - Decouples bpftool bootstrap from top-level static libbpf dependencies,
> attaching bpf-skel-prepare directly to the umbrella prepare target. This
> allows Make to pre-compile bpftool and dump vmlinux.h in the background at
> build startup, removing the 7-second serialization bottleneck before BPF
> object compilation.
> - Ensures benchmark skeleton intermediate .bpf.o files are cleanly removed
> during make clean, and adds bpf-skel-prepare to .PHONY.
>
> 8-9: Foundational Linkage Optimization
> - Moves static libsymbol library prerequisites out of the prepare step.
> - Eliminates redundant libbpf sub-make feature checks during static builds.
>
> 10-11: jevents.py Concurrency & Deduplication
> - Splits the massive 2.8 MB big_c_string literal out of pmu-events.c into a
> dedicated pmu-events-string.c compilation unit. This slices C compilation
> latency in half by compiling string and struct tables simultaneously across
> separate CPU cores while preserving zero dynamic ELF relocations. Adds
> pmu-events-string.c to .gitignore, declares extern const char big_c_string[];
> locally inside output_string_file and output_file when split to prevent linkage
> conflicts with empty-pmu-events.c, defers file closures to ensure identical
> timestamps, and uses canonical Make 4.0 @: dependency chaining.
> - Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
> all available CPU cores using ProcessPoolExecutor (accelerating Python
> execution by 11x, from 3.3s down to ~290ms). Moves _init_worker to top-level
> scope to ensure clean pickling under spawn multiprocessing start methods.
>
> 12: Out-of-Tree Incremental Rebuild Fix
> - Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
> Make from continuously re-executing script installation rules on already
> built out-of-tree builds.
>
> 13-14: AST Parsing Optimization & Shell Fork Eradication
> - Converts ZENS, ARMS, and INTELS in pmu-events/Build from recursive assignment
> (=) to simply expanded assignment (:=) and replaces model_name/vendor_name
> with pure GNU Make string functions. This guarantees Make executes directory
> probing shell forks exactly once during AST parsing and evaluates path macros
> purely in memory, completely eradicating over 7,800 redundant sub-processes
> during out-of-tree build evaluation.
> - Converts llvm-config shell queries in Makefile.config from recursive assignment
> (=) to simply expanded assignment (:=). This eliminates ~185 redundant sub-processes
> that were previously executed across object compilation dependency checks.
>
> Changes since v6:
> - Rebase/resend as last series failed to apply by Sashiko.
>
> Ian Rogers (14):
> tools build: Fix feature checks to touch target files on success
> perf trace beauty: Make beauty generated C code standalone .o files
> perf build: Decouple pmu-events from prepare umbrella target
> perf build: Remove empty archheaders target
> perf build: Move BPF skeleton generation out of Makefile.perf
> perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
> perf build: Pre-generate BPF skeleton tooling during umbrella prepare
> phase
> perf build: Move libsymbol dependency out of prepare step
> perf build: Remove redundant libbpf feature check for static builds
> perf pmu-events: Split big_c_string storage into standalone
> compilation unit
> perf pmu-events: Parallelize JSON and metric pre-computation in
> jevents.py
> perf build: Prefix SCRIPTS with output directory to fix continuous
> rebuilds
> perf pmu-events: Convert recursive shell assignments and macros to
> Make built-ins
> perf build: Convert llvm-config shell queries to simply expanded
> variables
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Thanks,
Namhyung
On Tue, May 19, 2026 at 11:27:05AM -0700, Namhyung Kim wrote:
> On Mon, May 18, 2026 at 08:46:24AM -0700, Ian Rogers wrote:
> > This patch series refactors Kbuild internals, BPF skeleton generation,
> > Python AST pre-computation, and foundational tooling dependencies across
> > the perf tool build system. By eliminating umbrella target synchronization
> > barriers, decoupling static library prerequisites, parallelizing single-core
> > script generators, and eradicating redundant feature checks, this series
> > unlocks absolute theoretical peak multi-core concurrency during Kbuild startup.
> >
> > On a 28-core build workstation (make -j28 all from scratch), clean build
> > latency improves by over 44%:
> >
> > Before:
> > real 0m29.006s
> > user 2m46.019s
> > sys 0m30.610s
> >
> > After:
> > real 0m16.091s
> > user 2m40.135s
> > sys 0m25.740s
> >
> > Saving 12.9 full seconds time per clean build. Furthermore, nothing to
> > build incremental builds are improved by nearly 7x:
> >
> > Before:
> > real 0m11.528s
> > user 0m9.633s
> > sys 0m6.965s
> >
> > After:
> > real 0m1.717s
> > user 0m1.682s
> > sys 0m0.960s
> >
> > Summary of Patches:
> >
> > 1: Fast-Path Feature Detection
> > - Refactors test-clang-bpf-co-re.bin and test-bpftool-skeletons.bin feature
> > checks to group shell pipelines within curly braces and redirect both stdout
> > and stderr to .make.output before touching $@ purely upon success
> > (> $(@:.bin=.make.output) 2>&1 && touch $@). Grouping the pipeline ({ cmd1 | cmd2; })
> > ensures that compiler stderr is successfully captured in .make.output rather
> > than escaping to the parent shell. This perfectly matches standard Kbuild
> > feature check conventions and ensures the target files are touched on disk
> > purely upon success, allowing Kbuild to cache positive detections and avoid
> > continuous sub-make re-evaluations during incremental builds. Adds
> > test-bpftool-skeletons.bin to the clean FILES list and explicit source
> > prerequisite test-clang-bpf-co-re.c.
>
> I think patch 1 can be separated and needs Ack/Review from BPF folks.
>
> >
> > 2-4: Flattening Umbrella Prepare Barriers
> > - builtin-trace embedded inclusions and pmu-events generation are completely
> > decoupled from the sequential "prepare" umbrella target, eliminating Make
> > AST double-parsing overhead and unchoking parallel compilation barriers.
> >
> > 5-7: Decoupling & Pre-generating BPF Skeletons
> > - BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
> > - Decouples bpftool bootstrap from top-level static libbpf dependencies,
> > attaching bpf-skel-prepare directly to the umbrella prepare target. This
> > allows Make to pre-compile bpftool and dump vmlinux.h in the background at
> > build startup, removing the 7-second serialization bottleneck before BPF
> > object compilation.
> > - Ensures benchmark skeleton intermediate .bpf.o files are cleanly removed
> > during make clean, and adds bpf-skel-prepare to .PHONY.
> >
> > 8-9: Foundational Linkage Optimization
> > - Moves static libsymbol library prerequisites out of the prepare step.
> > - Eliminates redundant libbpf sub-make feature checks during static builds.
> >
> > 10-11: jevents.py Concurrency & Deduplication
> > - Splits the massive 2.8 MB big_c_string literal out of pmu-events.c into a
> > dedicated pmu-events-string.c compilation unit. This slices C compilation
> > latency in half by compiling string and struct tables simultaneously across
> > separate CPU cores while preserving zero dynamic ELF relocations. Adds
> > pmu-events-string.c to .gitignore, declares extern const char big_c_string[];
> > locally inside output_string_file and output_file when split to prevent linkage
> > conflicts with empty-pmu-events.c, defers file closures to ensure identical
> > timestamps, and uses canonical Make 4.0 @: dependency chaining.
> > - Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
> > all available CPU cores using ProcessPoolExecutor (accelerating Python
> > execution by 11x, from 3.3s down to ~290ms). Moves _init_worker to top-level
> > scope to ensure clean pickling under spawn multiprocessing start methods.
> >
> > 12: Out-of-Tree Incremental Rebuild Fix
> > - Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
> > Make from continuously re-executing script installation rules on already
> > built out-of-tree builds.
> >
> > 13-14: AST Parsing Optimization & Shell Fork Eradication
> > - Converts ZENS, ARMS, and INTELS in pmu-events/Build from recursive assignment
> > (=) to simply expanded assignment (:=) and replaces model_name/vendor_name
> > with pure GNU Make string functions. This guarantees Make executes directory
> > probing shell forks exactly once during AST parsing and evaluates path macros
> > purely in memory, completely eradicating over 7,800 redundant sub-processes
> > during out-of-tree build evaluation.
> > - Converts llvm-config shell queries in Makefile.config from recursive assignment
> > (=) to simply expanded assignment (:=). This eliminates ~185 redundant sub-processes
> > that were previously executed across object compilation dependency checks.
> >
> > Changes since v6:
> > - Rebase/resend as last series failed to apply by Sashiko.
> >
> > Ian Rogers (14):
> > tools build: Fix feature checks to touch target files on success
> > perf trace beauty: Make beauty generated C code standalone .o files
> > perf build: Decouple pmu-events from prepare umbrella target
> > perf build: Remove empty archheaders target
> > perf build: Move BPF skeleton generation out of Makefile.perf
> > perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
> > perf build: Pre-generate BPF skeleton tooling during umbrella prepare
> > phase
> > perf build: Move libsymbol dependency out of prepare step
> > perf build: Remove redundant libbpf feature check for static builds
> > perf pmu-events: Split big_c_string storage into standalone
> > compilation unit
> > perf pmu-events: Parallelize JSON and metric pre-computation in
> > jevents.py
> > perf build: Prefix SCRIPTS with output directory to fix continuous
> > rebuilds
> > perf pmu-events: Convert recursive shell assignments and macros to
> > Make built-ins
> > perf build: Convert llvm-config shell queries to simply expanded
> > variables
>
> Reviewed-by: Namhyung Kim <namhyung@kernel.org>
So this is for 2-14? I haven't checked if 1 can be left out of an
initial merge by me.
- Arnaldo
On Tue, May 19, 2026 at 11:49 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Tue, May 19, 2026 at 11:27:05AM -0700, Namhyung Kim wrote:
> > On Mon, May 18, 2026 at 08:46:24AM -0700, Ian Rogers wrote:
> > > This patch series refactors Kbuild internals, BPF skeleton generation,
> > > Python AST pre-computation, and foundational tooling dependencies across
> > > the perf tool build system. By eliminating umbrella target synchronization
> > > barriers, decoupling static library prerequisites, parallelizing single-core
> > > script generators, and eradicating redundant feature checks, this series
> > > unlocks absolute theoretical peak multi-core concurrency during Kbuild startup.
> > >
> > > On a 28-core build workstation (make -j28 all from scratch), clean build
> > > latency improves by over 44%:
> > >
> > > Before:
> > > real 0m29.006s
> > > user 2m46.019s
> > > sys 0m30.610s
> > >
> > > After:
> > > real 0m16.091s
> > > user 2m40.135s
> > > sys 0m25.740s
> > >
> > > Saving 12.9 full seconds time per clean build. Furthermore, nothing to
> > > build incremental builds are improved by nearly 7x:
> > >
> > > Before:
> > > real 0m11.528s
> > > user 0m9.633s
> > > sys 0m6.965s
> > >
> > > After:
> > > real 0m1.717s
> > > user 0m1.682s
> > > sys 0m0.960s
> > >
> > > Summary of Patches:
> > >
> > > 1: Fast-Path Feature Detection
> > > - Refactors test-clang-bpf-co-re.bin and test-bpftool-skeletons.bin feature
> > > checks to group shell pipelines within curly braces and redirect both stdout
> > > and stderr to .make.output before touching $@ purely upon success
> > > (> $(@:.bin=.make.output) 2>&1 && touch $@). Grouping the pipeline ({ cmd1 | cmd2; })
> > > ensures that compiler stderr is successfully captured in .make.output rather
> > > than escaping to the parent shell. This perfectly matches standard Kbuild
> > > feature check conventions and ensures the target files are touched on disk
> > > purely upon success, allowing Kbuild to cache positive detections and avoid
> > > continuous sub-make re-evaluations during incremental builds. Adds
> > > test-bpftool-skeletons.bin to the clean FILES list and explicit source
> > > prerequisite test-clang-bpf-co-re.c.
> >
> > I think patch 1 can be separated and needs Ack/Review from BPF folks.
> >
> > >
> > > 2-4: Flattening Umbrella Prepare Barriers
> > > - builtin-trace embedded inclusions and pmu-events generation are completely
> > > decoupled from the sequential "prepare" umbrella target, eliminating Make
> > > AST double-parsing overhead and unchoking parallel compilation barriers.
> > >
> > > 5-7: Decoupling & Pre-generating BPF Skeletons
> > > - BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
> > > - Decouples bpftool bootstrap from top-level static libbpf dependencies,
> > > attaching bpf-skel-prepare directly to the umbrella prepare target. This
> > > allows Make to pre-compile bpftool and dump vmlinux.h in the background at
> > > build startup, removing the 7-second serialization bottleneck before BPF
> > > object compilation.
> > > - Ensures benchmark skeleton intermediate .bpf.o files are cleanly removed
> > > during make clean, and adds bpf-skel-prepare to .PHONY.
> > >
> > > 8-9: Foundational Linkage Optimization
> > > - Moves static libsymbol library prerequisites out of the prepare step.
> > > - Eliminates redundant libbpf sub-make feature checks during static builds.
> > >
> > > 10-11: jevents.py Concurrency & Deduplication
> > > - Splits the massive 2.8 MB big_c_string literal out of pmu-events.c into a
> > > dedicated pmu-events-string.c compilation unit. This slices C compilation
> > > latency in half by compiling string and struct tables simultaneously across
> > > separate CPU cores while preserving zero dynamic ELF relocations. Adds
> > > pmu-events-string.c to .gitignore, declares extern const char big_c_string[];
> > > locally inside output_string_file and output_file when split to prevent linkage
> > > conflicts with empty-pmu-events.c, defers file closures to ensure identical
> > > timestamps, and uses canonical Make 4.0 @: dependency chaining.
> > > - Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
> > > all available CPU cores using ProcessPoolExecutor (accelerating Python
> > > execution by 11x, from 3.3s down to ~290ms). Moves _init_worker to top-level
> > > scope to ensure clean pickling under spawn multiprocessing start methods.
> > >
> > > 12: Out-of-Tree Incremental Rebuild Fix
> > > - Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
> > > Make from continuously re-executing script installation rules on already
> > > built out-of-tree builds.
> > >
> > > 13-14: AST Parsing Optimization & Shell Fork Eradication
> > > - Converts ZENS, ARMS, and INTELS in pmu-events/Build from recursive assignment
> > > (=) to simply expanded assignment (:=) and replaces model_name/vendor_name
> > > with pure GNU Make string functions. This guarantees Make executes directory
> > > probing shell forks exactly once during AST parsing and evaluates path macros
> > > purely in memory, completely eradicating over 7,800 redundant sub-processes
> > > during out-of-tree build evaluation.
> > > - Converts llvm-config shell queries in Makefile.config from recursive assignment
> > > (=) to simply expanded assignment (:=). This eliminates ~185 redundant sub-processes
> > > that were previously executed across object compilation dependency checks.
> > >
> > > Changes since v6:
> > > - Rebase/resend as last series failed to apply by Sashiko.
> > >
> > > Ian Rogers (14):
> > > tools build: Fix feature checks to touch target files on success
> > > perf trace beauty: Make beauty generated C code standalone .o files
> > > perf build: Decouple pmu-events from prepare umbrella target
> > > perf build: Remove empty archheaders target
> > > perf build: Move BPF skeleton generation out of Makefile.perf
> > > perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
> > > perf build: Pre-generate BPF skeleton tooling during umbrella prepare
> > > phase
> > > perf build: Move libsymbol dependency out of prepare step
> > > perf build: Remove redundant libbpf feature check for static builds
> > > perf pmu-events: Split big_c_string storage into standalone
> > > compilation unit
> > > perf pmu-events: Parallelize JSON and metric pre-computation in
> > > jevents.py
> > > perf build: Prefix SCRIPTS with output directory to fix continuous
> > > rebuilds
> > > perf pmu-events: Convert recursive shell assignments and macros to
> > > Make built-ins
> > > perf build: Convert llvm-config shell queries to simply expanded
> > > variables
> >
> > Reviewed-by: Namhyung Kim <namhyung@kernel.org>
>
> So this is for 2-14? I haven't checked if 1 can be left out of an
> initial merge by me.
I believe you are correct. Patch 1 is completely independent because
it is the only change in tools/build; everything else is in
tools/perf.
Thanks,
Ian
> - Arnaldo
On Tue, May 19, 2026 at 11:53:08AM -0700, Ian Rogers wrote:
> On Tue, May 19, 2026 at 11:49 AM Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
> >
> > On Tue, May 19, 2026 at 11:27:05AM -0700, Namhyung Kim wrote:
> > > On Mon, May 18, 2026 at 08:46:24AM -0700, Ian Rogers wrote:
> > > > This patch series refactors Kbuild internals, BPF skeleton generation,
> > > > Python AST pre-computation, and foundational tooling dependencies across
> > > > the perf tool build system. By eliminating umbrella target synchronization
> > > > barriers, decoupling static library prerequisites, parallelizing single-core
> > > > script generators, and eradicating redundant feature checks, this series
> > > > unlocks absolute theoretical peak multi-core concurrency during Kbuild startup.
> > > >
> > > > On a 28-core build workstation (make -j28 all from scratch), clean build
> > > > latency improves by over 44%:
> > > >
> > > > Before:
> > > > real 0m29.006s
> > > > user 2m46.019s
> > > > sys 0m30.610s
> > > >
> > > > After:
> > > > real 0m16.091s
> > > > user 2m40.135s
> > > > sys 0m25.740s
> > > >
> > > > Saving 12.9 full seconds time per clean build. Furthermore, nothing to
> > > > build incremental builds are improved by nearly 7x:
> > > >
> > > > Before:
> > > > real 0m11.528s
> > > > user 0m9.633s
> > > > sys 0m6.965s
> > > >
> > > > After:
> > > > real 0m1.717s
> > > > user 0m1.682s
> > > > sys 0m0.960s
> > > >
> > > > Summary of Patches:
> > > >
> > > > 1: Fast-Path Feature Detection
> > > > - Refactors test-clang-bpf-co-re.bin and test-bpftool-skeletons.bin feature
> > > > checks to group shell pipelines within curly braces and redirect both stdout
> > > > and stderr to .make.output before touching $@ purely upon success
> > > > (> $(@:.bin=.make.output) 2>&1 && touch $@). Grouping the pipeline ({ cmd1 | cmd2; })
> > > > ensures that compiler stderr is successfully captured in .make.output rather
> > > > than escaping to the parent shell. This perfectly matches standard Kbuild
> > > > feature check conventions and ensures the target files are touched on disk
> > > > purely upon success, allowing Kbuild to cache positive detections and avoid
> > > > continuous sub-make re-evaluations during incremental builds. Adds
> > > > test-bpftool-skeletons.bin to the clean FILES list and explicit source
> > > > prerequisite test-clang-bpf-co-re.c.
> > >
> > > I think patch 1 can be separated and needs Ack/Review from BPF folks.
> > >
> > > >
> > > > 2-4: Flattening Umbrella Prepare Barriers
> > > > - builtin-trace embedded inclusions and pmu-events generation are completely
> > > > decoupled from the sequential "prepare" umbrella target, eliminating Make
> > > > AST double-parsing overhead and unchoking parallel compilation barriers.
> > > >
> > > > 5-7: Decoupling & Pre-generating BPF Skeletons
> > > > - BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
> > > > - Decouples bpftool bootstrap from top-level static libbpf dependencies,
> > > > attaching bpf-skel-prepare directly to the umbrella prepare target. This
> > > > allows Make to pre-compile bpftool and dump vmlinux.h in the background at
> > > > build startup, removing the 7-second serialization bottleneck before BPF
> > > > object compilation.
> > > > - Ensures benchmark skeleton intermediate .bpf.o files are cleanly removed
> > > > during make clean, and adds bpf-skel-prepare to .PHONY.
> > > >
> > > > 8-9: Foundational Linkage Optimization
> > > > - Moves static libsymbol library prerequisites out of the prepare step.
> > > > - Eliminates redundant libbpf sub-make feature checks during static builds.
> > > >
> > > > 10-11: jevents.py Concurrency & Deduplication
> > > > - Splits the massive 2.8 MB big_c_string literal out of pmu-events.c into a
> > > > dedicated pmu-events-string.c compilation unit. This slices C compilation
> > > > latency in half by compiling string and struct tables simultaneously across
> > > > separate CPU cores while preserving zero dynamic ELF relocations. Adds
> > > > pmu-events-string.c to .gitignore, declares extern const char big_c_string[];
> > > > locally inside output_string_file and output_file when split to prevent linkage
> > > > conflicts with empty-pmu-events.c, defers file closures to ensure identical
> > > > timestamps, and uses canonical Make 4.0 @: dependency chaining.
> > > > - Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
> > > > all available CPU cores using ProcessPoolExecutor (accelerating Python
> > > > execution by 11x, from 3.3s down to ~290ms). Moves _init_worker to top-level
> > > > scope to ensure clean pickling under spawn multiprocessing start methods.
> > > >
> > > > 12: Out-of-Tree Incremental Rebuild Fix
> > > > - Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
> > > > Make from continuously re-executing script installation rules on already
> > > > built out-of-tree builds.
> > > >
> > > > 13-14: AST Parsing Optimization & Shell Fork Eradication
> > > > - Converts ZENS, ARMS, and INTELS in pmu-events/Build from recursive assignment
> > > > (=) to simply expanded assignment (:=) and replaces model_name/vendor_name
> > > > with pure GNU Make string functions. This guarantees Make executes directory
> > > > probing shell forks exactly once during AST parsing and evaluates path macros
> > > > purely in memory, completely eradicating over 7,800 redundant sub-processes
> > > > during out-of-tree build evaluation.
> > > > - Converts llvm-config shell queries in Makefile.config from recursive assignment
> > > > (=) to simply expanded assignment (:=). This eliminates ~185 redundant sub-processes
> > > > that were previously executed across object compilation dependency checks.
> > > >
> > > > Changes since v6:
> > > > - Rebase/resend as last series failed to apply by Sashiko.
> > > >
> > > > Ian Rogers (14):
> > > > tools build: Fix feature checks to touch target files on success
> > > > perf trace beauty: Make beauty generated C code standalone .o files
> > > > perf build: Decouple pmu-events from prepare umbrella target
> > > > perf build: Remove empty archheaders target
> > > > perf build: Move BPF skeleton generation out of Makefile.perf
> > > > perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
> > > > perf build: Pre-generate BPF skeleton tooling during umbrella prepare
> > > > phase
> > > > perf build: Move libsymbol dependency out of prepare step
> > > > perf build: Remove redundant libbpf feature check for static builds
> > > > perf pmu-events: Split big_c_string storage into standalone
> > > > compilation unit
> > > > perf pmu-events: Parallelize JSON and metric pre-computation in
> > > > jevents.py
> > > > perf build: Prefix SCRIPTS with output directory to fix continuous
> > > > rebuilds
> > > > perf pmu-events: Convert recursive shell assignments and macros to
> > > > Make built-ins
> > > > perf build: Convert llvm-config shell queries to simply expanded
> > > > variables
> > >
> > > Reviewed-by: Namhyung Kim <namhyung@kernel.org>
> >
> > So this is for 2-14? I haven't checked if 1 can be left out of an
> > initial merge by me.
>
> I believe you are correct. Patch 1 is completely independent because
> it is the only change in tools/build; everything else is in
> tools/perf.
Actually it goes to the patch 1 as well. But we can take 2-14 in the
perf tree first.
Thanks,
Namhyung
On Tue, May 19, 2026 at 05:18:31PM -0700, Namhyung Kim wrote:
> On Tue, May 19, 2026 at 11:53:08AM -0700, Ian Rogers wrote:
> > On Tue, May 19, 2026 at 11:49 AM Arnaldo Carvalho de Melo
> > <acme@kernel.org> wrote:
> > >
> > > On Tue, May 19, 2026 at 11:27:05AM -0700, Namhyung Kim wrote:
> > > > On Mon, May 18, 2026 at 08:46:24AM -0700, Ian Rogers wrote:
> > > > > This patch series refactors Kbuild internals, BPF skeleton generation,
> > > > > Python AST pre-computation, and foundational tooling dependencies across
> > > > > the perf tool build system. By eliminating umbrella target synchronization
> > > > > barriers, decoupling static library prerequisites, parallelizing single-core
> > > > > script generators, and eradicating redundant feature checks, this series
> > > > > unlocks absolute theoretical peak multi-core concurrency during Kbuild startup.
> > > > >
> > > > > On a 28-core build workstation (make -j28 all from scratch), clean build
> > > > > latency improves by over 44%:
> > > > >
> > > > > Before:
> > > > > real 0m29.006s
> > > > > user 2m46.019s
> > > > > sys 0m30.610s
> > > > >
> > > > > After:
> > > > > real 0m16.091s
> > > > > user 2m40.135s
> > > > > sys 0m25.740s
> > > > >
> > > > > Saving 12.9 full seconds time per clean build. Furthermore, nothing to
> > > > > build incremental builds are improved by nearly 7x:
> > > > >
> > > > > Before:
> > > > > real 0m11.528s
> > > > > user 0m9.633s
> > > > > sys 0m6.965s
> > > > >
> > > > > After:
> > > > > real 0m1.717s
> > > > > user 0m1.682s
> > > > > sys 0m0.960s
> > > > >
> > > > > Summary of Patches:
> > > > >
> > > > > 1: Fast-Path Feature Detection
> > > > > - Refactors test-clang-bpf-co-re.bin and test-bpftool-skeletons.bin feature
> > > > > checks to group shell pipelines within curly braces and redirect both stdout
> > > > > and stderr to .make.output before touching $@ purely upon success
> > > > > (> $(@:.bin=.make.output) 2>&1 && touch $@). Grouping the pipeline ({ cmd1 | cmd2; })
> > > > > ensures that compiler stderr is successfully captured in .make.output rather
> > > > > than escaping to the parent shell. This perfectly matches standard Kbuild
> > > > > feature check conventions and ensures the target files are touched on disk
> > > > > purely upon success, allowing Kbuild to cache positive detections and avoid
> > > > > continuous sub-make re-evaluations during incremental builds. Adds
> > > > > test-bpftool-skeletons.bin to the clean FILES list and explicit source
> > > > > prerequisite test-clang-bpf-co-re.c.
> > > >
> > > > I think patch 1 can be separated and needs Ack/Review from BPF folks.
> > > >
> > > > >
> > > > > 2-4: Flattening Umbrella Prepare Barriers
> > > > > - builtin-trace embedded inclusions and pmu-events generation are completely
> > > > > decoupled from the sequential "prepare" umbrella target, eliminating Make
> > > > > AST double-parsing overhead and unchoking parallel compilation barriers.
> > > > >
> > > > > 5-7: Decoupling & Pre-generating BPF Skeletons
> > > > > - BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
> > > > > - Decouples bpftool bootstrap from top-level static libbpf dependencies,
> > > > > attaching bpf-skel-prepare directly to the umbrella prepare target. This
> > > > > allows Make to pre-compile bpftool and dump vmlinux.h in the background at
> > > > > build startup, removing the 7-second serialization bottleneck before BPF
> > > > > object compilation.
> > > > > - Ensures benchmark skeleton intermediate .bpf.o files are cleanly removed
> > > > > during make clean, and adds bpf-skel-prepare to .PHONY.
> > > > >
> > > > > 8-9: Foundational Linkage Optimization
> > > > > - Moves static libsymbol library prerequisites out of the prepare step.
> > > > > - Eliminates redundant libbpf sub-make feature checks during static builds.
> > > > >
> > > > > 10-11: jevents.py Concurrency & Deduplication
> > > > > - Splits the massive 2.8 MB big_c_string literal out of pmu-events.c into a
> > > > > dedicated pmu-events-string.c compilation unit. This slices C compilation
> > > > > latency in half by compiling string and struct tables simultaneously across
> > > > > separate CPU cores while preserving zero dynamic ELF relocations. Adds
> > > > > pmu-events-string.c to .gitignore, declares extern const char big_c_string[];
> > > > > locally inside output_string_file and output_file when split to prevent linkage
> > > > > conflicts with empty-pmu-events.c, defers file closures to ensure identical
> > > > > timestamps, and uses canonical Make 4.0 @: dependency chaining.
> > > > > - Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
> > > > > all available CPU cores using ProcessPoolExecutor (accelerating Python
> > > > > execution by 11x, from 3.3s down to ~290ms). Moves _init_worker to top-level
> > > > > scope to ensure clean pickling under spawn multiprocessing start methods.
> > > > >
> > > > > 12: Out-of-Tree Incremental Rebuild Fix
> > > > > - Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
> > > > > Make from continuously re-executing script installation rules on already
> > > > > built out-of-tree builds.
> > > > >
> > > > > 13-14: AST Parsing Optimization & Shell Fork Eradication
> > > > > - Converts ZENS, ARMS, and INTELS in pmu-events/Build from recursive assignment
> > > > > (=) to simply expanded assignment (:=) and replaces model_name/vendor_name
> > > > > with pure GNU Make string functions. This guarantees Make executes directory
> > > > > probing shell forks exactly once during AST parsing and evaluates path macros
> > > > > purely in memory, completely eradicating over 7,800 redundant sub-processes
> > > > > during out-of-tree build evaluation.
> > > > > - Converts llvm-config shell queries in Makefile.config from recursive assignment
> > > > > (=) to simply expanded assignment (:=). This eliminates ~185 redundant sub-processes
> > > > > that were previously executed across object compilation dependency checks.
> > > > >
> > > > > Changes since v6:
> > > > > - Rebase/resend as last series failed to apply by Sashiko.
> > > > >
> > > > > Ian Rogers (14):
> > > > > tools build: Fix feature checks to touch target files on success
> > > > > perf trace beauty: Make beauty generated C code standalone .o files
> > > > > perf build: Decouple pmu-events from prepare umbrella target
> > > > > perf build: Remove empty archheaders target
> > > > > perf build: Move BPF skeleton generation out of Makefile.perf
> > > > > perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
> > > > > perf build: Pre-generate BPF skeleton tooling during umbrella prepare
> > > > > phase
> > > > > perf build: Move libsymbol dependency out of prepare step
> > > > > perf build: Remove redundant libbpf feature check for static builds
> > > > > perf pmu-events: Split big_c_string storage into standalone
> > > > > compilation unit
> > > > > perf pmu-events: Parallelize JSON and metric pre-computation in
> > > > > jevents.py
> > > > > perf build: Prefix SCRIPTS with output directory to fix continuous
> > > > > rebuilds
> > > > > perf pmu-events: Convert recursive shell assignments and macros to
> > > > > Make built-ins
> > > > > perf build: Convert llvm-config shell queries to simply expanded
> > > > > variables
> > > >
> > > > Reviewed-by: Namhyung Kim <namhyung@kernel.org>
> > >
> > > So this is for 2-14? I haven't checked if 1 can be left out of an
> > > initial merge by me.
> >
> > I believe you are correct. Patch 1 is completely independent because
> > it is the only change in tools/build; everything else is in
> > tools/perf.
>
> Actually it goes to the patch 1 as well. But we can take 2-14 in the
> perf tree first.
Ok, lets go with 2-14, we can look at 1 later.
- Arnaldo
On Tue, May 19, 2026 at 11:27 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Mon, May 18, 2026 at 08:46:24AM -0700, Ian Rogers wrote:
> > This patch series refactors Kbuild internals, BPF skeleton generation,
> > Python AST pre-computation, and foundational tooling dependencies across
> > the perf tool build system. By eliminating umbrella target synchronization
> > barriers, decoupling static library prerequisites, parallelizing single-core
> > script generators, and eradicating redundant feature checks, this series
> > unlocks absolute theoretical peak multi-core concurrency during Kbuild startup.
> >
> > On a 28-core build workstation (make -j28 all from scratch), clean build
> > latency improves by over 44%:
> >
> > Before:
> > real 0m29.006s
> > user 2m46.019s
> > sys 0m30.610s
> >
> > After:
> > real 0m16.091s
> > user 2m40.135s
> > sys 0m25.740s
> >
> > Saving 12.9 full seconds time per clean build. Furthermore, nothing to
> > build incremental builds are improved by nearly 7x:
> >
> > Before:
> > real 0m11.528s
> > user 0m9.633s
> > sys 0m6.965s
> >
> > After:
> > real 0m1.717s
> > user 0m1.682s
> > sys 0m0.960s
> >
> > Summary of Patches:
> >
> > 1: Fast-Path Feature Detection
> > - Refactors test-clang-bpf-co-re.bin and test-bpftool-skeletons.bin feature
> > checks to group shell pipelines within curly braces and redirect both stdout
> > and stderr to .make.output before touching $@ purely upon success
> > (> $(@:.bin=.make.output) 2>&1 && touch $@). Grouping the pipeline ({ cmd1 | cmd2; })
> > ensures that compiler stderr is successfully captured in .make.output rather
> > than escaping to the parent shell. This perfectly matches standard Kbuild
> > feature check conventions and ensures the target files are touched on disk
> > purely upon success, allowing Kbuild to cache positive detections and avoid
> > continuous sub-make re-evaluations during incremental builds. Adds
> > test-bpftool-skeletons.bin to the clean FILES list and explicit source
> > prerequisite test-clang-bpf-co-re.c.
>
> I think patch 1 can be separated and needs Ack/Review from BPF folks.
So I dropped the patch restricting the features tested when making
bpftool in bootstrap mode as perf does:
https://lore.kernel.org/linux-perf-users/20260514163409.927816-2-irogers@google.com/
We don't need to test for libbfd disassembly support for bpftool if we
just need the bootstrap version. But anyway, I dropped that patch and
hopefully someone on the BPF side can take it or something. I find the
BPF mailing list has a difficult user interface.
The only patch remaining is:
https://lore.kernel.org/linux-perf-users/20260518154638.2798789-2-irogers@google.com/
```
...
-$(OUTPUT)test-clang-bpf-co-re.bin:
- $(CLANG) -S -g --target=bpf -o - $(patsubst %.bin,%.c,$(@F)) | \
- grep BTF_KIND_VAR
+$(OUTPUT)test-clang-bpf-co-re.bin: test-clang-bpf-co-re.c
+ { $(CLANG) -S -g --target=bpf -o - $< | \
+ grep BTF_KIND_VAR; } > $(@:.bin=.make.output) 2>&1 && touch $@
$(OUTPUT)test-file-handle.bin:
$(BUILD)
@@ -393,8 +394,8 @@ $(OUTPUT)test-libopenssl.bin:
$(BUILD) $(shell $(PKG_CONFIG) --libs --cflags openssl 2>/dev/null)
$(OUTPUT)test-bpftool-skeletons.bin:
- $(SYSTEM_BPFTOOL) version | grep '^features:.*skeletons' \
- > $(@:.bin=.make.output) 2>&1
+ { $(SYSTEM_BPFTOOL) version | grep '^features:.*skeletons'; } \
+ > $(@:.bin=.make.output) 2>&1 && touch $@
...
```
which just adds `&& touch $@` so that every incremental build doesn't
re-detect these features. None of the logic changes, so I was hoping
that a BPF sign off wouldn't be a requirement. But anyway, let's hope
a BPF person responds.
> >
> > 2-4: Flattening Umbrella Prepare Barriers
> > - builtin-trace embedded inclusions and pmu-events generation are completely
> > decoupled from the sequential "prepare" umbrella target, eliminating Make
> > AST double-parsing overhead and unchoking parallel compilation barriers.
> >
> > 5-7: Decoupling & Pre-generating BPF Skeletons
> > - BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
> > - Decouples bpftool bootstrap from top-level static libbpf dependencies,
> > attaching bpf-skel-prepare directly to the umbrella prepare target. This
> > allows Make to pre-compile bpftool and dump vmlinux.h in the background at
> > build startup, removing the 7-second serialization bottleneck before BPF
> > object compilation.
> > - Ensures benchmark skeleton intermediate .bpf.o files are cleanly removed
> > during make clean, and adds bpf-skel-prepare to .PHONY.
> >
> > 8-9: Foundational Linkage Optimization
> > - Moves static libsymbol library prerequisites out of the prepare step.
> > - Eliminates redundant libbpf sub-make feature checks during static builds.
> >
> > 10-11: jevents.py Concurrency & Deduplication
> > - Splits the massive 2.8 MB big_c_string literal out of pmu-events.c into a
> > dedicated pmu-events-string.c compilation unit. This slices C compilation
> > latency in half by compiling string and struct tables simultaneously across
> > separate CPU cores while preserving zero dynamic ELF relocations. Adds
> > pmu-events-string.c to .gitignore, declares extern const char big_c_string[];
> > locally inside output_string_file and output_file when split to prevent linkage
> > conflicts with empty-pmu-events.c, defers file closures to ensure identical
> > timestamps, and uses canonical Make 4.0 @: dependency chaining.
> > - Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
> > all available CPU cores using ProcessPoolExecutor (accelerating Python
> > execution by 11x, from 3.3s down to ~290ms). Moves _init_worker to top-level
> > scope to ensure clean pickling under spawn multiprocessing start methods.
> >
> > 12: Out-of-Tree Incremental Rebuild Fix
> > - Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
> > Make from continuously re-executing script installation rules on already
> > built out-of-tree builds.
> >
> > 13-14: AST Parsing Optimization & Shell Fork Eradication
> > - Converts ZENS, ARMS, and INTELS in pmu-events/Build from recursive assignment
> > (=) to simply expanded assignment (:=) and replaces model_name/vendor_name
> > with pure GNU Make string functions. This guarantees Make executes directory
> > probing shell forks exactly once during AST parsing and evaluates path macros
> > purely in memory, completely eradicating over 7,800 redundant sub-processes
> > during out-of-tree build evaluation.
> > - Converts llvm-config shell queries in Makefile.config from recursive assignment
> > (=) to simply expanded assignment (:=). This eliminates ~185 redundant sub-processes
> > that were previously executed across object compilation dependency checks.
> >
> > Changes since v6:
> > - Rebase/resend as last series failed to apply by Sashiko.
> >
> > Ian Rogers (14):
> > tools build: Fix feature checks to touch target files on success
> > perf trace beauty: Make beauty generated C code standalone .o files
> > perf build: Decouple pmu-events from prepare umbrella target
> > perf build: Remove empty archheaders target
> > perf build: Move BPF skeleton generation out of Makefile.perf
> > perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
> > perf build: Pre-generate BPF skeleton tooling during umbrella prepare
> > phase
> > perf build: Move libsymbol dependency out of prepare step
> > perf build: Remove redundant libbpf feature check for static builds
> > perf pmu-events: Split big_c_string storage into standalone
> > compilation unit
> > perf pmu-events: Parallelize JSON and metric pre-computation in
> > jevents.py
> > perf build: Prefix SCRIPTS with output directory to fix continuous
> > rebuilds
> > perf pmu-events: Convert recursive shell assignments and macros to
> > Make built-ins
> > perf build: Convert llvm-config shell queries to simply expanded
> > variables
>
> Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Thanks!
Ian
> Thanks,
> Namhyung
>
© 2016 - 2026 Red Hat, Inc.