scripts/Makefile.build | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Use readelf to dig out if <file>.o contain a __export_symbol_*.
Instead of nm, readelf is more faster, and significantly improve speed
when enable CONFIG_MODVERSIONS.
Build x86_64_defconfigs in 2C4T cloud server with CONFIG_MODVERSIONS=y:
With patch:
real 17m21.019s
user 61m48.388s
sys 4m27.709s
Without patch:
real 17m39.435s
user 62m24.686s
sys 5m3.200s
Link: https://lore.kernel.org/all/tencent_2FA16E0A18D6D0C0703F5D49@qq.com/
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
---
scripts/Makefile.build | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 3498d25b15e85..54a91bc144cce 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -233,7 +233,7 @@ ifdef CONFIG_MODVERSIONS
# be compiled and linked to the kernel and/or modules.
gen_symversions = \
- if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \
+ if $(READELF) -sW $@ 2>/dev/null | grep -q ' __export_symbol_'; then \
$(cmd_gensymtypes_$1) >> $(dot-target).cmd; \
fi
--
2.30.2
On Thu, Jun 04, 2026 at 12:17:32AM +0800, Wentao Guan wrote: > Use readelf to dig out if <file>.o contain a __export_symbol_*. > > Instead of nm, readelf is more faster, and significantly improve speed > when enable CONFIG_MODVERSIONS. > > Build x86_64_defconfigs in 2C4T cloud server with CONFIG_MODVERSIONS=y: > With patch: > real 17m21.019s > user 61m48.388s > sys 4m27.709s > Without patch: > real 17m39.435s > user 62m24.686s > sys 5m3.200s > > Link: https://lore.kernel.org/all/tencent_2FA16E0A18D6D0C0703F5D49@qq.com/ > Signed-off-by: Wentao Guan <guanwentao@uniontech.com> > --- > scripts/Makefile.build | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/scripts/Makefile.build b/scripts/Makefile.build > index 3498d25b15e85..54a91bc144cce 100644 > --- a/scripts/Makefile.build > +++ b/scripts/Makefile.build > @@ -233,7 +233,7 @@ ifdef CONFIG_MODVERSIONS > # be compiled and linked to the kernel and/or modules. > > gen_symversions = \ > - if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ > + if $(READELF) -sW $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ This breaks modversioning for Clang LTO builds, as llvm-nm can read LLVM bitcode but llvm-readelf cannot, it expects strictly ELF. Is there any performance gain with adding '-m1' to the grep command so that it stops looking for a match after the first export symbol is found? > $(cmd_gensymtypes_$1) >> $(dot-target).cmd; \ > fi > > -- > 2.30.2 > -- Cheers, Nathan
Hello, > On Thu, Jun 04, 2026 at 12:17:32AM +0800, Wentao Guan wrote: > > Use readelf to dig out if <file>.o contain a __export_symbol_*. > > > > Instead of nm, readelf is more faster, and significantly improve speed > > when enable CONFIG_MODVERSIONS. > > > > Build x86_64_defconfigs in 2C4T cloud server with CONFIG_MODVERSIONS=y: > > With patch: > > real 17m21.019s > > user 61m48.388s > > sys 4m27.709s > > Without patch: > > real 17m39.435s > > user 62m24.686s > > sys 5m3.200s > > > > Link: https://lore.kernel.org/all/tencent_2FA16E0A18D6D0C0703F5D49@qq.com/ > > Signed-off-by: Wentao Guan <guanwentao@uniontech.com> > > --- > > scripts/Makefile.build | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/scripts/Makefile.build b/scripts/Makefile.build > > index 3498d25b15e85..54a91bc144cce 100644 > > --- a/scripts/Makefile.build > > +++ b/scripts/Makefile.build > > @@ -233,7 +233,7 @@ ifdef CONFIG_MODVERSIONS > > # be compiled and linked to the kernel and/or modules. > > > > gen_symversions = \ > > - if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ > > + if $(READELF) -sW $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ > > This breaks modversioning for Clang LTO builds, as llvm-nm can read LLVM > bitcode but llvm-readelf cannot, it expects strictly ELF. Oh, is it worth to use the following logic to detect LLVM or LLVM-LTO or not ? +ifeq ($(LLVM),) + SYM_CHECK = $(READELF) -sW +else + SYM_CHECK = $(NM) +endif gen_symversions = \ - if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ + if $(SYM_CHECK) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ > Is there any performance gain with adding '-m1' to the grep command so > that it stops looking for a match after the first export symbol is > found? Small, there are my test result in make x86_64_defconfig + enable CONFIG_MODVERSIONS: 1. readelf if $(READELF) $@ 2>/dev/null | grep -q ' __export_symbol_'; real 10m44.359s user 37m43.596s sys 3m2.424s 2. nm if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; real 11m8.008s user 38m51.644s sys 3m29.798s 3. nm + grep -m1 -q if $(NM) $@ 2>/dev/null | grep -m1 -q ' __export_symbol_'; real 10m56.891s user 38m8.136s sys 3m28.096s These test based on default gcc toolchain in ubuntu noble. I will do more test which use llvm-nm and llvm-readelf. BRs Wentao Guan
On Thu, Jun 04, 2026 at 11:44:29AM +0800, Wentao Guan wrote:
> Hello,
>
> > On Thu, Jun 04, 2026 at 12:17:32AM +0800, Wentao Guan wrote:
> > > Use readelf to dig out if <file>.o contain a __export_symbol_*.
> > >
> > > Instead of nm, readelf is more faster, and significantly improve speed
> > > when enable CONFIG_MODVERSIONS.
> > >
> > > Build x86_64_defconfigs in 2C4T cloud server with CONFIG_MODVERSIONS=y:
> > > With patch:
> > > real 17m21.019s
> > > user 61m48.388s
> > > sys 4m27.709s
> > > Without patch:
> > > real 17m39.435s
> > > user 62m24.686s
> > > sys 5m3.200s
> > >
> > > Link: https://lore.kernel.org/all/tencent_2FA16E0A18D6D0C0703F5D49@qq.com/
> > > Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
> > > ---
> > > scripts/Makefile.build | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> > > index 3498d25b15e85..54a91bc144cce 100644
> > > --- a/scripts/Makefile.build
> > > +++ b/scripts/Makefile.build
> > > @@ -233,7 +233,7 @@ ifdef CONFIG_MODVERSIONS
> > > # be compiled and linked to the kernel and/or modules.
> > >
> > > gen_symversions = \
> > > - if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \
> > > + if $(READELF) -sW $@ 2>/dev/null | grep -q ' __export_symbol_'; then \
> >
> > This breaks modversioning for Clang LTO builds, as llvm-nm can read LLVM
> > bitcode but llvm-readelf cannot, it expects strictly ELF.
> Oh, is it worth to use the following logic to detect LLVM or LLVM-LTO or not ?
> +ifeq ($(LLVM),)
This should probably be CONFIG_LTO_CLANG with flipped branches but...
> + SYM_CHECK = $(READELF) -sW
> +else
> + SYM_CHECK = $(NM)
> +endif
> gen_symversions = \
> - if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \
> + if $(SYM_CHECK) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \
>
> > that it stops looking for a match after the first export symbol is
> > found?
> Small, there are my test result in make x86_64_defconfig + enable CONFIG_MODVERSIONS:
> 1. readelf
> if $(READELF) $@ 2>/dev/null | grep -q ' __export_symbol_';
> real 10m44.359s
> user 37m43.596s
> sys 3m2.424s
> 2. nm
> if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_';
> real 11m8.008s
> user 38m51.644s
> sys 3m29.798s
> 3. nm + grep -m1 -q
> if $(NM) $@ 2>/dev/null | grep -m1 -q ' __export_symbol_';
> real 10m56.891s
> user 38m8.136s
> sys 3m28.096s
'-m1' appears to get us 50% (12s) of the speed up of 'readelf' (24s) in
your environment while sticking with 'nm'. I would be more inclined to
take that change since it is small and correct, rather than switching on
NM or READELF, as I don't think it is worth the additional complexity.
FWIW, on one of my test machines with 8 cores and 16 threads, the
difference is much less noticeable. I think that is going to be in line
with most developer and build farm hardware, rather than a 2C/4T machine
like you mention in the initial commit message.
GCC 16.1.0 + binutils 2.46:
Benchmark 1: $(NM)
Time (mean ± σ): 75.203 s ± 0.283 s [User: 659.465 s, System: 185.605 s]
Range (min … max): 74.898 s … 75.457 s 3 runs
Benchmark 2: $(READELF) -sW
Time (mean ± σ): 73.055 s ± 0.465 s [User: 642.365 s, System: 175.908 s]
Range (min … max): 72.523 s … 73.385 s 3 runs
Summary
$(READELF) -sW ran
1.03 ± 0.01 times faster than $(NM)
LLVM 22:
Benchmark 1: $(NM)
Time (mean ± σ): 75.030 s ± 0.736 s [User: 659.603 s, System: 185.257 s]
Range (min … max): 74.207 s … 75.623 s 3 runs
Benchmark 2: $(READELF) -sW
Time (mean ± σ): 73.405 s ± 0.457 s [User: 642.512 s, System: 176.440 s]
Range (min … max): 72.878 s … 73.679 s 3 runs
Summary
$(READELF) -sW ran
1.02 ± 0.01 times faster than $(NM)
--
Cheers,
Nathan
Hello,
> This should probably be CONFIG_LTO_CLANG with flipped branches but...
Right!
> '-m1' appears to get us 50% (12s) of the speed up of 'readelf' (24s) in
> your environment while sticking with 'nm'. I would be more inclined to
> take that change since it is small and correct, rather than switching on
> NM or READELF, as I don't think it is worth the additional complexity.
> FWIW, on one of my test machines with 8 cores and 16 threads, the
> difference is much less noticeable. I think that is going to be in line
> with most developer and build farm hardware, rather than a 2C/4T machine
> like you mention in the initial commit message.
Sorry, it seems my cloud servies provider cause my results up and down:(,
also maybe first compile time not stable, so I tested in a 20 cores/28 threads
bare metal envirment , here is the result:
Intel(R) Core(TM) i7-14700HX + 32GB + NVMe ssd
gcc version 12.3.0 binutils 2.46
clang version 18.1.7
source kernel tag v7.0
summary:
1. still benifit from nm to readelf in 20core/28threads
(I think there more costs in libbfd in nm, show high cost down in sys time,
I guess it cause more memory acces bottle neck to effect overall compile process)
but seems no these different when change llvm-18-nm to llvm-18-readelf
2. -m1 seems no expect effect...
test scripts:
https://gist.github.com/opsiff/832baa9a6986343dddbe530fbee57f52
Makefile.build-nm-m1 : 'grep -q' -> 'grep -m1 -q'
Makefile.build-orig : orig Makefile.build
Makefile.build-readelf : 'NM' -> 'READELF -sW'
Makefile.build-readelf-m1: 'NM' -> 'READELF -sW' , 'grep -q' -> 'grep -m1 -q'
full result:
1. run x86_64_defconfig + modversions x3(base)
if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \
real 2m2.876s real 2m2.578s real 2m2.262s
user 42m15.871s user 42m35.250s user 42m33.679s
sys 5m52.904s sys 5m52.478s sys 5m49.009s
2. if $(READELF) -sW $@ 2>/dev/null | grep -q __export_symbol_; then
real 1m54.931s real 1m55.192s real 1m55.207s
user 41m4.162s user 41m7.754s user 41m5.791s
sys 4m8.422s sys 4m8.431s sys 4m9.219s
3. if $(NM) $@ 2>/dev/null | grep -m1 -q __export_symbol_; then \
real 2m1.865s real 2m1.866s real 2m2.108s
user 42m32.891s user 42m35.047s user 42m33.834s
sys 5m48.045s sys 5m47.700s sys 5m48.200s
4. if $(READELF) -sW $@ 2>/dev/null | grep -m1 -q ' __export_symbol_'; then \
real 1m55.386s real 1m56.528s real 1m55.489s
user 41m6.156s user 41m12.321s user 41m10.545s
sys 4m10.093s sys 4m9.838s sys 4m9.367s
5. LLVM run x86_64_defconfig + modversions x3(base)
if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \
real 2m35.758s real 2m32.696s real 2m32.127s
user 58m2.416s user 57m55.030s user 57m54.806s
sys 4m20.735s sys 4m18.473s sys 4m18.090s
6. LLVM if $(READELF) -sW $@ 2>/dev/null | grep -q ' __export_symbol_'; then \
real 2m32.448s real 2m32.419s real 2m32.509s
user 57m57.262s user 57m53.001s user 57m48.842s
sys 4m20.508s sys 4m20.693s sys 4m20.490s
7. LLVM if $(NM) $@ 2>/dev/null | grep -m1 -q ' __export_symbol_'; then \
real 2m32.003s real 2m31.900s real 2m32.276s
user 57m45.786s user 57m46.982s user 57m49.907s
sys 4m18.184s sys 4m17.923s sys 4m18.354s
8. LLVM if $(READELF) -sW $@ 2>/dev/null | grep -m1 -q ' __export_symbol_'; then \
real 2m33.365s real 2m32.186s real 2m32.114s
user 57m49.533s user 57m47.865s user 57m46.591s
sys 4m19.809s sys 4m20.652s sys 4m19.954s
9. LLVM LTO_THIN run x86_64_defconfig + modversions x3(base)
if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \
real 3m59.411s real 3m55.945s real 3m56.557s
user 59m38.877s user 59m20.007s user 59m19.009s
sys 4m21.582s sys 4m22.313s sys 4m23.793s
10. LLVM LTO_THIN if $(NM) $@ 2>/dev/null | grep -m1 -q ' __export_symbol_'; then \
real 3m55.722s real 3m56.641s real 3m57.979s
user 59m21.865s user 59m25.634s user 59m20.872s
sys 4m21.303s sys 4m24.174s sys 4m22.695s
Full log:
https://gist.github.com/opsiff/1cd7e0a0553c8416dd13a7e92590a440
If you have any other ideas, i will happly to test them,
i will try to use llvm-nm instead of nm to test.
BRs
Wentao Guan
© 2016 - 2026 Red Hat, Inc.