[PATCH] kbuild: try readelf first in gen_symversions

Wentao Guan posted 1 patch 4 days, 15 hours ago
scripts/Makefile.build | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] kbuild: try readelf first in gen_symversions
Posted by Wentao Guan 4 days, 15 hours ago
Use readelf to dig out if <file>.o contain a __export_symbol_*.

Instead of nm, readelf is more faster, and significantly improve speed
when enable CONFIG_MODVERSIONS.

Build x86_64_defconfigs in 2C4T cloud server with CONFIG_MODVERSIONS=y:
With patch:
real    17m21.019s
user    61m48.388s
sys     4m27.709s
Without patch:
real    17m39.435s
user    62m24.686s
sys     5m3.200s

Link: https://lore.kernel.org/all/tencent_2FA16E0A18D6D0C0703F5D49@qq.com/
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
---
 scripts/Makefile.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 3498d25b15e85..54a91bc144cce 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -233,7 +233,7 @@ ifdef CONFIG_MODVERSIONS
 #   be compiled and linked to the kernel and/or modules.
 
 gen_symversions =								\
-	if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then		\
+	if $(READELF) -sW $@ 2>/dev/null | grep -q ' __export_symbol_'; then		\
 		$(cmd_gensymtypes_$1) >> $(dot-target).cmd;			\
 	fi
 
-- 
2.30.2
Re: [PATCH] kbuild: try readelf first in gen_symversions
Posted by Nathan Chancellor 4 days, 5 hours ago
On Thu, Jun 04, 2026 at 12:17:32AM +0800, Wentao Guan wrote:
> Use readelf to dig out if <file>.o contain a __export_symbol_*.
> 
> Instead of nm, readelf is more faster, and significantly improve speed
> when enable CONFIG_MODVERSIONS.
> 
> Build x86_64_defconfigs in 2C4T cloud server with CONFIG_MODVERSIONS=y:
> With patch:
> real    17m21.019s
> user    61m48.388s
> sys     4m27.709s
> Without patch:
> real    17m39.435s
> user    62m24.686s
> sys     5m3.200s
> 
> Link: https://lore.kernel.org/all/tencent_2FA16E0A18D6D0C0703F5D49@qq.com/
> Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
> ---
>  scripts/Makefile.build | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> index 3498d25b15e85..54a91bc144cce 100644
> --- a/scripts/Makefile.build
> +++ b/scripts/Makefile.build
> @@ -233,7 +233,7 @@ ifdef CONFIG_MODVERSIONS
>  #   be compiled and linked to the kernel and/or modules.
>  
>  gen_symversions =								\
> -	if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then		\
> +	if $(READELF) -sW $@ 2>/dev/null | grep -q ' __export_symbol_'; then		\

This breaks modversioning for Clang LTO builds, as llvm-nm can read LLVM
bitcode but llvm-readelf cannot, it expects strictly ELF.

Is there any performance gain with adding '-m1' to the grep command so
that it stops looking for a match after the first export symbol is
found?

>  		$(cmd_gensymtypes_$1) >> $(dot-target).cmd;			\
>  	fi
>  
> -- 
> 2.30.2
> 

-- 
Cheers,
Nathan
Re: [PATCH] kbuild: try readelf first in gen_symversions
Posted by Wentao Guan 4 days, 3 hours ago
Hello,

> On Thu, Jun 04, 2026 at 12:17:32AM +0800, Wentao Guan wrote:
> > Use readelf to dig out if <file>.o contain a __export_symbol_*.
> >
> > Instead of nm, readelf is more faster, and significantly improve speed
> > when enable CONFIG_MODVERSIONS.
> >
> > Build x86_64_defconfigs in 2C4T cloud server with CONFIG_MODVERSIONS=y:
> > With patch:
> > real    17m21.019s
> > user    61m48.388s
> > sys     4m27.709s
> > Without patch:
> > real    17m39.435s
> > user    62m24.686s
> > sys     5m3.200s
> >
> > Link: https://lore.kernel.org/all/tencent_2FA16E0A18D6D0C0703F5D49@qq.com/
> > Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
> > ---
> >  scripts/Makefile.build | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> > index 3498d25b15e85..54a91bc144cce 100644
> > --- a/scripts/Makefile.build
> > +++ b/scripts/Makefile.build
> > @@ -233,7 +233,7 @@ ifdef CONFIG_MODVERSIONS
> >  #   be compiled and linked to the kernel and/or modules.
> > 
> >  gen_symversions = \
> > - if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \
> > + if $(READELF) -sW $@ 2>/dev/null | grep -q ' __export_symbol_'; then \
> 
> This breaks modversioning for Clang LTO builds, as llvm-nm can read LLVM
> bitcode but llvm-readelf cannot, it expects strictly ELF.
Oh, is it worth to use the following logic to detect LLVM or LLVM-LTO or not ?
+ifeq ($(LLVM),)
+  SYM_CHECK = $(READELF) -sW
+else
+  SYM_CHECK = $(NM)
+endif 
 gen_symversions =								\
-	if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then		\
+	if $(SYM_CHECK) $@ 2>/dev/null | grep -q ' __export_symbol_'; then	\

> Is there any performance gain with adding '-m1' to the grep command so
> that it stops looking for a match after the first export symbol is
> found?
Small, there are my test result in make x86_64_defconfig + enable CONFIG_MODVERSIONS:
1. readelf
if $(READELF) $@ 2>/dev/null | grep -q ' __export_symbol_';
real    10m44.359s
user    37m43.596s
sys     3m2.424s
2. nm
if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_';
real    11m8.008s
user    38m51.644s
sys     3m29.798s
3. nm + grep -m1 -q
if $(NM) $@ 2>/dev/null | grep -m1 -q ' __export_symbol_';
real    10m56.891s
user    38m8.136s
sys     3m28.096s

These test based on default gcc toolchain in ubuntu noble.
I will do more test which use llvm-nm and llvm-readelf.

BRs
Wentao Guan
Re: [PATCH] kbuild: try readelf first in gen_symversions
Posted by Nathan Chancellor 3 days, 1 hour ago
On Thu, Jun 04, 2026 at 11:44:29AM +0800, Wentao Guan wrote:
> Hello,
> 
> > On Thu, Jun 04, 2026 at 12:17:32AM +0800, Wentao Guan wrote:
> > > Use readelf to dig out if <file>.o contain a __export_symbol_*.
> > >
> > > Instead of nm, readelf is more faster, and significantly improve speed
> > > when enable CONFIG_MODVERSIONS.
> > >
> > > Build x86_64_defconfigs in 2C4T cloud server with CONFIG_MODVERSIONS=y:
> > > With patch:
> > > real    17m21.019s
> > > user    61m48.388s
> > > sys     4m27.709s
> > > Without patch:
> > > real    17m39.435s
> > > user    62m24.686s
> > > sys     5m3.200s
> > >
> > > Link: https://lore.kernel.org/all/tencent_2FA16E0A18D6D0C0703F5D49@qq.com/
> > > Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
> > > ---
> > >  scripts/Makefile.build | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> > > index 3498d25b15e85..54a91bc144cce 100644
> > > --- a/scripts/Makefile.build
> > > +++ b/scripts/Makefile.build
> > > @@ -233,7 +233,7 @@ ifdef CONFIG_MODVERSIONS
> > >  #   be compiled and linked to the kernel and/or modules.
> > > 
> > >  gen_symversions = \
> > > - if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \
> > > + if $(READELF) -sW $@ 2>/dev/null | grep -q ' __export_symbol_'; then \
> > 
> > This breaks modversioning for Clang LTO builds, as llvm-nm can read LLVM
> > bitcode but llvm-readelf cannot, it expects strictly ELF.
> Oh, is it worth to use the following logic to detect LLVM or LLVM-LTO or not ?
> +ifeq ($(LLVM),)

This should probably be CONFIG_LTO_CLANG with flipped branches but...

> +  SYM_CHECK = $(READELF) -sW
> +else
> +  SYM_CHECK = $(NM)
> +endif 
>  gen_symversions =								\
> -	if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then		\
> +	if $(SYM_CHECK) $@ 2>/dev/null | grep -q ' __export_symbol_'; then	\
> 

> > that it stops looking for a match after the first export symbol is
> > found?
> Small, there are my test result in make x86_64_defconfig + enable CONFIG_MODVERSIONS:
> 1. readelf
> if $(READELF) $@ 2>/dev/null | grep -q ' __export_symbol_';
> real    10m44.359s
> user    37m43.596s
> sys     3m2.424s
> 2. nm
> if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_';
> real    11m8.008s
> user    38m51.644s
> sys     3m29.798s
> 3. nm + grep -m1 -q
> if $(NM) $@ 2>/dev/null | grep -m1 -q ' __export_symbol_';
> real    10m56.891s
> user    38m8.136s
> sys     3m28.096s

'-m1' appears to get us 50% (12s) of the speed up of 'readelf' (24s) in
your environment while sticking with 'nm'. I would be more inclined to
take that change since it is small and correct, rather than switching on
NM or READELF, as I don't think it is worth the additional complexity.
FWIW, on one of my test machines with 8 cores and 16 threads, the
difference is much less noticeable. I think that is going to be in line
with most developer and build farm hardware, rather than a 2C/4T machine
like you mention in the initial commit message.

GCC 16.1.0 + binutils 2.46:

  Benchmark 1: $(NM)
    Time (mean ± σ):     75.203 s ±  0.283 s    [User: 659.465 s, System: 185.605 s]
    Range (min … max):   74.898 s … 75.457 s    3 runs

  Benchmark 2: $(READELF) -sW
    Time (mean ± σ):     73.055 s ±  0.465 s    [User: 642.365 s, System: 175.908 s]
    Range (min … max):   72.523 s … 73.385 s    3 runs

  Summary
    $(READELF) -sW ran
      1.03 ± 0.01 times faster than $(NM)

LLVM 22:

  Benchmark 1: $(NM)
    Time (mean ± σ):     75.030 s ±  0.736 s    [User: 659.603 s, System: 185.257 s]
    Range (min … max):   74.207 s … 75.623 s    3 runs

  Benchmark 2: $(READELF) -sW
    Time (mean ± σ):     73.405 s ±  0.457 s    [User: 642.512 s, System: 176.440 s]
    Range (min … max):   72.878 s … 73.679 s    3 runs

  Summary
    $(READELF) -sW ran
      1.02 ± 0.01 times faster than $(NM)

-- 
Cheers,
Nathan
Re: [PATCH] kbuild: try readelf first in gen_symversions
Posted by Wentao Guan 2 days, 20 hours ago
Hello,

> This should probably be CONFIG_LTO_CLANG with flipped branches but...
Right!

> '-m1' appears to get us 50% (12s) of the speed up of 'readelf' (24s) in
> your environment while sticking with 'nm'. I would be more inclined to
> take that change since it is small and correct, rather than switching on
> NM or READELF, as I don't think it is worth the additional complexity.
> FWIW, on one of my test machines with 8 cores and 16 threads, the
> difference is much less noticeable. I think that is going to be in line
>  with most developer and build farm hardware, rather than a 2C/4T machine
> like you mention in the initial commit message.
Sorry, it seems my cloud servies provider cause my results  up and down:(,
also maybe first compile time not stable, so I tested in a 20 cores/28 threads 
bare metal envirment , here is the result:

Intel(R) Core(TM) i7-14700HX + 32GB + NVMe ssd
gcc version 12.3.0 binutils 2.46
clang version 18.1.7
source kernel tag v7.0

summary:
1. still benifit from nm to readelf in 20core/28threads
(I think there more costs in libbfd in nm, show high cost down in sys time,
I guess it cause more memory acces bottle neck to effect overall compile process)
but seems no these different when change llvm-18-nm to llvm-18-readelf
2. -m1 seems no expect effect...

test scripts:
https://gist.github.com/opsiff/832baa9a6986343dddbe530fbee57f52

Makefile.build-nm-m1  : 'grep -q' -> 'grep -m1 -q'
Makefile.build-orig : orig Makefile.build
Makefile.build-readelf : 'NM' -> 'READELF -sW'
Makefile.build-readelf-m1: 'NM' -> 'READELF -sW' , 'grep -q' -> 'grep -m1 -q'

full result:
1. run x86_64_defconfig + modversions x3(base)
        if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then             \
real    2m2.876s   real    2m2.578s   real    2m2.262s
user    42m15.871s   user    42m35.250s   user    42m33.679s
sys     5m52.904s  sys     5m52.478s   sys     5m49.009s

2. if $(READELF) -sW $@ 2>/dev/null | grep -q  __export_symbol_; then
real    1m54.931s   real    1m55.192s   real    1m55.207s
user    41m4.162s   user    41m7.754s   user    41m5.791s
sys     4m8.422s   sys     4m8.431s   sys     4m9.219s

3. if $(NM) $@ 2>/dev/null | grep -m1 -q  __export_symbol_; then    \

real    2m1.865s   real    2m1.866s   real    2m2.108s
user    42m32.891s   user    42m35.047s  user    42m33.834s
sys     5m48.045s  sys     5m47.700s   sys     5m48.200s

4. if $(READELF) -sW $@ 2>/dev/null | grep -m1 -q ' __export_symbol_'; then    \

real    1m55.386s   real    1m56.528s   real    1m55.489s
user    41m6.156s   user    41m12.321s   user    41m10.545s
sys     4m10.093s   sys     4m9.838s   sys     4m9.367s

5. LLVM run x86_64_defconfig + modversions x3(base)
        if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then             \

real    2m35.758s   real    2m32.696s   real    2m32.127s
user    58m2.416s   user    57m55.030s   user    57m54.806s
sys     4m20.735s   sys     4m18.473s   sys     4m18.090s

6. LLVM if $(READELF) -sW $@ 2>/dev/null | grep -q ' __export_symbol_'; then        \

real    2m32.448s   real    2m32.419s   real    2m32.509s
user    57m57.262s   user    57m53.001s   user    57m48.842s
sys     4m20.508s   sys     4m20.693s   sys     4m20.490s

7. LLVM  if $(NM) $@ 2>/dev/null | grep -m1 -q ' __export_symbol_'; then         \

real    2m32.003s  real    2m31.900s   real    2m32.276s
user    57m45.786s   user    57m46.982s   user    57m49.907s
sys     4m18.184s   sys     4m17.923s   sys     4m18.354s


8. LLVM if $(READELF) -sW $@ 2>/dev/null | grep -m1 -q ' __export_symbol_'; then    \

real    2m33.365s   real    2m32.186s  real    2m32.114s
user    57m49.533s   user    57m47.865s   user    57m46.591s
sys     4m19.809s  sys     4m20.652s   sys     4m19.954s

9. LLVM LTO_THIN run x86_64_defconfig + modversions x3(base)
        if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then             \

real    3m59.411s    real    3m55.945s   real    3m56.557s
user    59m38.877s    user    59m20.007s   user    59m19.009s
sys     4m21.582s   sys     4m22.313s   sys     4m23.793s

10. LLVM LTO_THIN  if $(NM) $@ 2>/dev/null | grep -m1 -q ' __export_symbol_'; then         \

real    3m55.722s   real    3m56.641s   real    3m57.979s
user    59m21.865s   user    59m25.634s   user    59m20.872s
sys     4m21.303s   sys     4m24.174s   sys     4m22.695s

Full log:
https://gist.github.com/opsiff/1cd7e0a0553c8416dd13a7e92590a440

If you have any other ideas, i will happly to test them,
i will try to use llvm-nm instead of nm to test.

BRs
Wentao Guan