[PATCH bpf-next] scripts/gen-btf.sh: Disable LTO when generating initial .o file

Nathan Chancellor posted 1 patch 1 month ago
scripts/gen-btf.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH bpf-next] scripts/gen-btf.sh: Disable LTO when generating initial .o file
Posted by Nathan Chancellor 1 month ago
After commit 600605853f87 ("scripts/gen-btf.sh: Fix .btf.o generation
when compiling for RISCV"), there is an error from llvm-objcopy when
CONFIG_LTO_CLANG is enabled:

  llvm-objcopy: error: '.tmp_vmlinux1.btf.o': The file was not recognized as a valid object file
  Failed to generate BTF for vmlinux

KBUILD_CFLAGS includes CC_FLAGS_LTO, which makes clang emit an LLVM IR
object, rather than an ELF one as expected by llvm-objcopy.

Most areas of the kernel deal with this by filtering out CC_FLAGS_LTO
from KBUILD_CFLAGS for the particular object or directory but this is
not so easy to do in bash. Just include '-fno-lto' after KBUILD_CFLAGS
to ensure an ELF object is consistently created as the initial .o file.

Fixes: 600605853f87 ("scripts/gen-btf.sh: Fix .btf.o generation when compiling for RISCV")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
---
 scripts/gen-btf.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/gen-btf.sh b/scripts/gen-btf.sh
index d6457661b9b6..08b46b91c04b 100755
--- a/scripts/gen-btf.sh
+++ b/scripts/gen-btf.sh
@@ -87,7 +87,7 @@ gen_btf_o()
 	# SHF_ALLOC because .BTF will be part of the vmlinux image. --strip-all
 	# deletes all symbols including __start_BTF and __stop_BTF, which will
 	# be redefined in the linker script.
-	echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -c -x c -o ${btf_data} -
+	echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -fno-lto -c -x c -o ${btf_data} -
 	${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
 		--set-section-flags .BTF=alloc,readonly ${btf_data}
 	${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}

---
base-commit: a069190b590e108223cd841a1c2d0bfb92230ecc
change-id: 20260105-fix-gen-btf-sh-lto-007fe4908070

Best regards,
--  
Nathan Chancellor <nathan@kernel.org>
Re: [PATCH bpf-next] scripts/gen-btf.sh: Disable LTO when generating initial .o file
Posted by Ihor Solodrai 1 month ago
On 1/5/26 1:12 PM, Nathan Chancellor wrote:
> After commit 600605853f87 ("scripts/gen-btf.sh: Fix .btf.o generation
> when compiling for RISCV"), there is an error from llvm-objcopy when
> CONFIG_LTO_CLANG is enabled:
> 
>   llvm-objcopy: error: '.tmp_vmlinux1.btf.o': The file was not recognized as a valid object file
>   Failed to generate BTF for vmlinux
> 
> KBUILD_CFLAGS includes CC_FLAGS_LTO, which makes clang emit an LLVM IR
> object, rather than an ELF one as expected by llvm-objcopy.
> 
> Most areas of the kernel deal with this by filtering out CC_FLAGS_LTO
> from KBUILD_CFLAGS for the particular object or directory but this is
> not so easy to do in bash. Just include '-fno-lto' after KBUILD_CFLAGS
> to ensure an ELF object is consistently created as the initial .o file.
> 
> Fixes: 600605853f87 ("scripts/gen-btf.sh: Fix .btf.o generation when compiling for RISCV")
> Signed-off-by: Nathan Chancellor <nathan@kernel.org>
> ---
>  scripts/gen-btf.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/scripts/gen-btf.sh b/scripts/gen-btf.sh
> index d6457661b9b6..08b46b91c04b 100755
> --- a/scripts/gen-btf.sh
> +++ b/scripts/gen-btf.sh
> @@ -87,7 +87,7 @@ gen_btf_o()
>  	# SHF_ALLOC because .BTF will be part of the vmlinux image. --strip-all
>  	# deletes all symbols including __start_BTF and __stop_BTF, which will
>  	# be redefined in the linker script.
> -	echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -c -x c -o ${btf_data} -
> +	echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -fno-lto -c -x c -o ${btf_data} -
>  	${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
>  		--set-section-flags .BTF=alloc,readonly ${btf_data}
>  	${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}

Hi Nathan, thank you for the patch.

I'm starting to think it wasn't a good idea to do

	echo "" | ${CC} ...

here, given the number of associated bugs.

Before gen-btf.sh was introduced, the .btf.o binary was generated with this [1]:

	${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \
		--strip-all ${1} "${btf_data}" 2>/dev/null

I changed to ${CC} on the assumption it's a quicker operation than
stripping entire vmlinux. But maybe it's not worth it and we should
change back to --strip-all? wdyt?

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/link-vmlinux.sh?h=v6.18#n110

> 
> ---
> base-commit: a069190b590e108223cd841a1c2d0bfb92230ecc
> change-id: 20260105-fix-gen-btf-sh-lto-007fe4908070
> 
> Best regards,
> --  
> Nathan Chancellor <nathan@kernel.org>
>
Re: [PATCH bpf-next] scripts/gen-btf.sh: Disable LTO when generating initial .o file
Posted by Nathan Chancellor 1 month ago
On Mon, Jan 05, 2026 at 02:01:36PM -0800, Ihor Solodrai wrote:
> Hi Nathan, thank you for the patch.
> 
> I'm starting to think it wasn't a good idea to do
> 
> 	echo "" | ${CC} ...
> 
> here, given the number of associated bugs.

Yeah, I was wondering if a lack of KBUILD_CPPFLAGS would also be a
problem since that contains the endianness flag for some targets. I
cannot imagine any more issues than that but I can understand wanting to
back out of it.

> Before gen-btf.sh was introduced, the .btf.o binary was generated with this [1]:
> 
> 	${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \
> 		--strip-all ${1} "${btf_data}" 2>/dev/null
> 
> I changed to ${CC} on the assumption it's a quicker operation than
> stripping entire vmlinux. But maybe it's not worth it and we should
> change back to --strip-all? wdyt?

That certainly seems more robust to me. I see the logic but with
'--only-section' and no glob, I would expect that to be a rather quick
operation but I am running out of time today to test and benchmark such
a change. I will try to do it tomorrow unless someone beats me to it.

Cheers,
Nathan
Re: [PATCH bpf-next] scripts/gen-btf.sh: Disable LTO when generating initial .o file
Posted by Ihor Solodrai 1 month ago
On 1/5/26 3:46 PM, Nathan Chancellor wrote:
> On Mon, Jan 05, 2026 at 02:01:36PM -0800, Ihor Solodrai wrote:
>> Hi Nathan, thank you for the patch.
>>
>> I'm starting to think it wasn't a good idea to do
>>
>> 	echo "" | ${CC} ...
>>
>> here, given the number of associated bugs.
> 
> Yeah, I was wondering if a lack of KBUILD_CPPFLAGS would also be a
> problem since that contains the endianness flag for some targets. I
> cannot imagine any more issues than that but I can understand wanting to
> back out of it.
> 
>> Before gen-btf.sh was introduced, the .btf.o binary was generated with this [1]:
>>
>> 	${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \
>> 		--strip-all ${1} "${btf_data}" 2>/dev/null
>>
>> I changed to ${CC} on the assumption it's a quicker operation than
>> stripping entire vmlinux. But maybe it's not worth it and we should
>> change back to --strip-all? wdyt?
> 
> That certainly seems more robust to me. I see the logic but with
> '--only-section' and no glob, I would expect that to be a rather quick
> operation but I am running out of time today to test and benchmark such
> a change. I will try to do it tomorrow unless someone beats me to it.

I got curious and did a little experiment. Basically, I ran perf stat
on this part of gen-btf.sh:

	echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -c -x c -o ${btf_data} -
	${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
		--set-section-flags .BTF=alloc,readonly ${btf_data}
	${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}

Replacing ${CC} command with:

	${OBJCOPY} --strip-all "${ELF_FILE}" ${btf_data} 2>/dev/null

for comparison.

TL;DR is that using ${CC} is:
  * about 1.5x faster than GNU objcopy --strip-all .tmp_vmlinux1
  * about 16x (!) faster than llvm-objcopy --strip-all .tmp_vmlinux1

With obvious caveats that this is a particular machine (Threadripper
PRO 3975WX), toolchain etc:
  * clang version 21.1.7
  * gcc (GCC) 15.2.1 20251211

This is bpf-next (a069190b590e) with BPF CI-like kconfig.

Pasting perf stat output below.


# llvm-objcopy --strip-all
$ perf stat -r 31 -- ./gen-btf.o_strip.sh

 Performance counter stats for './gen-btf.o_strip.sh' (31 runs):

     1,300,945,256      task-clock:u                     #    0.962 CPUs utilized               ( +-  0.10% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
           327,311      page-faults:u                    #  251.595 K/sec                       ( +-  0.00% )
     1,532,927,570      instructions:u                   #    1.33  insn per cycle            
                                                  #    0.03  stalled cycles per insn     ( +-  0.00% )
     1,155,639,083      cycles:u                         #    0.888 GHz                         ( +-  0.18% )
        53,144,866      stalled-cycles-frontend:u        #    4.60% frontend cycles idle        ( +-  0.99% )
       297,229,466      branches:u                       #  228.472 M/sec                       ( +-  0.00% )
           903,337      branch-misses:u                  #    0.30% of all branches             ( +-  0.02% )

           1.35200 +- 0.00137 seconds time elapsed  ( +-  0.10% )


# GNU objcopy --strip-all
$ perf stat -r 31 -- ./gen-btf.o_strip.sh

 Performance counter stats for './gen-btf.o_strip.sh' (31 runs):

       119,747,488      task-clock:u                     #    0.970 CPUs utilized               ( +-  0.41% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             9,186      page-faults:u                    #   76.711 K/sec                       ( +-  0.01% )
       132,651,881      instructions:u                   #    1.68  insn per cycle            
                                                  #    0.08  stalled cycles per insn     ( +-  0.00% )
        79,191,259      cycles:u                         #    0.661 GHz                         ( +-  1.06% )
        10,136,981      stalled-cycles-frontend:u        #   12.80% frontend cycles idle        ( +-  2.58% )
        28,422,807      branches:u                       #  237.356 M/sec                       ( +-  0.00% )
           354,981      branch-misses:u                  #    1.25% of all branches             ( +-  0.02% )

          0.123415 +- 0.000564 seconds time elapsed  ( +-  0.46% )


# echo "" | clang ...
$ perf stat -r 31 -- ./gen-btf.o_llvm.sh

 Performance counter stats for './gen-btf.o_llvm.sh' (31 runs):

        62,107,490      task-clock:u                     #    0.774 CPUs utilized               ( +-  0.31% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             9,755      page-faults:u                    #  157.066 K/sec                       ( +-  0.01% )
        88,196,854      instructions:u                   #    1.18  insn per cycle            
                                                  #    0.19  stalled cycles per insn     ( +-  0.00% )
        74,944,793      cycles:u                         #    1.207 GHz                         ( +-  0.50% )
        16,494,448      stalled-cycles-frontend:u        #   22.01% frontend cycles idle        ( +-  0.48% )
        17,914,949      branches:u                       #  288.451 M/sec                       ( +-  0.00% )
           459,548      branch-misses:u                  #    2.57% of all branches             ( +-  0.10% )

          0.080237 +- 0.000313 seconds time elapsed  ( +-  0.39% )


# echo "" | gcc ...
$ perf stat -r 31 -- ./gen-btf.o_gnu.sh

 Performance counter stats for './gen-btf.o_gnu.sh' (31 runs):

        53,683,797      task-clock:u                     #    0.770 CPUs utilized               ( +-  0.33% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             8,390      page-faults:u                    #  156.286 K/sec                       ( +-  0.01% )
        69,398,474      instructions:u                   #    1.22  insn per cycle            
                                                  #    0.17  stalled cycles per insn     ( +-  0.00% )
        56,763,954      cycles:u                         #    1.057 GHz                         ( +-  0.39% )
        12,103,546      stalled-cycles-frontend:u        #   21.32% frontend cycles idle        ( +-  0.47% )
        14,064,366      branches:u                       #  261.985 M/sec                       ( +-  0.00% )
           347,383      branch-misses:u                  #    2.47% of all branches             ( +-  0.09% )

          0.069735 +- 0.000253 seconds time elapsed  ( +-  0.36% )


> 
> Cheers,
> Nathan
Re: [PATCH bpf-next] scripts/gen-btf.sh: Disable LTO when generating initial .o file
Posted by Nathan Chancellor 1 month ago
On Mon, Jan 05, 2026 at 05:06:49PM -0800, Ihor Solodrai wrote:
> I got curious and did a little experiment. Basically, I ran perf stat
> on this part of gen-btf.sh:
> 
> 	echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -c -x c -o ${btf_data} -
> 	${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
> 		--set-section-flags .BTF=alloc,readonly ${btf_data}
> 	${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}
> 
> Replacing ${CC} command with:
> 
> 	${OBJCOPY} --strip-all "${ELF_FILE}" ${btf_data} 2>/dev/null
> 
> for comparison.
> 
> TL;DR is that using ${CC} is:
>   * about 1.5x faster than GNU objcopy --strip-all .tmp_vmlinux1
>   * about 16x (!) faster than llvm-objcopy --strip-all .tmp_vmlinux1
> 
> With obvious caveats that this is a particular machine (Threadripper
> PRO 3975WX), toolchain etc:
>   * clang version 21.1.7
>   * gcc (GCC) 15.2.1 20251211
> 
> This is bpf-next (a069190b590e) with BPF CI-like kconfig.

Oof, that difference between GNU and LLVM's objcopy implementations...
At the same time, it was only a little over a second for llvm-objcopy.
Maybe that gets worse if more is built into the kernel to the point
where it is untenable but maybe it is worth the reduced complexity? That
said, my patch is pretty simple (and a follow up for KBUILD_CPPFLAGS if
needed would be equally simple), your testing demonstrates that there
is some performance improvement, and I cannot imagine there being any
other bugs of this nature in this area going forward. I have no real
strong opinion, I just need my builds to finish :)

Cheers,
Nathan
Re: [PATCH bpf-next] scripts/gen-btf.sh: Disable LTO when generating initial .o file
Posted by Alexei Starovoitov 1 month ago
On Tue, Jan 6, 2026 at 1:53 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Mon, Jan 05, 2026 at 05:06:49PM -0800, Ihor Solodrai wrote:
> > I got curious and did a little experiment. Basically, I ran perf stat
> > on this part of gen-btf.sh:
> >
> >       echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -c -x c -o ${btf_data} -
> >       ${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
> >               --set-section-flags .BTF=alloc,readonly ${btf_data}
> >       ${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}
> >
> > Replacing ${CC} command with:
> >
> >       ${OBJCOPY} --strip-all "${ELF_FILE}" ${btf_data} 2>/dev/null
> >
> > for comparison.
> >
> > TL;DR is that using ${CC} is:
> >   * about 1.5x faster than GNU objcopy --strip-all .tmp_vmlinux1
> >   * about 16x (!) faster than llvm-objcopy --strip-all .tmp_vmlinux1
> >
> > With obvious caveats that this is a particular machine (Threadripper
> > PRO 3975WX), toolchain etc:
> >   * clang version 21.1.7
> >   * gcc (GCC) 15.2.1 20251211
> >
> > This is bpf-next (a069190b590e) with BPF CI-like kconfig.
>
> Oof, that difference between GNU and LLVM's objcopy implementations...
> At the same time, it was only a little over a second for llvm-objcopy.
> Maybe that gets worse if more is built into the kernel to the point
> where it is untenable but maybe it is worth the reduced complexity? That
> said, my patch is pretty simple (and a follow up for KBUILD_CPPFLAGS if
> needed would be equally simple), your testing demonstrates that there
> is some performance improvement, and I cannot imagine there being any
> other bugs of this nature in this area going forward. I have no real
> strong opinion, I just need my builds to finish :)

Pls resend both patches? Or squash as one ?
Sounds like the current one is incomplete.