scripts/gen-btf.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
After commit 600605853f87 ("scripts/gen-btf.sh: Fix .btf.o generation
when compiling for RISCV"), there is an error from llvm-objcopy when
CONFIG_LTO_CLANG is enabled:
llvm-objcopy: error: '.tmp_vmlinux1.btf.o': The file was not recognized as a valid object file
Failed to generate BTF for vmlinux
KBUILD_CFLAGS includes CC_FLAGS_LTO, which makes clang emit an LLVM IR
object, rather than an ELF one as expected by llvm-objcopy.
Most areas of the kernel deal with this by filtering out CC_FLAGS_LTO
from KBUILD_CFLAGS for the particular object or directory but this is
not so easy to do in bash. Just include '-fno-lto' after KBUILD_CFLAGS
to ensure an ELF object is consistently created as the initial .o file.
Fixes: 600605853f87 ("scripts/gen-btf.sh: Fix .btf.o generation when compiling for RISCV")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
---
scripts/gen-btf.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/gen-btf.sh b/scripts/gen-btf.sh
index d6457661b9b6..08b46b91c04b 100755
--- a/scripts/gen-btf.sh
+++ b/scripts/gen-btf.sh
@@ -87,7 +87,7 @@ gen_btf_o()
# SHF_ALLOC because .BTF will be part of the vmlinux image. --strip-all
# deletes all symbols including __start_BTF and __stop_BTF, which will
# be redefined in the linker script.
- echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -c -x c -o ${btf_data} -
+ echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -fno-lto -c -x c -o ${btf_data} -
${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
--set-section-flags .BTF=alloc,readonly ${btf_data}
${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}
---
base-commit: a069190b590e108223cd841a1c2d0bfb92230ecc
change-id: 20260105-fix-gen-btf-sh-lto-007fe4908070
Best regards,
--
Nathan Chancellor <nathan@kernel.org>
On 1/5/26 1:12 PM, Nathan Chancellor wrote:
> After commit 600605853f87 ("scripts/gen-btf.sh: Fix .btf.o generation
> when compiling for RISCV"), there is an error from llvm-objcopy when
> CONFIG_LTO_CLANG is enabled:
>
> llvm-objcopy: error: '.tmp_vmlinux1.btf.o': The file was not recognized as a valid object file
> Failed to generate BTF for vmlinux
>
> KBUILD_CFLAGS includes CC_FLAGS_LTO, which makes clang emit an LLVM IR
> object, rather than an ELF one as expected by llvm-objcopy.
>
> Most areas of the kernel deal with this by filtering out CC_FLAGS_LTO
> from KBUILD_CFLAGS for the particular object or directory but this is
> not so easy to do in bash. Just include '-fno-lto' after KBUILD_CFLAGS
> to ensure an ELF object is consistently created as the initial .o file.
>
> Fixes: 600605853f87 ("scripts/gen-btf.sh: Fix .btf.o generation when compiling for RISCV")
> Signed-off-by: Nathan Chancellor <nathan@kernel.org>
> ---
> scripts/gen-btf.sh | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/gen-btf.sh b/scripts/gen-btf.sh
> index d6457661b9b6..08b46b91c04b 100755
> --- a/scripts/gen-btf.sh
> +++ b/scripts/gen-btf.sh
> @@ -87,7 +87,7 @@ gen_btf_o()
> # SHF_ALLOC because .BTF will be part of the vmlinux image. --strip-all
> # deletes all symbols including __start_BTF and __stop_BTF, which will
> # be redefined in the linker script.
> - echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -c -x c -o ${btf_data} -
> + echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -fno-lto -c -x c -o ${btf_data} -
> ${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
> --set-section-flags .BTF=alloc,readonly ${btf_data}
> ${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}
Hi Nathan, thank you for the patch.
I'm starting to think it wasn't a good idea to do
echo "" | ${CC} ...
here, given the number of associated bugs.
Before gen-btf.sh was introduced, the .btf.o binary was generated with this [1]:
${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \
--strip-all ${1} "${btf_data}" 2>/dev/null
I changed to ${CC} on the assumption it's a quicker operation than
stripping entire vmlinux. But maybe it's not worth it and we should
change back to --strip-all? wdyt?
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/link-vmlinux.sh?h=v6.18#n110
>
> ---
> base-commit: a069190b590e108223cd841a1c2d0bfb92230ecc
> change-id: 20260105-fix-gen-btf-sh-lto-007fe4908070
>
> Best regards,
> --
> Nathan Chancellor <nathan@kernel.org>
>
On Mon, Jan 05, 2026 at 02:01:36PM -0800, Ihor Solodrai wrote:
> Hi Nathan, thank you for the patch.
>
> I'm starting to think it wasn't a good idea to do
>
> echo "" | ${CC} ...
>
> here, given the number of associated bugs.
Yeah, I was wondering if a lack of KBUILD_CPPFLAGS would also be a
problem since that contains the endianness flag for some targets. I
cannot imagine any more issues than that but I can understand wanting to
back out of it.
> Before gen-btf.sh was introduced, the .btf.o binary was generated with this [1]:
>
> ${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \
> --strip-all ${1} "${btf_data}" 2>/dev/null
>
> I changed to ${CC} on the assumption it's a quicker operation than
> stripping entire vmlinux. But maybe it's not worth it and we should
> change back to --strip-all? wdyt?
That certainly seems more robust to me. I see the logic but with
'--only-section' and no glob, I would expect that to be a rather quick
operation but I am running out of time today to test and benchmark such
a change. I will try to do it tomorrow unless someone beats me to it.
Cheers,
Nathan
On 1/5/26 3:46 PM, Nathan Chancellor wrote:
> On Mon, Jan 05, 2026 at 02:01:36PM -0800, Ihor Solodrai wrote:
>> Hi Nathan, thank you for the patch.
>>
>> I'm starting to think it wasn't a good idea to do
>>
>> echo "" | ${CC} ...
>>
>> here, given the number of associated bugs.
>
> Yeah, I was wondering if a lack of KBUILD_CPPFLAGS would also be a
> problem since that contains the endianness flag for some targets. I
> cannot imagine any more issues than that but I can understand wanting to
> back out of it.
>
>> Before gen-btf.sh was introduced, the .btf.o binary was generated with this [1]:
>>
>> ${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \
>> --strip-all ${1} "${btf_data}" 2>/dev/null
>>
>> I changed to ${CC} on the assumption it's a quicker operation than
>> stripping entire vmlinux. But maybe it's not worth it and we should
>> change back to --strip-all? wdyt?
>
> That certainly seems more robust to me. I see the logic but with
> '--only-section' and no glob, I would expect that to be a rather quick
> operation but I am running out of time today to test and benchmark such
> a change. I will try to do it tomorrow unless someone beats me to it.
I got curious and did a little experiment. Basically, I ran perf stat
on this part of gen-btf.sh:
echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -c -x c -o ${btf_data} -
${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
--set-section-flags .BTF=alloc,readonly ${btf_data}
${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}
Replacing ${CC} command with:
${OBJCOPY} --strip-all "${ELF_FILE}" ${btf_data} 2>/dev/null
for comparison.
TL;DR is that using ${CC} is:
* about 1.5x faster than GNU objcopy --strip-all .tmp_vmlinux1
* about 16x (!) faster than llvm-objcopy --strip-all .tmp_vmlinux1
With obvious caveats that this is a particular machine (Threadripper
PRO 3975WX), toolchain etc:
* clang version 21.1.7
* gcc (GCC) 15.2.1 20251211
This is bpf-next (a069190b590e) with BPF CI-like kconfig.
Pasting perf stat output below.
# llvm-objcopy --strip-all
$ perf stat -r 31 -- ./gen-btf.o_strip.sh
Performance counter stats for './gen-btf.o_strip.sh' (31 runs):
1,300,945,256 task-clock:u # 0.962 CPUs utilized ( +- 0.10% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
327,311 page-faults:u # 251.595 K/sec ( +- 0.00% )
1,532,927,570 instructions:u # 1.33 insn per cycle
# 0.03 stalled cycles per insn ( +- 0.00% )
1,155,639,083 cycles:u # 0.888 GHz ( +- 0.18% )
53,144,866 stalled-cycles-frontend:u # 4.60% frontend cycles idle ( +- 0.99% )
297,229,466 branches:u # 228.472 M/sec ( +- 0.00% )
903,337 branch-misses:u # 0.30% of all branches ( +- 0.02% )
1.35200 +- 0.00137 seconds time elapsed ( +- 0.10% )
# GNU objcopy --strip-all
$ perf stat -r 31 -- ./gen-btf.o_strip.sh
Performance counter stats for './gen-btf.o_strip.sh' (31 runs):
119,747,488 task-clock:u # 0.970 CPUs utilized ( +- 0.41% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
9,186 page-faults:u # 76.711 K/sec ( +- 0.01% )
132,651,881 instructions:u # 1.68 insn per cycle
# 0.08 stalled cycles per insn ( +- 0.00% )
79,191,259 cycles:u # 0.661 GHz ( +- 1.06% )
10,136,981 stalled-cycles-frontend:u # 12.80% frontend cycles idle ( +- 2.58% )
28,422,807 branches:u # 237.356 M/sec ( +- 0.00% )
354,981 branch-misses:u # 1.25% of all branches ( +- 0.02% )
0.123415 +- 0.000564 seconds time elapsed ( +- 0.46% )
# echo "" | clang ...
$ perf stat -r 31 -- ./gen-btf.o_llvm.sh
Performance counter stats for './gen-btf.o_llvm.sh' (31 runs):
62,107,490 task-clock:u # 0.774 CPUs utilized ( +- 0.31% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
9,755 page-faults:u # 157.066 K/sec ( +- 0.01% )
88,196,854 instructions:u # 1.18 insn per cycle
# 0.19 stalled cycles per insn ( +- 0.00% )
74,944,793 cycles:u # 1.207 GHz ( +- 0.50% )
16,494,448 stalled-cycles-frontend:u # 22.01% frontend cycles idle ( +- 0.48% )
17,914,949 branches:u # 288.451 M/sec ( +- 0.00% )
459,548 branch-misses:u # 2.57% of all branches ( +- 0.10% )
0.080237 +- 0.000313 seconds time elapsed ( +- 0.39% )
# echo "" | gcc ...
$ perf stat -r 31 -- ./gen-btf.o_gnu.sh
Performance counter stats for './gen-btf.o_gnu.sh' (31 runs):
53,683,797 task-clock:u # 0.770 CPUs utilized ( +- 0.33% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
8,390 page-faults:u # 156.286 K/sec ( +- 0.01% )
69,398,474 instructions:u # 1.22 insn per cycle
# 0.17 stalled cycles per insn ( +- 0.00% )
56,763,954 cycles:u # 1.057 GHz ( +- 0.39% )
12,103,546 stalled-cycles-frontend:u # 21.32% frontend cycles idle ( +- 0.47% )
14,064,366 branches:u # 261.985 M/sec ( +- 0.00% )
347,383 branch-misses:u # 2.47% of all branches ( +- 0.09% )
0.069735 +- 0.000253 seconds time elapsed ( +- 0.36% )
>
> Cheers,
> Nathan
On Mon, Jan 05, 2026 at 05:06:49PM -0800, Ihor Solodrai wrote:
> I got curious and did a little experiment. Basically, I ran perf stat
> on this part of gen-btf.sh:
>
> echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -c -x c -o ${btf_data} -
> ${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
> --set-section-flags .BTF=alloc,readonly ${btf_data}
> ${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}
>
> Replacing ${CC} command with:
>
> ${OBJCOPY} --strip-all "${ELF_FILE}" ${btf_data} 2>/dev/null
>
> for comparison.
>
> TL;DR is that using ${CC} is:
> * about 1.5x faster than GNU objcopy --strip-all .tmp_vmlinux1
> * about 16x (!) faster than llvm-objcopy --strip-all .tmp_vmlinux1
>
> With obvious caveats that this is a particular machine (Threadripper
> PRO 3975WX), toolchain etc:
> * clang version 21.1.7
> * gcc (GCC) 15.2.1 20251211
>
> This is bpf-next (a069190b590e) with BPF CI-like kconfig.
Oof, that difference between GNU and LLVM's objcopy implementations...
At the same time, it was only a little over a second for llvm-objcopy.
Maybe that gets worse if more is built into the kernel to the point
where it is untenable but maybe it is worth the reduced complexity? That
said, my patch is pretty simple (and a follow up for KBUILD_CPPFLAGS if
needed would be equally simple), your testing demonstrates that there
is some performance improvement, and I cannot imagine there being any
other bugs of this nature in this area going forward. I have no real
strong opinion, I just need my builds to finish :)
Cheers,
Nathan
On Tue, Jan 6, 2026 at 1:53 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Mon, Jan 05, 2026 at 05:06:49PM -0800, Ihor Solodrai wrote:
> > I got curious and did a little experiment. Basically, I ran perf stat
> > on this part of gen-btf.sh:
> >
> > echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -c -x c -o ${btf_data} -
> > ${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
> > --set-section-flags .BTF=alloc,readonly ${btf_data}
> > ${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}
> >
> > Replacing ${CC} command with:
> >
> > ${OBJCOPY} --strip-all "${ELF_FILE}" ${btf_data} 2>/dev/null
> >
> > for comparison.
> >
> > TL;DR is that using ${CC} is:
> > * about 1.5x faster than GNU objcopy --strip-all .tmp_vmlinux1
> > * about 16x (!) faster than llvm-objcopy --strip-all .tmp_vmlinux1
> >
> > With obvious caveats that this is a particular machine (Threadripper
> > PRO 3975WX), toolchain etc:
> > * clang version 21.1.7
> > * gcc (GCC) 15.2.1 20251211
> >
> > This is bpf-next (a069190b590e) with BPF CI-like kconfig.
>
> Oof, that difference between GNU and LLVM's objcopy implementations...
> At the same time, it was only a little over a second for llvm-objcopy.
> Maybe that gets worse if more is built into the kernel to the point
> where it is untenable but maybe it is worth the reduced complexity? That
> said, my patch is pretty simple (and a follow up for KBUILD_CPPFLAGS if
> needed would be equally simple), your testing demonstrates that there
> is some performance improvement, and I cannot imagine there being any
> other bugs of this nature in this area going forward. I have no real
> strong opinion, I just need my builds to finish :)
Pls resend both patches? Or squash as one ?
Sounds like the current one is incomplete.
© 2016 - 2026 Red Hat, Inc.