kernel/module/main.c | 7 +++++++ 1 file changed, 7 insertions(+)
I've been chasing down the following flaky splat, introduced by recent
changes in BTF generation [1]:
------------[ cut here ]------------
BUG: unable to handle page fault for address: ffa000000233d828
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 100000067 P4D 100253067 PUD 100258067 PMD 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 1 UID: 0 PID: 390 Comm: test_progs Tainted: G W OE 6.19.0-rc1-gf785a31395d9 #331 PREEMPT(full)
Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.el9 04/01/2014
RIP: 0010:simplify_symbols+0x2b2/0x480
9.737179] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83 fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20 c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
PKRU: 55555554
Call Trace:
<TASK>
? __kmalloc_node_track_caller_noprof+0x37f/0x740
? __pfx_setup_modinfo_srcversion+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
? kstrdup+0x4a/0x70
? srso_alias_return_thunk+0x5/0xfbef5
? setup_modinfo_srcversion+0x1a/0x30
? srso_alias_return_thunk+0x5/0xfbef5
? setup_modinfo+0x12b/0x1e0
load_module+0x133a/0x1610
__x64_sys_finit_module+0x31b/0x450
? entry_SYSCALL_64_after_hwframe+0x76/0x7e
do_syscall_64+0x80/0x2d0
? srso_alias_return_thunk+0x5/0xfbef5
? exc_page_fault+0x95/0xc0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f1c63a2582d
9.794028] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff 8 8b 0d bb 15 0f 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe513df128 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1c63a2582d
RDX: 0000000000000000 RSI: 0000000000ee83f9 RDI: 0000000000000016
RBP: 00007ffe513df150 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffe513e3588
R13: 000000000088fad0 R14: 00000000014bddb0 R15: 00007f1c63ba7000
</TASK>
Modules linked in: bpf_testmod(OE)
CR2: ffa000000233d828
---[ end trace 0000000000000000 ]---
RIP: 0010:simplify_symbols+0x2b2/0x480
9.821595] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83 fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20 c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
This hasn't happened on BPF CI so far, for example, however I was able
to reproduce it on a particular x64 machine using a kernel built with
LLVM 20.
The crash happens on attempt to load one of the BPF selftest modules
(tools/testing/selftests/bpf/test_kmods/bpf_test_modorder_x.ko) which
is used by kfunc_module_order test.
The reason for the crash is that simplify_symbols() doesn't check for
bounds of the ELF section index:
for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) {
const char *name = info->strtab + sym[i].st_name;
switch (sym[i].st_shndx) {
case SHN_COMMON:
[...]
default:
/* Divert to percpu allocation if a percpu var. */
if (sym[i].st_shndx == info->index.pcpu)
secbase = (unsigned long)mod_percpu(mod);
else
/** HERE --> **/ secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
sym[i].st_value += secbase;
break;
}
}
And in the case I was able to reproduce, the value 0xffff
(SHN_HIRESERVE aka SHN_XINDEX [2]) fell through here.
Now this code fragment is between 15 and 20 years old, so obviously
it's not expected for a kmodule symbol to have such st_shndx
value. Even so, the kernel probably should fail loading the module
instead of crashing, which is what this patch attempts to fix.
Investigating further, I discovered that the module binary became
corrupted by `${OBJCOPY} --update-section` operation updating .BTF_ids
section data in scripts/gen-btf.sh. This explains how the bug has
surfaced after gen-btf.sh was introduced:
$ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (2), but unable to locate the extended symbol index table
llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (3), but unable to locate the extended symbol index table
llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (4), but unable to locate the extended symbol index table
3: 0000000000000000 16 NOTYPE LOCAL DEFAULT RSV[0xffff] __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (16), but unable to locate the extended symbol index table
4: 0000000000000008 4 OBJECT LOCAL DEFAULT RSV[0xffff] __BTF_ID__func__bpf_test_modorder_retx__44417
vs expected
$ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
3: 0000000000000000 16 NOTYPE LOCAL DEFAULT 6 __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
4: 0000000000000008 4 OBJECT LOCAL DEFAULT 6 __BTF_ID__func__bpf_test_modorder_retx__44417
But why? Updating section data without changing it's size is not
supposed to affect sections indices, right?
With a bit more testing I confirmed that this is a LLVM-specific
issue (doesn't reproduce with GCC kbuild), and it's not stable,
because in link-vmlinux.h we also do:
${OBJCOPY} --update-section .BTF_ids=${btfids_vmlinux} ${VMLINUX}
However:
$ llvm-readelf -s --wide ~/workspace/prog-aux/linux/vmlinux | grep 0xffff
# no output, which is good
So the suspect is the implementation of llvm-objcopy. As it turns out
there is a relevant known bug that explains the flakiness and isn't
fixed yet [3].
[1] https://lore.kernel.org/bpf/20251219181825.1289460-3-ihor.solodrai@linux.dev/
[2] https://man7.org/linux/man-pages/man5/elf.5.html
[3] https://github.com/llvm/llvm-project/issues/168060#issuecomment-3533552952
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
---
RFC
While this llvm-objcopy bug is not fixed, we can not trust it in the
kernel build pipeline. In the short-term we have to come up with a
workaround for .BTF_ids section update and replace the calls to
${OBJCOPY} --update-section with something else.
One potential workaround is to force the use of the objcopy (from
binutils) instead of llvm-objcopy when updating .BTF_ids section.
Alternatively, we could just dd the .BTF_ids data computed by
resolve_btfids at the right offset in the target ELF file.
Surprisingly I couldn't find a good way to read a section offset and
size from the ELF with a specified format in a command line. Both
readelf and {llvm-}objdump give a human readable output, and it
appears we can't rely on the column order, for example.
We could still try parsing readelf output with awk/grep, covering
output variants that appear in the kernel build.
We can also do:
llvm-readobj --elf-output-style=JSON --sections "$elf" | \
jq -r --arg name .BTF_ids '
.[0].Sections[] |
select(.Section.Name.Name == $name) |
"\(.Section.Offset) \(.Section.Size)"'
...but idk man, doesn't feel right.
Most reliable way to determine the size and offset of .BTF_ids section
is probably reading them by a C program with libelf, such as
resolve_btfids. Which is quite ironic, given the recent
changes. Setting the irony aside, we could add smth like:
resolve_btfids --section-info=.BTF_ids $elf
Reverting the gen-btf.sh patch is also a possible workaround, but I'd
really like to avoid it, given that BPF features/optimizations in
development depend on it.
I'd appreciate comments and suggestions on this issue. Thank you!
---
kernel/module/main.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 710ee30b3bea..5bf456fad63e 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1568,6 +1568,13 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
break;
default:
+ if (sym[i].st_shndx >= info->hdr->e_shnum) {
+ pr_err("%s: Symbol %s has an invalid section index %u (max %u)\n",
+ mod->name, name, sym[i].st_shndx, info->hdr->e_shnum - 1);
+ ret = -ENOEXEC;
+ break;
+ }
+
/* Divert to percpu allocation if a percpu var. */
if (sym[i].st_shndx == info->index.pcpu)
secbase = (unsigned long)mod_percpu(mod);
--
2.52.0
On 12/24/25 1:57 AM, Ihor Solodrai wrote:
> [...]
> ---
> kernel/module/main.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 710ee30b3bea..5bf456fad63e 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -1568,6 +1568,13 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
> break;
>
> default:
> + if (sym[i].st_shndx >= info->hdr->e_shnum) {
> + pr_err("%s: Symbol %s has an invalid section index %u (max %u)\n",
> + mod->name, name, sym[i].st_shndx, info->hdr->e_shnum - 1);
> + ret = -ENOEXEC;
> + break;
> + }
> +
> /* Divert to percpu allocation if a percpu var. */
> if (sym[i].st_shndx == info->index.pcpu)
> secbase = (unsigned long)mod_percpu(mod);
The module loader should always at least get through the signature and
blacklist checks without crashing due to a corrupted ELF file. After
that point, the module content is to be trusted, but we try to error out
for most issues that would cause problems later on.
In this specific case, I think it is useful to add this check because
the code potentially crashes on a valid module that uses SHN_XINDEX. The
loader already rejects sh_link and sh_info values that are above e_shnum
in several places, so the patch is consistent with that behavior.
I suggest adding a proper commit description and sending a non-RFC
version.
--
Thanks,
Petr
On 12/23/25 4:57 PM, Ihor Solodrai wrote:
> I've been chasing down the following flaky splat, introduced by recent
> changes in BTF generation [1]:
>
> ------------[ cut here ]------------
> BUG: unable to handle page fault for address: ffa000000233d828
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 100000067 P4D 100253067 PUD 100258067 PMD 0
> Oops: Oops: 0000 [#1] SMP NOPTI
> CPU: 1 UID: 0 PID: 390 Comm: test_progs Tainted: G W OE 6.19.0-rc1-gf785a31395d9 #331 PREEMPT(full)
> Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.el9 04/01/2014
> RIP: 0010:simplify_symbols+0x2b2/0x480
> 9.737179] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83 fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20 c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
> RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
> RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
> RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
> RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
> R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
> R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
> FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? __kmalloc_node_track_caller_noprof+0x37f/0x740
> ? __pfx_setup_modinfo_srcversion+0x10/0x10
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? kstrdup+0x4a/0x70
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? setup_modinfo_srcversion+0x1a/0x30
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? setup_modinfo+0x12b/0x1e0
> load_module+0x133a/0x1610
> __x64_sys_finit_module+0x31b/0x450
> ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> do_syscall_64+0x80/0x2d0
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? exc_page_fault+0x95/0xc0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f1c63a2582d
> 9.794028] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff 8 8b 0d bb 15 0f 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffe513df128 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1c63a2582d
> RDX: 0000000000000000 RSI: 0000000000ee83f9 RDI: 0000000000000016
> RBP: 00007ffe513df150 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffe513e3588
> R13: 000000000088fad0 R14: 00000000014bddb0 R15: 00007f1c63ba7000
> </TASK>
> Modules linked in: bpf_testmod(OE)
> CR2: ffa000000233d828
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:simplify_symbols+0x2b2/0x480
> 9.821595] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83 fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20 c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
> RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
> RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
> RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
> RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
> R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
> R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
> FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
> PKRU: 55555554
> Kernel panic - not syncing: Fatal exception
> Kernel Offset: disabled
>
> This hasn't happened on BPF CI so far, for example, however I was able
> to reproduce it on a particular x64 machine using a kernel built with
> LLVM 20.
>
> The crash happens on attempt to load one of the BPF selftest modules
> (tools/testing/selftests/bpf/test_kmods/bpf_test_modorder_x.ko) which
> is used by kfunc_module_order test.
>
> The reason for the crash is that simplify_symbols() doesn't check for
> bounds of the ELF section index:
>
> for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) {
> const char *name = info->strtab + sym[i].st_name;
>
> switch (sym[i].st_shndx) {
> case SHN_COMMON:
>
> [...]
>
> default:
> /* Divert to percpu allocation if a percpu var. */
> if (sym[i].st_shndx == info->index.pcpu)
> secbase = (unsigned long)mod_percpu(mod);
> else
> /** HERE --> **/ secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
> sym[i].st_value += secbase;
> break;
> }
> }
>
> And in the case I was able to reproduce, the value 0xffff
> (SHN_HIRESERVE aka SHN_XINDEX [2]) fell through here.
>
> Now this code fragment is between 15 and 20 years old, so obviously
> it's not expected for a kmodule symbol to have such st_shndx
> value. Even so, the kernel probably should fail loading the module
> instead of crashing, which is what this patch attempts to fix.
>
> Investigating further, I discovered that the module binary became
> corrupted by `${OBJCOPY} --update-section` operation updating .BTF_ids
> section data in scripts/gen-btf.sh. This explains how the bug has
> surfaced after gen-btf.sh was introduced:
>
> $ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (2), but unable to locate the extended symbol index table
> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (3), but unable to locate the extended symbol index table
> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (4), but unable to locate the extended symbol index table
> 3: 0000000000000000 16 NOTYPE LOCAL DEFAULT RSV[0xffff] __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended symbol index (16), but unable to locate the extended symbol index table
> 4: 0000000000000008 4 OBJECT LOCAL DEFAULT RSV[0xffff] __BTF_ID__func__bpf_test_modorder_retx__44417
>
> vs expected
>
> $ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
> 3: 0000000000000000 16 NOTYPE LOCAL DEFAULT 6 __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
> 4: 0000000000000008 4 OBJECT LOCAL DEFAULT 6 __BTF_ID__func__bpf_test_modorder_retx__44417
>
> But why? Updating section data without changing it's size is not
> supposed to affect sections indices, right?
>
> With a bit more testing I confirmed that this is a LLVM-specific
> issue (doesn't reproduce with GCC kbuild), and it's not stable,
> because in link-vmlinux.h we also do:
>
> ${OBJCOPY} --update-section .BTF_ids=${btfids_vmlinux} ${VMLINUX}
>
> However:
>
> $ llvm-readelf -s --wide ~/workspace/prog-aux/linux/vmlinux | grep 0xffff
> # no output, which is good
>
> So the suspect is the implementation of llvm-objcopy. As it turns out
> there is a relevant known bug that explains the flakiness and isn't
> fixed yet [3].
>
> [1] https://lore.kernel.org/bpf/20251219181825.1289460-3-ihor.solodrai@linux.dev/
> [2] https://man7.org/linux/man-pages/man5/elf.5.html
> [3] https://github.com/llvm/llvm-project/issues/168060#issuecomment-3533552952
>
> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
>
> ---
>
> RFC
>
> While this llvm-objcopy bug is not fixed, we can not trust it in the
> kernel build pipeline. In the short-term we have to come up with a
> workaround for .BTF_ids section update and replace the calls to
> ${OBJCOPY} --update-section with something else.
>
> One potential workaround is to force the use of the objcopy (from
> binutils) instead of llvm-objcopy when updating .BTF_ids section.
>
> Alternatively, we could just dd the .BTF_ids data computed by
> resolve_btfids at the right offset in the target ELF file.
>
> Surprisingly I couldn't find a good way to read a section offset and
> size from the ELF with a specified format in a command line. Both
> readelf and {llvm-}objdump give a human readable output, and it
> appears we can't rely on the column order, for example.
>
> We could still try parsing readelf output with awk/grep, covering
> output variants that appear in the kernel build.
>
> We can also do:
>
> llvm-readobj --elf-output-style=JSON --sections "$elf" | \
> jq -r --arg name .BTF_ids '
> .[0].Sections[] |
> select(.Section.Name.Name == $name) |
> "\(.Section.Offset) \(.Section.Size)"'
>
> ...but idk man, doesn't feel right.
>
> Most reliable way to determine the size and offset of .BTF_ids section
> is probably reading them by a C program with libelf, such as
> resolve_btfids. Which is quite ironic, given the recent
> changes. Setting the irony aside, we could add smth like:
> resolve_btfids --section-info=.BTF_ids $elf
>
> Reverting the gen-btf.sh patch is also a possible workaround, but I'd
> really like to avoid it, given that BPF features/optimizations in
> development depend on it.
>
> I'd appreciate comments and suggestions on this issue. Thank you!
> ---
> kernel/module/main.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 710ee30b3bea..5bf456fad63e 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -1568,6 +1568,13 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
> break;
>
> default:
> + if (sym[i].st_shndx >= info->hdr->e_shnum) {
> + pr_err("%s: Symbol %s has an invalid section index %u (max %u)\n",
> + mod->name, name, sym[i].st_shndx, info->hdr->e_shnum - 1);
> + ret = -ENOEXEC;
> + break;
> + }
> +
> /* Divert to percpu allocation if a percpu var. */
> if (sym[i].st_shndx == info->index.pcpu)
> secbase = (unsigned long)mod_percpu(mod);
I tried both llvm21 and llvm22 (where llvm21 is used in bpf ci).
Without KASAN, I can reproduce the failure for llvm19/llvm21/llvm22.
I did not test llvm20 and I assume it may fail too.
The following llvm patch
https://github.com/llvm/llvm-project/pull/170462
can fix the issue. Currently it is still in review stage. The actual diff is
diff --git a/llvm/lib/ObjCopy/ELF/ELFObject.cpp b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
index e5de17e093df..cc1527d996e2 100644
--- a/llvm/lib/ObjCopy/ELF/ELFObject.cpp
+++ b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
@@ -2168,7 +2168,11 @@ Error Object::updateSectionData(SecPtr &Sec, ArrayRef<uint8_t> Data) {
Data.size(), Sec->Name.c_str(), Sec->Size);
if (!Sec->ParentSegment) {
- Sec = std::make_unique<OwnedDataSection>(*Sec, Data);
+ SectionBase *Replaced = Sec.get();
+ SectionBase *Modified = &addSection<OwnedDataSection>(*Sec, Data);
+ DenseMap<SectionBase *, SectionBase *> Replacements{{Replaced, Modified}};
+ if (auto err = replaceSections(Replacements))
+ return err;
} else {
// The segment writer will be in charge of updating these contents.
Sec->Size = Data.size();
I applied the above patch to latest llvm21 and llvm22 and
the crash is gone and the selftests can run properly.
With KASAN, everything is okay for llvm21 and llvm22.
Not sure whether the llvm patch
https://github.com/llvm/llvm-project/pull/170462
can make into llvm21 or not as looks like llvm21 intends to
freeze for now. See
https://github.com/llvm/llvm-project/pull/168314#issuecomment-3645797175
the llvm22 will branch into rc mode in January.
I will try to see whether we can have a reasonable workaround
for llvm21 llvm-objcopy (for without KASAN).
On 12/23/25 9:36 PM, Yonghong Song wrote:
>
>
> On 12/23/25 4:57 PM, Ihor Solodrai wrote:
>> [...]
>>
>> While this llvm-objcopy bug is not fixed, we can not trust it in the
>> kernel build pipeline. In the short-term we have to come up with a
>> workaround for .BTF_ids section update and replace the calls to
>> ${OBJCOPY} --update-section with something else.
>>
>> One potential workaround is to force the use of the objcopy (from
>> binutils) instead of llvm-objcopy when updating .BTF_ids section.
I think the simplest workaround is this one: use objcopy from binutils
instead of llvm-objcopy when doing --update-section.
There are just 3 places where that happens, so the OBJCOPY
substitution is going to be localized.
Also binutils is a documented requirement for compiling the kernel,
whether with clang or not [1].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst?h=v6.18#n29
>>
>> Alternatively, we could just dd the .BTF_ids data computed by
>> resolve_btfids at the right offset in the target ELF file.
>>
>> Surprisingly I couldn't find a good way to read a section offset and
>> size from the ELF with a specified format in a command line. Both
>> readelf and {llvm-}objdump give a human readable output, and it
>> appears we can't rely on the column order, for example.
>>
>> We could still try parsing readelf output with awk/grep, covering
>> output variants that appear in the kernel build.
>>
>> We can also do:
>>
>> llvm-readobj --elf-output-style=JSON --sections "$elf" | \
>> jq -r --arg name .BTF_ids '
>> .[0].Sections[] |
>> select(.Section.Name.Name == $name) |
>> "\(.Section.Offset) \(.Section.Size)"'
>>
>> ...but idk man, doesn't feel right.
>>
>> Most reliable way to determine the size and offset of .BTF_ids section
>> is probably reading them by a C program with libelf, such as
>> resolve_btfids. Which is quite ironic, given the recent
>> changes. Setting the irony aside, we could add smth like:
>> resolve_btfids --section-info=.BTF_ids $elf
>>
>> Reverting the gen-btf.sh patch is also a possible workaround, but I'd
>> really like to avoid it, given that BPF features/optimizations in
>> development depend on it.
>>
>> I'd appreciate comments and suggestions on this issue. Thank you!
>> ---
>> kernel/module/main.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/kernel/module/main.c b/kernel/module/main.c
>> index 710ee30b3bea..5bf456fad63e 100644
>> --- a/kernel/module/main.c
>> +++ b/kernel/module/main.c
>> @@ -1568,6 +1568,13 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
>> break;
>> default:
>> + if (sym[i].st_shndx >= info->hdr->e_shnum) {
>> + pr_err("%s: Symbol %s has an invalid section index %u (max %u)\n",
>> + mod->name, name, sym[i].st_shndx, info->hdr->e_shnum - 1);
>> + ret = -ENOEXEC;
>> + break;
>> + }
>> +
>> /* Divert to percpu allocation if a percpu var. */
>> if (sym[i].st_shndx == info->index.pcpu)
>> secbase = (unsigned long)mod_percpu(mod);
>
> I tried both llvm21 and llvm22 (where llvm21 is used in bpf ci).
>
> Without KASAN, I can reproduce the failure for llvm19/llvm21/llvm22.
> I did not test llvm20 and I assume it may fail too.
>
> The following llvm patch
> https://github.com/llvm/llvm-project/pull/170462
> can fix the issue. Currently it is still in review stage. The actual diff is
>
> diff --git a/llvm/lib/ObjCopy/ELF/ELFObject.cpp b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> index e5de17e093df..cc1527d996e2 100644
> --- a/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> +++ b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> @@ -2168,7 +2168,11 @@ Error Object::updateSectionData(SecPtr &Sec, ArrayRef<uint8_t> Data) {
> Data.size(), Sec->Name.c_str(), Sec->Size);
>
> if (!Sec->ParentSegment) {
> - Sec = std::make_unique<OwnedDataSection>(*Sec, Data);
> + SectionBase *Replaced = Sec.get();
> + SectionBase *Modified = &addSection<OwnedDataSection>(*Sec, Data);
> + DenseMap<SectionBase *, SectionBase *> Replacements{{Replaced, Modified}};
> + if (auto err = replaceSections(Replacements))
> + return err;
> } else {
> // The segment writer will be in charge of updating these contents.
> Sec->Size = Data.size();
>
> I applied the above patch to latest llvm21 and llvm22 and
> the crash is gone and the selftests can run properly.
Hi Yonghong, thank you for confirming the issue.
Patching llvm-objcopy would be great, it should be done. But we are
still going to be stuck with making sure older LLVMs can build the kernel.
So even if they backport the fix to v21, it won't help us much, unfortunately.
>
> With KASAN, everything is okay for llvm21 and llvm22.
>
> Not sure whether the llvm patch
> https://github.com/llvm/llvm-project/pull/170462
> can make into llvm21 or not as looks like llvm21 intends to
> freeze for now. See
> https://github.com/llvm/llvm-project/pull/168314#issuecomment-3645797175
> the llvm22 will branch into rc mode in January.
>
> I will try to see whether we can have a reasonable workaround
> for llvm21 llvm-objcopy (for without KASAN).
>
Hi Ihor, On Mon, Dec 29, 2025 at 12:40:10PM -0800, Ihor Solodrai wrote: > I think the simplest workaround is this one: use objcopy from binutils > instead of llvm-objcopy when doing --update-section. > > There are just 3 places where that happens, so the OBJCOPY > substitution is going to be localized. > > Also binutils is a documented requirement for compiling the kernel, > whether with clang or not [1]. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst?h=v6.18#n29 This would necessitate always specifying a CROSS_COMPILE variable when cross compiling with LLVM=1, which I would really like to avoid. The LLVM variants have generally been drop in substitutes for several versions now so some groups such as Android may not even have GNU binutils installed in their build environment (see a recent build fix [1]). I would much prefer detecting llvm-objcopy in Kconfig (such as by creating CONFIG_OBJCOPY_IS_LLVM using the existing check for llvm-objcopy in X86_X32_ABI in arch/x86/Kconfig) and requiring a working copy (>= 22.0.0 presuming the fix is soon merged) or an explicit opt into GNU objcopy via OBJCOPY=...objcopy for CONFIG_DEBUG_INFO_BTF to be selectable. > Patching llvm-objcopy would be great, it should be done. But we are > still going to be stuck with making sure older LLVMs can build the kernel. > So even if they backport the fix to v21, it won't help us much, unfortunately. 21.1.8 was the last planned 21.x release [2] so I think it is unlikely that a 21.1.9 would be released for this but we won't know until it is merged into main. Much agreed on handling the old versions. [1]: https://lore.kernel.org/20251218175824.3122690-1-cmllamas@google.com/ [2]: https://discourse.llvm.org/t/llvm-21-1-8-released/89144 Cheers, Nathan
On 12/29/25 1:29 PM, Nathan Chancellor wrote:
> Hi Ihor,
>
> On Mon, Dec 29, 2025 at 12:40:10PM -0800, Ihor Solodrai wrote:
>> I think the simplest workaround is this one: use objcopy from binutils
>> instead of llvm-objcopy when doing --update-section.
>>
>> There are just 3 places where that happens, so the OBJCOPY
>> substitution is going to be localized.
>>
>> Also binutils is a documented requirement for compiling the kernel,
>> whether with clang or not [1].
>>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst?h=v6.18#n29
>
> This would necessitate always specifying a CROSS_COMPILE variable when
> cross compiling with LLVM=1, which I would really like to avoid. The
> LLVM variants have generally been drop in substitutes for several
> versions now so some groups such as Android may not even have GNU
> binutils installed in their build environment (see a recent build
> fix [1]).
>
> I would much prefer detecting llvm-objcopy in Kconfig (such as by
> creating CONFIG_OBJCOPY_IS_LLVM using the existing check for
> llvm-objcopy in X86_X32_ABI in arch/x86/Kconfig) and requiring a working
> copy (>= 22.0.0 presuming the fix is soon merged) or an explicit opt
> into GNU objcopy via OBJCOPY=...objcopy for CONFIG_DEBUG_INFO_BTF to be
> selectable.
I like the idea of opt into GNU objcopy, however I think we should
avoid requiring kbuilds that want CONFIG_DEBUG_INFO_BTF to change any
configuration (such as adding an explicit OBJCOPY= in a build command).
I drafted a patch (pasted below), introducing BTF_OBJCOPY which
defaults to GNU objcopy. This implements the workaround, and should be
easy to update with a LLVM version check later after the bug is fixed.
This bit:
@@ -391,6 +391,7 @@ config DEBUG_INFO_BTF
depends on PAHOLE_VERSION >= 122
# pahole uses elfutils, which does not have support for Hexagon relocations
depends on !HEXAGON
+ depends on $(success,command -v $(BTF_OBJCOPY))
Will turn off DEBUG_INFO_BTF if relevant GNU objcopy happens to not be
installed.
However I am not sure this is the right way to fail here. Because if
the kernel really does need BTF (which is effectively all kernels
using BPF), then we are breaking them anyways just downstream of the
build.
An "objcopy: command not found" might make some pipelines red, but it
is very clear how to address.
Thoughts?
From 7c3b9cce97cc76d0365d8948b1ca36c61faddde3 Mon Sep 17 00:00:00 2001
From: Ihor Solodrai <ihor.solodrai@linux.dev>
Date: Mon, 29 Dec 2025 15:49:51 -0800
Subject: [PATCH] BTF_OBJCOPY
---
Makefile | 6 +++++-
lib/Kconfig.debug | 1 +
scripts/gen-btf.sh | 10 +++++-----
scripts/link-vmlinux.sh | 2 +-
tools/testing/selftests/bpf/Makefile | 4 ++--
5 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/Makefile b/Makefile
index 18adf5502244..b7797a85b8c2 100644
--- a/Makefile
+++ b/Makefile
@@ -534,6 +534,9 @@ CLIPPY_DRIVER = clippy-driver
BINDGEN = bindgen
PAHOLE = pahole
RESOLVE_BTFIDS = $(objtree)/tools/bpf/resolve_btfids/resolve_btfids
+# Always use GNU objcopy when manipulating BTF sections to work around
+# a bug in llvm-objcopy: https://github.com/llvm/llvm-project/issues/168060
+BTF_OBJCOPY = $(CROSS_COMPILE)objcopy
LEX = flex
YACC = bison
AWK = awk
@@ -627,7 +630,8 @@ export CLIPPY_CONF_DIR := $(srctree)
export ARCH SRCARCH CONFIG_SHELL BASH HOSTCC KBUILD_HOSTCFLAGS CROSS_COMPILE LD CC HOSTPKG_CONFIG
export RUSTC RUSTDOC RUSTFMT RUSTC_OR_CLIPPY_QUIET RUSTC_OR_CLIPPY BINDGEN
export HOSTRUSTC KBUILD_HOSTRUSTFLAGS
-export CPP AR NM STRIP OBJCOPY OBJDUMP READELF PAHOLE RESOLVE_BTFIDS LEX YACC AWK INSTALLKERNEL
+export CPP AR NM STRIP OBJCOPY OBJDUMP READELF LEX YACC AWK INSTALLKERNEL
+export PAHOLE RESOLVE_BTFIDS BTF_OBJCOPY
export PERL PYTHON3 CHECK CHECKFLAGS MAKE UTS_MACHINE HOSTCXX
export KGZIP KBZIP2 KLZOP LZMA LZ4 XZ ZSTD TAR
export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS KBUILD_PROCMACROLDFLAGS LDFLAGS_MODULE
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 60281c4f9e99..ec9e683244fa 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -391,6 +391,7 @@ config DEBUG_INFO_BTF
depends on PAHOLE_VERSION >= 122
# pahole uses elfutils, which does not have support for Hexagon relocations
depends on !HEXAGON
+ depends on $(success,command -v $(BTF_OBJCOPY))
help
Generate deduplicated BTF type information from DWARF debug info.
Turning this on requires pahole v1.22 or later, which will convert
diff --git a/scripts/gen-btf.sh b/scripts/gen-btf.sh
index 06c6d8becaa2..6ae671523edd 100755
--- a/scripts/gen-btf.sh
+++ b/scripts/gen-btf.sh
@@ -97,9 +97,9 @@ gen_btf_o()
# be redefined in the linker script.
info OBJCOPY "${btf_data}"
echo "" | ${CC} ${CLANG_FLAGS} -c -x c -o ${btf_data} -
- ${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
+ ${BTF_OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
--set-section-flags .BTF=alloc,readonly ${btf_data}
- ${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}
+ ${BTF_OBJCOPY} --only-section=.BTF --strip-all ${btf_data}
# Change e_type to ET_REL so that it can be used to link final vmlinux.
# GNU ld 2.35+ and lld do not allow an ET_EXEC input.
@@ -114,16 +114,16 @@ gen_btf_o()
embed_btf_data()
{
info OBJCOPY "${ELF_FILE}.BTF"
- ${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF ${ELF_FILE}
+ ${BTF_OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF ${ELF_FILE}
# a module might not have a .BTF_ids or .BTF.base section
local btf_base="${ELF_FILE}.BTF.base"
if [ -f "${btf_base}" ]; then
- ${OBJCOPY} --add-section .BTF.base=${btf_base} ${ELF_FILE}
+ ${BTF_OBJCOPY} --add-section .BTF.base=${btf_base} ${ELF_FILE}
fi
local btf_ids="${ELF_FILE}.BTF_ids"
if [ -f "${btf_ids}" ]; then
- ${OBJCOPY} --update-section .BTF_ids=${btf_ids} ${ELF_FILE}
+ ${BTF_OBJCOPY} --update-section .BTF_ids=${btf_ids} ${ELF_FILE}
fi
}
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index e2207e612ac3..4ad04d31f8bc 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -266,7 +266,7 @@ vmlinux_link "${VMLINUX}"
if is_enabled CONFIG_DEBUG_INFO_BTF; then
info OBJCOPY ${btfids_vmlinux}
- ${OBJCOPY} --update-section .BTF_ids=${btfids_vmlinux} ${VMLINUX}
+ ${BTF_OBJCOPY} --update-section .BTF_ids=${btfids_vmlinux} ${VMLINUX}
fi
mksysmap "${VMLINUX}" System.map
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index f28a32b16ff0..e998cac975c1 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -4,7 +4,7 @@ include ../../../scripts/Makefile.arch
include ../../../scripts/Makefile.include
CXX ?= $(CROSS_COMPILE)g++
-OBJCOPY ?= $(CROSS_COMPILE)objcopy
+BTF_OBJCOPY ?= $(CROSS_COMPILE)objcopy
CURDIR := $(abspath .)
TOOLSDIR := $(abspath ../../..)
@@ -657,7 +657,7 @@ $(TRUNNER_TEST_OBJS): $(TRUNNER_OUTPUT)/%.test.o: \
$$(if $$(TEST_NEEDS_BTFIDS), \
$$(call msg,BTFIDS,$(TRUNNER_BINARY),$$@) \
$(RESOLVE_BTFIDS) --btf $(TRUNNER_OUTPUT)/btf_data.bpf.o $$@; \
- $(OBJCOPY) --update-section .BTF_ids=$$@.BTF_ids $$@)
+ $(BTF_OBJCOPY) --update-section .BTF_ids=$$@.BTF_ids $$@)
$(TRUNNER_TEST_OBJS:.o=.d): $(TRUNNER_OUTPUT)/%.test.d: \
$(TRUNNER_TESTS_DIR)/%.c \
--
2.47.3
>
>> Patching llvm-objcopy would be great, it should be done. But we are
>> still going to be stuck with making sure older LLVMs can build the kernel.
>> So even if they backport the fix to v21, it won't help us much, unfortunately.
>
> 21.1.8 was the last planned 21.x release [2] so I think it is unlikely
> that a 21.1.9 would be released for this but we won't know until it is
> merged into main. Much agreed on handling the old versions.
>
> [1]: https://lore.kernel.org/20251218175824.3122690-1-cmllamas@google.com/
> [2]: https://discourse.llvm.org/t/llvm-21-1-8-released/89144
>
> Cheers,
> Nathan
On Mon, Dec 29, 2025 at 4:39 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > > On 12/29/25 1:29 PM, Nathan Chancellor wrote: > > Hi Ihor, > > > > On Mon, Dec 29, 2025 at 12:40:10PM -0800, Ihor Solodrai wrote: > >> I think the simplest workaround is this one: use objcopy from binutils > >> instead of llvm-objcopy when doing --update-section. > >> > >> There are just 3 places where that happens, so the OBJCOPY > >> substitution is going to be localized. > >> > >> Also binutils is a documented requirement for compiling the kernel, > >> whether with clang or not [1]. > >> > >> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst?h=v6.18#n29 > > > > This would necessitate always specifying a CROSS_COMPILE variable when > > cross compiling with LLVM=1, which I would really like to avoid. The > > LLVM variants have generally been drop in substitutes for several > > versions now so some groups such as Android may not even have GNU > > binutils installed in their build environment (see a recent build > > fix [1]). > > > > I would much prefer detecting llvm-objcopy in Kconfig (such as by > > creating CONFIG_OBJCOPY_IS_LLVM using the existing check for > > llvm-objcopy in X86_X32_ABI in arch/x86/Kconfig) and requiring a working > > copy (>= 22.0.0 presuming the fix is soon merged) or an explicit opt > > into GNU objcopy via OBJCOPY=...objcopy for CONFIG_DEBUG_INFO_BTF to be > > selectable. > > I like the idea of opt into GNU objcopy, however I think we should > avoid requiring kbuilds that want CONFIG_DEBUG_INFO_BTF to change any > configuration (such as adding an explicit OBJCOPY= in a build command). > > I drafted a patch (pasted below), introducing BTF_OBJCOPY which > defaults to GNU objcopy. This implements the workaround, and should be > easy to update with a LLVM version check later after the bug is fixed. > > This bit: > > @@ -391,6 +391,7 @@ config DEBUG_INFO_BTF > depends on PAHOLE_VERSION >= 122 > # pahole uses elfutils, which does not have support for Hexagon relocations > depends on !HEXAGON > + depends on $(success,command -v $(BTF_OBJCOPY)) > > Will turn off DEBUG_INFO_BTF if relevant GNU objcopy happens to not be > installed. > > However I am not sure this is the right way to fail here. Because if > the kernel really does need BTF (which is effectively all kernels > using BPF), then we are breaking them anyways just downstream of the > build. > > An "objcopy: command not found" might make some pipelines red, but it > is very clear how to address. > > Thoughts? > > > From 7c3b9cce97cc76d0365d8948b1ca36c61faddde3 Mon Sep 17 00:00:00 2001 > From: Ihor Solodrai <ihor.solodrai@linux.dev> > Date: Mon, 29 Dec 2025 15:49:51 -0800 > Subject: [PATCH] BTF_OBJCOPY > > --- > Makefile | 6 +++++- > lib/Kconfig.debug | 1 + > scripts/gen-btf.sh | 10 +++++----- > scripts/link-vmlinux.sh | 2 +- > tools/testing/selftests/bpf/Makefile | 4 ++-- > 5 files changed, 14 insertions(+), 9 deletions(-) All the makefile hackery looks like overkill and wrong direction. What's wrong with kernel/module/main.c change? Module loading already does a bunch of sanity checks for ELF in elf_validity_cache_copy(). + if (sym[i].st_shndx >= info->hdr->e_shnum) is just one more. Maybe it can be moved to elf_validity*() somewhere, but that's a minor detail. iiuc llvm-objcopy affects only bpf testmod, so not a general issue that needs top level makefile changes.
On 12/29/25 4:50 PM, Alexei Starovoitov wrote: > On Mon, Dec 29, 2025 at 4:39 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: >> >> On 12/29/25 1:29 PM, Nathan Chancellor wrote: >>> Hi Ihor, >>> >>> On Mon, Dec 29, 2025 at 12:40:10PM -0800, Ihor Solodrai wrote: >>>> I think the simplest workaround is this one: use objcopy from binutils >>>> instead of llvm-objcopy when doing --update-section. >>>> >>>> There are just 3 places where that happens, so the OBJCOPY >>>> substitution is going to be localized. >>>> >>>> Also binutils is a documented requirement for compiling the kernel, >>>> whether with clang or not [1]. >>>> >>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst?h=v6.18#n29 >>> >>> This would necessitate always specifying a CROSS_COMPILE variable when >>> cross compiling with LLVM=1, which I would really like to avoid. The >>> LLVM variants have generally been drop in substitutes for several >>> versions now so some groups such as Android may not even have GNU >>> binutils installed in their build environment (see a recent build >>> fix [1]). >>> >>> I would much prefer detecting llvm-objcopy in Kconfig (such as by >>> creating CONFIG_OBJCOPY_IS_LLVM using the existing check for >>> llvm-objcopy in X86_X32_ABI in arch/x86/Kconfig) and requiring a working >>> copy (>= 22.0.0 presuming the fix is soon merged) or an explicit opt >>> into GNU objcopy via OBJCOPY=...objcopy for CONFIG_DEBUG_INFO_BTF to be >>> selectable. >> >> I like the idea of opt into GNU objcopy, however I think we should >> avoid requiring kbuilds that want CONFIG_DEBUG_INFO_BTF to change any >> configuration (such as adding an explicit OBJCOPY= in a build command). >> >> I drafted a patch (pasted below), introducing BTF_OBJCOPY which >> defaults to GNU objcopy. This implements the workaround, and should be >> easy to update with a LLVM version check later after the bug is fixed. >> >> This bit: >> >> @@ -391,6 +391,7 @@ config DEBUG_INFO_BTF >> depends on PAHOLE_VERSION >= 122 >> # pahole uses elfutils, which does not have support for Hexagon relocations >> depends on !HEXAGON >> + depends on $(success,command -v $(BTF_OBJCOPY)) >> >> Will turn off DEBUG_INFO_BTF if relevant GNU objcopy happens to not be >> installed. >> >> However I am not sure this is the right way to fail here. Because if >> the kernel really does need BTF (which is effectively all kernels >> using BPF), then we are breaking them anyways just downstream of the >> build. >> >> An "objcopy: command not found" might make some pipelines red, but it >> is very clear how to address. >> >> Thoughts? >> >> >> From 7c3b9cce97cc76d0365d8948b1ca36c61faddde3 Mon Sep 17 00:00:00 2001 >> From: Ihor Solodrai <ihor.solodrai@linux.dev> >> Date: Mon, 29 Dec 2025 15:49:51 -0800 >> Subject: [PATCH] BTF_OBJCOPY >> >> --- >> Makefile | 6 +++++- >> lib/Kconfig.debug | 1 + >> scripts/gen-btf.sh | 10 +++++----- >> scripts/link-vmlinux.sh | 2 +- >> tools/testing/selftests/bpf/Makefile | 4 ++-- >> 5 files changed, 14 insertions(+), 9 deletions(-) > > All the makefile hackery looks like overkill and wrong direction. > > What's wrong with kernel/module/main.c change? > > Module loading already does a bunch of sanity checks for ELF > in elf_validity_cache_copy(). > > + if (sym[i].st_shndx >= info->hdr->e_shnum) > is just one more. > > Maybe it can be moved to elf_validity*() somewhere, > but that's a minor detail. > > iiuc llvm-objcopy affects only bpf testmod, so not a general > issue that needs top level makefile changes. By the way, we don't have to put BTF_OBJCOPY variable in the top level Makefile. It can be defined in Makefile.btf, which is included only with CONFIG_DEBUG_INFO_BTF=y We have to define BTF_OBJCOPY in the top-level makefile *if* we want CONFIG_DEBUG_INFO_BTF to depend on it, and get disabled if BTF_OBJCOPY is not set/available. I was trying to address Nathan's concern, that some kernel build environments might not have GNU binutils installed, and kconfig should detect that. IMO putting BTF_OBJCOPY in Makefile.btf is more appropriate, assuming the BTF_OBJCOPY variable is at all an acceptable workaround for the llvm-objcopy bug.
On Tue, Dec 30, 2025 at 10:45 AM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > > On 12/29/25 4:50 PM, Alexei Starovoitov wrote: > > On Mon, Dec 29, 2025 at 4:39 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > >> > >> On 12/29/25 1:29 PM, Nathan Chancellor wrote: > >>> Hi Ihor, > >>> > >>> On Mon, Dec 29, 2025 at 12:40:10PM -0800, Ihor Solodrai wrote: > >>>> I think the simplest workaround is this one: use objcopy from binutils > >>>> instead of llvm-objcopy when doing --update-section. > >>>> > >>>> There are just 3 places where that happens, so the OBJCOPY > >>>> substitution is going to be localized. > >>>> > >>>> Also binutils is a documented requirement for compiling the kernel, > >>>> whether with clang or not [1]. > >>>> > >>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst?h=v6.18#n29 > >>> > >>> This would necessitate always specifying a CROSS_COMPILE variable when > >>> cross compiling with LLVM=1, which I would really like to avoid. The > >>> LLVM variants have generally been drop in substitutes for several > >>> versions now so some groups such as Android may not even have GNU > >>> binutils installed in their build environment (see a recent build > >>> fix [1]). > >>> > >>> I would much prefer detecting llvm-objcopy in Kconfig (such as by > >>> creating CONFIG_OBJCOPY_IS_LLVM using the existing check for > >>> llvm-objcopy in X86_X32_ABI in arch/x86/Kconfig) and requiring a working > >>> copy (>= 22.0.0 presuming the fix is soon merged) or an explicit opt > >>> into GNU objcopy via OBJCOPY=...objcopy for CONFIG_DEBUG_INFO_BTF to be > >>> selectable. > >> > >> I like the idea of opt into GNU objcopy, however I think we should > >> avoid requiring kbuilds that want CONFIG_DEBUG_INFO_BTF to change any > >> configuration (such as adding an explicit OBJCOPY= in a build command). > >> > >> I drafted a patch (pasted below), introducing BTF_OBJCOPY which > >> defaults to GNU objcopy. This implements the workaround, and should be > >> easy to update with a LLVM version check later after the bug is fixed. > >> > >> This bit: > >> > >> @@ -391,6 +391,7 @@ config DEBUG_INFO_BTF > >> depends on PAHOLE_VERSION >= 122 > >> # pahole uses elfutils, which does not have support for Hexagon relocations > >> depends on !HEXAGON > >> + depends on $(success,command -v $(BTF_OBJCOPY)) > >> > >> Will turn off DEBUG_INFO_BTF if relevant GNU objcopy happens to not be > >> installed. > >> > >> However I am not sure this is the right way to fail here. Because if > >> the kernel really does need BTF (which is effectively all kernels > >> using BPF), then we are breaking them anyways just downstream of the > >> build. > >> > >> An "objcopy: command not found" might make some pipelines red, but it > >> is very clear how to address. > >> > >> Thoughts? > >> > >> > >> From 7c3b9cce97cc76d0365d8948b1ca36c61faddde3 Mon Sep 17 00:00:00 2001 > >> From: Ihor Solodrai <ihor.solodrai@linux.dev> > >> Date: Mon, 29 Dec 2025 15:49:51 -0800 > >> Subject: [PATCH] BTF_OBJCOPY > >> > >> --- > >> Makefile | 6 +++++- > >> lib/Kconfig.debug | 1 + > >> scripts/gen-btf.sh | 10 +++++----- > >> scripts/link-vmlinux.sh | 2 +- > >> tools/testing/selftests/bpf/Makefile | 4 ++-- > >> 5 files changed, 14 insertions(+), 9 deletions(-) > > > > All the makefile hackery looks like overkill and wrong direction. > > > > What's wrong with kernel/module/main.c change? > > > > Module loading already does a bunch of sanity checks for ELF > > in elf_validity_cache_copy(). > > > > + if (sym[i].st_shndx >= info->hdr->e_shnum) > > is just one more. > > > > Maybe it can be moved to elf_validity*() somewhere, > > but that's a minor detail. > > > > iiuc llvm-objcopy affects only bpf testmod, so not a general > > issue that needs top level makefile changes. > > By the way, we don't have to put BTF_OBJCOPY variable in the top level > Makefile. It can be defined in Makefile.btf, which is included only > with CONFIG_DEBUG_INFO_BTF=y > > We have to define BTF_OBJCOPY in the top-level makefile *if* we want > CONFIG_DEBUG_INFO_BTF to depend on it, and get disabled if BTF_OBJCOPY > is not set/available. > > I was trying to address Nathan's concern, that some kernel build > environments might not have GNU binutils installed, and kconfig should > detect that. IMO putting BTF_OBJCOPY in Makefile.btf is more > appropriate, assuming the BTF_OBJCOPY variable is at all an acceptable > workaround for the llvm-objcopy bug. I feel that fallback to binutils objcopy is going to have its own issues. It could have issues with cross compiling too. Asking developers to setup cross compile with llvm toolchain and binutils at the same time is imo too much. If we cannot rely on objcopy then resolve_btfids should do the whole thing.
On 12/29/25 4:50 PM, Alexei Starovoitov wrote:
> On Mon, Dec 29, 2025 at 4:39 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote:
>>
>> [...]
>>
>>
>> From 7c3b9cce97cc76d0365d8948b1ca36c61faddde3 Mon Sep 17 00:00:00 2001
>> From: Ihor Solodrai <ihor.solodrai@linux.dev>
>> Date: Mon, 29 Dec 2025 15:49:51 -0800
>> Subject: [PATCH] BTF_OBJCOPY
>>
>> ---
>> Makefile | 6 +++++-
>> lib/Kconfig.debug | 1 +
>> scripts/gen-btf.sh | 10 +++++-----
>> scripts/link-vmlinux.sh | 2 +-
>> tools/testing/selftests/bpf/Makefile | 4 ++--
>> 5 files changed, 14 insertions(+), 9 deletions(-)
>
> All the makefile hackery looks like overkill and wrong direction.
>
> What's wrong with kernel/module/main.c change?
>
> Module loading already does a bunch of sanity checks for ELF
> in elf_validity_cache_copy().
>
> + if (sym[i].st_shndx >= info->hdr->e_shnum)
> is just one more.
>
> Maybe it can be moved to elf_validity*() somewhere,
> but that's a minor detail.
>
> iiuc llvm-objcopy affects only bpf testmod, so not a general
> issue that needs top level makefile changes.
AFAIU, the problem is that the llvm-objcopy bug is essentially
use-after-free [1], that may (or may not) corrupt st_shndx value of
some symbols when executing --update-section.
And so we can't trust this command anywhere in the kernel build, even
though it only manifested itself in a BPF test module.
With the gen-btf.sh changes ${OBJCOPY} --update-section is called for
all binaries with .BTF_ids: vmlinux and all modules.
The fix in module.c is an independent kernel bug, that is hopefully
fixed with the st_shndx check.
[1] https://github.com/llvm/llvm-project/issues/168060#issuecomment-3533552952
On 12/23/25 9:36 PM, Yonghong Song wrote:
>
>
> On 12/23/25 4:57 PM, Ihor Solodrai wrote:
>> I've been chasing down the following flaky splat, introduced by recent
>> changes in BTF generation [1]:
>>
>> ------------[ cut here ]------------
>> BUG: unable to handle page fault for address: ffa000000233d828
>> #PF: supervisor read access in kernel mode
>> #PF: error_code(0x0000) - not-present page
>> PGD 100000067 P4D 100253067 PUD 100258067 PMD 0
>> Oops: Oops: 0000 [#1] SMP NOPTI
>> CPU: 1 UID: 0 PID: 390 Comm: test_progs Tainted: G W
>> OE 6.19.0-rc1-gf785a31395d9 #331 PREEMPT(full)
>> Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> 1.16.3-4.el9 04/01/2014
>> RIP: 0010:simplify_symbols+0x2b2/0x480
>> 9.737179] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83
>> fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20
>> c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
>> RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
>> RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
>> RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
>> RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
>> R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
>> R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
>> FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
>> PKRU: 55555554
>> Call Trace:
>> <TASK>
>> ? __kmalloc_node_track_caller_noprof+0x37f/0x740
>> ? __pfx_setup_modinfo_srcversion+0x10/0x10
>> ? srso_alias_return_thunk+0x5/0xfbef5
>> ? kstrdup+0x4a/0x70
>> ? srso_alias_return_thunk+0x5/0xfbef5
>> ? setup_modinfo_srcversion+0x1a/0x30
>> ? srso_alias_return_thunk+0x5/0xfbef5
>> ? setup_modinfo+0x12b/0x1e0
>> load_module+0x133a/0x1610
>> __x64_sys_finit_module+0x31b/0x450
>> ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> do_syscall_64+0x80/0x2d0
>> ? srso_alias_return_thunk+0x5/0xfbef5
>> ? exc_page_fault+0x95/0xc0
>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> RIP: 0033:0x7f1c63a2582d
>> 9.794028] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e
>> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
>> 08 0f 05 <48> 3d 01 f0 ff 8 8b 0d bb 15 0f 00 f7 d8 64 89 01 48
>> RSP: 002b:00007ffe513df128 EFLAGS: 00000206 ORIG_RAX:
>> 0000000000000139
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1c63a2582d
>> RDX: 0000000000000000 RSI: 0000000000ee83f9 RDI: 0000000000000016
>> RBP: 00007ffe513df150 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffe513e3588
>> R13: 000000000088fad0 R14: 00000000014bddb0 R15: 00007f1c63ba7000
>> </TASK>
>> Modules linked in: bpf_testmod(OE)
>> CR2: ffa000000233d828
>> ---[ end trace 0000000000000000 ]---
>> RIP: 0010:simplify_symbols+0x2b2/0x480
>> 9.821595] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83
>> fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20
>> c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
>> RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
>> RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
>> RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
>> RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
>> R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
>> R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
>> FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
>> PKRU: 55555554
>> Kernel panic - not syncing: Fatal exception
>> Kernel Offset: disabled
>>
>> This hasn't happened on BPF CI so far, for example, however I was able
>> to reproduce it on a particular x64 machine using a kernel built with
>> LLVM 20.
>>
>> The crash happens on attempt to load one of the BPF selftest modules
>> (tools/testing/selftests/bpf/test_kmods/bpf_test_modorder_x.ko) which
>> is used by kfunc_module_order test.
>>
>> The reason for the crash is that simplify_symbols() doesn't check for
>> bounds of the ELF section index:
>>
>> for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) {
>> const char *name = info->strtab + sym[i].st_name;
>>
>> switch (sym[i].st_shndx) {
>> case SHN_COMMON:
>>
>> [...]
>>
>> default:
>> /* Divert to percpu allocation if a percpu var. */
>> if (sym[i].st_shndx == info->index.pcpu)
>> secbase = (unsigned long)mod_percpu(mod);
>> else
>> /** HERE --> **/ secbase =
>> info->sechdrs[sym[i].st_shndx].sh_addr;
>> sym[i].st_value += secbase;
>> break;
>> }
>> }
>>
>> And in the case I was able to reproduce, the value 0xffff
>> (SHN_HIRESERVE aka SHN_XINDEX [2]) fell through here.
>>
>> Now this code fragment is between 15 and 20 years old, so obviously
>> it's not expected for a kmodule symbol to have such st_shndx
>> value. Even so, the kernel probably should fail loading the module
>> instead of crashing, which is what this patch attempts to fix.
>>
>> Investigating further, I discovered that the module binary became
>> corrupted by `${OBJCOPY} --update-section` operation updating .BTF_ids
>> section data in scripts/gen-btf.sh. This explains how the bug has
>> surfaced after gen-btf.sh was introduced:
>>
>> $ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
>> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended
>> symbol index (2), but unable to locate the extended symbol index table
>> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended
>> symbol index (3), but unable to locate the extended symbol index table
>> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended
>> symbol index (4), but unable to locate the extended symbol index table
>> 3: 0000000000000000 16 NOTYPE LOCAL DEFAULT RSV[0xffff]
>> __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
>> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended
>> symbol index (16), but unable to locate the extended symbol index table
>> 4: 0000000000000008 4 OBJECT LOCAL DEFAULT RSV[0xffff]
>> __BTF_ID__func__bpf_test_modorder_retx__44417
>>
>> vs expected
>>
>> $ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
>> 3: 0000000000000000 16 NOTYPE LOCAL DEFAULT 6
>> __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
>> 4: 0000000000000008 4 OBJECT LOCAL DEFAULT 6
>> __BTF_ID__func__bpf_test_modorder_retx__44417
>>
>> But why? Updating section data without changing it's size is not
>> supposed to affect sections indices, right?
>>
>> With a bit more testing I confirmed that this is a LLVM-specific
>> issue (doesn't reproduce with GCC kbuild), and it's not stable,
>> because in link-vmlinux.h we also do:
>>
>> ${OBJCOPY} --update-section .BTF_ids=${btfids_vmlinux} ${VMLINUX}
>>
>> However:
>>
>> $ llvm-readelf -s --wide ~/workspace/prog-aux/linux/vmlinux | grep
>> 0xffff
>> # no output, which is good
>>
>> So the suspect is the implementation of llvm-objcopy. As it turns out
>> there is a relevant known bug that explains the flakiness and isn't
>> fixed yet [3].
>>
>> [1]
>> https://lore.kernel.org/bpf/20251219181825.1289460-3-ihor.solodrai@linux.dev/
>> [2] https://man7.org/linux/man-pages/man5/elf.5.html
>> [3]
>> https://github.com/llvm/llvm-project/issues/168060#issuecomment-3533552952
>>
>> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
>>
>> ---
>>
>> RFC
>>
>> While this llvm-objcopy bug is not fixed, we can not trust it in the
>> kernel build pipeline. In the short-term we have to come up with a
>> workaround for .BTF_ids section update and replace the calls to
>> ${OBJCOPY} --update-section with something else.
>>
>> One potential workaround is to force the use of the objcopy (from
>> binutils) instead of llvm-objcopy when updating .BTF_ids section.
>>
>> Alternatively, we could just dd the .BTF_ids data computed by
>> resolve_btfids at the right offset in the target ELF file.
>>
>> Surprisingly I couldn't find a good way to read a section offset and
>> size from the ELF with a specified format in a command line. Both
>> readelf and {llvm-}objdump give a human readable output, and it
>> appears we can't rely on the column order, for example.
>>
>> We could still try parsing readelf output with awk/grep, covering
>> output variants that appear in the kernel build.
>>
>> We can also do:
>>
>> llvm-readobj --elf-output-style=JSON --sections "$elf" | \
>> jq -r --arg name .BTF_ids '
>> .[0].Sections[] |
>> select(.Section.Name.Name == $name) |
>> "\(.Section.Offset) \(.Section.Size)"'
>>
>> ...but idk man, doesn't feel right.
>>
>> Most reliable way to determine the size and offset of .BTF_ids section
>> is probably reading them by a C program with libelf, such as
>> resolve_btfids. Which is quite ironic, given the recent
>> changes. Setting the irony aside, we could add smth like:
>> resolve_btfids --section-info=.BTF_ids $elf
>>
>> Reverting the gen-btf.sh patch is also a possible workaround, but I'd
>> really like to avoid it, given that BPF features/optimizations in
>> development depend on it.
>>
>> I'd appreciate comments and suggestions on this issue. Thank you!
>> ---
>> kernel/module/main.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/kernel/module/main.c b/kernel/module/main.c
>> index 710ee30b3bea..5bf456fad63e 100644
>> --- a/kernel/module/main.c
>> +++ b/kernel/module/main.c
>> @@ -1568,6 +1568,13 @@ static int simplify_symbols(struct module
>> *mod, const struct load_info *info)
>> break;
>> default:
>> + if (sym[i].st_shndx >= info->hdr->e_shnum) {
>> + pr_err("%s: Symbol %s has an invalid section index
>> %u (max %u)\n",
>> + mod->name, name, sym[i].st_shndx,
>> info->hdr->e_shnum - 1);
>> + ret = -ENOEXEC;
>> + break;
>> + }
>> +
>> /* Divert to percpu allocation if a percpu var. */
>> if (sym[i].st_shndx == info->index.pcpu)
>> secbase = (unsigned long)mod_percpu(mod);
>
> I tried both llvm21 and llvm22 (where llvm21 is used in bpf ci).
>
> Without KASAN, I can reproduce the failure for llvm19/llvm21/llvm22.
> I did not test llvm20 and I assume it may fail too.
>
> The following llvm patch
> https://github.com/llvm/llvm-project/pull/170462
> can fix the issue. Currently it is still in review stage. The actual
> diff is
>
> diff --git a/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> index e5de17e093df..cc1527d996e2 100644
> --- a/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> +++ b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> @@ -2168,7 +2168,11 @@ Error Object::updateSectionData(SecPtr &Sec,
> ArrayRef<uint8_t> Data) {
> Data.size(), Sec->Name.c_str(), Sec->Size);
>
> if (!Sec->ParentSegment) {
> - Sec = std::make_unique<OwnedDataSection>(*Sec, Data);
> + SectionBase *Replaced = Sec.get();
> + SectionBase *Modified = &addSection<OwnedDataSection>(*Sec, Data);
> + DenseMap<SectionBase *, SectionBase *> Replacements{{Replaced,
> Modified}};
> + if (auto err = replaceSections(Replacements))
> + return err;
> } else {
> // The segment writer will be in charge of updating these contents.
> Sec->Size = Data.size();
>
> I applied the above patch to latest llvm21 and llvm22 and
> the crash is gone and the selftests can run properly.
>
> With KASAN, everything is okay for llvm21 and llvm22.
>
> Not sure whether the llvm patch
> https://github.com/llvm/llvm-project/pull/170462
> can make into llvm21 or not as looks like llvm21 intends to
> freeze for now. See
> https://github.com/llvm/llvm-project/pull/168314#issuecomment-3645797175
> the llvm22 will branch into rc mode in January.
>
> I will try to see whether we can have a reasonable workaround
> for llvm21 llvm-objcopy (for without KASAN).
>
I commented the llvm patch https://github.com/llvm/llvm-project/pull/170462
and hopefully the fix can land soon.
I didn't find a good solution. Currently if there are kfunc's in the module,
.BTF_ids section will be created. Previously, resolve_btfids will resolve
.BTF_ids such that the count and btf id will be resolved by resolve_btfids
itself.
The current approach, resolve_btfids will not populate the *correct* contents
to .BTF_ids section. Rather it created another file and try to do
update-section. This should work. But it may not work due to the llvm bug.
One possible workaround is in resolve_btfids, the .BTF_ids section is populated
correct contents and remove update-section for .BTF_ids.
© 2016 - 2026 Red Hat, Inc.