Makefile | 6 + arch/Kconfig | 8 + arch/arm64/Kconfig | 3 + arch/arm64/Kconfig.debug | 10 + arch/arm64/include/asm/stacktrace/common.h | 6 + arch/arm64/include/asm/thread_info.h | 4 +- arch/arm64/kernel/entry-common.c | 4 + arch/arm64/kernel/entry.S | 10 + arch/arm64/kernel/setup.c | 2 + arch/arm64/kernel/stacktrace.c | 102 ++++++++++ include/asm-generic/vmlinux.lds.h | 12 ++ include/linux/sframe_lookup.h | 43 +++++ kernel/Makefile | 1 + kernel/sframe.h | 215 +++++++++++++++++++++ kernel/sframe_lookup.c | 196 +++++++++++++++++++ 15 files changed, 621 insertions(+), 1 deletion(-) create mode 100644 include/linux/sframe_lookup.h create mode 100644 kernel/sframe.h create mode 100644 kernel/sframe_lookup.c
This patchset implements a generic kernel sframe-based [1] unwinder.
The main goal is to support reliable stacktraces on arm64.
On x86 orc unwinder provides reliable stacktraces. But arm64 misses the
required support from objtool: it cannot generate orc unwind tables for
arm64.
Currently, there's already a sframe unwinder proposed for userspace: [2].
Since the sframe unwind table algorithm is similar, these two proposal
could integrate common functionality in the future.
There are some incomplete features or challenges:
- The unwinder doesn't yet work with kernel modules. The `start_addr` of
FRE from kernel modules doesn't appear correct, preventing us from
unwinding functions from kernel modules.
- Currently, only GCC supports sframe.
Ref:
[1]: https://sourceware.org/binutils/docs/sframe-spec.html
[2]: https://lore.kernel.org/lkml/cover.1730150953.git.jpoimboe@kernel.org/
Madhavan T. Venkataraman (1):
arm64: Define TIF_PATCH_PENDING for livepatch
Weinan Liu (7):
unwind: build kernel with sframe info
arm64: entry: add unwind info for various kernel entries
unwind: add sframe v2 header
unwind: Implement generic sframe unwinder library
unwind: arm64: Add sframe unwinder on arm64
unwind: arm64: add reliable stacktrace support for arm64
arm64: Enable livepatch for ARM64
Makefile | 6 +
arch/Kconfig | 8 +
arch/arm64/Kconfig | 3 +
arch/arm64/Kconfig.debug | 10 +
arch/arm64/include/asm/stacktrace/common.h | 6 +
arch/arm64/include/asm/thread_info.h | 4 +-
arch/arm64/kernel/entry-common.c | 4 +
arch/arm64/kernel/entry.S | 10 +
arch/arm64/kernel/setup.c | 2 +
arch/arm64/kernel/stacktrace.c | 102 ++++++++++
include/asm-generic/vmlinux.lds.h | 12 ++
include/linux/sframe_lookup.h | 43 +++++
kernel/Makefile | 1 +
kernel/sframe.h | 215 +++++++++++++++++++++
kernel/sframe_lookup.c | 196 +++++++++++++++++++
15 files changed, 621 insertions(+), 1 deletion(-)
create mode 100644 include/linux/sframe_lookup.h
create mode 100644 kernel/sframe.h
create mode 100644 kernel/sframe_lookup.c
--
2.48.1.262.g85cc9f2d1e-goog
Weinan Liu <wnliu@google.com> writes: > This patchset implements a generic kernel sframe-based [1] unwinder. > The main goal is to support reliable stacktraces on arm64. > > On x86 orc unwinder provides reliable stacktraces. But arm64 misses the > required support from objtool: it cannot generate orc unwind tables for > arm64. > > Currently, there's already a sframe unwinder proposed for userspace: [2]. > Since the sframe unwind table algorithm is similar, these two proposal > could integrate common functionality in the future. > > There are some incomplete features or challenges: > - The unwinder doesn't yet work with kernel modules. The `start_addr` of > FRE from kernel modules doesn't appear correct, preventing us from > unwinding functions from kernel modules. > - Currently, only GCC supports sframe. > > Ref: > [1]: https://sourceware.org/binutils/docs/sframe-spec.html > [2]: https://lore.kernel.org/lkml/cover.1730150953.git.jpoimboe@kernel.org/ > Hi Weinan, Thanks for working on this. I tested this set on my setup and faced some issues, here are the details: Here is my setup [on AWS c6gd.16xlarge instance]: ------------------------------------------------- [root@ip-172-31-32-86 linux-upstream]# uname -a Linux ip-172-31-32-86.ec2.internal 6.14.0-rc1+ #1 SMP Tue Feb 4 14:15:55 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux [root@ip-172-31-32-86 linux-upstream]# git log --oneline e9a702365 (HEAD -> master) arm64: Enable livepatch for ARM64 5dedc956e arm64: Define TIF_PATCH_PENDING for livepatch ba563b31a unwind: arm64: add reliable stacktrace support for arm64 d807d392d unwind: arm64: Add sframe unwinder on arm64 7872f050b unwind: Implement generic sframe unwinder library 03d2ad003 unwind: add sframe v2 header 5e95cc051 arm64: entry: add unwind info for various kernel entries faff6cbc3 unwind: build kernel with sframe info 0de63bb7d (origin/master, origin/HEAD) Merge tag 'pull-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 902e09c8a fix braino in "9p: fix ->rename_sem exclusion" f286757b6 Merge tag 'timers-urgent-2025-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip a360f3ffd (grafted) Merge tag 'irq-urgent-2025-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip bb2784d9a (grafted) jiffies: Cast to unsigned long in secs_to_jiffies() conversion 30d61efe1 (grafted) 9p: fix ->rename_sem exclusion [root@ip-172-31-32-86 linux-upstream]# grep SFRAME .config CONFIG_AS_HAS_SFRAME_SUPPORT=y CONFIG_SFRAME_UNWIND_TABLE=y CONFIG_SFRAME_UNWINDER=y [root@ip-172-31-32-86 linux-upstream]# grep LIVEPATCH .config CONFIG_HAVE_LIVEPATCH=y CONFIG_LIVEPATCH=y CONFIG_SAMPLE_LIVEPATCH=m [root@ip-172-31-32-86 linux-upstream]# as --version GNU assembler version 2.41-50.al2023.0.2 Copyright (C) 2023 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License version 3 or later. This program has absolutely no warranty. This assembler was configured for a target of `aarch64-amazon-linux'. [root@ip-172-31-32-86 linux-upstream]# gcc --version gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2) Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Loading the livepatch-sameple module: ------------------------------------- [root@ip-172-31-32-86 linux-upstream]# kpatch load /lib/modules/6.14.0-rc1+/kernel/samples/livepatch/livepatch-sample.ko loading patch module: /lib/modules/6.14.0-rc1+/kernel/samples/livepatch/livepatch-sample.ko waiting (up to 15 seconds) for patch transition to complete... patch transition has stalled! <4>kpatch: Livepatch process signaling is performed automatically on your system. <4>kpatch: Skipping manual process signaling. waiting (up to 60 seconds) for patch transition to complete... Stalled processes: 340 kdevtmpfs stack: [<0>] devtmpfs_work_loop+0x2cc/0x2d8 [<0>] devtmpfsd+0x4c/0x58 [<0>] kthread+0xf0/0x100 [<0>] ret_from_fork+0x10/0x20 module livepatch_sample did not complete its transition, unloading... disabling patch module: livepatch_sample waiting (up to 15 seconds) for patch transition to complete... transition complete (3 seconds) unloading patch module: livepatch_sample <4>kpatch: error: failed to load module livepatch_sample (transition stalled) Useful messages from kernel log [pr_debug enabled]: --------------------------------------------------- livepatch: enabling patch 'livepatch_sample' livepatch: 'livepatch_sample': initializing patching transition livepatch: 'livepatch_sample': starting patching transition livepatch: klp_try_switch_task: kdevtmpfs:340 has an unreliable stack livepatch: klp_try_switch_task: insmod:9226 has an unreliable stack livepatch: klp_try_switch_task: swapper/63:0 is running [......SNIP.......] livepatch: klp_try_switch_task: kdevtmpfs:340 has an unreliable stack [......SNIP.......] livepatch: signaling remaining tasks livepatch: klp_try_switch_task: kdevtmpfs:340 has an unreliable stack livepatch: 'livepatch_sample': reversing transition from patching to unpatching livepatch: 'livepatch_sample': starting unpatching transition livepatch: klp_try_switch_task: swapper/45:0 is running livepatch: 'livepatch_sample': completing unpatching transition livepatch: 'livepatch_sample': unpatching complete Please let me know if you are aware of this already or if this is expected behaviour with this version. I will try to debug this from my side as well. Also let me know if you need more details for debugging this. P.S. - I also saw multiple build warning like: ld: warning: orphan section `.eh_frame' from `arch/arm64/kernel/entry.o' being placed in section `.eh_frame' ld: warning: orphan section `.init.sframe' from `arch/arm64/kernel/pi/lib-fdt.pi.o' being placed in section `.init.sframe' Thanks, Puranjay
Puranjay Mohan <puranjay@kernel.org> writes:
> Weinan Liu <wnliu@google.com> writes:
>
>> This patchset implements a generic kernel sframe-based [1] unwinder.
>> The main goal is to support reliable stacktraces on arm64.
>>
>> On x86 orc unwinder provides reliable stacktraces. But arm64 misses the
>> required support from objtool: it cannot generate orc unwind tables for
>> arm64.
>>
>> Currently, there's already a sframe unwinder proposed for userspace: [2].
>> Since the sframe unwind table algorithm is similar, these two proposal
>> could integrate common functionality in the future.
>>
>> There are some incomplete features or challenges:
>> - The unwinder doesn't yet work with kernel modules. The `start_addr` of
>> FRE from kernel modules doesn't appear correct, preventing us from
>> unwinding functions from kernel modules.
>> - Currently, only GCC supports sframe.
>>
>> Ref:
>> [1]: https://sourceware.org/binutils/docs/sframe-spec.html
>> [2]: https://lore.kernel.org/lkml/cover.1730150953.git.jpoimboe@kernel.org/
>>
>
> Hi Weinan,
> Thanks for working on this.
>
> I tested this set on my setup and faced some issues, here are the
> details:
>
> Here is my setup [on AWS c6gd.16xlarge instance]:
> -------------------------------------------------
>
> [root@ip-172-31-32-86 linux-upstream]# uname -a
> Linux ip-172-31-32-86.ec2.internal 6.14.0-rc1+ #1 SMP Tue Feb 4 14:15:55 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux
>
> [root@ip-172-31-32-86 linux-upstream]# git log --oneline
> e9a702365 (HEAD -> master) arm64: Enable livepatch for ARM64
> 5dedc956e arm64: Define TIF_PATCH_PENDING for livepatch
> ba563b31a unwind: arm64: add reliable stacktrace support for arm64
> d807d392d unwind: arm64: Add sframe unwinder on arm64
> 7872f050b unwind: Implement generic sframe unwinder library
> 03d2ad003 unwind: add sframe v2 header
> 5e95cc051 arm64: entry: add unwind info for various kernel entries
> faff6cbc3 unwind: build kernel with sframe info
> 0de63bb7d (origin/master, origin/HEAD) Merge tag 'pull-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> 902e09c8a fix braino in "9p: fix ->rename_sem exclusion"
> f286757b6 Merge tag 'timers-urgent-2025-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> a360f3ffd (grafted) Merge tag 'irq-urgent-2025-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> bb2784d9a (grafted) jiffies: Cast to unsigned long in secs_to_jiffies() conversion
> 30d61efe1 (grafted) 9p: fix ->rename_sem exclusion
>
> [root@ip-172-31-32-86 linux-upstream]# grep SFRAME .config
> CONFIG_AS_HAS_SFRAME_SUPPORT=y
> CONFIG_SFRAME_UNWIND_TABLE=y
> CONFIG_SFRAME_UNWINDER=y
> [root@ip-172-31-32-86 linux-upstream]# grep LIVEPATCH .config
> CONFIG_HAVE_LIVEPATCH=y
> CONFIG_LIVEPATCH=y
> CONFIG_SAMPLE_LIVEPATCH=m
>
> [root@ip-172-31-32-86 linux-upstream]# as --version
> GNU assembler version 2.41-50.al2023.0.2
> Copyright (C) 2023 Free Software Foundation, Inc.
> This program is free software; you may redistribute it under the terms of
> the GNU General Public License version 3 or later.
> This program has absolutely no warranty.
> This assembler was configured for a target of `aarch64-amazon-linux'.
>
> [root@ip-172-31-32-86 linux-upstream]# gcc --version
> gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)
> Copyright (C) 2021 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> Loading the livepatch-sameple module:
> -------------------------------------
>
> [root@ip-172-31-32-86 linux-upstream]# kpatch load /lib/modules/6.14.0-rc1+/kernel/samples/livepatch/livepatch-sample.ko
> loading patch module: /lib/modules/6.14.0-rc1+/kernel/samples/livepatch/livepatch-sample.ko
> waiting (up to 15 seconds) for patch transition to complete...
> patch transition has stalled!
> <4>kpatch: Livepatch process signaling is performed automatically on your system.
> <4>kpatch: Skipping manual process signaling.
> waiting (up to 60 seconds) for patch transition to complete...
>
> Stalled processes:
> 340 kdevtmpfs
> stack:
> [<0>] devtmpfs_work_loop+0x2cc/0x2d8
> [<0>] devtmpfsd+0x4c/0x58
> [<0>] kthread+0xf0/0x100
> [<0>] ret_from_fork+0x10/0x20
> module livepatch_sample did not complete its transition, unloading...
> disabling patch module: livepatch_sample
> waiting (up to 15 seconds) for patch transition to complete...
> transition complete (3 seconds)
> unloading patch module: livepatch_sample
> <4>kpatch: error: failed to load module livepatch_sample (transition stalled)
After some debugging this is what I found:
devtmpfsd() calls devtmpfs_work_loop() which is marked '__noreturn' and has an
infinite loop. The compiler puts the `bl` to devtmpfs_work_loop() as the the
last instruction in devtmpfsd() and therefore on entry to devtmpfs_work_loop(),
LR points to an instruction beyond devtmpfsd() and this consfuses the unwinder.
ffff800080d9a070 <devtmpfsd>:
ffff800080d9a070: d503201f nop
ffff800080d9a074: d503201f nop
ffff800080d9a078: d503233f paciasp
ffff800080d9a07c: a9be7bfd stp x29, x30, [sp, #-32]!
ffff800080d9a080: 910003fd mov x29, sp
ffff800080d9a084: f9000bf3 str x19, [sp, #16]
ffff800080d9a088: 943378e8 bl ffff800081a78428 <devtmpfs_setup>
ffff800080d9a08c: 90006ca1 adrp x1, ffff800081b2e000 <unique_processor_ids+0x3758>
ffff800080d9a090: 2a0003f3 mov w19, w0
ffff800080d9a094: 912de021 add x1, x1, #0xb78
ffff800080d9a098: 91002020 add x0, x1, #0x8
ffff800080d9a09c: 97cd2a43 bl ffff8000800e49a8 <complete>
ffff800080d9a0a0: 340000d3 cbz w19, ffff800080d9a0b8 <devtmpfsd+0x48>
ffff800080d9a0a4: 2a1303e0 mov w0, w19
ffff800080d9a0a8: f9400bf3 ldr x19, [sp, #16]
ffff800080d9a0ac: a8c27bfd ldp x29, x30, [sp], #32
ffff800080d9a0b0: d50323bf autiasp
ffff800080d9a0b4: d65f03c0 ret
ffff800080d9a0b8: 97f06526 bl ffff8000809b3550 <devtmpfs_work_loop>
ffff800080d9a0bc: 00000000 udf #0
ffff800080d9a0c0: d503201f nop
ffff800080d9a0c4: d503201f nop
find_fde() got pc=0xffff800080d9a0bc which is not in [sfde_func_start_address, sfde_func_size)
output for readelf --sframe for devtmpfsd()
func idx [51825]: pc = 0xffff800080d9a070, size = 76 bytes
STARTPC CFA FP RA
ffff800080d9a070 sp+0 u u
ffff800080d9a07c sp+0 u u[s]
ffff800080d9a080 sp+32 c-32 c-24[s]
ffff800080d9a0b0 sp+0 u u[s]
ffff800080d9a0b4 sp+0 u u
ffff800080d9a0b8 sp+32 c-32 c-24[s]
The unwinder and all the related infra is assuming that the return address
will be part of a valid function which is not the case here.
I am not sure which component needs to be fixed here, but the following
patch(which is a hack) fixes the issue by considering the return address as
part of the function descriptor entry.
-- 8< --
diff --git a/kernel/sframe_lookup.c b/kernel/sframe_lookup.c
index 846f1da95..28bec5064 100644
--- a/kernel/sframe_lookup.c
+++ b/kernel/sframe_lookup.c
@@ -82,7 +82,7 @@ static struct sframe_fde *find_fde(const struct sframe_table *tbl, unsigned long
if (f >= tbl->sfhdr_p->num_fdes || f < 0)
return NULL;
fdep = tbl->fde_p + f;
- if (ip < fdep->start_addr || ip >= fdep->start_addr + fdep->size)
+ if (ip < fdep->start_addr || ip > fdep->start_addr + fdep->size)
return NULL;
return fdep;
@@ -106,7 +106,7 @@ static int find_fre(const struct sframe_table *tbl, unsigned long pc,
else
ip_off = (int32_t)(pc - (unsigned long)tbl->sfhdr_p) - fdep->start_addr;
- if (ip_off < 0 || ip_off >= fdep->size)
+ if (ip_off < 0 || ip_off > fdep->size)
return -EINVAL;
/*
-- >8 --
Thanks,
Puranjay
> After some debugging this is what I found: > > devtmpfsd() calls devtmpfs_work_loop() which is marked '__noreturn' and has an > infinite loop. The compiler puts the `bl` to devtmpfs_work_loop() as the the > last instruction in devtmpfsd() and therefore on entry to devtmpfs_work_loop(), > LR points to an instruction beyond devtmpfsd() and this consfuses the unwinder. > > ffff800080d9a070 <devtmpfsd>: > ffff800080d9a070: d503201f nop > ffff800080d9a074: d503201f nop > ffff800080d9a078: d503233f paciasp > ffff800080d9a07c: a9be7bfd stp x29, x30, [sp, #-32]! > ffff800080d9a080: 910003fd mov x29, sp > ffff800080d9a084: f9000bf3 str x19, [sp, #16] > ffff800080d9a088: 943378e8 bl ffff800081a78428 <devtmpfs_setup> > ffff800080d9a08c: 90006ca1 adrp x1, ffff800081b2e000 <unique_processor_ids+0x3758> > ffff800080d9a090: 2a0003f3 mov w19, w0 > ffff800080d9a094: 912de021 add x1, x1, #0xb78 > ffff800080d9a098: 91002020 add x0, x1, #0x8 > ffff800080d9a09c: 97cd2a43 bl ffff8000800e49a8 <complete> > ffff800080d9a0a0: 340000d3 cbz w19, ffff800080d9a0b8 <devtmpfsd+0x48> > ffff800080d9a0a4: 2a1303e0 mov w0, w19 > ffff800080d9a0a8: f9400bf3 ldr x19, [sp, #16] > ffff800080d9a0ac: a8c27bfd ldp x29, x30, [sp], #32 > ffff800080d9a0b0: d50323bf autiasp > ffff800080d9a0b4: d65f03c0 ret > ffff800080d9a0b8: 97f06526 bl ffff8000809b3550 <devtmpfs_work_loop> > ffff800080d9a0bc: 00000000 udf #0 > ffff800080d9a0c0: d503201f nop > ffff800080d9a0c4: d503201f nop > > find_fde() got pc=0xffff800080d9a0bc which is not in [sfde_func_start_address, sfde_func_size) > > output for readelf --sframe for devtmpfsd() > > func idx [51825]: pc = 0xffff800080d9a070, size = 76 bytes > STARTPC CFA FP RA > ffff800080d9a070 sp+0 u u > ffff800080d9a07c sp+0 u u[s] > ffff800080d9a080 sp+32 c-32 c-24[s] > ffff800080d9a0b0 sp+0 u u[s] > ffff800080d9a0b4 sp+0 u u > ffff800080d9a0b8 sp+32 c-32 c-24[s] > > The unwinder and all the related infra is assuming that the return address > will be part of a valid function which is not the case here. > > I am not sure which component needs to be fixed here, but the following > patch(which is a hack) fixes the issue by considering the return address as > part of the function descriptor entry. > > -- 8< -- > > diff --git a/kernel/sframe_lookup.c b/kernel/sframe_lookup.c > index 846f1da95..28bec5064 100644 > --- a/kernel/sframe_lookup.c > +++ b/kernel/sframe_lookup.c > @@ -82,7 +82,7 @@ static struct sframe_fde *find_fde(const struct sframe_table *tbl, unsigned long > if (f >= tbl->sfhdr_p->num_fdes || f < 0) > return NULL; > fdep = tbl->fde_p + f; > - if (ip < fdep->start_addr || ip >= fdep->start_addr + fdep->size) > + if (ip < fdep->start_addr || ip > fdep->start_addr + fdep->size) > return NULL; > > return fdep; > @@ -106,7 +106,7 @@ static int find_fre(const struct sframe_table *tbl, unsigned long pc, > else > ip_off = (int32_t)(pc - (unsigned long)tbl->sfhdr_p) - fdep->start_addr; > > - if (ip_off < 0 || ip_off >= fdep->size) > + if (ip_off < 0 || ip_off > fdep->size) > return -EINVAL; > > /* > > -- >8 -- > > Thanks, > Puranjay Thank you for reporting this issue. I just found out that Josh also intentionally uses '>' instead of '>=' for the same reason https://lore.kernel.org/lkml/20250122225257.h64ftfnorofe7cb4@jpoimboe/T/#m6d70a20ed9f5b3bbe5b24b24b8c5dcc603a79101 QQ, do we need to care the stacktrace after '__noreturn' function?
Weinan Liu <wnliu@google.com> writes: >> After some debugging this is what I found: >> >> devtmpfsd() calls devtmpfs_work_loop() which is marked '__noreturn' and has an >> infinite loop. The compiler puts the `bl` to devtmpfs_work_loop() as the the >> last instruction in devtmpfsd() and therefore on entry to devtmpfs_work_loop(), >> LR points to an instruction beyond devtmpfsd() and this consfuses the unwinder. >> >> ffff800080d9a070 <devtmpfsd>: >> ffff800080d9a070: d503201f nop >> ffff800080d9a074: d503201f nop >> ffff800080d9a078: d503233f paciasp >> ffff800080d9a07c: a9be7bfd stp x29, x30, [sp, #-32]! >> ffff800080d9a080: 910003fd mov x29, sp >> ffff800080d9a084: f9000bf3 str x19, [sp, #16] >> ffff800080d9a088: 943378e8 bl ffff800081a78428 <devtmpfs_setup> >> ffff800080d9a08c: 90006ca1 adrp x1, ffff800081b2e000 <unique_processor_ids+0x3758> >> ffff800080d9a090: 2a0003f3 mov w19, w0 >> ffff800080d9a094: 912de021 add x1, x1, #0xb78 >> ffff800080d9a098: 91002020 add x0, x1, #0x8 >> ffff800080d9a09c: 97cd2a43 bl ffff8000800e49a8 <complete> >> ffff800080d9a0a0: 340000d3 cbz w19, ffff800080d9a0b8 <devtmpfsd+0x48> >> ffff800080d9a0a4: 2a1303e0 mov w0, w19 >> ffff800080d9a0a8: f9400bf3 ldr x19, [sp, #16] >> ffff800080d9a0ac: a8c27bfd ldp x29, x30, [sp], #32 >> ffff800080d9a0b0: d50323bf autiasp >> ffff800080d9a0b4: d65f03c0 ret >> ffff800080d9a0b8: 97f06526 bl ffff8000809b3550 <devtmpfs_work_loop> >> ffff800080d9a0bc: 00000000 udf #0 >> ffff800080d9a0c0: d503201f nop >> ffff800080d9a0c4: d503201f nop >> >> find_fde() got pc=0xffff800080d9a0bc which is not in [sfde_func_start_address, sfde_func_size) >> >> output for readelf --sframe for devtmpfsd() >> >> func idx [51825]: pc = 0xffff800080d9a070, size = 76 bytes >> STARTPC CFA FP RA >> ffff800080d9a070 sp+0 u u >> ffff800080d9a07c sp+0 u u[s] >> ffff800080d9a080 sp+32 c-32 c-24[s] >> ffff800080d9a0b0 sp+0 u u[s] >> ffff800080d9a0b4 sp+0 u u >> ffff800080d9a0b8 sp+32 c-32 c-24[s] >> >> The unwinder and all the related infra is assuming that the return address >> will be part of a valid function which is not the case here. >> >> I am not sure which component needs to be fixed here, but the following >> patch(which is a hack) fixes the issue by considering the return address as >> part of the function descriptor entry. >> >> -- 8< -- >> >> diff --git a/kernel/sframe_lookup.c b/kernel/sframe_lookup.c >> index 846f1da95..28bec5064 100644 >> --- a/kernel/sframe_lookup.c >> +++ b/kernel/sframe_lookup.c >> @@ -82,7 +82,7 @@ static struct sframe_fde *find_fde(const struct sframe_table *tbl, unsigned long >> if (f >= tbl->sfhdr_p->num_fdes || f < 0) >> return NULL; >> fdep = tbl->fde_p + f; >> - if (ip < fdep->start_addr || ip >= fdep->start_addr + fdep->size) >> + if (ip < fdep->start_addr || ip > fdep->start_addr + fdep->size) >> return NULL; >> >> return fdep; >> @@ -106,7 +106,7 @@ static int find_fre(const struct sframe_table *tbl, unsigned long pc, >> else >> ip_off = (int32_t)(pc - (unsigned long)tbl->sfhdr_p) - fdep->start_addr; >> >> - if (ip_off < 0 || ip_off >= fdep->size) >> + if (ip_off < 0 || ip_off > fdep->size) >> return -EINVAL; >> >> /* >> >> -- >8 -- >> >> Thanks, >> Puranjay > > Thank you for reporting this issue. > I just found out that Josh also intentionally uses '>' instead of '>=' for the same reason > https://lore.kernel.org/lkml/20250122225257.h64ftfnorofe7cb4@jpoimboe/T/#m6d70a20ed9f5b3bbe5b24b24b8c5dcc603a79101 > > QQ, do we need to care the stacktrace after '__noreturn' function? Yes, I think we should, but others people could add more to this. I have been testing this series with Kpatch and created a PR that works with this unwinder: https://github.com/dynup/kpatch/pull/1439 For the modules, I think we need per module sframe tables that are initialised when the module is loaded. And the unwinder should use the module specific table if the IP is in a module's code. Have you already started working on it? if not I would like to help and work on that. Thanks, Puranjay
On Fri, Feb 7, 2025 at 4:16 AM Puranjay Mohan <puranjay@kernel.org> wrote: > > Yes, I think we should, but others people could add more to this. > > I have been testing this series with Kpatch and created a PR that works > with this unwinder: https://github.com/dynup/kpatch/pull/1439 > > For the modules, I think we need per module sframe tables that are > initialised when the module is loaded. And the unwinder should use the > module specific table if the IP is in a module's code. > > Have you already started working on it? if not I would like to help and > work on that. > > Thanks, > Puranjay Thanks for updating the arm64 kpatch PR so quickly. So we can use kpatch to test this proposal. I already have a WIP patch to add sframe support to the kernel module. However, it is not yet working. I had trouble unwinding frames for the kernel module using the current algorithm. Indu has likely identified the issue and will be addressing it from the toolchain side. https://sourceware.org/bugzilla/show_bug.cgi?id=32666
On Mon, Feb 10, 2025 at 12:30 AM Weinan Liu <wnliu@google.com> wrote:
> I already have a WIP patch to add sframe support to the kernel module.
> However, it is not yet working. I had trouble unwinding frames for the
> kernel module using the current algorithm.
>
> Indu has likely identified the issue and will be addressing it from the
> toolchain side.
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=32666
I have a working in progress patch that adds sframe support for kernel
module.
https://github.com/heuza/linux/tree/sframe_unwinder.rfc
According to the sframe table values I got during runtime testing, looks
like the offsets are not correct .
When unwind symbols init_module(0xffff80007b155048) from the kernel
module(livepatch-sample.ko), the start_address of the FDE entries in the
sframe table of the kernel modules appear incorrect.
For instance, the first FDE's start_addr is reported as -20564. Adding
this offset to the module's sframe section address (0xffff80007b15a040)
yields 0xffff80007b154fec, which is not within the livepatch-sample.ko
memory region(It should be larger than 0xffff80007b155000).
Here are the sframe table values of the livepatch-samples.ko that I print
by qemu + gdb.
```
$ /usr/bin/aarch64-linux-gnu-objdump -L --sframe=.sframe ./samples/livepatch/livepatch-sample.ko
./samples/livepatch/livepatch-sample.ko: file format elf64-littleaarch64
Contents of the SFrame section .sframe:
Header :
Version: SFRAME_VERSION_2
Flags: SFRAME_F_FDE_SORTED
Num FDEs: 3
Num FREs: 11
Function Index :
func idx [0]: pc = 0x0, size = 12 bytes
STARTPC CFA FP RA
0000000000000000 sp+0 u u
func idx [1]: pc = 0x0, size = 44 bytes
STARTPC CFA FP RA
0000000000000000 sp+0 u u
000000000000000c sp+0 u u[s]
0000000000000010 sp+16 c-16 c-8[s]
0000000000000024 sp+0 u u[s]
0000000000000028 sp+0 u u
func idx [2]: pc = 0x0, size = 56 bytes
STARTPC CFA FP RA
0000000000000000 sp+0 u u
000000000000000c sp+0 u u[s]
0000000000000010 sp+16 c-16 c-8[s]
0000000000000030 sp+0 u u[s]
0000000000000034 sp+0 u u
(gdb) bt
#0 find_fde (tbl=0xffff80007b157708, pc=18446603338286190664) at kernel/sframe_lookup.c:75
#1 0xffff80008031e260 in sframe_find_pc (pc=18446603338286190664, entry=0xffff800086f83800) at kernel/sframe_lookup.c:175
#2 0xffff800080035a48 in unwind_next_frame_sframe (state=0xffff800086f83828) at arch/arm64/kernel/stacktrace.c:270
#3 kunwind_next (state=0xffff800086f83828) at arch/arm64/kernel/stacktrace.c:332
...
(gdb) lx-symbols
loading vmlinux
scanning for modules in /home/wnliu/kernel
loading @0xffff80007b155000: /home/wnliu/kernel/samples/livepatch/livepatch-sample.ko
loading @0xffff80007b14d000: /home/wnliu/kernel/fs/fat/vfat.ko
loading @0xffff80007b130000: /home/wnliu/kernel/fs/fat/fat.ko
(gdb) p/x *tbl->sfhdr_p
$5 = {preamble = {magic = 0xdee2, version = 0x2, flags = 0x1}, abi_arch = 0x2, cfa_fixed_fp_offset = 0x0, cfa_fixed_ra_offset = 0x0, auxhdr_len = 0x0, num_fdes = 0x3, num_fres = 0xb, fre_len = 0x25, fdes_off = 0x0, fres_off = 0x3c}
(gdb) p/x tbl->sfhdr_p
$6 = 0xffff80007b15a040
(gdb) p *tbl->fde_p
$7 = {start_addr = -20564, size = 12, fres_off = 0, fres_num = 1, info = 0 '\000', rep_size = 0 '\000', padding = 0}
(gdb) p *(tbl->fde_p + 1)
$11 = {start_addr = -20552, size = 44, fres_off = 3, fres_num = 5, info = 0 '\000', rep_size = 0 '\000', padding = 0}
(gdb) p *(tbl->fde_p + 2)
$12 = {start_addr = -20508, size = 56, fres_off = 20, fres_num = 5, info = 0 '\000', rep_size = 0 '\000', padding = 0}
/* -20564 + 0xffff80007b15a040 = 0xffff80007b154fec */
(gdb) info symbol 0xffff80007b154fec
No symbol matches 0xffff80007b154fec
```
On Mon, Feb 10, 2025 at 12:30 AM Weinan Liu <wnliu@google.com> wrote:
>> I already have a WIP patch to add sframe support to the kernel module.
>> However, it is not yet working. I had trouble unwinding frames for the
>> kernel module using the current algorithm.
>>
>> Indu has likely identified the issue and will be addressing it from the
>> toolchain side.
>>
>> https://sourceware.org/bugzilla/show_bug.cgi?id=32666
>
> I have a working in progress patch that adds sframe support for kernel
> module.
> https://github.com/heuza/linux/tree/sframe_unwinder.rfc
>
> According to the sframe table values I got during runtime testing, looks
> like the offsets are not correct .
>
I hope to sanitize the fix for 32666 and post upstream soon (I had to
address other related issues). Unless fixed, relocating .sframe
sections using the .rela.sframe is expected to generate incorrect output.
> When unwind symbols init_module(0xffff80007b155048) from the kernel
> module(livepatch-sample.ko), the start_address of the FDE entries in the
> sframe table of the kernel modules appear incorrect.
init_module will apply the relocations on the .sframe section, isnt it ?
> For instance, the first FDE's start_addr is reported as -20564. Adding
> this offset to the module's sframe section address (0xffff80007b15a040)
> yields 0xffff80007b154fec, which is not within the livepatch-sample.ko
> memory region(It should be larger than 0xffff80007b155000).
>
Hmm..something seems off here. Having tested a potential fix for 32666
locally, I do not expect the first FDE to show this symptom.
> Here are the sframe table values of the livepatch-samples.ko that I print
> by qemu + gdb.
>
> ```
> $ /usr/bin/aarch64-linux-gnu-objdump -L --sframe=.sframe ./samples/livepatch/livepatch-sample.ko
> ./samples/livepatch/livepatch-sample.ko: file format elf64-littleaarch64
>
> Contents of the SFrame section .sframe:
> Header :
>
> Version: SFRAME_VERSION_2
> Flags: SFRAME_F_FDE_SORTED
> Num FDEs: 3
> Num FREs: 11
>
> Function Index :
>
> func idx [0]: pc = 0x0, size = 12 bytes
> STARTPC CFA FP RA
> 0000000000000000 sp+0 u u
>
> func idx [1]: pc = 0x0, size = 44 bytes
> STARTPC CFA FP RA
> 0000000000000000 sp+0 u u
> 000000000000000c sp+0 u u[s]
> 0000000000000010 sp+16 c-16 c-8[s]
> 0000000000000024 sp+0 u u[s]
> 0000000000000028 sp+0 u u
>
> func idx [2]: pc = 0x0, size = 56 bytes
> STARTPC CFA FP RA
> 0000000000000000 sp+0 u u
> 000000000000000c sp+0 u u[s]
> 0000000000000010 sp+16 c-16 c-8[s]
> 0000000000000030 sp+0 u u[s]
> 0000000000000034 sp+0 u u
>
>
>
> (gdb) bt
> #0 find_fde (tbl=0xffff80007b157708, pc=18446603338286190664) at kernel/sframe_lookup.c:75
> #1 0xffff80008031e260 in sframe_find_pc (pc=18446603338286190664, entry=0xffff800086f83800) at kernel/sframe_lookup.c:175
> #2 0xffff800080035a48 in unwind_next_frame_sframe (state=0xffff800086f83828) at arch/arm64/kernel/stacktrace.c:270
> #3 kunwind_next (state=0xffff800086f83828) at arch/arm64/kernel/stacktrace.c:332
> ...
>
> (gdb) lx-symbols
> loading vmlinux
> scanning for modules in /home/wnliu/kernel
> loading @0xffff80007b155000: /home/wnliu/kernel/samples/livepatch/livepatch-sample.ko
> loading @0xffff80007b14d000: /home/wnliu/kernel/fs/fat/vfat.ko
> loading @0xffff80007b130000: /home/wnliu/kernel/fs/fat/fat.ko
>
> (gdb) p/x *tbl->sfhdr_p
> $5 = {preamble = {magic = 0xdee2, version = 0x2, flags = 0x1}, abi_arch = 0x2, cfa_fixed_fp_offset = 0x0, cfa_fixed_ra_offset = 0x0, auxhdr_len = 0x0, num_fdes = 0x3, num_fres = 0xb, fre_len = 0x25, fdes_off = 0x0, fres_off = 0x3c}
>
> (gdb) p/x tbl->sfhdr_p
> $6 = 0xffff80007b15a040
>
> (gdb) p *tbl->fde_p
> $7 = {start_addr = -20564, size = 12, fres_off = 0, fres_num = 1, info = 0 '\000', rep_size = 0 '\000', padding = 0}
>
> (gdb) p *(tbl->fde_p + 1)
> $11 = {start_addr = -20552, size = 44, fres_off = 3, fres_num = 5, info = 0 '\000', rep_size = 0 '\000', padding = 0}
>
> (gdb) p *(tbl->fde_p + 2)
> $12 = {start_addr = -20508, size = 56, fres_off = 20, fres_num = 5, info = 0 '\000', rep_size = 0 '\000', padding = 0}
>
> /* -20564 + 0xffff80007b15a040 = 0xffff80007b154fec */
> (gdb) info symbol 0xffff80007b154fec
> No symbol matches 0xffff80007b154fec
> ```
On Tue, Feb 25, 2025 at 11:38 AM Indu Bhagat <indu.bhagat@oracle.com> wrote: > > On Mon, Feb 10, 2025 at 12:30 AM Weinan Liu <wnliu@google.com> wrote: > >> I already have a WIP patch to add sframe support to the kernel module. > >> However, it is not yet working. I had trouble unwinding frames for the > >> kernel module using the current algorithm. > >> > >> Indu has likely identified the issue and will be addressing it from the > >> toolchain side. > >> > >> https://sourceware.org/bugzilla/show_bug.cgi?id=32666 > > > > I have a working in progress patch that adds sframe support for kernel > > module. > > https://github.com/heuza/linux/tree/sframe_unwinder.rfc > > > > According to the sframe table values I got during runtime testing, looks > > like the offsets are not correct . > > > > I hope to sanitize the fix for 32666 and post upstream soon (I had to > address other related issues). Unless fixed, relocating .sframe > sections using the .rela.sframe is expected to generate incorrect output. > > > When unwind symbols init_module(0xffff80007b155048) from the kernel > > module(livepatch-sample.ko), the start_address of the FDE entries in the > > sframe table of the kernel modules appear incorrect. > > init_module will apply the relocations on the .sframe section, isnt it ? > > > For instance, the first FDE's start_addr is reported as -20564. Adding > > this offset to the module's sframe section address (0xffff80007b15a040) > > yields 0xffff80007b154fec, which is not within the livepatch-sample.ko > > memory region(It should be larger than 0xffff80007b155000). > > > > Hmm..something seems off here. Having tested a potential fix for 32666 > locally, I do not expect the first FDE to show this symptom. > Yes, I think init_module will apply the relocation as well. To further investigate, here's the relevant relocation and symbol table information for the kernel module: Relocation section '.rela.sframe' at offset 0x28350 contains 3 entries: Offset Info Type Sym. Value Sym. Name + Addend 00000000001c 000100000105 R_AARCH64_PREL32 0000000000000000 .text + 8 000000000030 000100000105 R_AARCH64_PREL32 0000000000000000 .text + 28 000000000044 000100000105 R_AARCH64_PREL32 0000000000000000 .text + 68 Symbol table '.symtab' contains 68 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text ... 32: 0000000000000008 12 FUNC LOCAL DEFAULT 1 livepatch_exit 33: 0000000000000008 0 NOTYPE LOCAL DEFAULT 3 $d 34: 0000000000000028 44 FUNC LOCAL DEFAULT 1 livepatch_init 35: 0000000000000000 0 NOTYPE LOCAL DEFAULT 9 $d 36: 0000000000000010 0 NOTYPE LOCAL DEFAULT 3 $d 37: 0000000000000068 56 FUNC LOCAL DEFAULT 1 livepatch_cmdlin[...] ... 63: 0000000000000008 12 FUNC GLOBAL DEFAULT 1 cleanup_module 64: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND klp_enable_patch 65: 0000000000000028 44 FUNC GLOBAL DEFAULT 1 init_module
On 2/25/25 3:54 PM, Weinan Liu wrote: > On Tue, Feb 25, 2025 at 11:38 AM Indu Bhagat <indu.bhagat@oracle.com> wrote: >> >> On Mon, Feb 10, 2025 at 12:30 AM Weinan Liu <wnliu@google.com> wrote: >>>> I already have a WIP patch to add sframe support to the kernel module. >>>> However, it is not yet working. I had trouble unwinding frames for the >>>> kernel module using the current algorithm. >>>> >>>> Indu has likely identified the issue and will be addressing it from the >>>> toolchain side. >>>> >>>> https://sourceware.org/bugzilla/show_bug.cgi?id=32666 >>> >>> I have a working in progress patch that adds sframe support for kernel >>> module. >>> https://github.com/heuza/linux/tree/sframe_unwinder.rfc >>> >>> According to the sframe table values I got during runtime testing, looks >>> like the offsets are not correct . >>> >> >> I hope to sanitize the fix for 32666 and post upstream soon (I had to >> address other related issues). Unless fixed, relocating .sframe >> sections using the .rela.sframe is expected to generate incorrect output. >> >>> When unwind symbols init_module(0xffff80007b155048) from the kernel >>> module(livepatch-sample.ko), the start_address of the FDE entries in the >>> sframe table of the kernel modules appear incorrect. >> >> init_module will apply the relocations on the .sframe section, isnt it ? >> >>> For instance, the first FDE's start_addr is reported as -20564. Adding >>> this offset to the module's sframe section address (0xffff80007b15a040) >>> yields 0xffff80007b154fec, which is not within the livepatch-sample.ko >>> memory region(It should be larger than 0xffff80007b155000). >>> >> >> Hmm..something seems off here. Having tested a potential fix for 32666 >> locally, I do not expect the first FDE to show this symptom. >> > > Yes, I think init_module will apply the relocation as well. > To further investigate, here's the relevant relocation and symbol table > information for the kernel module: > > Relocation section '.rela.sframe' at offset 0x28350 contains 3 entries: > Offset Info Type Sym. Value Sym. Name + Addend > 00000000001c 000100000105 R_AARCH64_PREL32 0000000000000000 .text + 8 > 000000000030 000100000105 R_AARCH64_PREL32 0000000000000000 .text + 28 > 000000000044 000100000105 R_AARCH64_PREL32 0000000000000000 .text + 68 > The offsets look OK.. > Symbol table '.symtab' contains 68 entries: > Num: Value Size Type Bind Vis Ndx Name > 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND > 1: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text > ... > 32: 0000000000000008 12 FUNC LOCAL DEFAULT 1 livepatch_exit > 33: 0000000000000008 0 NOTYPE LOCAL DEFAULT 3 $d > 34: 0000000000000028 44 FUNC LOCAL DEFAULT 1 livepatch_init > 35: 0000000000000000 0 NOTYPE LOCAL DEFAULT 9 $d > 36: 0000000000000010 0 NOTYPE LOCAL DEFAULT 3 $d > 37: 0000000000000068 56 FUNC LOCAL DEFAULT 1 livepatch_cmdlin[...] > ... > 63: 0000000000000008 12 FUNC GLOBAL DEFAULT 1 cleanup_module > 64: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND klp_enable_patch > 65: 0000000000000028 44 FUNC GLOBAL DEFAULT 1 init_module
Indu Bhagat <indu.bhagat@oracle.com> writes: > On 2/25/25 3:54 PM, Weinan Liu wrote: >> On Tue, Feb 25, 2025 at 11:38 AM Indu Bhagat <indu.bhagat@oracle.com> wrote: >>> >>> On Mon, Feb 10, 2025 at 12:30 AM Weinan Liu <wnliu@google.com> wrote: >>>>> I already have a WIP patch to add sframe support to the kernel module. >>>>> However, it is not yet working. I had trouble unwinding frames for the >>>>> kernel module using the current algorithm. >>>>> >>>>> Indu has likely identified the issue and will be addressing it from the >>>>> toolchain side. >>>>> >>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=32666 >>>> >>>> I have a working in progress patch that adds sframe support for kernel >>>> module. >>>> https://github.com/heuza/linux/tree/sframe_unwinder.rfc >>>> >>>> According to the sframe table values I got during runtime testing, looks >>>> like the offsets are not correct . >>>> >>> >>> I hope to sanitize the fix for 32666 and post upstream soon (I had to >>> address other related issues). Unless fixed, relocating .sframe >>> sections using the .rela.sframe is expected to generate incorrect output. >>> >>>> When unwind symbols init_module(0xffff80007b155048) from the kernel >>>> module(livepatch-sample.ko), the start_address of the FDE entries in the >>>> sframe table of the kernel modules appear incorrect. >>> >>> init_module will apply the relocations on the .sframe section, isnt it ? >>> >>>> For instance, the first FDE's start_addr is reported as -20564. Adding >>>> this offset to the module's sframe section address (0xffff80007b15a040) >>>> yields 0xffff80007b154fec, which is not within the livepatch-sample.ko >>>> memory region(It should be larger than 0xffff80007b155000). >>>> >>> >>> Hmm..something seems off here. Having tested a potential fix for 32666 >>> locally, I do not expect the first FDE to show this symptom. >>> >> Hi, Sorry for not responding in the past few days. I was on PTO and was trying to improve my snowboarding technique, I am back now!! I think what we are seeing is expected behaviour: | For instance, the first FDE's start_addr is reported as -20564. Adding | this offset to the module's sframe section address (0xffff80007b15a040) | yields 0xffff80007b154fec, which is not within the livepatch-sample.ko | memory region(It should be larger than 0xffff80007b155000). Let me explain using a __dummy__ example. Assume Memory layout before relocation: | Address | Element | Relocation | .... | .... | | 60 | init_module (start address) | | 72 | init_module (end address) | | .... | ..... | | 100 | Sframe section header start address | | 128 | First FDE's start address | RELOC_OP_PREL -> Put init_module address (60) - current address (128) So, after relocation First FDE's start address has value 60 - 128 = -68 Now, while doing unwinding we Try to add this value to the sframe section header's start address which is in this example 100, so 100 + (-68) = 32 So, 32 is not within [60, 72], i.e. within init_module. You can see that it is possible for this value to be less than the start address of the module's memory region when this function's address is very close to the start of the memory region. The crux is that the offset in the FDE's start address is calculated based on the address of the FDE's start_address and not based on the address of the sframe section. Thanks, Puranjay
On 2/26/25 2:23 AM, Puranjay Mohan wrote: > Indu Bhagat <indu.bhagat@oracle.com> writes: > >> On 2/25/25 3:54 PM, Weinan Liu wrote: >>> On Tue, Feb 25, 2025 at 11:38 AM Indu Bhagat <indu.bhagat@oracle.com> wrote: >>>> >>>> On Mon, Feb 10, 2025 at 12:30 AM Weinan Liu <wnliu@google.com> wrote: >>>>>> I already have a WIP patch to add sframe support to the kernel module. >>>>>> However, it is not yet working. I had trouble unwinding frames for the >>>>>> kernel module using the current algorithm. >>>>>> >>>>>> Indu has likely identified the issue and will be addressing it from the >>>>>> toolchain side. >>>>>> >>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=32666 >>>>> >>>>> I have a working in progress patch that adds sframe support for kernel >>>>> module. >>>>> https://github.com/heuza/linux/tree/sframe_unwinder.rfc >>>>> >>>>> According to the sframe table values I got during runtime testing, looks >>>>> like the offsets are not correct . >>>>> >>>> >>>> I hope to sanitize the fix for 32666 and post upstream soon (I had to >>>> address other related issues). Unless fixed, relocating .sframe >>>> sections using the .rela.sframe is expected to generate incorrect output. >>>> >>>>> When unwind symbols init_module(0xffff80007b155048) from the kernel >>>>> module(livepatch-sample.ko), the start_address of the FDE entries in the >>>>> sframe table of the kernel modules appear incorrect. >>>> >>>> init_module will apply the relocations on the .sframe section, isnt it ? >>>> >>>>> For instance, the first FDE's start_addr is reported as -20564. Adding >>>>> this offset to the module's sframe section address (0xffff80007b15a040) >>>>> yields 0xffff80007b154fec, which is not within the livepatch-sample.ko >>>>> memory region(It should be larger than 0xffff80007b155000). >>>>> >>>> >>>> Hmm..something seems off here. Having tested a potential fix for 32666 >>>> locally, I do not expect the first FDE to show this symptom. >>>> >>> > > Hi, > > Sorry for not responding in the past few days. I was on PTO and was > trying to improve my snowboarding technique, I am back now!! > > I think what we are seeing is expected behaviour: > > | For instance, the first FDE's start_addr is reported as -20564. Adding > | this offset to the module's sframe section address (0xffff80007b15a040) > | yields 0xffff80007b154fec, which is not within the livepatch-sample.ko > | memory region(It should be larger than 0xffff80007b155000). > > > Let me explain using a __dummy__ example. > > Assume Memory layout before relocation: > > | Address | Element | Relocation > | .... | .... | > | 60 | init_module (start address) | > | 72 | init_module (end address) | > | .... | ..... | > | 100 | Sframe section header start address | > | 128 | First FDE's start address | RELOC_OP_PREL -> Put init_module address (60) - current address (128) > > So, after relocation First FDE's start address has value 60 - 128 = -68 > For SFrame FDE function start address is : "Signed 32-bit integral field denoting the virtual memory address of the described function, for which the SFrame FDE applies. The value encoded in the ‘sfde_func_start_address’ field is the offset in bytes of the function’s start address, from the SFrame section." So, in your case, after applying the relocations, you will get: S + A - P = 60 - 128 = -68 This is the distance of the function start address (60) from the current location in SFrame section (128) But what we intend to store is the distance of the function start address from the start of the SFrame section. So we need to do an additional step for SFrame FDE: Value += r_offset -68 + 28 = -40 Where 28 is the r_offset in the RELA. So we really expect a -40 in the relocated SFrame section instead of -68 above. IOW, the RELAs of SFrame section will need an additional step after relocation. > Now, while doing unwinding we Try to add this value to the sframe > section header's start address which is in this example 100, > > so 100 + (-68) = 32 > > So, 32 is not within [60, 72], i.e. within init_module. > > You can see that it is possible for this value to be less than the start > address of the module's memory region when this function's address is > very close to the start of the memory region. > > The crux is that the offset in the FDE's start address is calculated > based on the address of the FDE's start_address and not based on the > address of the sframe section. > > > Thanks, > Puranjay
Indu Bhagat <indu.bhagat@oracle.com> writes: > On 2/26/25 2:23 AM, Puranjay Mohan wrote: >> Indu Bhagat <indu.bhagat@oracle.com> writes: >> >>> On 2/25/25 3:54 PM, Weinan Liu wrote: >>>> On Tue, Feb 25, 2025 at 11:38 AM Indu Bhagat <indu.bhagat@oracle.com> wrote: >>>>> >>>>> On Mon, Feb 10, 2025 at 12:30 AM Weinan Liu <wnliu@google.com> wrote: >>>>>>> I already have a WIP patch to add sframe support to the kernel module. >>>>>>> However, it is not yet working. I had trouble unwinding frames for the >>>>>>> kernel module using the current algorithm. >>>>>>> >>>>>>> Indu has likely identified the issue and will be addressing it from the >>>>>>> toolchain side. >>>>>>> >>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=32666 >>>>>> >>>>>> I have a working in progress patch that adds sframe support for kernel >>>>>> module. >>>>>> https://github.com/heuza/linux/tree/sframe_unwinder.rfc >>>>>> >>>>>> According to the sframe table values I got during runtime testing, looks >>>>>> like the offsets are not correct . >>>>>> >>>>> >>>>> I hope to sanitize the fix for 32666 and post upstream soon (I had to >>>>> address other related issues). Unless fixed, relocating .sframe >>>>> sections using the .rela.sframe is expected to generate incorrect output. >>>>> >>>>>> When unwind symbols init_module(0xffff80007b155048) from the kernel >>>>>> module(livepatch-sample.ko), the start_address of the FDE entries in the >>>>>> sframe table of the kernel modules appear incorrect. >>>>> >>>>> init_module will apply the relocations on the .sframe section, isnt it ? >>>>> >>>>>> For instance, the first FDE's start_addr is reported as -20564. Adding >>>>>> this offset to the module's sframe section address (0xffff80007b15a040) >>>>>> yields 0xffff80007b154fec, which is not within the livepatch-sample.ko >>>>>> memory region(It should be larger than 0xffff80007b155000). >>>>>> >>>>> >>>>> Hmm..something seems off here. Having tested a potential fix for 32666 >>>>> locally, I do not expect the first FDE to show this symptom. >>>>> >>>> >> >> Hi, >> >> Sorry for not responding in the past few days. I was on PTO and was >> trying to improve my snowboarding technique, I am back now!! >> >> I think what we are seeing is expected behaviour: >> >> | For instance, the first FDE's start_addr is reported as -20564. Adding >> | this offset to the module's sframe section address (0xffff80007b15a040) >> | yields 0xffff80007b154fec, which is not within the livepatch-sample.ko >> | memory region(It should be larger than 0xffff80007b155000). >> >> >> Let me explain using a __dummy__ example. >> >> Assume Memory layout before relocation: >> >> | Address | Element | Relocation >> | .... | .... | >> | 60 | init_module (start address) | >> | 72 | init_module (end address) | >> | .... | ..... | >> | 100 | Sframe section header start address | >> | 128 | First FDE's start address | RELOC_OP_PREL -> Put init_module address (60) - current address (128) >> >> So, after relocation First FDE's start address has value 60 - 128 = -68 >> > > For SFrame FDE function start address is : > > "Signed 32-bit integral field denoting the virtual memory address of the > described function, for which the SFrame FDE applies. The value encoded > in the ‘sfde_func_start_address’ field is the offset in bytes of the > function’s start address, from the SFrame section." > > So, in your case, after applying the relocations, you will get: > S + A - P = 60 - 128 = -68 > > This is the distance of the function start address (60) from the current > location in SFrame section (128) > > But what we intend to store is the distance of the function start > address from the start of the SFrame section. So we need to do an > additional step for SFrame FDE: Value += r_offset Thanks for the explaination, now it makes sense. But I couldn't find a relocation type in AARCH64 that does this extra += r_offset along with PREL32. The kernel's module loader is only doing the R_AARCH64_PREL32 which is why we see this issue. How is this working even for the kernel itself? or for that matter, any other binary compiled with sframe? From my limited undestanding, the way to fix this would be to hack the relocator to do this additional step while relocating .sframe sections. Or the 'addend' values in .rela.sframe should already have the +r_offset added to it, then no change to the relocator would be needed. > -68 + 28 = -40 > Where 28 is the r_offset in the RELA. > > So we really expect a -40 in the relocated SFrame section instead of -68 > above. IOW, the RELAs of SFrame section will need an additional step > after relocation. > Thanks, Puranjay
On 2/27/25 1:38 AM, Puranjay Mohan wrote: > Indu Bhagat <indu.bhagat@oracle.com> writes: > >> On 2/26/25 2:23 AM, Puranjay Mohan wrote: >>> Indu Bhagat <indu.bhagat@oracle.com> writes: >>> >>>> On 2/25/25 3:54 PM, Weinan Liu wrote: >>>>> On Tue, Feb 25, 2025 at 11:38 AM Indu Bhagat <indu.bhagat@oracle.com> wrote: >>>>>> >>>>>> On Mon, Feb 10, 2025 at 12:30 AM Weinan Liu <wnliu@google.com> wrote: >>>>>>>> I already have a WIP patch to add sframe support to the kernel module. >>>>>>>> However, it is not yet working. I had trouble unwinding frames for the >>>>>>>> kernel module using the current algorithm. >>>>>>>> >>>>>>>> Indu has likely identified the issue and will be addressing it from the >>>>>>>> toolchain side. >>>>>>>> >>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=32666 >>>>>>> >>>>>>> I have a working in progress patch that adds sframe support for kernel >>>>>>> module. >>>>>>> https://github.com/heuza/linux/tree/sframe_unwinder.rfc >>>>>>> >>>>>>> According to the sframe table values I got during runtime testing, looks >>>>>>> like the offsets are not correct . >>>>>>> >>>>>> >>>>>> I hope to sanitize the fix for 32666 and post upstream soon (I had to >>>>>> address other related issues). Unless fixed, relocating .sframe >>>>>> sections using the .rela.sframe is expected to generate incorrect output. >>>>>> >>>>>>> When unwind symbols init_module(0xffff80007b155048) from the kernel >>>>>>> module(livepatch-sample.ko), the start_address of the FDE entries in the >>>>>>> sframe table of the kernel modules appear incorrect. >>>>>> >>>>>> init_module will apply the relocations on the .sframe section, isnt it ? >>>>>> >>>>>>> For instance, the first FDE's start_addr is reported as -20564. Adding >>>>>>> this offset to the module's sframe section address (0xffff80007b15a040) >>>>>>> yields 0xffff80007b154fec, which is not within the livepatch-sample.ko >>>>>>> memory region(It should be larger than 0xffff80007b155000). >>>>>>> >>>>>> >>>>>> Hmm..something seems off here. Having tested a potential fix for 32666 >>>>>> locally, I do not expect the first FDE to show this symptom. >>>>>> >>>>> >>> >>> Hi, >>> >>> Sorry for not responding in the past few days. I was on PTO and was >>> trying to improve my snowboarding technique, I am back now!! >>> >>> I think what we are seeing is expected behaviour: >>> >>> | For instance, the first FDE's start_addr is reported as -20564. Adding >>> | this offset to the module's sframe section address (0xffff80007b15a040) >>> | yields 0xffff80007b154fec, which is not within the livepatch-sample.ko >>> | memory region(It should be larger than 0xffff80007b155000). >>> >>> >>> Let me explain using a __dummy__ example. >>> >>> Assume Memory layout before relocation: >>> >>> | Address | Element | Relocation >>> | .... | .... | >>> | 60 | init_module (start address) | >>> | 72 | init_module (end address) | >>> | .... | ..... | >>> | 100 | Sframe section header start address | >>> | 128 | First FDE's start address | RELOC_OP_PREL -> Put init_module address (60) - current address (128) >>> >>> So, after relocation First FDE's start address has value 60 - 128 = -68 >>> >> >> For SFrame FDE function start address is : >> >> "Signed 32-bit integral field denoting the virtual memory address of the >> described function, for which the SFrame FDE applies. The value encoded >> in the ‘sfde_func_start_address’ field is the offset in bytes of the >> function’s start address, from the SFrame section." >> >> So, in your case, after applying the relocations, you will get: >> S + A - P = 60 - 128 = -68 >> >> This is the distance of the function start address (60) from the current >> location in SFrame section (128) >> >> But what we intend to store is the distance of the function start >> address from the start of the SFrame section. So we need to do an >> additional step for SFrame FDE: Value += r_offset > > Thanks for the explaination, now it makes sense. > > But I couldn't find a relocation type in AARCH64 that does this extra += > r_offset along with PREL32. > > The kernel's module loader is only doing the R_AARCH64_PREL32 which is > why we see this issue. > > How is this working even for the kernel itself? or for that matter, any > other binary compiled with sframe? > For the usual executables or shared objects, the calculations are applied by ld.bfd at this time. Hence, the issue manifests in relocatable files. > From my limited undestanding, the way to fix this would be to hack the > relocator to do this additional step while relocating .sframe sections. > Or the 'addend' values in .rela.sframe should already have the +r_offset > added to it, then no change to the relocator would be needed. > Of the two, adjusting the addend values in .rela.sframe may be a reasonable way to go about it. Let me try it out in GAS and ld.bfd. >> -68 + 28 = -40 >> Where 28 is the r_offset in the RELA. >> >> So we really expect a -40 in the relocated SFrame section instead of -68 >> above. IOW, the RELAs of SFrame section will need an additional step >> after relocation. >> > > Thanks, > Puranjay
On 2/27/25 10:47 PM, Indu Bhagat wrote: > On 2/27/25 1:38 AM, Puranjay Mohan wrote: >> Indu Bhagat <indu.bhagat@oracle.com> writes: >> >>> On 2/26/25 2:23 AM, Puranjay Mohan wrote: >>>> Indu Bhagat <indu.bhagat@oracle.com> writes: >>>> >>>>> On 2/25/25 3:54 PM, Weinan Liu wrote: >>>>>> On Tue, Feb 25, 2025 at 11:38 AM Indu Bhagat >>>>>> <indu.bhagat@oracle.com> wrote: >>>>>>> >>>>>>> On Mon, Feb 10, 2025 at 12:30 AM Weinan Liu <wnliu@google.com> >>>>>>> wrote: >>>>>>>>> I already have a WIP patch to add sframe support to the kernel >>>>>>>>> module. >>>>>>>>> However, it is not yet working. I had trouble unwinding frames >>>>>>>>> for the >>>>>>>>> kernel module using the current algorithm. >>>>>>>>> >>>>>>>>> Indu has likely identified the issue and will be addressing it >>>>>>>>> from the >>>>>>>>> toolchain side. >>>>>>>>> >>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=32666 >>>>>>>> >>>>>>>> I have a working in progress patch that adds sframe support for >>>>>>>> kernel >>>>>>>> module. >>>>>>>> https://github.com/heuza/linux/tree/sframe_unwinder.rfc >>>>>>>> >>>>>>>> According to the sframe table values I got during runtime >>>>>>>> testing, looks >>>>>>>> like the offsets are not correct . >>>>>>>> >>>>>>> >>>>>>> I hope to sanitize the fix for 32666 and post upstream soon (I >>>>>>> had to >>>>>>> address other related issues). Unless fixed, relocating .sframe >>>>>>> sections using the .rela.sframe is expected to generate incorrect >>>>>>> output. >>>>>>> >>>>>>>> When unwind symbols init_module(0xffff80007b155048) from the kernel >>>>>>>> module(livepatch-sample.ko), the start_address of the FDE >>>>>>>> entries in the >>>>>>>> sframe table of the kernel modules appear incorrect. >>>>>>> >>>>>>> init_module will apply the relocations on the .sframe section, >>>>>>> isnt it ? >>>>>>> >>>>>>>> For instance, the first FDE's start_addr is reported as -20564. >>>>>>>> Adding >>>>>>>> this offset to the module's sframe section address >>>>>>>> (0xffff80007b15a040) >>>>>>>> yields 0xffff80007b154fec, which is not within the livepatch- >>>>>>>> sample.ko >>>>>>>> memory region(It should be larger than 0xffff80007b155000). >>>>>>>> >>>>>>> >>>>>>> Hmm..something seems off here. Having tested a potential fix for >>>>>>> 32666 >>>>>>> locally, I do not expect the first FDE to show this symptom. >>>>>>> >>>>>> >>>> >>>> Hi, >>>> >>>> Sorry for not responding in the past few days. I was on PTO and was >>>> trying to improve my snowboarding technique, I am back now!! >>>> >>>> I think what we are seeing is expected behaviour: >>>> >>>> | For instance, the first FDE's start_addr is reported as -20564. >>>> Adding >>>> | this offset to the module's sframe section address >>>> (0xffff80007b15a040) >>>> | yields 0xffff80007b154fec, which is not within the livepatch- >>>> sample.ko >>>> | memory region(It should be larger than 0xffff80007b155000). >>>> >>>> >>>> Let me explain using a __dummy__ example. >>>> >>>> Assume Memory layout before relocation: >>>> >>>> | Address | Element | Relocation >>>> | .... | .... | >>>> | 60 | init_module (start address) | >>>> | 72 | init_module (end address) | >>>> | .... | ..... | >>>> | 100 | Sframe section header start address | >>>> | 128 | First FDE's start address | >>>> RELOC_OP_PREL -> Put init_module address (60) - current address (128) >>>> >>>> So, after relocation First FDE's start address has value 60 - 128 = -68 >>>> >>> >>> For SFrame FDE function start address is : >>> >>> "Signed 32-bit integral field denoting the virtual memory address of the >>> described function, for which the SFrame FDE applies. The value encoded >>> in the ‘sfde_func_start_address’ field is the offset in bytes of the >>> function’s start address, from the SFrame section." >>> >>> So, in your case, after applying the relocations, you will get: >>> S + A - P = 60 - 128 = -68 >>> >>> This is the distance of the function start address (60) from the current >>> location in SFrame section (128) >>> >>> But what we intend to store is the distance of the function start >>> address from the start of the SFrame section. So we need to do an >>> additional step for SFrame FDE: Value += r_offset >> >> Thanks for the explaination, now it makes sense. >> >> But I couldn't find a relocation type in AARCH64 that does this extra += >> r_offset along with PREL32. >> >> The kernel's module loader is only doing the R_AARCH64_PREL32 which is >> why we see this issue. >> >> How is this working even for the kernel itself? or for that matter, any >> other binary compiled with sframe? >> > > For the usual executables or shared objects, the calculations are > applied by ld.bfd at this time. Hence, the issue manifests in > relocatable files. > >> From my limited undestanding, the way to fix this would be to hack the >> relocator to do this additional step while relocating .sframe sections. >> Or the 'addend' values in .rela.sframe should already have the +r_offset >> added to it, then no change to the relocator would be needed. >> > > Of the two, adjusting the addend values in .rela.sframe may be a > reasonable way to go about it. Let me try it out in GAS and ld.bfd. > A fix for this is in the works (being discussed on the binutils@sourceware list). I will keep you posted. Thanks Indu
On Tue, Feb 25, 2025 at 01:02:24AM +0000, Weinan Liu wrote: > On Mon, Feb 10, 2025 at 12:30 AM Weinan Liu <wnliu@google.com> wrote: > > I already have a WIP patch to add sframe support to the kernel module. > > However, it is not yet working. I had trouble unwinding frames for the > > kernel module using the current algorithm. > > > > Indu has likely identified the issue and will be addressing it from the > > toolchain side. > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=32666 > > > I have a working in progress patch that adds sframe support for kernel > module. > https://github.com/heuza/linux/tree/sframe_unwinder.rfc > > According to the sframe table values I got during runtime testing, looks > like the offsets are not correct . > > When unwind symbols init_module(0xffff80007b155048) from the kernel > module(livepatch-sample.ko), the start_address of the FDE entries in the > sframe table of the kernel modules appear incorrect. > For instance, the first FDE's start_addr is reported as -20564. Adding > this offset to the module's sframe section address (0xffff80007b15a040) > yields 0xffff80007b154fec, which is not within the livepatch-sample.ko > memory region(It should be larger than 0xffff80007b155000). I assume kpatch create-diff-object needs to copy over a subset of the .sframe section. Similar to what kpatch_regenerate_orc_sections() does. -- Josh
On Tue, Feb 25, 2025 at 10:13 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > On Tue, Feb 25, 2025 at 01:02:24AM +0000, Weinan Liu wrote: > > On Mon, Feb 10, 2025 at 12:30 AM Weinan Liu <wnliu@google.com> wrote: > > > I already have a WIP patch to add sframe support to the kernel module. > > > However, it is not yet working. I had trouble unwinding frames for the > > > kernel module using the current algorithm. > > > > > > Indu has likely identified the issue and will be addressing it from the > > > toolchain side. > > > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=32666 > > > > > > I have a working in progress patch that adds sframe support for kernel > > module. > > https://github.com/heuza/linux/tree/sframe_unwinder.rfc > > > > According to the sframe table values I got during runtime testing, looks > > like the offsets are not correct . > > > > When unwind symbols init_module(0xffff80007b155048) from the kernel > > module(livepatch-sample.ko), the start_address of the FDE entries in the > > sframe table of the kernel modules appear incorrect. > > For instance, the first FDE's start_addr is reported as -20564. Adding > > this offset to the module's sframe section address (0xffff80007b15a040) > > yields 0xffff80007b154fec, which is not within the livepatch-sample.ko > > memory region(It should be larger than 0xffff80007b155000). > > I assume kpatch create-diff-object needs to copy over a subset of the > .sframe section. Similar to what kpatch_regenerate_orc_sections() does. You're right that we need to process the sframe section like what kpatch_regenerate_orc_sections() does when building livepatch by kpatch. However, livepatch-sample.ko is not generated by kpatch. It is built directly from samples/livepatch/livepatch-sample.c by gcc during the kernel build
On Fri, Feb 07, 2025 at 12:16:29PM +0000, Puranjay Mohan wrote: > Weinan Liu <wnliu@google.com> writes: > > Thank you for reporting this issue. > > I just found out that Josh also intentionally uses '>' instead of '>=' for the same reason > > https://lore.kernel.org/lkml/20250122225257.h64ftfnorofe7cb4@jpoimboe/T/#m6d70a20ed9f5b3bbe5b24b24b8c5dcc603a79101 > > > > QQ, do we need to care the stacktrace after '__noreturn' function? > > Yes, I think we should, but others people could add more to this. FYI, here's how ORC handles this: /* * Find the orc_entry associated with the text address. * * For a call frame (as opposed to a signal frame), state->ip points to * the instruction after the call. That instruction's stack layout * could be different from the call instruction's layout, for example * if the call was to a noreturn function. So get the ORC data for the * call instruction itself. */ orc = orc_find(state->signal ? state->ip : state->ip - 1); and state->signal is only set for exceptions/interrupts (including preemption and page faults) and newly forked tasks which haven't run yet. -- Josh
I missed this set before sending my RFC set. If this set works well, we won't need the other set. I will give this one a try. Thanks, Song On Mon, Jan 27, 2025 at 1:33 PM Weinan Liu <wnliu@google.com> wrote: > > This patchset implements a generic kernel sframe-based [1] unwinder. > The main goal is to support reliable stacktraces on arm64. > > On x86 orc unwinder provides reliable stacktraces. But arm64 misses the > required support from objtool: it cannot generate orc unwind tables for > arm64. > > Currently, there's already a sframe unwinder proposed for userspace: [2]. > Since the sframe unwind table algorithm is similar, these two proposal > could integrate common functionality in the future. > > There are some incomplete features or challenges: > - The unwinder doesn't yet work with kernel modules. The `start_addr` of > FRE from kernel modules doesn't appear correct, preventing us from > unwinding functions from kernel modules. > - Currently, only GCC supports sframe. > > Ref: > [1]: https://sourceware.org/binutils/docs/sframe-spec.html > [2]: https://lore.kernel.org/lkml/cover.1730150953.git.jpoimboe@kernel.org/ > > Madhavan T. Venkataraman (1): > arm64: Define TIF_PATCH_PENDING for livepatch > > Weinan Liu (7): > unwind: build kernel with sframe info > arm64: entry: add unwind info for various kernel entries > unwind: add sframe v2 header > unwind: Implement generic sframe unwinder library > unwind: arm64: Add sframe unwinder on arm64 > unwind: arm64: add reliable stacktrace support for arm64 > arm64: Enable livepatch for ARM64 > > Makefile | 6 + > arch/Kconfig | 8 + > arch/arm64/Kconfig | 3 + > arch/arm64/Kconfig.debug | 10 + > arch/arm64/include/asm/stacktrace/common.h | 6 + > arch/arm64/include/asm/thread_info.h | 4 +- > arch/arm64/kernel/entry-common.c | 4 + > arch/arm64/kernel/entry.S | 10 + > arch/arm64/kernel/setup.c | 2 + > arch/arm64/kernel/stacktrace.c | 102 ++++++++++ > include/asm-generic/vmlinux.lds.h | 12 ++ > include/linux/sframe_lookup.h | 43 +++++ > kernel/Makefile | 1 + > kernel/sframe.h | 215 +++++++++++++++++++++ > kernel/sframe_lookup.c | 196 +++++++++++++++++++ > 15 files changed, 621 insertions(+), 1 deletion(-) > create mode 100644 include/linux/sframe_lookup.h > create mode 100644 kernel/sframe.h > create mode 100644 kernel/sframe_lookup.c > > -- > 2.48.1.262.g85cc9f2d1e-goog > >
On Thu, Jan 30, 2025 at 9:59 AM Song Liu <song@kernel.org> wrote: > > I missed this set before sending my RFC set. If this set works well, we > won't need the other set. I will give this one a try. I just realized that llvm doesn't support sframe yet. So we (Meta) still need some sframe-less approach before llvm supports sframe. IIRC, Google also uses llvm to compile the kernel. Weinan, would you mind share your thoughts on how we can adopt this before llvm supports sframe? (compile arm64 kernel with gcc?) Thanks, Song
On Thu, Jan 30, 2025 at 10:34:09AM -0800, Song Liu wrote: > On Thu, Jan 30, 2025 at 9:59 AM Song Liu <song@kernel.org> wrote: > > > > I missed this set before sending my RFC set. If this set works well, we > > won't need the other set. I will give this one a try. > > I just realized that llvm doesn't support sframe yet. So we (Meta) still > need some sframe-less approach before llvm supports sframe. > > IIRC, Google also uses llvm to compile the kernel. Weinan, would > you mind share your thoughts on how we can adopt this before > llvm supports sframe? (compile arm64 kernel with gcc?) Hi Song, the plan is to start the work on adding sframe support to llvm in parallel to landing these changes upstream. From the initial assessment it shouldn't be too hard. Thanks
Hi Roman, On Thu, Jan 30, 2025 at 11:01 AM Roman Gushchin <roman.gushchin@linux.dev> wrote: > > On Thu, Jan 30, 2025 at 10:34:09AM -0800, Song Liu wrote: > > On Thu, Jan 30, 2025 at 9:59 AM Song Liu <song@kernel.org> wrote: > > > > > > I missed this set before sending my RFC set. If this set works well, we > > > won't need the other set. I will give this one a try. > > > > I just realized that llvm doesn't support sframe yet. So we (Meta) still > > need some sframe-less approach before llvm supports sframe. > > > > IIRC, Google also uses llvm to compile the kernel. Weinan, would > > you mind share your thoughts on how we can adopt this before > > llvm supports sframe? (compile arm64 kernel with gcc?) > > Hi Song, > > the plan is to start the work on adding sframe support to llvm > in parallel to landing these changes upstream. From the initial > assessment it shouldn't be too hard. Thanks for the information! Song
I run some tests with this set and my RFC set [1]. Most of the test is done with kpatch-build. I tested both Puranjay's version [3] and my version [4]. For gcc 14.2.1, I have seen the following issue with this test [2]. This happens with both upstream and 6.13.2. The livepatch loaded fine, but the system spilled out the following warning quickly. On the other hand, the same test works with LLVM and my RFC set (LLVM doesn't support SFRAME, and thus doesn't work with this set yet). Thanks, Song [ 81.250437] ------------[ cut here ]------------ [ 81.250818] refcount_t: saturated; leaking memory. [ 81.251201] WARNING: CPU: 0 PID: 95 at lib/refcount.c:22 refcount_warn_saturate+0x6c/0x140 [ 81.251841] Modules linked in: livepatch_special_static(OEK) [ 81.252277] CPU: 0 UID: 0 PID: 95 Comm: bash Tainted: G OE K 6.13.2-00321-g52d2813b4b07 #49 [ 81.253003] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [K]=LIVEPATCH [ 81.253503] Hardware name: linux,dummy-virt (DT) [ 81.253856] pstate: 634000c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 81.254383] pc : refcount_warn_saturate+0x6c/0x140 [ 81.254748] lr : refcount_warn_saturate+0x6c/0x140 [ 81.255114] sp : ffff800085a6fc00 [ 81.255371] x29: ffff800085a6fc00 x28: 0000000001200000 x27: ffff0000c2966180 [ 81.255918] x26: 0000000000000000 x25: ffff8000829c0000 x24: ffff0000c2e9b608 [ 81.256462] x23: ffff800083351000 x22: ffff0000c2e9af80 x21: ffff0000c062e140 [ 81.257006] x20: ffff0000c1c10c00 x19: ffff800085a6fd80 x18: ffffffffffffffff [ 81.257544] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000006 [ 81.258083] x14: 0000000000000000 x13: 2e79726f6d656d20 x12: 676e696b61656c20 [ 81.258625] x11: ffff8000829f7d70 x10: 0000000000000147 x9 : ffff8000801546b4 [ 81.259165] x8 : 00000000fffeffff x7 : 00000000ffff0000 x6 : ffff800082f77d70 [ 81.259709] x5 : 80000000ffff0000 x4 : 0000000000000000 x3 : 0000000000000001 [ 81.260257] x2 : ffff8000829f7a88 x1 : ffff8000829f7a88 x0 : 0000000000000026 [ 81.260824] Call trace: [ 81.261015] refcount_warn_saturate+0x6c/0x140 (P) [ 81.261387] __refcount_add.constprop.0+0x60/0x70 [ 81.261748] copy_process+0xfdc/0xfd58 [livepatch_special_static] [ 81.262217] kernel_clone+0x80/0x3e0 [ 81.262499] __do_sys_clone+0x5c/0x88 [ 81.262786] __arm64_sys_clone+0x24/0x38 [ 81.263085] invoke_syscall+0x4c/0x108 [ 81.263378] el0_svc_common.constprop.0+0x44/0xe8 [ 81.263734] do_el0_svc+0x20/0x30 [ 81.263993] el0_svc+0x34/0xf8 [ 81.264231] el0t_64_sync_handler+0x104/0x130 [ 81.264561] el0t_64_sync+0x184/0x188 [ 81.264846] ---[ end trace 0000000000000000 ]--- [ 82.335559] ------------[ cut here ]------------ [ 82.335931] refcount_t: underflow; use-after-free. [ 82.336307] WARNING: CPU: 1 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0xec/0x140 [ 82.336949] Modules linked in: livepatch_special_static(OEK) [ 82.337389] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G W OE K 6.13.2-00321-g52d2813b4b07 #49 [ 82.338148] Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [K]=LIVEPATCH [ 82.338721] Hardware name: linux,dummy-virt (DT) [ 82.339083] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 82.339617] pc : refcount_warn_saturate+0xec/0x140 [ 82.340007] lr : refcount_warn_saturate+0xec/0x140 [ 82.340378] sp : ffff80008370fe40 [ 82.340637] x29: ffff80008370fe40 x28: 0000000000000000 x27: 0000000000000000 [ 82.341188] x26: 000000000000000a x25: ffff0000fdaf7ab8 x24: 0000000000000014 [ 82.341737] x23: ffff8000829c8d30 x22: ffff80008370ff28 x21: ffff0000fe020000 [ 82.342286] x20: ffff0000c062e140 x19: ffff0000c2e9af80 x18: ffffffffffffffff [ 82.342839] x17: ffff80007b7a0000 x16: ffff800083700000 x15: 0000000000000006 [ 82.343389] x14: 0000000000000000 x13: 2e656572662d7265 x12: 7466612d65737520 [ 82.343944] x11: ffff8000829f7d70 x10: 000000000000016a x9 : ffff8000801546b4 [ 82.344499] x8 : 00000000fffeffff x7 : 00000000ffff0000 x6 : ffff800082f77d70 [ 82.345051] x5 : 80000000ffff0000 x4 : 0000000000000000 x3 : 0000000000000001 [ 82.345604] x2 : ffff8000829f7a88 x1 : ffff8000829f7a88 x0 : 0000000000000026 [ 82.346163] Call trace: [ 82.346359] refcount_warn_saturate+0xec/0x140 (P) [ 82.346736] __put_task_struct+0x130/0x170 [ 82.347063] delayed_put_task_struct+0xbc/0xe8 [ 82.347411] rcu_core+0x20c/0x5f8 [ 82.347680] rcu_core_si+0x14/0x28 [ 82.347952] handle_softirqs+0x124/0x308 [ 82.348260] __do_softirq+0x18/0x20 [ 82.348536] ____do_softirq+0x14/0x28 [ 82.348828] call_on_irq_stack+0x24/0x30 [ 82.349137] do_softirq_own_stack+0x20/0x38 [ 82.349465] __irq_exit_rcu+0xcc/0x108 [ 82.349764] irq_exit_rcu+0x14/0x28 [ 82.350038] el1_interrupt+0x34/0x50 [ 82.350321] el1h_64_irq_handler+0x14/0x20 [ 82.350642] el1h_64_irq+0x6c/0x70 [ 82.350911] default_idle_call+0x30/0xd0 (P) [ 82.351248] do_idle+0x1d0/0x200 [ 82.351506] cpu_startup_entry+0x38/0x48 [ 82.351818] secondary_start_kernel+0x124/0x150 [ 82.352176] __secondary_switched+0xac/0xb0 [ 82.352505] ---[ end trace 0000000000000000 ]--- [1] SFRAME-less livepatch RFC https://lore.kernel.org/live-patching/20250129232936.1795412-1-song@kernel.org/ [2] special-static test from kpatch https://github.com/dynup/kpatch/blob/master/test/integration/linux-6.2.0/special-static.patch [3] Puranjay's kpatch with arm64 support https://github.com/puranjaymohan/kpatch/tree/arm64 [4] My version of kpatch with arm64 and LTO support https://github.com/liu-song-6/kpatch/tree/fb-6.13-v2 On Mon, Jan 27, 2025 at 1:33 PM Weinan Liu <wnliu@google.com> wrote: > > This patchset implements a generic kernel sframe-based [1] unwinder. > The main goal is to support reliable stacktraces on arm64. > > On x86 orc unwinder provides reliable stacktraces. But arm64 misses the > required support from objtool: it cannot generate orc unwind tables for > arm64. > > Currently, there's already a sframe unwinder proposed for userspace: [2]. > Since the sframe unwind table algorithm is similar, these two proposal > could integrate common functionality in the future. > > There are some incomplete features or challenges: > - The unwinder doesn't yet work with kernel modules. The `start_addr` of > FRE from kernel modules doesn't appear correct, preventing us from > unwinding functions from kernel modules. > - Currently, only GCC supports sframe. > > Ref: > [1]: https://sourceware.org/binutils/docs/sframe-spec.html > [2]: https://lore.kernel.org/lkml/cover.1730150953.git.jpoimboe@kernel.org/ > > Madhavan T. Venkataraman (1): > arm64: Define TIF_PATCH_PENDING for livepatch > > Weinan Liu (7): > unwind: build kernel with sframe info > arm64: entry: add unwind info for various kernel entries > unwind: add sframe v2 header > unwind: Implement generic sframe unwinder library > unwind: arm64: Add sframe unwinder on arm64 > unwind: arm64: add reliable stacktrace support for arm64 > arm64: Enable livepatch for ARM64 > > Makefile | 6 + > arch/Kconfig | 8 + > arch/arm64/Kconfig | 3 + > arch/arm64/Kconfig.debug | 10 + > arch/arm64/include/asm/stacktrace/common.h | 6 + > arch/arm64/include/asm/thread_info.h | 4 +- > arch/arm64/kernel/entry-common.c | 4 + > arch/arm64/kernel/entry.S | 10 + > arch/arm64/kernel/setup.c | 2 + > arch/arm64/kernel/stacktrace.c | 102 ++++++++++ > include/asm-generic/vmlinux.lds.h | 12 ++ > include/linux/sframe_lookup.h | 43 +++++ > kernel/Makefile | 1 + > kernel/sframe.h | 215 +++++++++++++++++++++ > kernel/sframe_lookup.c | 196 +++++++++++++++++++ > 15 files changed, 621 insertions(+), 1 deletion(-) > create mode 100644 include/linux/sframe_lookup.h > create mode 100644 kernel/sframe.h > create mode 100644 kernel/sframe_lookup.c > > -- > 2.48.1.262.g85cc9f2d1e-goog > >
On Wed, Feb 12, 2025 at 03:32:40PM -0800, Song Liu wrote: > [ 81.250437] ------------[ cut here ]------------ > [ 81.250818] refcount_t: saturated; leaking memory. > [ 81.251201] WARNING: CPU: 0 PID: 95 at lib/refcount.c:22 > refcount_warn_saturate+0x6c/0x140 > [ 81.251841] Modules linked in: livepatch_special_static(OEK) > [ 81.252277] CPU: 0 UID: 0 PID: 95 Comm: bash Tainted: G > OE K 6.13.2-00321-g52d2813b4b07 #49 > [ 81.253003] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [K]=LIVEPATCH > [ 81.253503] Hardware name: linux,dummy-virt (DT) > [ 81.253856] pstate: 634000c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--) > [ 81.254383] pc : refcount_warn_saturate+0x6c/0x140 > [ 81.254748] lr : refcount_warn_saturate+0x6c/0x140 > [ 81.255114] sp : ffff800085a6fc00 > [ 81.255371] x29: ffff800085a6fc00 x28: 0000000001200000 x27: ffff0000c2966180 > [ 81.255918] x26: 0000000000000000 x25: ffff8000829c0000 x24: ffff0000c2e9b608 > [ 81.256462] x23: ffff800083351000 x22: ffff0000c2e9af80 x21: ffff0000c062e140 > [ 81.257006] x20: ffff0000c1c10c00 x19: ffff800085a6fd80 x18: ffffffffffffffff > [ 81.257544] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000006 > [ 81.258083] x14: 0000000000000000 x13: 2e79726f6d656d20 x12: 676e696b61656c20 > [ 81.258625] x11: ffff8000829f7d70 x10: 0000000000000147 x9 : ffff8000801546b4 > [ 81.259165] x8 : 00000000fffeffff x7 : 00000000ffff0000 x6 : ffff800082f77d70 > [ 81.259709] x5 : 80000000ffff0000 x4 : 0000000000000000 x3 : 0000000000000001 > [ 81.260257] x2 : ffff8000829f7a88 x1 : ffff8000829f7a88 x0 : 0000000000000026 > [ 81.260824] Call trace: > [ 81.261015] refcount_warn_saturate+0x6c/0x140 (P) > [ 81.261387] __refcount_add.constprop.0+0x60/0x70 > [ 81.261748] copy_process+0xfdc/0xfd58 [livepatch_special_static] Does that copy_process+0xfdc/0xfd58 resolve to this line in copy_process()? refcount_inc(¤t->signal->sigcnt); Maybe the klp rela reference to 'current' is bogus, or resolving to the wrong address somehow? -- Josh
On Wed, Feb 12, 2025 at 3:49 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > On Wed, Feb 12, 2025 at 03:32:40PM -0800, Song Liu wrote: > > [ 81.250437] ------------[ cut here ]------------ > > [ 81.250818] refcount_t: saturated; leaking memory. > > [ 81.251201] WARNING: CPU: 0 PID: 95 at lib/refcount.c:22 > > refcount_warn_saturate+0x6c/0x140 > > [ 81.251841] Modules linked in: livepatch_special_static(OEK) > > [ 81.252277] CPU: 0 UID: 0 PID: 95 Comm: bash Tainted: G > > OE K 6.13.2-00321-g52d2813b4b07 #49 > > [ 81.253003] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [K]=LIVEPATCH > > [ 81.253503] Hardware name: linux,dummy-virt (DT) > > [ 81.253856] pstate: 634000c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--) > > [ 81.254383] pc : refcount_warn_saturate+0x6c/0x140 > > [ 81.254748] lr : refcount_warn_saturate+0x6c/0x140 > > [ 81.255114] sp : ffff800085a6fc00 > > [ 81.255371] x29: ffff800085a6fc00 x28: 0000000001200000 x27: ffff0000c2966180 > > [ 81.255918] x26: 0000000000000000 x25: ffff8000829c0000 x24: ffff0000c2e9b608 > > [ 81.256462] x23: ffff800083351000 x22: ffff0000c2e9af80 x21: ffff0000c062e140 > > [ 81.257006] x20: ffff0000c1c10c00 x19: ffff800085a6fd80 x18: ffffffffffffffff > > [ 81.257544] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000006 > > [ 81.258083] x14: 0000000000000000 x13: 2e79726f6d656d20 x12: 676e696b61656c20 > > [ 81.258625] x11: ffff8000829f7d70 x10: 0000000000000147 x9 : ffff8000801546b4 > > [ 81.259165] x8 : 00000000fffeffff x7 : 00000000ffff0000 x6 : ffff800082f77d70 > > [ 81.259709] x5 : 80000000ffff0000 x4 : 0000000000000000 x3 : 0000000000000001 > > [ 81.260257] x2 : ffff8000829f7a88 x1 : ffff8000829f7a88 x0 : 0000000000000026 > > [ 81.260824] Call trace: > > [ 81.261015] refcount_warn_saturate+0x6c/0x140 (P) > > [ 81.261387] __refcount_add.constprop.0+0x60/0x70 > > [ 81.261748] copy_process+0xfdc/0xfd58 [livepatch_special_static] > > Does that copy_process+0xfdc/0xfd58 resolve to this line in > copy_process()? > > refcount_inc(¤t->signal->sigcnt); > > Maybe the klp rela reference to 'current' is bogus, or resolving to the > wrong address somehow? It resolves the following line. p->signal->tty = tty_kref_get(current->signal->tty); I am not quite sure how 'current' should be resolved. The size of copy_process (0xfd58) is wrong. It is only about 5.5kB in size. Also, the copy_process function in the .ko file looks very broken. I will try a few more things. Thanks, Song
On Wed, Feb 12, 2025 at 06:36:04PM -0800, Song Liu wrote: > > > [ 81.261748] copy_process+0xfdc/0xfd58 [livepatch_special_static] > > > > Does that copy_process+0xfdc/0xfd58 resolve to this line in > > copy_process()? > > > > refcount_inc(¤t->signal->sigcnt); > > > > Maybe the klp rela reference to 'current' is bogus, or resolving to the > > wrong address somehow? > > It resolves the following line. > > p->signal->tty = tty_kref_get(current->signal->tty); > > I am not quite sure how 'current' should be resolved. Hm, on arm64 it looks like the value of 'current' is stored in the SP_EL0 register. So I guess that shouldn't need any relocations. > The size of copy_process (0xfd58) is wrong. It is only about > 5.5kB in size. Also, the copy_process function in the .ko file > looks very broken. I will try a few more things. Ah ok, sounds like it's pretty borked. -- Josh
On Wed, Feb 12, 2025 at 6:45 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Wed, Feb 12, 2025 at 06:36:04PM -0800, Song Liu wrote:
> > > > [ 81.261748] copy_process+0xfdc/0xfd58 [livepatch_special_static]
> > >
> > > Does that copy_process+0xfdc/0xfd58 resolve to this line in
> > > copy_process()?
> > >
> > > refcount_inc(¤t->signal->sigcnt);
> > >
> > > Maybe the klp rela reference to 'current' is bogus, or resolving to the
> > > wrong address somehow?
> >
> > It resolves the following line.
> >
> > p->signal->tty = tty_kref_get(current->signal->tty);
> >
> > I am not quite sure how 'current' should be resolved.
>
> Hm, on arm64 it looks like the value of 'current' is stored in the
> SP_EL0 register. So I guess that shouldn't need any relocations.
>
> > The size of copy_process (0xfd58) is wrong. It is only about
> > 5.5kB in size. Also, the copy_process function in the .ko file
> > looks very broken. I will try a few more things.
When I try each step of kpatch-build, the copy_process function
looks reasonable (according to gdb-disassemble) in fork.o and
output.o. However, copy_process looks weird in livepatch-special-static.o,
which is generated by ld:
ld -EL -maarch64linux -z norelro -z noexecstack
--no-warn-rwx-segments -T ././kpatch.lds -r -o
livepatch-special-static.o ./patch-hook.o ./output.o
I have attached these files to the email. I am not sure whether
the email server will let them through.
Indu, does this look like an issue with ld?
Thanks,
Song
ELF � � @ @ 2 1 � � � ՠ ��{��� ��S�� �@�� �`@� � � �@�`������ ������ ��SA��{¨�_��_� � � � ��{�� �� ��S� @��[� ���c����� ��# � �t@�� T���A�6 @���? � T#@�b �C �5` � ����@����b@�A ���"