Makefile | 8 + arch/Kconfig | 6 + arch/arm64/Kconfig.debug | 10 + arch/arm64/include/asm/module.h | 6 + arch/arm64/include/asm/stacktrace/common.h | 6 + arch/arm64/kernel/entry.S | 10 + arch/arm64/kernel/module.c | 5 + arch/arm64/kernel/setup.c | 2 + arch/arm64/kernel/stacktrace.c | 102 +++++++++ arch/arm64/kernel/vdso/Makefile | 2 +- include/asm-generic/vmlinux.lds.h | 15 ++ include/linux/sframe_lookup.h | 45 ++++ kernel/Makefile | 1 + kernel/sframe.h | 75 +++++++ kernel/sframe_lookup.c | 232 +++++++++++++++++++++ 15 files changed, 524 insertions(+), 1 deletion(-) create mode 100644 include/linux/sframe_lookup.h create mode 100644 kernel/sframe.h create mode 100644 kernel/sframe_lookup.c
This patchset implements a generic kernel sframe-based [1] unwinder. The main goal is to support reliable stacktraces on arm64. On x86 orc unwinder provides reliable stacktraces. But arm64 misses the required support from objtool: it cannot generate orc unwind tables for arm64. Currently, there's already a sframe unwinder proposed for userspace: [2]. Since the sframe unwind table algorithm is similar, these two proposals could integrate common functionality in the future. Currently, only GCC supports sframe. These patches are based on v6.17-rc4 and are available on github [3]. Ref: [1]: https://sourceware.org/binutils/docs/sframe-spec.html [2]: https://lore.kernel.org/lkml/cover.1730150953.git.jpoimboe@kernel.org/ [3]: https://github.com/dylanbhatch/linux/tree/sframe-v2 Changes since v1: https://lore.kernel.org/live-patching/20250127213310.2496133-1-wnliu@google.com/ - Fixed detection of sframe support in compiler (Josh, Jens) - Adapt latest sframe v2 header definition from userspace patch series (Josh) - Folded together unwinder/stacktrace patches (Prasanna) - Fix "orphan section" warnings for .init.sframe sections (Puranjay, Indu, Josh) - Build VDSO without sframe (Dylan) - Added support for modules (Weinan) Dylan Hatch (2): unwind: build kernel with sframe info unwind: add sframe v2 header Weinan Liu (4): arm64: entry: add unwind info for various kernel entries unwind: Implement generic sframe unwinder library arm64/module, unwind: Add sframe support for modules. unwind: arm64: Add reliable stacktrace with sframe unwinder. Makefile | 8 + arch/Kconfig | 6 + arch/arm64/Kconfig.debug | 10 + arch/arm64/include/asm/module.h | 6 + arch/arm64/include/asm/stacktrace/common.h | 6 + arch/arm64/kernel/entry.S | 10 + arch/arm64/kernel/module.c | 5 + arch/arm64/kernel/setup.c | 2 + arch/arm64/kernel/stacktrace.c | 102 +++++++++ arch/arm64/kernel/vdso/Makefile | 2 +- include/asm-generic/vmlinux.lds.h | 15 ++ include/linux/sframe_lookup.h | 45 ++++ kernel/Makefile | 1 + kernel/sframe.h | 75 +++++++ kernel/sframe_lookup.c | 232 +++++++++++++++++++++ 15 files changed, 524 insertions(+), 1 deletion(-) create mode 100644 include/linux/sframe_lookup.h create mode 100644 kernel/sframe.h create mode 100644 kernel/sframe_lookup.c -- 2.51.0.355.g5224444f11-goog
On Thu, Sep 4, 2025 at 3:39 PM Dylan Hatch <dylanbhatch@google.com> wrote:
>
> This patchset implements a generic kernel sframe-based [1] unwinder.
> The main goal is to support reliable stacktraces on arm64.
>
> On x86 orc unwinder provides reliable stacktraces. But arm64 misses the
> required support from objtool: it cannot generate orc unwind tables for
> arm64.
>
> Currently, there's already a sframe unwinder proposed for userspace: [2].
> Since the sframe unwind table algorithm is similar, these two proposals
> could integrate common functionality in the future.
>
> Currently, only GCC supports sframe.
>
> These patches are based on v6.17-rc4 and are available on github [3].
>
> Ref:
> [1]: https://sourceware.org/binutils/docs/sframe-spec.html
> [2]: https://lore.kernel.org/lkml/cover.1730150953.git.jpoimboe@kernel.org/
> [3]: https://github.com/dylanbhatch/linux/tree/sframe-v2
I run the following test on this sframe-v2 branch:
bpftrace -e 'kprobe:security_file_open {printf("%s",
kstack);@count+=1; if (@count > 1) {exit();}}'
security_file_open+0
bpf_prog_eaca355a0dcdca7f_kprobe_security_file_open_1+16641632@./bpftrace.bpf.o:0
path_openat+1892
do_filp_open+132
do_open_execat+84
alloc_bprm+44
do_execveat_common.isra.0+116
__arm64_sys_execve+72
invoke_syscall+76
el0_svc_common.constprop.0+68
do_el0_svc+32
el0_svc+56
el0t_64_sync_handler+152
el0t_64_sync+388
This looks wrong. The right call trace should be:
do_filp_open
=> path_openat
=> vfs_open
=> do_dentry_open
=> security_file_open
=> bpf_prog_eaca355a0dcdca7f_...
I am not sure whether this is just a problem with the bpf program,
or also with something else.
Thanks,
Song
On Mon, Sep 29, 2025 at 9:46 PM Song Liu <song@kernel.org> wrote:
>
> On Thu, Sep 4, 2025 at 3:39 PM Dylan Hatch <dylanbhatch@google.com> wrote:
> >
> > This patchset implements a generic kernel sframe-based [1] unwinder.
> > The main goal is to support reliable stacktraces on arm64.
> >
> > On x86 orc unwinder provides reliable stacktraces. But arm64 misses the
> > required support from objtool: it cannot generate orc unwind tables for
> > arm64.
> >
> > Currently, there's already a sframe unwinder proposed for userspace: [2].
> > Since the sframe unwind table algorithm is similar, these two proposals
> > could integrate common functionality in the future.
> >
> > Currently, only GCC supports sframe.
> >
> > These patches are based on v6.17-rc4 and are available on github [3].
> >
> > Ref:
> > [1]: https://sourceware.org/binutils/docs/sframe-spec.html
> > [2]: https://lore.kernel.org/lkml/cover.1730150953.git.jpoimboe@kernel.org/
> > [3]: https://github.com/dylanbhatch/linux/tree/sframe-v2
>
> I run the following test on this sframe-v2 branch:
>
> bpftrace -e 'kprobe:security_file_open {printf("%s",
> kstack);@count+=1; if (@count > 1) {exit();}}'
>
> security_file_open+0
> bpf_prog_eaca355a0dcdca7f_kprobe_security_file_open_1+16641632@./bpftrace.bpf.o:0
> path_openat+1892
> do_filp_open+132
> do_open_execat+84
> alloc_bprm+44
> do_execveat_common.isra.0+116
> __arm64_sys_execve+72
> invoke_syscall+76
> el0_svc_common.constprop.0+68
> do_el0_svc+32
> el0_svc+56
> el0t_64_sync_handler+152
> el0t_64_sync+388
>
> This looks wrong. The right call trace should be:
>
> do_filp_open
> => path_openat
> => vfs_open
> => do_dentry_open
> => security_file_open
> => bpf_prog_eaca355a0dcdca7f_...
>
> I am not sure whether this is just a problem with the bpf program,
> or also with something else.
I will try to debug this more but am just curious about BPF's
interactions with sframe.
The sframe data for bpf programs doesn't exist, so we would need to
add that support
and that wouldn't be trivial, given the BPF programs are JITed.
Thanks,
Puranjay
On Mon, Sep 29, 2025 at 12:55 PM Puranjay Mohan <puranjay12@gmail.com> wrote: > > I will try to debug this more but am just curious about BPF's > interactions with sframe. > The sframe data for bpf programs doesn't exist, so we would need to > add that support > and that wouldn't be trivial, given the BPF programs are JITed. > > Thanks, > Puranjay From what I can tell, the ORC unwinder in x86 falls back to using frame pointers in cases of generated code, like BPF. Would matching this behavior in the sframe unwinder be a reasonable approach, at least for the purposes of enabling reliable unwind for livepatch? Thanks, Dylan
On Fri, Nov 14, 2025 at 10:50:16PM -0800, Dylan Hatch wrote: > On Mon, Sep 29, 2025 at 12:55 PM Puranjay Mohan <puranjay12@gmail.com> wrote: > > > > I will try to debug this more but am just curious about BPF's > > interactions with sframe. > > The sframe data for bpf programs doesn't exist, so we would need to > > add that support > > and that wouldn't be trivial, given the BPF programs are JITed. > > > > Thanks, > > Puranjay > > From what I can tell, the ORC unwinder in x86 falls back to using > frame pointers in cases of generated code, like BPF. Would matching > this behavior in the sframe unwinder be a reasonable approach, at > least for the purposes of enabling reliable unwind for livepatch? The ORC unwinder marks the unwind "unreliable" if it has to fall back to frame pointers. But that's not a problem for livepatch because it only[*] unwinds blocked/sleeping tasks, which shouldn't have BPF on their stack anyway. [*] with one exception: the task calling into livepatch -- Josh
On Tue, Nov 18, 2025 at 12:06 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > On Fri, Nov 14, 2025 at 10:50:16PM -0800, Dylan Hatch wrote: > > On Mon, Sep 29, 2025 at 12:55 PM Puranjay Mohan <puranjay12@gmail.com> wrote: > > > > > > I will try to debug this more but am just curious about BPF's > > > interactions with sframe. > > > The sframe data for bpf programs doesn't exist, so we would need to > > > add that support > > > and that wouldn't be trivial, given the BPF programs are JITed. > > > > > > Thanks, > > > Puranjay > > > > From what I can tell, the ORC unwinder in x86 falls back to using > > frame pointers in cases of generated code, like BPF. Would matching > > this behavior in the sframe unwinder be a reasonable approach, at > > least for the purposes of enabling reliable unwind for livepatch? > > The ORC unwinder marks the unwind "unreliable" if it has to fall back to > frame pointers. > > But that's not a problem for livepatch because it only[*] unwinds > blocked/sleeping tasks, which shouldn't have BPF on their stack anyway. > BPF programs can sleep, so wouldn't they show up in the stack? Like if I am tracing a syscall with a bpf program attached using fentry and the BPF program calls a bpf_arena_alloc_pages(), which can sleep.
On Mon, 17 Nov 2025 15:06:32 -0800 Josh Poimboeuf <jpoimboe@kernel.org> wrote: > The ORC unwinder marks the unwind "unreliable" if it has to fall back to > frame pointers. > > But that's not a problem for livepatch because it only[*] unwinds > blocked/sleeping tasks, which shouldn't have BPF on their stack anyway. > > [*] with one exception: the task calling into livepatch It may be a problem with preempted tasks right? I believe with PREEMPT_LAZY (and definitely with PREEMPT_RT) BPF programs can be preempted. -- Steve
On Mon, Nov 17, 2025 at 06:42:23PM -0500, Steven Rostedt wrote: > On Mon, 17 Nov 2025 15:06:32 -0800 > Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > > The ORC unwinder marks the unwind "unreliable" if it has to fall back to > > frame pointers. > > > > But that's not a problem for livepatch because it only[*] unwinds > > blocked/sleeping tasks, which shouldn't have BPF on their stack anyway. > > > > [*] with one exception: the task calling into livepatch > > It may be a problem with preempted tasks right? I believe with PREEMPT_LAZY > (and definitely with PREEMPT_RT) BPF programs can be preempted. In that case, then yes, that stack would be marked unreliable and livepatch would have to go try and patch the task later. If it were an isolated case, that would be fine, but if BPF were consistently on the same task's stack, it could stall the completion of the livepatch indefinitely. I haven't (yet?) heard of BPF-induced livepatch stalls happening in reality, but maybe it's only a matter of time :-/ To fix that, I suppose we would need some kind of dynamic ORC registration interface. Similar to what has been discussed with sframe+JIT. If BPF were to always use frame pointers then there would be only a very limited set of ORC entries (either "frame pointer" or "undefined") for a given BPF function and it shouldn't be too complicated. -- Josh
On Tue, Nov 18, 2025 at 1:10 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > On Mon, Nov 17, 2025 at 06:42:23PM -0500, Steven Rostedt wrote: > > On Mon, 17 Nov 2025 15:06:32 -0800 > > Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > > > > The ORC unwinder marks the unwind "unreliable" if it has to fall back to > > > frame pointers. > > > > > > But that's not a problem for livepatch because it only[*] unwinds > > > blocked/sleeping tasks, which shouldn't have BPF on their stack anyway. > > > > > > [*] with one exception: the task calling into livepatch > > > > It may be a problem with preempted tasks right? I believe with PREEMPT_LAZY > > (and definitely with PREEMPT_RT) BPF programs can be preempted. > > In that case, then yes, that stack would be marked unreliable and > livepatch would have to go try and patch the task later. > > If it were an isolated case, that would be fine, but if BPF were > consistently on the same task's stack, it could stall the completion of > the livepatch indefinitely. > > I haven't (yet?) heard of BPF-induced livepatch stalls happening in > reality, but maybe it's only a matter of time :-/ > > To fix that, I suppose we would need some kind of dynamic ORC > registration interface. Similar to what has been discussed with > sframe+JIT. I work with the BPF JITs and would be interested in exploring this further, can you point me to this discussion if it happened on the list. > > If BPF were to always use frame pointers then there would be only a very > limited set of ORC entries (either "frame pointer" or "undefined") for a > given BPF function and it shouldn't be too complicated. > > -- > Josh
On 11/17/25 4:49 PM, Puranjay Mohan wrote: > On Tue, Nov 18, 2025 at 1:10 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote: >> >> On Mon, Nov 17, 2025 at 06:42:23PM -0500, Steven Rostedt wrote: >>> On Mon, 17 Nov 2025 15:06:32 -0800 >>> Josh Poimboeuf <jpoimboe@kernel.org> wrote: >>> >>>> The ORC unwinder marks the unwind "unreliable" if it has to fall back to >>>> frame pointers. >>>> >>>> But that's not a problem for livepatch because it only[*] unwinds >>>> blocked/sleeping tasks, which shouldn't have BPF on their stack anyway. >>>> >>>> [*] with one exception: the task calling into livepatch >>> >>> It may be a problem with preempted tasks right? I believe with PREEMPT_LAZY >>> (and definitely with PREEMPT_RT) BPF programs can be preempted. >> >> In that case, then yes, that stack would be marked unreliable and >> livepatch would have to go try and patch the task later. >> >> If it were an isolated case, that would be fine, but if BPF were >> consistently on the same task's stack, it could stall the completion of >> the livepatch indefinitely. >> >> I haven't (yet?) heard of BPF-induced livepatch stalls happening in >> reality, but maybe it's only a matter of time :-/ >> >> To fix that, I suppose we would need some kind of dynamic ORC >> registration interface. Similar to what has been discussed with >> sframe+JIT. > > I work with the BPF JITs and would be interested in exploring this further, > can you point me to this discussion if it happened on the list. > We discussed SFrame/JIT topic earlier this year in our monthly SFrame meetings. I can point you to the meeting notes in a separate email. We had some discussion around: - SFrame specification: Allow efficient addition, removal and update of data in SFrame sections. A part of the challenge is in representing the variety of frames a JIT may use. - SFrame APIs with JIT: Efficient SFrame stack trace data manipulation by JIT. - Interface with Linux kernel: Efficient SFrame stack trace data registration and update stack trace data. It will be great to have more collaboration and brainstorming, and to include BPF/JIT in the discussions. >> >> If BPF were to always use frame pointers then there would be only a very >> limited set of ORC entries (either "frame pointer" or "undefined") for a >> given BPF function and it shouldn't be too complicated. >> >> -- >> Josh
On Tue, Nov 18, 2025 at 01:49:06AM +0100, Puranjay Mohan wrote: > On Tue, Nov 18, 2025 at 1:10 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > > > On Mon, Nov 17, 2025 at 06:42:23PM -0500, Steven Rostedt wrote: > > > On Mon, 17 Nov 2025 15:06:32 -0800 > > > Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > > > > > > The ORC unwinder marks the unwind "unreliable" if it has to fall back to > > > > frame pointers. > > > > > > > > But that's not a problem for livepatch because it only[*] unwinds > > > > blocked/sleeping tasks, which shouldn't have BPF on their stack anyway. > > > > > > > > [*] with one exception: the task calling into livepatch > > > > > > It may be a problem with preempted tasks right? I believe with PREEMPT_LAZY > > > (and definitely with PREEMPT_RT) BPF programs can be preempted. > > > > In that case, then yes, that stack would be marked unreliable and > > livepatch would have to go try and patch the task later. > > > > If it were an isolated case, that would be fine, but if BPF were > > consistently on the same task's stack, it could stall the completion of > > the livepatch indefinitely. > > > > I haven't (yet?) heard of BPF-induced livepatch stalls happening in > > reality, but maybe it's only a matter of time :-/ > > > > To fix that, I suppose we would need some kind of dynamic ORC > > registration interface. Similar to what has been discussed with > > sframe+JIT. > > I work with the BPF JITs and would be interested in exploring this further, > can you point me to this discussion if it happened on the list. Sorry, nothing specific has been discussed that I'm aware of :-) -- Josh
On Mon, 17 Nov 2025 21:18:41 -0800 Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > > To fix that, I suppose we would need some kind of dynamic ORC > > > registration interface. Similar to what has been discussed with > > > sframe+JIT. > > > > I work with the BPF JITs and would be interested in exploring this further, > > can you point me to this discussion if it happened on the list. > > Sorry, nothing specific has been discussed that I'm aware of :-) Right, the only discussions have been at the monthly Sframe meetings about needing to be able to handle this. But the actual implementation details have not been figured out yet. -- Steve
© 2016 - 2026 Red Hat, Inc.