decode_stacktrace: Support heuristic caller address search

[PATCH] decode_stacktrace: Support heuristic caller address search

Posted by Masami Hiramatsu (Google) 1 month ago

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Add -c option to search call address search to decode_stacktrace.
This tries to decode line info backwards, starting from 1byte before
the return address, and displays the first line info it founds as
the caller address.
If it tries up to 10bytes before (or the symbol address) and still
can not find it, it gives up and decodes the return address.

With -c option:
 Call Trace:
  <TASK>
  dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
  lockdep_rcu_suspicious (kernel/locking/lockdep.c:6876)
  event_filter_pid_sched_process_fork (kernel/trace/trace_events.c:1057)
  kernel_clone (include/trace/events/sched.h:396 include/trace/events/sched.h:396 kernel/fork.c:2664)
  __x64_sys_clone (kernel/fork.c:2795 kernel/fork.c:2779 kernel/fork.c:2779)
  do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
  ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
  ? trace_irq_disable (include/trace/events/preemptirq.h:36)
  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)


Without -c option:
 Call Trace:
  <TASK>
  dump_stack_lvl (lib/dump_stack.c:122)
  lockdep_rcu_suspicious (kernel/locking/lockdep.c:6877)
  event_filter_pid_sched_process_fork (kernel/trace/trace_events.c:?)
  kernel_clone (include/trace/events/sched.h:? include/trace/events/sched.h:396 kernel/fork.c:2664)
  __x64_sys_clone (kernel/fork.c:2779)
  do_syscall_64 (arch/x86/entry/syscall_64.c:?)
  ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
  ? trace_irq_disable (include/trace/events/preemptirq.h:36)
  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 scripts/decode_stacktrace.sh |   51 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 46 insertions(+), 5 deletions(-)

diff --git a/scripts/decode_stacktrace.sh b/scripts/decode_stacktrace.sh
index 8d01b741de62..78e0810af476 100755
--- a/scripts/decode_stacktrace.sh
+++ b/scripts/decode_stacktrace.sh
@@ -5,9 +5,11 @@
 
 usage() {
 	echo "Usage:"
-	echo "	$0 -r <release>"
-	echo "	$0 [<vmlinux> [<base_path>|auto [<modules_path>]]]"
+	echo "	$0 [-c] -r <release>"
+	echo "	$0 [-c] [<vmlinux> [<base_path>|auto [<modules_path>]]]"
 	echo "	$0 -h"
+	echo "Options:"
+	echo "   -c: Decode heuristically searched call address."
 }
 
 # Try to find a Rust demangler
@@ -33,11 +35,17 @@ fi
 READELF=${UTIL_PREFIX}readelf${UTIL_SUFFIX}
 ADDR2LINE=${UTIL_PREFIX}addr2line${UTIL_SUFFIX}
 NM=${UTIL_PREFIX}nm${UTIL_SUFFIX}
+call_search=false
 
 if [[ $1 == "-h" ]] ; then
 	usage
 	exit 0
-elif [[ $1 == "-r" ]] ; then
+elif [[ $1 == "-c" ]] ; then
+	call_search=true
+	shift 1
+fi
+
+if [[ $1 == "-r" ]] ; then
 	vmlinux=""
 	basepath="auto"
 	modpath=""
@@ -123,6 +131,28 @@ find_module() {
 	return 1
 }
 
+UNKNOWN_LINE="??:0"
+
+search_call_site() {
+	# Instead of using the return address, use the nearest line info
+	# address before given address.
+	local return_addr=${2}
+	local max=${3}
+	local i
+
+	for i in $(seq 1 ${max}); do
+		local expr=$((0x$return_addr-$i))
+		local address=$(printf "%x\n" "$expr")
+
+		local code=$(${ADDR2LINE} -i -e "${1}" "$address" 2>/dev/null)
+		local first=${code% *}
+		if [[ "$code" != "" && "$code" != ${UNKNOWN_LINE} && "${first#*:}" != "?" ]]; then
+			echo "$code"
+			break
+		fi
+	done
+}
+
 parse_symbol() {
 	# The structure of symbol at this point is:
 	#   ([name]+[offset]/[total length])
@@ -176,6 +206,9 @@ parse_symbol() {
 	# Let's start doing the math to get the exact address into the
 	# symbol. First, strip out the symbol total length.
 	local expr=${symbol%/*}
+	# Also parse the offset from symbol.
+	local offset=${expr#*+}
+	offset=$((offset))
 
 	# Now, replace the symbol name with the base address we found
 	# before.
@@ -190,7 +223,15 @@ parse_symbol() {
 	if [[ $aarray_support == true && "${cache[$module,$address]+isset}" == "isset" ]]; then
 		local code=${cache[$module,$address]}
 	else
-		local code=$(${ADDR2LINE} -i -e "$objfile" "$address" 2>/dev/null)
+		local code
+		if [[ $call_search == true && $offset != 0 ]]; then
+			code=$(search_call_site "$objfile" "$address" "$offset")
+		fi
+
+		if [[ "$code" == "" ]]; then
+			code=$(${ADDR2LINE} -i -e "$objfile" "$address" 2>/dev/null)
+		fi
+
 		if [[ $aarray_support == true ]]; then
 			cache[$module,$address]=$code
 		fi
@@ -199,7 +240,7 @@ parse_symbol() {
 	# addr2line doesn't return a proper error code if it fails, so
 	# we detect it using the value it prints so that we could preserve
 	# the offset/size into the function and bail out
-	if [[ $code == "??:0" ]]; then
+	if [[ $code == ${UNKNOWN_LINE} ]]; then
 		return
 	fi

Re: [PATCH] decode_stacktrace: Support heuristic caller address search

Posted by Sasha Levin 1 month ago

On Thu,  5 Mar 2026 14:12:19 +0900, Masami Hiramatsu (Google) wrote:
> Add -c option to search call address search to decode_stacktrace.
> This tries to decode line info backwards, starting from 1byte before
> the return address, and displays the first line info it founds as
> the caller address.
> If it tries up to 10bytes before (or the symbol address) and still
> can not find it, it gives up and decodes the return address.

The commit message says "up to 10bytes" but the code passes $offset
(the function offset from the symbol) as the max iteration count to
search_call_site(). There's no 10-byte cap anywhere in the code?
$offset can easily be hundreds or thousands of bytes into a function.

> +search_call_site() {
> +     # Instead of using the return address, use the nearest line info
> +     # address before given address.
> +     local return_addr=${2}
> +     local max=${3}
> +     local i
> +
> +     for i in $(seq 1 ${max}); do
> +             local expr=$((0x$return_addr-$i))
> +             local address=$(printf "%x\n" "$expr")
> +
> +             local code=$(${ADDR2LINE} -i -e "${1}" "$address" 2>/dev/null)
> +             local first=${code% *}
> +             if [[ "$code" != "" && "$code" != ${UNKNOWN_LINE} && "${first#*:}" != "?" ]]; then

To also address Matthieu's question about performance: I think this
whole iterative search could be replaced by simply subtracting 1 from
the return address before passing it to addr2line.

DWARF line tables map address *ranges* to source lines, so any address
within the CALL instruction resolves to the correct source line.
return_addr-1 is guaranteed to land inside the CALL instruction (it's
the last byte of it), so a single addr2line call is sufficient.

This is exactly what the kernel itself does in sprint_backtrace()
(kernel/kallsyms.c:570): it passes symbol_offset=-1 to
__sprint_symbol(), which does `address += symbol_offset` before
lookup. GDB, perf, and libunwind all use the same addr-1 trick for
the same reason.

That would make this both correct and free.

> +             if [[ "$code" != "" && "$code" != ${UNKNOWN_LINE} && "${first#*:}" != "?" ]]; then

Minor: ${UNKNOWN_LINE} is "??:0" -- when unquoted on the RHS of != inside
[[ ]], the ? characters are interpreted as glob wildcards (each matching
any single character). It happens to work here because ? also matches '?'
itself, but it should be quoted as "${UNKNOWN_LINE}" for correctness.
Same issue on the other != ${UNKNOWN_LINE} below.

-- 
Thanks,
Sasha

Re: [PATCH] decode_stacktrace: Support heuristic caller address search

Posted by Masami Hiramatsu (Google) 1 month ago

On Thu, 5 Mar 2026 10:51:47 -0500
Sasha Levin <sashal@kernel.org> wrote:

> On Thu,  5 Mar 2026 14:12:19 +0900, Masami Hiramatsu (Google) wrote:
> > Add -c option to search call address search to decode_stacktrace.
> > This tries to decode line info backwards, starting from 1byte before
> > the return address, and displays the first line info it founds as
> > the caller address.
> > If it tries up to 10bytes before (or the symbol address) and still
> > can not find it, it gives up and decodes the return address.
> 
> The commit message says "up to 10bytes" but the code passes $offset
> (the function offset from the symbol) as the max iteration count to
> search_call_site(). There's no 10-byte cap anywhere in the code?
> $offset can easily be hundreds or thousands of bytes into a function.

Ah, sorry. I forgot to set maximum :(

> 
> > +search_call_site() {
> > +     # Instead of using the return address, use the nearest line info
> > +     # address before given address.
> > +     local return_addr=${2}
> > +     local max=${3}
> > +     local i
> > +
> > +     for i in $(seq 1 ${max}); do
> > +             local expr=$((0x$return_addr-$i))
> > +             local address=$(printf "%x\n" "$expr")
> > +
> > +             local code=$(${ADDR2LINE} -i -e "${1}" "$address" 2>/dev/null)
> > +             local first=${code% *}
> > +             if [[ "$code" != "" && "$code" != ${UNKNOWN_LINE} && "${first#*:}" != "?" ]]; then
> 
> To also address Matthieu's question about performance: I think this
> whole iterative search could be replaced by simply subtracting 1 from
> the return address before passing it to addr2line.
> 
> DWARF line tables map address *ranges* to source lines, so any address
> within the CALL instruction resolves to the correct source line.
> return_addr-1 is guaranteed to land inside the CALL instruction (it's
> the last byte of it), so a single addr2line call is sufficient.

Ah, got it, OK. I also confirmed "addr-1" works. But if there is no lineinfo
entry for the call instruction, shouldn't we check more instructions before
the call?

> 
> This is exactly what the kernel itself does in sprint_backtrace()
> (kernel/kallsyms.c:570): it passes symbol_offset=-1 to
> __sprint_symbol(), which does `address += symbol_offset` before
> lookup. GDB, perf, and libunwind all use the same addr-1 trick for
> the same reason.

OK.

> 
> That would make this both correct and free.
> 
> > +             if [[ "$code" != "" && "$code" != ${UNKNOWN_LINE} && "${first#*:}" != "?" ]]; then
> 
> Minor: ${UNKNOWN_LINE} is "??:0" -- when unquoted on the RHS of != inside
> [[ ]], the ? characters are interpreted as glob wildcards (each matching
> any single character). It happens to work here because ? also matches '?'
> itself, but it should be quoted as "${UNKNOWN_LINE}" for correctness.
> Same issue on the other != ${UNKNOWN_LINE} below.

Ah, OK. Let me fix it.

Thanks,

> 
> -- 
> Thanks,
> Sasha


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

Re: [PATCH] decode_stacktrace: Support heuristic caller address search

Posted by Sasha Levin 1 month ago

On Fri, Mar 06, 2026 at 01:32:41AM +0900, Masami Hiramatsu wrote:
>On Thu, 5 Mar 2026 10:51:47 -0500
>Sasha Levin <sashal@kernel.org> wrote:
>> DWARF line tables map address *ranges* to source lines, so any address
>> within the CALL instruction resolves to the correct source line.
>> return_addr-1 is guaranteed to land inside the CALL instruction (it's
>> the last byte of it), so a single addr2line call is sufficient.
>
>Ah, got it, OK. I also confirmed "addr-1" works. But if there is no lineinfo
>entry for the call instruction, shouldn't we check more instructions before
>the call?

There's no such thing as "no lineinfo entry for the call instruction" - DWARF
line tables are range-based, not discrete points. Each row covers all addresses
up to the next row, so every address within a function resolves to some source
line. addr-1 lands inside the CALL instruction and will always resolve to same
line as the CALL itself.

We show "??:0" because the address we passed falls outside of any DWARF
compilation unit altogether.

-- 
Thanks,
Sasha

Re: [PATCH] decode_stacktrace: Support heuristic caller address search

Posted by Matthieu Baerts 1 month ago

Hi Masami,

On 05/03/2026 06:12, Masami Hiramatsu (Google) wrote:
> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> Add -c option to search call address search to decode_stacktrace.
> This tries to decode line info backwards, starting from 1byte before
> the return address, and displays the first line info it founds as
> the caller address.
> If it tries up to 10bytes before (or the symbol address) and still
> can not find it, it gives up and decodes the return address.

Thank you for this new option!

> With -c option:
>  Call Trace:
>   <TASK>
>   dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
>   lockdep_rcu_suspicious (kernel/locking/lockdep.c:6876)
>   event_filter_pid_sched_process_fork (kernel/trace/trace_events.c:1057)
>   kernel_clone (include/trace/events/sched.h:396 include/trace/events/sched.h:396 kernel/fork.c:2664)
>   __x64_sys_clone (kernel/fork.c:2795 kernel/fork.c:2779 kernel/fork.c:2779)
>   do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
>   ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
>   ? trace_irq_disable (include/trace/events/preemptirq.h:36)
>   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
> 
> 
> Without -c option:
>  Call Trace:
>   <TASK>
>   dump_stack_lvl (lib/dump_stack.c:122)
>   lockdep_rcu_suspicious (kernel/locking/lockdep.c:6877)
>   event_filter_pid_sched_process_fork (kernel/trace/trace_events.c:?)
>   kernel_clone (include/trace/events/sched.h:? include/trace/events/sched.h:396 kernel/fork.c:2664)
>   __x64_sys_clone (kernel/fork.c:2779)
>   do_syscall_64 (arch/x86/entry/syscall_64.c:?)
>   ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
>   ? trace_irq_disable (include/trace/events/preemptirq.h:36)
>   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
That's better indeed!

Do we need a new option for that? Could it not be the new default
behaviour? Or are there any downsides with it?

"addr2line" will be called more, but if it is worth it, it is probably
not an issue, or is it?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH] decode_stacktrace: Support heuristic caller address search

Posted by Masami Hiramatsu (Google) 1 month ago

On Thu, 5 Mar 2026 15:56:13 +0100
Matthieu Baerts <matttbe@kernel.org> wrote:

> Hi Masami,
> 
> On 05/03/2026 06:12, Masami Hiramatsu (Google) wrote:
> > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > 
> > Add -c option to search call address search to decode_stacktrace.
> > This tries to decode line info backwards, starting from 1byte before
> > the return address, and displays the first line info it founds as
> > the caller address.
> > If it tries up to 10bytes before (or the symbol address) and still
> > can not find it, it gives up and decodes the return address.
> 
> Thank you for this new option!
> 
> > With -c option:
> >  Call Trace:
> >   <TASK>
> >   dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
> >   lockdep_rcu_suspicious (kernel/locking/lockdep.c:6876)
> >   event_filter_pid_sched_process_fork (kernel/trace/trace_events.c:1057)
> >   kernel_clone (include/trace/events/sched.h:396 include/trace/events/sched.h:396 kernel/fork.c:2664)
> >   __x64_sys_clone (kernel/fork.c:2795 kernel/fork.c:2779 kernel/fork.c:2779)
> >   do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
> >   ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
> >   ? trace_irq_disable (include/trace/events/preemptirq.h:36)
> >   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
> > 
> > 
> > Without -c option:
> >  Call Trace:
> >   <TASK>
> >   dump_stack_lvl (lib/dump_stack.c:122)
> >   lockdep_rcu_suspicious (kernel/locking/lockdep.c:6877)
> >   event_filter_pid_sched_process_fork (kernel/trace/trace_events.c:?)
> >   kernel_clone (include/trace/events/sched.h:? include/trace/events/sched.h:396 kernel/fork.c:2664)
> >   __x64_sys_clone (kernel/fork.c:2779)
> >   do_syscall_64 (arch/x86/entry/syscall_64.c:?)
> >   ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> >   ? trace_irq_disable (include/trace/events/preemptirq.h:36)
> >   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> That's better indeed!
> 
> Do we need a new option for that? Could it not be the new default
> behaviour? Or are there any downsides with it?

AFAIK, this may not work well on the architectures which have delay
slot (I have not tested) which will execute one more instruction
after branch before branching. In that case, the return address will
not be the next instruction of the delay slot.
But I think that is not popular anymore, so we can switch the default
behavior and maybe we can switch it based on architecture.

Thank you,

> 
> "addr2line" will be called more, but if it is worth it, it is probably
> not an issue, or is it?
> 
> Cheers,
> Matt
> -- 
> Sponsored by the NGI0 Core fund.
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>