[v1] riscv: stacktrace: Fix return address handling in non-FP unwinder

[PATCH v1 0/2] riscv: stacktrace: Fix return address handling in non-FP unwinder

Posted by Rui Qi 4 days, 18 hours ago

This 2-patch series fixes how the non-frame-pointer version of
walk_stackframe handles return addresses, addressing both the
unwinder layer and the display layer.

Background
==========

In the original RISC-V stacktrace code (commit 5d8544e2d007), both
the FP and non-FP versions of walk_stackframe subtracted 0x4 from
return addresses read from the stack:

  FP version:      pc = frame->ra - 0x4;
  Non-FP version:  pc = (*ksp++) - 0x4;

The intent was to convert the return address (the instruction after
a call) back to the call site for better symbol resolution. However,
this is incorrect on RISC-V because the RVC (compressed instruction)
extension allows 2-byte instructions (e.g. c.jal). Subtracting a
fixed 0x4 assumes all call instructions are 4 bytes, which produces
a wrong address for compressed calls.

Commit b785ec129bd9 ("riscv/ftrace: Add HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
support") fixed the FP version by replacing "pc = frame->ra - 0x4"
with ftrace_graph_ret_addr(). The commit message explicitly noted:

  "the original calculation, pc = frame->ra - 4, is buggy when the
   instruction at the return address happened to be a compressed inst."

Later, commit 877425424d6c removed the fallback "#else" branch
entirely since HAVE_FUNCTION_GRAPH_RET_ADDR_PTR is always defined
for RISC-V. However, the non-FP version was simply overlooked and
still carries the bogus -0x4 offset.

Patch 1: Remove -0x4 from the unwinder
=======================================

Remove the -0x4 offset from the non-FP walk_stackframe so it reports
raw return addresses, consistent with the FP version and with other
architectures (ARM64, x86, ARM all report return addresses as-is).

Patch 2: Use %pB for backtrace display
========================================

With the -0x4 removed, a pre-existing issue becomes more visible:
when a noreturn function (e.g. panic, do_exit) is the last call in
a function, the saved return address points to the start of the next
function, and kallsyms resolves it to the wrong symbol.

This is not a new problem -- the FP version has the same issue since
it also does not subtract from return addresses. And it is not
specific to RISC-V; other architectures face it too.

The kernel provides sprint_backtrace() (invoked via %pB) specifically
for this purpose: it subtracts 1 from the address before symbol
lookup, which is sufficient to fall back into the calling function's
range. x86 uses %pB in its printk_stack_address() for this reason.

Replace print_ip_sym() (which uses %pS) with a direct printk using
%pB, matching the pattern used by x86.

Design principle
================

The two patches follow a clear separation of concerns:

  - The unwinder reports raw return addresses (no offset adjustment)
  - The display layer handles symbol resolution adjustments (%pB)

This matches the convention used by other architectures and ensures
consistent behavior between the FP and non-FP unwinders.

Jiakai Xu [1] and Wang Han [2] have separate patch series in flight
for other stacktrace issues; these patches are orthogonal to their
work.

  [1] https://lore.kernel.org/all/20260517143704.659416-1-xujiakai2025@iscas.ac.cn/
  [2] https://lore.kernel.org/all/20260528082310.1994388-1-wanghan@linux.alibaba.com/

Rui Qi (2):
  riscv: stacktrace: Remove bogus -0x4 offset in non-FP walk_stackframe
  riscv: stacktrace: Use %pB for backtrace display

 arch/riscv/kernel/stacktrace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.20.1

[PATCH v2 0/2] riscv: stacktrace: Fix return address handling in non-FP unwinder

Posted by Rui Qi 2 days, 22 hours ago

This 2-patch series fixes how the non-frame-pointer version of
walk_stackframe handles return addresses, addressing both the
unwinder layer and the display layer.

Background
==========

In the original RISC-V stacktrace code (commit 5d8544e2d007), both
the FP and non-FP versions of walk_stackframe subtracted 0x4 from
return addresses read from the stack:

  FP version:      pc = frame->ra - 0x4;
  Non-FP version:  pc = (*ksp++) - 0x4;

The intent was to convert the return address (the instruction after
a call) back to the call site for better symbol resolution. However,
this is incorrect on RISC-V because the RVC (compressed instruction)
extension allows 2-byte instructions (e.g. c.jal). Subtracting a
fixed 0x4 assumes all call instructions are 4 bytes, which produces
a wrong address for compressed calls.

Commit b785ec129bd9 ("riscv/ftrace: Add HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
support") fixed the FP version by replacing "pc = frame->ra - 0x4"
with ftrace_graph_ret_addr(). The commit message explicitly noted:

  "the original calculation, pc = frame->ra - 4, is buggy when the
   instruction at the return address happened to be a compressed inst."

Later, commit 877425424d6c removed the fallback "#else" branch
entirely since HAVE_FUNCTION_GRAPH_RET_ADDR_PTR is always defined
for RISC-V. However, the non-FP version was simply overlooked and
still carries the bogus -0x4 offset.

Patch 1: Remove -0x4 from the unwinder
=======================================

Remove the -0x4 offset from the non-FP walk_stackframe so it reports
raw return addresses, consistent with the FP version and with other
architectures (ARM64, x86, ARM all report return addresses as-is).

Patch 2: Use %pB for return addresses and %pS for exact IPs
=============================================================

With the -0x4 removed, a pre-existing issue becomes more visible:
when a noreturn function (e.g. panic, do_exit) is the last call in
a function, the saved return address points to the start of the next
function, and kallsyms resolves it to the wrong symbol.

The kernel provides sprint_backtrace() (invoked via %pB) specifically
for this purpose: it subtracts 1 from the address before symbol
lookup, which is sufficient to fall back into the calling function's
range. x86 uses %pB in its printk_stack_address() for this reason.

However, %pB must only be used for return addresses. walk_stackframe
passes two kinds of addresses to its callback: exact instruction
pointers (instruction_pointer(regs) and epc from exception frames)
and return addresses (frame->ra, regs->ra, task->thread.ra). For
exact IPs, subtracting 1 can cause kallsyms to resolve to the
preceding function if the IP is at offset 0 of a function.

Add an 'is_ra' parameter to the walk_stackframe callback to indicate
whether the address is a return address.  Then use %pB for return
addresses and %pS for exact instruction pointers, matching the x86
approach which uses %pS for regs->ip via show_ip() and %pB for
stack return addresses via printk_stack_address().

Known limitation: the non-FP version of walk_stackframe() scans the
stack blindly and cannot distinguish pt_regs->epc values from return
addresses, so all stack-scanned values are treated as return addresses
(is_ra = true).  This means epc values from pt_regs encountered during
the scan may still use %pB incorrectly.  The FP version handles this
case correctly by detecting exception frames explicitly.

Changes since v1
================

- Patch 2: Split %pB usage to only apply to return addresses (is_ra =
  true), using %pS for exact instruction pointers (is_ra = false).
  This addresses the review feedback from Paul Walmsley and the
  Sashiko review noting that x86 uses %pS for exact IPs and %pB only
  for return addresses.

- Patch 2: Add 'is_ra' parameter to walk_stackframe callback, update
  the declaration in asm/stacktrace.h, and adapt all callers
  (print_trace_address, save_wchan, fill_callchain_walk wrapper for
  perf, stack_walk_wrapper for arch_stack_walk).

Design principle
================

The two patches follow a clear separation of concerns:

  - The unwinder reports raw return addresses (no offset adjustment)
  - The display layer handles symbol resolution adjustments (%pB for
    return addresses, %pS for exact IPs)

This matches the convention used by other architectures and ensures
consistent behavior between the FP and non-FP unwinders.

Jiakai Xu [1] and Wang Han [2] have separate patch series in flight
for other stacktrace issues; these patches are orthogonal to their
work.

  [1] https://lore.kernel.org/all/20260517143704.659416-1-xujiakai2025@iscas.ac.cn/
  [2] https://lore.kernel.org/all/20260528082310.1994388-1-wanghan@linux.alibaba.com/

Rui Qi (2):
  riscv: stacktrace: Remove bogus -0x4 offset in non-FP walk_stackframe
  riscv: stacktrace: Use %pB for return addresses and %pS for exact IPs

 arch/riscv/include/asm/stacktrace.h |  2 +-
 arch/riscv/kernel/perf_callchain.c  |  7 +++-
 arch/riscv/kernel/stacktrace.c      | 56 +++++++++++++++++++++++------
 3 files changed, 53 insertions(+), 12 deletions(-)

--
2.20.1

[PATCH v2 1/2] riscv: stacktrace: Remove bogus -0x4 offset in non-FP walk_stackframe

Posted by Rui Qi 2 days, 22 hours ago

In the non-frame-pointer version of walk_stackframe, each value read
from the stack is treated as a potential return address and has 0x4
subtracted before being used as the program counter. This was intended
to convert the return address (the instruction after a call) back to
the call site, but it is incorrect:

1. RISC-V has variable-length instructions due to the RVC (compressed
   instruction) extension. A call instruction can be either 4 bytes
   (regular) or 2 bytes (compressed, e.g. c.jal). Subtracting a fixed
   0x4 assumes all call instructions are 4 bytes, which is wrong for
   compressed instructions.

2. Stack traces conventionally report return addresses, not call sites.
   Other architectures (ARM64, x86, ARM) do not subtract instruction
   size from return addresses in their stack unwinding code.

3. The frame-pointer version of walk_stackframe already dropped the
   -0x4 offset. Commit b785ec129bd9 ("riscv/ftrace: Add
   HAVE_FUNCTION_GRAPH_RET_ADDR_PTR support") replaced "pc =
   frame->ra - 0x4" with ftrace_graph_ret_addr(), and the commit
   message explicitly noted that "the original calculation, pc =
   frame->ra - 4, is buggy when the instruction at the return address
   happened to be a compressed inst." The non-FP version was simply
   overlooked.

Remove the bogus -0x4 offset to match the FP version and the
conventions used by other architectures.

Fixes: 5d8544e2d007 ("RISC-V: Generic library routines and assembly")
Signed-off-by: Rui Qi <qirui.001@bytedance.com>
---
 arch/riscv/kernel/stacktrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/kernel/stacktrace.c b/arch/riscv/kernel/stacktrace.c
index 2692d3a06afa..c7555447149b 100644
--- a/arch/riscv/kernel/stacktrace.c
+++ b/arch/riscv/kernel/stacktrace.c
@@ -129,7 +129,7 @@ void notrace walk_stackframe(struct task_struct *task,
 	while (!kstack_end(ksp)) {
 		if (__kernel_text_address(pc) && unlikely(!fn(arg, pc)))
 			break;
-		pc = READ_ONCE_NOCHECK(*ksp++) - 0x4;
+		pc = READ_ONCE_NOCHECK(*ksp++);
 	}
 }
 
-- 
2.20.1

[PATCH v2 2/2] riscv: stacktrace: Use %pB for return addresses and %pS for exact IPs

Posted by Rui Qi 2 days, 22 hours ago

The print_trace_address callback uses print_ip_sym which formats
addresses with %pS. This does not adjust the address before symbol
lookup, so when a noreturn function (e.g. panic, do_exit) is the last
call in a function, the saved return address can point to the start of
the next function, and kallsyms resolves it to the wrong symbol.

The kernel provides %pB (sprint_backtrace) specifically for this
purpose: it subtracts 1 from the address before symbol lookup, which
is sufficient to fall back into the calling function range.

However, %pB must only be used for return addresses. walk_stackframe
passes two kinds of addresses to its callback:
- Exact instruction pointers: instruction_pointer(regs) and the epc
  from exception frames on the stack. These should use %pS since they
  are not return addresses and subtracting 1 could cause kallsyms to
  resolve them to the preceding function.
- Return addresses: frame->ra, regs->ra, and task->thread.ra. These
  should use %pB for correct symbol resolution, especially after
  noreturn calls.

Add an 'is_ra' parameter to the walk_stackframe callback to indicate
whether the address is a return address, then use %pB for return
addresses and %pS for exact instruction pointers. This matches the x86
approach which uses %pS for regs->ip via show_ip() and %pB for stack
return addresses via printk_stack_address().

Also update the walk_stackframe() declaration in asm/stacktrace.h and
the fill_callchain() callback in perf_callchain.c to match the new
signature.

Signed-off-by: Rui Qi <qirui.001@bytedance.com>
---
 arch/riscv/include/asm/stacktrace.h |  2 +-
 arch/riscv/kernel/perf_callchain.c  |  7 +++-
 arch/riscv/kernel/stacktrace.c      | 54 ++++++++++++++++++++++++-----
 3 files changed, 52 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/include/asm/stacktrace.h b/arch/riscv/include/asm/stacktrace.h
index b1495a7e06ce..6771d0e0f656 100644
--- a/arch/riscv/include/asm/stacktrace.h
+++ b/arch/riscv/include/asm/stacktrace.h
@@ -12,7 +12,7 @@ struct stackframe {
 };
 
 extern void notrace walk_stackframe(struct task_struct *task, struct pt_regs *regs,
-				    bool (*fn)(void *, unsigned long), void *arg);
+				    bool (*fn)(void *, unsigned long, bool), void *arg);
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *task,
 			   const char *loglvl);
 
diff --git a/arch/riscv/kernel/perf_callchain.c b/arch/riscv/kernel/perf_callchain.c
index b465bc9eb870..0a5678f0d400 100644
--- a/arch/riscv/kernel/perf_callchain.c
+++ b/arch/riscv/kernel/perf_callchain.c
@@ -11,6 +11,11 @@ static bool fill_callchain(void *entry, unsigned long pc)
 	return perf_callchain_store(entry, pc) == 0;
 }
 
+static bool fill_callchain_walk(void *entry, unsigned long pc, bool is_ra)
+{
+	return fill_callchain(entry, pc);
+}
+
 /*
  * This will be called when the target is in user mode
  * This function will only be called when we use
@@ -44,5 +49,5 @@ void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 		return;
 	}
 
-	walk_stackframe(NULL, regs, fill_callchain, entry);
+	walk_stackframe(NULL, regs, fill_callchain_walk, entry);
 }
diff --git a/arch/riscv/kernel/stacktrace.c b/arch/riscv/kernel/stacktrace.c
index c7555447149b..502729ab257b 100644
--- a/arch/riscv/kernel/stacktrace.c
+++ b/arch/riscv/kernel/stacktrace.c
@@ -46,32 +46,36 @@ static inline int fp_is_valid(unsigned long fp, unsigned long sp)
 }
 
 void notrace walk_stackframe(struct task_struct *task, struct pt_regs *regs,
-			     bool (*fn)(void *, unsigned long), void *arg)
+			     bool (*fn)(void *, unsigned long, bool), void *arg)
 {
 	unsigned long fp, sp, pc;
 	int graph_idx = 0;
 	int level = 0;
+	bool is_ra;
 
 	if (regs) {
 		fp = frame_pointer(regs);
 		sp = user_stack_pointer(regs);
 		pc = instruction_pointer(regs);
+		is_ra = false;	/* exact instruction pointer */
 	} else if (task == NULL || task == current) {
 		fp = (unsigned long)__builtin_frame_address(0);
 		sp = current_stack_pointer;
 		pc = (unsigned long)walk_stackframe;
+		is_ra = false;	/* function entry address */
 		level = -1;
 	} else {
 		/* task blocked in __switch_to */
 		fp = task->thread.s[0];
 		sp = task->thread.sp;
 		pc = task->thread.ra;
+		is_ra = true;	/* return address */
 	}
 
 	for (;;) {
 		struct stackframe *frame;
 
-		if (unlikely(!__kernel_text_address(pc) || (level++ >= 0 && !fn(arg, pc))))
+		if (unlikely(!__kernel_text_address(pc) || (level++ >= 0 && !fn(arg, pc, is_ra))))
 			break;
 
 		if (unlikely(!fp_is_valid(fp, sp)))
@@ -84,18 +88,21 @@ void notrace walk_stackframe(struct task_struct *task, struct pt_regs *regs,
 			/* We hit function where ra is not saved on the stack */
 			fp = frame->ra;
 			pc = regs->ra;
+			is_ra = true;	/* return address */
 		} else {
 			fp = READ_ONCE_TASK_STACK(task, frame->fp);
 			pc = READ_ONCE_TASK_STACK(task, frame->ra);
 			pc = ftrace_graph_ret_addr(task, &graph_idx, pc,
 						   &frame->ra);
+			is_ra = true;	/* return address */
 			if (pc >= (unsigned long)handle_exception &&
 			    pc < (unsigned long)&ret_from_exception_end) {
-				if (unlikely(!fn(arg, pc)))
+				if (unlikely(!fn(arg, pc, is_ra)))
 					break;
 
 				pc = ((struct pt_regs *)sp)->epc;
 				fp = ((struct pt_regs *)sp)->s0;
+				is_ra = false;	/* exact instruction pointer */
 			}
 		}
 
@@ -105,41 +112,58 @@ void notrace walk_stackframe(struct task_struct *task, struct pt_regs *regs,
 #else /* !CONFIG_FRAME_POINTER */
 
 void notrace walk_stackframe(struct task_struct *task,
-	struct pt_regs *regs, bool (*fn)(void *, unsigned long), void *arg)
+	struct pt_regs *regs, bool (*fn)(void *, unsigned long, bool), void *arg)
 {
 	unsigned long sp, pc;
 	unsigned long *ksp;
+	bool is_ra;
 
 	if (regs) {
 		sp = user_stack_pointer(regs);
 		pc = instruction_pointer(regs);
+		is_ra = false;	/* exact instruction pointer */
 	} else if (task == NULL || task == current) {
 		sp = current_stack_pointer;
 		pc = (unsigned long)walk_stackframe;
+		is_ra = false;	/* function entry address */
 	} else {
 		/* task blocked in __switch_to */
 		sp = task->thread.sp;
 		pc = task->thread.ra;
+		is_ra = true;	/* return address */
 	}
 
 	if (unlikely(sp & 0x7))
 		return;
 
+	/*
+	 * Scan the stack for kernel text addresses.  Without frame pointers
+	 * we cannot distinguish return addresses from other saved text
+	 * addresses (e.g. pt_regs->epc), so all stack values are treated
+	 * as return addresses (is_ra = true).  This means pt_regs->epc
+	 * values found during the scan may be displayed with %pB instead
+	 * of the correct %pS, which can cause misattribution when the
+	 * exact IP is at offset 0 of a function.
+	 */
 	ksp = (unsigned long *)sp;
 	while (!kstack_end(ksp)) {
-		if (__kernel_text_address(pc) && unlikely(!fn(arg, pc)))
+		if (__kernel_text_address(pc) && unlikely(!fn(arg, pc, is_ra)))
 			break;
 		pc = READ_ONCE_NOCHECK(*ksp++);
+		is_ra = true;
 	}
 }
 
 #endif /* CONFIG_FRAME_POINTER */
 
-static bool print_trace_address(void *arg, unsigned long pc)
+static bool print_trace_address(void *arg, unsigned long pc, bool is_ra)
 {
 	const char *loglvl = arg;
 
-	print_ip_sym(loglvl, pc);
+	if (is_ra)
+		printk("%s[<%px>] %pB\n", loglvl, (void *)pc, (void *)pc);
+	else
+		printk("%s[<%px>] %pS\n", loglvl, (void *)pc, (void *)pc);
 	return true;
 }
 
@@ -155,7 +179,7 @@ void show_stack(struct task_struct *task, unsigned long *sp, const char *loglvl)
 	dump_backtrace(NULL, task, loglvl);
 }
 
-static bool save_wchan(void *arg, unsigned long pc)
+static bool save_wchan(void *arg, unsigned long pc, bool is_ra)
 {
 	if (!in_sched_functions(pc)) {
 		unsigned long *p = arg;
@@ -176,10 +200,22 @@ unsigned long __get_wchan(struct task_struct *task)
 	return pc;
 }
 
+struct stack_walk_cookie {
+	stack_trace_consume_fn consume;
+	void *cookie;
+};
+
+static bool stack_walk_wrapper(void *arg, unsigned long pc, bool is_ra)
+{
+	struct stack_walk_cookie *ctx = arg;
+	return ctx->consume(ctx->cookie, pc);
+}
+
 noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
 		     struct task_struct *task, struct pt_regs *regs)
 {
-	walk_stackframe(task, regs, consume_entry, cookie);
+	struct stack_walk_cookie ctx = { .consume = consume_entry, .cookie = cookie };
+	walk_stackframe(task, regs, stack_walk_wrapper, &ctx);
 }
 
 /*
-- 
2.20.1