[PATCH v1 2/2] LoongArch: Return 0 for user tasks in arch_stack_walk_reliable()

Tiezhu Yang posted 2 patches 3 weeks, 2 days ago
[PATCH v1 2/2] LoongArch: Return 0 for user tasks in arch_stack_walk_reliable()
Posted by Tiezhu Yang 3 weeks, 2 days ago
When testing the kernel live patching with "modprobe livepatch-sample",
there is a timeout over 15 seconds from "starting patching transition"
to "patching complete", dmesg shows "unreliable stack" for user tasks
in debug mode. When executing "rmmod livepatch-sample", there exists
the similar issue.

Like x86, arch_stack_walk_reliable() should return 0 for user tasks.
It is necessary to set regs->csr_prmd as task->thread.csr_prmd first,
then use user_mode() to check whether the task is in userspace.

Here are the call chains:

  klp_enable_patch()
    klp_try_complete_transition()
      klp_try_switch_task()
        klp_check_and_switch_task()
          klp_check_stack()
            stack_trace_save_tsk_reliable()
              arch_stack_walk_reliable()

With this patch, it takes a short time for patching and unpatching.

Before:

  # modprobe livepatch-sample
  # dmesg -T | tail -3
  [Sat Sep  6 11:00:20 2025] livepatch: 'livepatch_sample': starting patching transition
  [Sat Sep  6 11:00:35 2025] livepatch: signaling remaining tasks
  [Sat Sep  6 11:00:36 2025] livepatch: 'livepatch_sample': patching complete

  # echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
  # rmmod livepatch_sample
  rmmod: ERROR: Module livepatch_sample is in use
  # rmmod livepatch_sample
  # dmesg -T | tail -3
  [Sat Sep  6 11:06:05 2025] livepatch: 'livepatch_sample': starting unpatching transition
  [Sat Sep  6 11:06:20 2025] livepatch: signaling remaining tasks
  [Sat Sep  6 11:06:21 2025] livepatch: 'livepatch_sample': unpatching complete

After:

  # modprobe livepatch-sample
  # dmesg -T | tail -2
  [Sat Sep  6 11:19:00 2025] livepatch: 'livepatch_sample': starting patching transition
  [Sat Sep  6 11:19:01 2025] livepatch: 'livepatch_sample': patching complete

  # echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
  # rmmod livepatch_sample
  # dmesg -T | tail -2
  [Sat Sep  6 11:21:10 2025] livepatch: 'livepatch_sample': starting unpatching transition
  [Sat Sep  6 11:21:11 2025] livepatch: 'livepatch_sample': unpatching complete

While at it, do the similar thing for arch_stack_walk() to avoid
potential issues.

Cc: stable@vger.kernel.org # v6.9+
Fixes: 199cc14cb4f1 ("LoongArch: Add kernel livepatching support")
Reported-by: Xi Zhang <zhangxi@kylinos.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 arch/loongarch/kernel/stacktrace.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/loongarch/kernel/stacktrace.c b/arch/loongarch/kernel/stacktrace.c
index 9a038d1070d7..0454cce3b667 100644
--- a/arch/loongarch/kernel/stacktrace.c
+++ b/arch/loongarch/kernel/stacktrace.c
@@ -30,10 +30,15 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
 		}
 		regs->regs[1] = 0;
 		regs->regs[22] = 0;
+		regs->csr_prmd = task->thread.csr_prmd;
 	}
 
 	for (unwind_start(&state, task, regs);
 	     !unwind_done(&state); unwind_next_frame(&state)) {
+		/* Success path for user tasks */
+		if (user_mode(regs))
+			return;
+
 		addr = unwind_get_return_address(&state);
 		if (!addr || !consume_entry(cookie, addr))
 			break;
@@ -57,9 +62,14 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
 	}
 	regs->regs[1] = 0;
 	regs->regs[22] = 0;
+	regs->csr_prmd = task->thread.csr_prmd;
 
 	for (unwind_start(&state, task, regs);
 	     !unwind_done(&state) && !unwind_error(&state); unwind_next_frame(&state)) {
+		/* Success path for user tasks */
+		if (user_mode(regs))
+			return 0;
+
 		addr = unwind_get_return_address(&state);
 
 		/*
-- 
2.42.0
Re: [PATCH v1 2/2] LoongArch: Return 0 for user tasks in arch_stack_walk_reliable()
Posted by Miroslav Benes 3 weeks ago
Hi,

On Tue, 9 Sep 2025, Tiezhu Yang wrote:

> When testing the kernel live patching with "modprobe livepatch-sample",
> there is a timeout over 15 seconds from "starting patching transition"
> to "patching complete", dmesg shows "unreliable stack" for user tasks
> in debug mode. When executing "rmmod livepatch-sample", there exists
> the similar issue.
> 
> Like x86, arch_stack_walk_reliable() should return 0 for user tasks.
> It is necessary to set regs->csr_prmd as task->thread.csr_prmd first,
> then use user_mode() to check whether the task is in userspace.

it is a nice optimization for sure, but "unreliable stack" messages point 
to a fact that the unwinding of these tasks is probably suboptimal and 
could be improved, no?

It would also be nice to include these messages (not for all tasks) to the 
commit message.

Regards
Miroslav
Re: [PATCH v1 2/2] LoongArch: Return 0 for user tasks in arch_stack_walk_reliable()
Posted by Tiezhu Yang 2 weeks, 3 days ago
On 2025/9/11 下午9:44, Miroslav Benes wrote:
> Hi,
> 
> On Tue, 9 Sep 2025, Tiezhu Yang wrote:
> 
>> When testing the kernel live patching with "modprobe livepatch-sample",
>> there is a timeout over 15 seconds from "starting patching transition"
>> to "patching complete", dmesg shows "unreliable stack" for user tasks
>> in debug mode. When executing "rmmod livepatch-sample", there exists
>> the similar issue.
>>
>> Like x86, arch_stack_walk_reliable() should return 0 for user tasks.
>> It is necessary to set regs->csr_prmd as task->thread.csr_prmd first,
>> then use user_mode() to check whether the task is in userspace.
> 
> it is a nice optimization for sure, but "unreliable stack" messages point
> to a fact that the unwinding of these tasks is probably suboptimal and
> could be improved, no?

Yes, makes sense, I will fix "unreliable stack" in the next version.

> It would also be nice to include these messages (not for all tasks) to the
> commit message.

OK, will do it.

Thanks,
Tiezhu

Re: [PATCH v1 2/2] LoongArch: Return 0 for user tasks in arch_stack_walk_reliable()
Posted by Jinyang He 3 weeks, 1 day ago
On 2025-09-09 19:31, Tiezhu Yang wrote:

> When testing the kernel live patching with "modprobe livepatch-sample",
> there is a timeout over 15 seconds from "starting patching transition"
> to "patching complete", dmesg shows "unreliable stack" for user tasks
> in debug mode. When executing "rmmod livepatch-sample", there exists
> the similar issue.
>
> Like x86, arch_stack_walk_reliable() should return 0 for user tasks.
> It is necessary to set regs->csr_prmd as task->thread.csr_prmd first,
> then use user_mode() to check whether the task is in userspace.
>
> Here are the call chains:
>
>    klp_enable_patch()
>      klp_try_complete_transition()
>        klp_try_switch_task()
>          klp_check_and_switch_task()
>            klp_check_stack()
>              stack_trace_save_tsk_reliable()
>                arch_stack_walk_reliable()
>
> With this patch, it takes a short time for patching and unpatching.
>
> Before:
>
>    # modprobe livepatch-sample
>    # dmesg -T | tail -3
>    [Sat Sep  6 11:00:20 2025] livepatch: 'livepatch_sample': starting patching transition
>    [Sat Sep  6 11:00:35 2025] livepatch: signaling remaining tasks
>    [Sat Sep  6 11:00:36 2025] livepatch: 'livepatch_sample': patching complete
>
>    # echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
>    # rmmod livepatch_sample
>    rmmod: ERROR: Module livepatch_sample is in use
>    # rmmod livepatch_sample
>    # dmesg -T | tail -3
>    [Sat Sep  6 11:06:05 2025] livepatch: 'livepatch_sample': starting unpatching transition
>    [Sat Sep  6 11:06:20 2025] livepatch: signaling remaining tasks
>    [Sat Sep  6 11:06:21 2025] livepatch: 'livepatch_sample': unpatching complete
>
> After:
>
>    # modprobe livepatch-sample
>    # dmesg -T | tail -2
>    [Sat Sep  6 11:19:00 2025] livepatch: 'livepatch_sample': starting patching transition
>    [Sat Sep  6 11:19:01 2025] livepatch: 'livepatch_sample': patching complete
>
>    # echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
>    # rmmod livepatch_sample
>    # dmesg -T | tail -2
>    [Sat Sep  6 11:21:10 2025] livepatch: 'livepatch_sample': starting unpatching transition
>    [Sat Sep  6 11:21:11 2025] livepatch: 'livepatch_sample': unpatching complete
>
> While at it, do the similar thing for arch_stack_walk() to avoid
> potential issues.
>
> Cc: stable@vger.kernel.org # v6.9+
> Fixes: 199cc14cb4f1 ("LoongArch: Add kernel livepatching support")
> Reported-by: Xi Zhang <zhangxi@kylinos.cn>
> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> ---
>   arch/loongarch/kernel/stacktrace.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
>
> diff --git a/arch/loongarch/kernel/stacktrace.c b/arch/loongarch/kernel/stacktrace.c
> index 9a038d1070d7..0454cce3b667 100644
> --- a/arch/loongarch/kernel/stacktrace.c
> +++ b/arch/loongarch/kernel/stacktrace.c
> @@ -30,10 +30,15 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
>   		}
>   		regs->regs[1] = 0;
>   		regs->regs[22] = 0;
> +		regs->csr_prmd = task->thread.csr_prmd;
>   	}
>   
>   	for (unwind_start(&state, task, regs);
>   	     !unwind_done(&state); unwind_next_frame(&state)) {
> +		/* Success path for user tasks */
> +		if (user_mode(regs))
> +			return;
> +
>   		addr = unwind_get_return_address(&state);
>   		if (!addr || !consume_entry(cookie, addr))
>   			break;
> @@ -57,9 +62,14 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
>   	}
>   	regs->regs[1] = 0;
>   	regs->regs[22] = 0;
> +	regs->csr_prmd = task->thread.csr_prmd;
>   
>   	for (unwind_start(&state, task, regs);
>   	     !unwind_done(&state) && !unwind_error(&state); unwind_next_frame(&state)) {
> +		/* Success path for user tasks */
> +		if (user_mode(regs))
> +			return 0;
> +
>   		addr = unwind_get_return_address(&state);
>   
>   		/*
Hi, Tiezhu,

We update stack info by get_stack_info when meet ORC_TYPE_REGS in
unwind_next_frame. And in arch_stack_walk(_reliable), we always
do unwind_done before unwind_next_frame. So is there anything
error in get_stack_info which causing regs is user_mode while
stack is not STACK_TYPE_UNKNOWN?
Re: [PATCH v1 2/2] LoongArch: Return 0 for user tasks in arch_stack_walk_reliable()
Posted by Tiezhu Yang 3 weeks ago
On 2025/9/10 上午9:11, Jinyang He wrote:
> On 2025-09-09 19:31, Tiezhu Yang wrote:
> 
>> When testing the kernel live patching with "modprobe livepatch-sample",
>> there is a timeout over 15 seconds from "starting patching transition"
>> to "patching complete", dmesg shows "unreliable stack" for user tasks
>> in debug mode. When executing "rmmod livepatch-sample", there exists
>> the similar issue.

...

>> @@ -57,9 +62,14 @@ int arch_stack_walk_reliable(stack_trace_consume_fn 
>> consume_entry,
>>       }
>>       regs->regs[1] = 0;
>>       regs->regs[22] = 0;
>> +    regs->csr_prmd = task->thread.csr_prmd;
>>       for (unwind_start(&state, task, regs);
>>            !unwind_done(&state) && !unwind_error(&state); 
>> unwind_next_frame(&state)) {
>> +        /* Success path for user tasks */
>> +        if (user_mode(regs))
>> +            return 0;
>> +
>>           addr = unwind_get_return_address(&state);
>>           /*
> Hi, Tiezhu,
> 
> We update stack info by get_stack_info when meet ORC_TYPE_REGS in
> unwind_next_frame. And in arch_stack_walk(_reliable), we always
> do unwind_done before unwind_next_frame. So is there anything
> error in get_stack_info which causing regs is user_mode while
> stack is not STACK_TYPE_UNKNOWN?

When testing the kernel live patching, the error code path in
unwind_next_frame() is:

   switch (orc->fp_reg) {
           case ORC_REG_PREV_SP:
                   p = (unsigned long *)(state->sp + orc->fp_offset);
                   if (!stack_access_ok(state, (unsigned long)p, 
sizeof(unsigned long)))
                           goto err;

for this case, get_stack_info() does not return 0 due to in_task_stack()
is not true, then goto error, state->stack_info.type = STACK_TYPE_UNKNOWN
and state->error = true. In arch_stack_walk_reliable(), the loop will be
break and it returns -EINVAL, thus causing unreliable stack.

Maybe it can check whether the task is in userspace and set
state->stack_info.type = STACK_TYPE_UNKNOWN in get_stack_info(),
but I think no need to do that because it has similar effect with
this patch.

Thanks,
Tiezhu

Re: [PATCH v1 2/2] LoongArch: Return 0 for user tasks in arch_stack_walk_reliable()
Posted by Jinyang He 2 weeks, 6 days ago
On 2025-09-11 19:49, Tiezhu Yang wrote:

> On 2025/9/10 上午9:11, Jinyang He wrote:
>> On 2025-09-09 19:31, Tiezhu Yang wrote:
>>
>>> When testing the kernel live patching with "modprobe livepatch-sample",
>>> there is a timeout over 15 seconds from "starting patching transition"
>>> to "patching complete", dmesg shows "unreliable stack" for user tasks
>>> in debug mode. When executing "rmmod livepatch-sample", there exists
>>> the similar issue.
>
> ...
>
>>> @@ -57,9 +62,14 @@ int 
>>> arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
>>>       }
>>>       regs->regs[1] = 0;
>>>       regs->regs[22] = 0;
>>> +    regs->csr_prmd = task->thread.csr_prmd;
>>>       for (unwind_start(&state, task, regs);
>>>            !unwind_done(&state) && !unwind_error(&state); 
>>> unwind_next_frame(&state)) {
>>> +        /* Success path for user tasks */
>>> +        if (user_mode(regs))
>>> +            return 0;
>>> +
>>>           addr = unwind_get_return_address(&state);
>>>           /*
>> Hi, Tiezhu,
>>
>> We update stack info by get_stack_info when meet ORC_TYPE_REGS in
>> unwind_next_frame. And in arch_stack_walk(_reliable), we always
>> do unwind_done before unwind_next_frame. So is there anything
>> error in get_stack_info which causing regs is user_mode while
>> stack is not STACK_TYPE_UNKNOWN?
>
> When testing the kernel live patching, the error code path in
> unwind_next_frame() is:
>
>   switch (orc->fp_reg) {
>           case ORC_REG_PREV_SP:
>                   p = (unsigned long *)(state->sp + orc->fp_offset);
>                   if (!stack_access_ok(state, (unsigned long)p, 
> sizeof(unsigned long)))
>                           goto err;
>
> for this case, get_stack_info() does not return 0 due to in_task_stack()
> is not true, then goto error, state->stack_info.type = STACK_TYPE_UNKNOWN
> and state->error = true. In arch_stack_walk_reliable(), the loop will be
> break and it returns -EINVAL, thus causing unreliable stack.
The stop position of a complete stack backtrace on LoongArch should be
the top of the task stack or until the address is_entry_func.
Otherwise, it is not a complete stack backtrace, and thus I think it
is an "unreliable stack".
I'm curious about what the ORC info at this PC.

Re: [PATCH v1 2/2] LoongArch: Return 0 for user tasks in arch_stack_walk_reliable()
Posted by Tiezhu Yang 2 weeks, 3 days ago
On 2025/9/12 上午9:55, Jinyang He wrote:
> On 2025-09-11 19:49, Tiezhu Yang wrote:
> 
>> On 2025/9/10 上午9:11, Jinyang He wrote:
>>> On 2025-09-09 19:31, Tiezhu Yang wrote:
>>>
>>>> When testing the kernel live patching with "modprobe livepatch-sample",
>>>> there is a timeout over 15 seconds from "starting patching transition"
>>>> to "patching complete", dmesg shows "unreliable stack" for user tasks
>>>> in debug mode. When executing "rmmod livepatch-sample", there exists
>>>> the similar issue.

...

>> for this case, get_stack_info() does not return 0 due to in_task_stack()
>> is not true, then goto error, state->stack_info.type = STACK_TYPE_UNKNOWN
>> and state->error = true. In arch_stack_walk_reliable(), the loop will be
>> break and it returns -EINVAL, thus causing unreliable stack.
> The stop position of a complete stack backtrace on LoongArch should be
> the top of the task stack or until the address is_entry_func.
> Otherwise, it is not a complete stack backtrace, and thus I think it
> is an "unreliable stack".
> I'm curious about what the ORC info at this PC.

The unwind process has problem, I have found the root cause and am
working to fix the "unreliable stack" issue, it should and can find
the last frame, and then the user_mode() check is not necessary.

Thanks,
Tiezhu