kernel/trace/rethook.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
When a task calls schedule() to yield the CPU, its state remains
TASK_RUNNING, but its stack is frozen and safe to walk.
Replace task_is_running(tsk) with tsk->on_cpu to avoid overly
conservative rejections.
Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
kernel/trace/rethook.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
index 5a8bdf88999a..bd5e5f455e85 100644
--- a/kernel/trace/rethook.c
+++ b/kernel/trace/rethook.c
@@ -250,7 +250,7 @@ unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame
if (WARN_ON_ONCE(!cur))
return 0;
- if (tsk != current && task_is_running(tsk))
+ if (tsk != current && tsk->on_cpu)
return 0;
do {
--
2.34.1
On Mon, 25 May 2026 21:22:53 +0800
Tengda Wu <wutengda@huaweicloud.com> wrote:
> When a task calls schedule() to yield the CPU, its state remains
> TASK_RUNNING, but its stack is frozen and safe to walk.
>
> Replace task_is_running(tsk) with tsk->on_cpu to avoid overly
> conservative rejections.
Please see the Sashiko's comment.
https://sashiko.dev/#/patchset/20260525132253.1889726-1-wutengda%40huaweicloud.com
When calling Unwind on a task other than the current, IMHO, it is
the responsibility of the caller of this function to ensure that the
stack trace of that task is safe.
We also should not use tsk->on_cpu, but should use task_on_cpu(tsk).
BTW, should task_on_cpu() use READ_ONCE() etc?
wait_task_inactive() seems a bit fragile.
Thanks,
>
> Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
> ---
> kernel/trace/rethook.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
> index 5a8bdf88999a..bd5e5f455e85 100644
> --- a/kernel/trace/rethook.c
> +++ b/kernel/trace/rethook.c
> @@ -250,7 +250,7 @@ unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame
> if (WARN_ON_ONCE(!cur))
> return 0;
>
> - if (tsk != current && task_is_running(tsk))
> + if (tsk != current && tsk->on_cpu)
> return 0;
>
> do {
> --
> 2.34.1
>
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
Hi Masami,
thanks for the review and feedback.
On 2026/5/26 11:37, Masami Hiramatsu wrote:
> On Mon, 25 May 2026 21:22:53 +0800
> Tengda Wu <wutengda@huaweicloud.com> wrote:
>
>> When a task calls schedule() to yield the CPU, its state remains
>> TASK_RUNNING, but its stack is frozen and safe to walk.
>>
>> Replace task_is_running(tsk) with tsk->on_cpu to avoid overly
>> conservative rejections.
>
> Please see the Sashiko's comment.
>
> https://sashiko.dev/#/patchset/20260525132253.1889726-1-wutengda%40huaweicloud.com
>
> When calling Unwind on a task other than the current, IMHO, it is
> the responsibility of the caller of this function to ensure that the
> stack trace of that task is safe.
Agree.
> We also should not use tsk->on_cpu, but should use task_on_cpu(tsk).
>
> BTW, should task_on_cpu() use READ_ONCE() etc?
> wait_task_inactive() seems a bit fragile.
>
> Thanks,
>
It seems that using task_on_cpu() is not necessary here because:
1. It requires an additional 'rq' parameter not available in the rethook context.
2. It just returns p->on_cpu, which is identical to our current use of tsk->on_cpu.
/* file: kernel/sched/sched.h */
static inline int task_on_cpu(struct rq *rq, struct task_struct *p)
{
return p->on_cpu;
}
Given these constraints, staying with tsk->on_cpu seems more straightforward
for the rethook context.
Thanks,
Tengda
>>
>> Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
>> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
>> ---
>> kernel/trace/rethook.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
>> index 5a8bdf88999a..bd5e5f455e85 100644
>> --- a/kernel/trace/rethook.c
>> +++ b/kernel/trace/rethook.c
>> @@ -250,7 +250,7 @@ unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame
>> if (WARN_ON_ONCE(!cur))
>> return 0;
>>
>> - if (tsk != current && task_is_running(tsk))
>> + if (tsk != current && tsk->on_cpu)
>> return 0;
>>
>> do {
>> --
>> 2.34.1
>>
>>
>
>
On Fri, 29 May 2026 11:39:49 +0800
Tengda Wu <wutengda@huaweicloud.com> wrote:
> > We also should not use tsk->on_cpu, but should use task_on_cpu(tsk).
> >
> > BTW, should task_on_cpu() use READ_ONCE() etc?
> > wait_task_inactive() seems a bit fragile.
> >
> > Thanks,
> >
>
> It seems that using task_on_cpu() is not necessary here because:
>
> 1. It requires an additional 'rq' parameter not available in the rethook context.
> 2. It just returns p->on_cpu, which is identical to our current use of tsk->on_cpu.
OK. We may need to cleanup task_on_cpu() to remove unused rq, which
was historically introduced by commit 029632fbb7b7 ("sched: Make
separate sched*.c translation units") as a parameter for "task_running()"
because it called task_current(), but was cleaned by commit 0b9d46fc5ef7
("sched: Rename task_running() to task_on_cpu()") and now it is not
used.
>
> /* file: kernel/sched/sched.h */
> static inline int task_on_cpu(struct rq *rq, struct task_struct *p)
> {
> return p->on_cpu;
> }
>
> Given these constraints, staying with tsk->on_cpu seems more straightforward
> for the rethook context.
The reason I asked you to use a wrapper function instead of directly
accessing that on_cpu is to ensure that this part isn't overlooked when
changing the scheduler's behavior in the future.
They may be equivalent at this point in time, but that doesn't necessarily
mean they will remain so in the future.
IMHO, an API interface should represent the behavior and resources
required for that operation, and when accessing it from outside the
subsystem, we should use that API whenever possible. Otherwise, even
if we want to update the internal implementation of one subsystem,
we will have to make changes to other subsystems as well.
Peter, is it OK to drop @rq from task_on_cpu()? Then we can use
it from rethook.
Thank you,
>
> Thanks,
> Tengda
>
> >>
> >> Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
> >> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
> >> ---
> >> kernel/trace/rethook.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
> >> index 5a8bdf88999a..bd5e5f455e85 100644
> >> --- a/kernel/trace/rethook.c
> >> +++ b/kernel/trace/rethook.c
> >> @@ -250,7 +250,7 @@ unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame
> >> if (WARN_ON_ONCE(!cur))
> >> return 0;
> >>
> >> - if (tsk != current && task_is_running(tsk))
> >> + if (tsk != current && tsk->on_cpu)
> >> return 0;
> >>
> >> do {
> >> --
> >> 2.34.1
> >>
> >>
> >
> >
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
On Mon, Jun 01, 2026 at 08:40:01AM +0900, Masami Hiramatsu wrote: > Peter, is it OK to drop @rq from task_on_cpu()? Sure. > Then we can use it from rethook. Well, it is in sched/sched.h, which is an internal header, and no you cannot use that header in rethook. But lets step back first, what is the actual problem here, why are we looking at ->on_cpu at all?
On Thu, 4 Jun 2026 11:34:45 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Jun 01, 2026 at 08:40:01AM +0900, Masami Hiramatsu wrote: > > > Peter, is it OK to drop @rq from task_on_cpu()? > > Sure. > > > Then we can use it from rethook. > > Well, it is in sched/sched.h, which is an internal header, and no you > cannot use that header in rethook. Ah, OK. Hmm, then we should not use it. Maybe ->on_cpu is also internal state? > > But lets step back first, what is the actual problem here, why are we > looking at ->on_cpu at all? Tengda, can you explain it? I think you want to take a stacktrace on !current process, and rethook_find_ret_addr() is rejected i the task is running state. But if you can share actual situation what you need, it is helpful for us to understand. Thank you, -- Masami Hiramatsu (Google) <mhiramat@kernel.org>
On 2026/6/5 21:43, Masami Hiramatsu wrote:
> On Thu, 4 Jun 2026 11:34:45 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Mon, Jun 01, 2026 at 08:40:01AM +0900, Masami Hiramatsu wrote:
>>
>>> Peter, is it OK to drop @rq from task_on_cpu()?
>>
>> Sure.
>>
>>> Then we can use it from rethook.
>>
>> Well, it is in sched/sched.h, which is an internal header, and no you
>> cannot use that header in rethook.
>
> Ah, OK. Hmm, then we should not use it. Maybe ->on_cpu is also internal
> state?
>
>>
>> But lets step back first, what is the actual problem here, why are we
>> looking at ->on_cpu at all?
>
> Tengda, can you explain it?
> I think you want to take a stacktrace on !current process, and
> rethook_find_ret_addr() is rejected i the task is running state.
>
> But if you can share actual situation what you need, it is
> helpful for us to understand.
>
> Thank you,
>
>
Sure.
Background: We are verifying the support of live patches for functions that
have a kretprobe. The specific verification method is as follows:
We construct a function foo() that calls bar():
void bar(void)
{
for (;;) {
schedule();
}
}
void foo(void)
{
bar();
}
A kretprobe is attached to bar():
echo 'r:rp1 bar' > /sys/kernel/tracing/kprobe_events
echo 1 > /sys/kernel/tracing/events/kprobes/rp1/enable
Then foo() is triggered. The expected behavior is that bar() will call
schedule() and yield the CPU.
After that, the live patch is activated to attempt replacing the implementation
of foo(). The expectation is that this should succeed.
However, in reality, because the task that called schedule() is still in the
RUNNING state, the condition task_is_running(tsk) inside rethook_find_ret_addr()
is not satisfied, causing the function to return early. This, in turn,
prevents stack_trace_save_tsk_reliable() from determining the stack as
reliable, leading to a failure in activating the live patch.
**Not sure if this is correct:**
We believe that after a task voluntarily calls schedule(), when the stack
is expected to be reliable, it is a safe time to activate a live patch.
Additionally, a similar tsk->on_cpu check can be found elsewhere in the
kernel (See task_on_another_cpu() in arch/x86/include/asm/unwind.h).
Therefore, we propose changing the task_is_running(tsk) condition to
tsk->on_cpu.
Thanks,
Tengda
+Live patching folks
On Mon, Jun 08, 2026 at 09:52:37AM +0800, Tengda Wu wrote:
> Background: We are verifying the support of live patches for functions that
> have a kretprobe. The specific verification method is as follows:
>
> We construct a function foo() that calls bar():
>
> void bar(void)
> {
> for (;;) {
> schedule();
> }
> }
>
> void foo(void)
> {
> bar();
> }
>
> A kretprobe is attached to bar():
>
> echo 'r:rp1 bar' > /sys/kernel/tracing/kprobe_events
> echo 1 > /sys/kernel/tracing/events/kprobes/rp1/enable
>
> Then foo() is triggered. The expected behavior is that bar() will call
> schedule() and yield the CPU.
>
> After that, the live patch is activated to attempt replacing the implementation
> of foo(). The expectation is that this should succeed.
This wholly depends on how foo() calls bar(), if it is a normal call,
then no, it should not succeed, because foo() is still on the stack.
If it is a tail-call, then yes, because foo() is no longer relevant.
> However, in reality, because the task that called schedule() is still in the
> RUNNING state,
So calling schedule() without setting state is dodgy in the first place.
Who is doing this? All wait primitives will set this to
TASK_UNINTERRUPTIBLE or something along those lines.
> the condition task_is_running(tsk) inside rethook_find_ret_addr()
> is not satisfied, causing the function to return early. This, in turn,
> prevents stack_trace_save_tsk_reliable() from determining the stack as
> reliable, leading to a failure in activating the live patch.
>
> **Not sure if this is correct:**
>
> We believe that after a task voluntarily calls schedule(), when the stack
> is expected to be reliable, it is a safe time to activate a live patch.
Calling schedule() without setting state is a no-op and really shouldn't
count much at all.
> Additionally, a similar tsk->on_cpu check can be found elsewhere in the
> kernel (See task_on_another_cpu() in arch/x86/include/asm/unwind.h).
> Therefore, we propose changing the task_is_running(tsk) condition to
> tsk->on_cpu.
Anyway, I'm wondering what the purpose of this check here is, there is
no real comment, and commit 5120d167e21c ("rethook: Remove warning
messages printed for finding return address of a frame.") is just pure
voodoo as well.
Also, note the comment that goes with the usage of
task_on_another_cpu(); that thing is racy as all heck.
So it really comes down to what the purpose of this check is.
I suspect the issue at hand is that tsk->rethook elements, such as
iterated by __rethook_find_ret_addr() are not safe to be accessed for a
running task.
Notably while rethook_recycle() has some RCU thing on, that objpool
thing (and the recycle name itself) seems to strongly suggest iterating
these things is not sound (you could start with things from this task,
hit a recycled entry and continue iterating rethooks from another task).
Also note that the current check is also racy, nothing really prevents a
wakeup from happening right after you observe task_is_running() being
false. The task can then get scheduled in on another CPU and tear down
its rethooks concurrent with __rethook_find_ret_addr().
Now, livepatch itself calls unwind from a proper context, but unwinds in
general are not. This rethook stuff doesn't seem to be sound in general.
On Mon, Jun 08, 2026 at 11:34:49AM +0200, Peter Zijlstra wrote:
>
> +Live patching folks
>
> On Mon, Jun 08, 2026 at 09:52:37AM +0800, Tengda Wu wrote:
>
> > Background: We are verifying the support of live patches for functions that
> > have a kretprobe. The specific verification method is as follows:
> >
> > We construct a function foo() that calls bar():
> >
> > void bar(void)
> > {
> > for (;;) {
> > schedule();
> > }
> > }
> >
> > void foo(void)
> > {
> > bar();
> > }
> >
> > A kretprobe is attached to bar():
> >
> > echo 'r:rp1 bar' > /sys/kernel/tracing/kprobe_events
> > echo 1 > /sys/kernel/tracing/events/kprobes/rp1/enable
> >
> > Then foo() is triggered. The expected behavior is that bar() will call
> > schedule() and yield the CPU.
> >
> > After that, the live patch is activated to attempt replacing the implementation
> > of foo(). The expectation is that this should succeed.
>
> This wholly depends on how foo() calls bar(), if it is a normal call,
> then no, it should not succeed, because foo() is still on the stack.
>
> If it is a tail-call, then yes, because foo() is no longer relevant.
>
> > However, in reality, because the task that called schedule() is still in the
> > RUNNING state,
>
> So calling schedule() without setting state is dodgy in the first place.
> Who is doing this? All wait primitives will set this to
> TASK_UNINTERRUPTIBLE or something along those lines.
>
> > the condition task_is_running(tsk) inside rethook_find_ret_addr()
> > is not satisfied, causing the function to return early. This, in turn,
> > prevents stack_trace_save_tsk_reliable() from determining the stack as
> > reliable, leading to a failure in activating the live patch.
> >
> > **Not sure if this is correct:**
> >
> > We believe that after a task voluntarily calls schedule(), when the stack
> > is expected to be reliable, it is a safe time to activate a live patch.
>
> Calling schedule() without setting state is a no-op and really shouldn't
> count much at all.
>
> > Additionally, a similar tsk->on_cpu check can be found elsewhere in the
> > kernel (See task_on_another_cpu() in arch/x86/include/asm/unwind.h).
> > Therefore, we propose changing the task_is_running(tsk) condition to
> > tsk->on_cpu.
>
> Anyway, I'm wondering what the purpose of this check here is, there is
> no real comment, and commit 5120d167e21c ("rethook: Remove warning
> messages printed for finding return address of a frame.") is just pure
> voodoo as well.
FWIW, you should have had this discussion then.
> Also, note the comment that goes with the usage of
> task_on_another_cpu(); that thing is racy as all heck.
>
> So it really comes down to what the purpose of this check is.
>
> I suspect the issue at hand is that tsk->rethook elements, such as
> iterated by __rethook_find_ret_addr() are not safe to be accessed for a
> running task.
>
> Notably while rethook_recycle() has some RCU thing on, that objpool
> thing (and the recycle name itself) seems to strongly suggest iterating
> these things is not sound (you could start with things from this task,
> hit a recycled entry and continue iterating rethooks from another task).
>
> Also note that the current check is also racy, nothing really prevents a
> wakeup from happening right after you observe task_is_running() being
> false. The task can then get scheduled in on another CPU and tear down
> its rethooks concurrent with __rethook_find_ret_addr().
>
>
> Now, livepatch itself calls unwind from a proper context, but unwinds in
> general are not. This rethook stuff doesn't seem to be sound in general.
I suspect just entirely removing the check is the sanest option at this
point. Callers that do it right (livepatch) are guaranteed consistent
data, and the rest gets whatever pieces.
Notably, unwind_next() holds rcu, so the iteration is protected from any
of those rethook_node things getting freed. Its just that the iteration
can go sideways and you might not get a sane answer.
The very worst possible option is getting stuck in an infinite loop when
concurrent with agressive rethook re-use or something daft like that,
but that seems extremely unlikely.
On Mon, 8 Jun 2026 12:23:26 +0200
Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Jun 08, 2026 at 11:34:49AM +0200, Peter Zijlstra wrote:
> >
> > +Live patching folks
> >
> > On Mon, Jun 08, 2026 at 09:52:37AM +0800, Tengda Wu wrote:
> >
> > > Background: We are verifying the support of live patches for functions that
> > > have a kretprobe. The specific verification method is as follows:
> > >
> > > We construct a function foo() that calls bar():
> > >
> > > void bar(void)
> > > {
> > > for (;;) {
> > > schedule();
> > > }
> > > }
> > >
> > > void foo(void)
> > > {
> > > bar();
> > > }
> > >
> > > A kretprobe is attached to bar():
> > >
> > > echo 'r:rp1 bar' > /sys/kernel/tracing/kprobe_events
> > > echo 1 > /sys/kernel/tracing/events/kprobes/rp1/enable
> > >
> > > Then foo() is triggered. The expected behavior is that bar() will call
> > > schedule() and yield the CPU.
> > >
> > > After that, the live patch is activated to attempt replacing the implementation
> > > of foo(). The expectation is that this should succeed.
> >
> > This wholly depends on how foo() calls bar(), if it is a normal call,
> > then no, it should not succeed, because foo() is still on the stack.
> >
> > If it is a tail-call, then yes, because foo() is no longer relevant.
> >
> > > However, in reality, because the task that called schedule() is still in the
> > > RUNNING state,
> >
> > So calling schedule() without setting state is dodgy in the first place.
> > Who is doing this? All wait primitives will set this to
> > TASK_UNINTERRUPTIBLE or something along those lines.
> >
> > > the condition task_is_running(tsk) inside rethook_find_ret_addr()
> > > is not satisfied, causing the function to return early. This, in turn,
> > > prevents stack_trace_save_tsk_reliable() from determining the stack as
> > > reliable, leading to a failure in activating the live patch.
> > >
> > > **Not sure if this is correct:**
> > >
> > > We believe that after a task voluntarily calls schedule(), when the stack
> > > is expected to be reliable, it is a safe time to activate a live patch.
> >
> > Calling schedule() without setting state is a no-op and really shouldn't
> > count much at all.
> >
> > > Additionally, a similar tsk->on_cpu check can be found elsewhere in the
> > > kernel (See task_on_another_cpu() in arch/x86/include/asm/unwind.h).
> > > Therefore, we propose changing the task_is_running(tsk) condition to
> > > tsk->on_cpu.
> >
> > Anyway, I'm wondering what the purpose of this check here is, there is
> > no real comment, and commit 5120d167e21c ("rethook: Remove warning
> > messages printed for finding return address of a frame.") is just pure
> > voodoo as well.
>
> FWIW, you should have had this discussion then.
Indeed. The rethook is making a shadow stack by list, thus caller must
guarantee the target process is blocked at least during this function.
The commit messages suggest that when BPF takes a backtrace, it also
includes other running tasks. Is that safe?
>
> > Also, note the comment that goes with the usage of
> > task_on_another_cpu(); that thing is racy as all heck.
> >
> > So it really comes down to what the purpose of this check is.
This check has been introduced when it is copied from
kretprobe_find_ret_addr(). It has the comment:
* The @tsk must be 'current' or a task which is not running. @fp is a hint
IIRC, I added this check to explicitly verify this condition.
> >
> > I suspect the issue at hand is that tsk->rethook elements, such as
> > iterated by __rethook_find_ret_addr() are not safe to be accessed for a
> > running task.
> >
> > Notably while rethook_recycle() has some RCU thing on, that objpool
> > thing (and the recycle name itself) seems to strongly suggest iterating
> > these things is not sound (you could start with things from this task,
> > hit a recycled entry and continue iterating rethooks from another task).
> >
> > Also note that the current check is also racy, nothing really prevents a
> > wakeup from happening right after you observe task_is_running() being
> > false. The task can then get scheduled in on another CPU and tear down
> > its rethooks concurrent with __rethook_find_ret_addr().
Yeah, but is there any way to ensure the task is blocked? Even if it is
blocked, like TASK_UNINTERRUPTIBLE, unless holding the actual lock in
the rethook, it may not be possible to ensure it?
Of course, we could give up on checking within this function and leave
everything to the caller to guarantee - as kretprobe does.
BTW, the reason why we made it possible to pass tasks other than current
is that the stack unwinding code itself supported unwinding tasks other
than current, so we had no choice but to create this interface.
However, it is a bad idea to check this in deep inside of unwinding.
> > Now, livepatch itself calls unwind from a proper context, but unwinds in
> > general are not. This rethook stuff doesn't seem to be sound in general.
>
> I suspect just entirely removing the check is the sanest option at this
> point. Callers that do it right (livepatch) are guaranteed consistent
> data, and the rest gets whatever pieces.
Agreed.
>
> Notably, unwind_next() holds rcu, so the iteration is protected from any
> of those rethook_node things getting freed. Its just that the iteration
> can go sideways and you might not get a sane answer.
>
> The very worst possible option is getting stuck in an infinite loop when
> concurrent with agressive rethook re-use or something daft like that,
> but that seems extremely unlikely.
OK, thanks for your review!
Tengda, can you send a patch to just remove the check?
Thank you,
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
On Mon, Jun 08, 2026 at 10:08:11PM +0900, Masami Hiramatsu wrote:
> > > Anyway, I'm wondering what the purpose of this check here is, there is
> > > no real comment, and commit 5120d167e21c ("rethook: Remove warning
> > > messages printed for finding return address of a frame.") is just pure
> > > voodoo as well.
> >
> > FWIW, you should have had this discussion then.
>
> Indeed. The rethook is making a shadow stack by list, thus caller must
> guarantee the target process is blocked at least during this function.
>
> The commit messages suggest that when BPF takes a backtrace, it also
> includes other running tasks. Is that safe?
Well, you get to keep the pieces. At this point safe only pertains to
'doesn't-crash', all correctness is out the window.
I always forget the crazy BPF does ;-)
> > > Also, note the comment that goes with the usage of
> > > task_on_another_cpu(); that thing is racy as all heck.
> > >
> > > So it really comes down to what the purpose of this check is.
>
> This check has been introduced when it is copied from
> kretprobe_find_ret_addr(). It has the comment:
>
> * The @tsk must be 'current' or a task which is not running. @fp is a hint
>
> IIRC, I added this check to explicitly verify this condition.
Right, but it is a prescriptive comment, not an explanatory one. That
is, it doesn't explain the condition.
> > > I suspect the issue at hand is that tsk->rethook elements, such as
> > > iterated by __rethook_find_ret_addr() are not safe to be accessed for a
> > > running task.
> > >
> > > Notably while rethook_recycle() has some RCU thing on, that objpool
> > > thing (and the recycle name itself) seems to strongly suggest iterating
> > > these things is not sound (you could start with things from this task,
> > > hit a recycled entry and continue iterating rethooks from another task).
> > >
> > > Also note that the current check is also racy, nothing really prevents a
> > > wakeup from happening right after you observe task_is_running() being
> > > false. The task can then get scheduled in on another CPU and tear down
> > > its rethooks concurrent with __rethook_find_ret_addr().
>
> Yeah, but is there any way to ensure the task is blocked? Even if it is
> blocked, like TASK_UNINTERRUPTIBLE, unless holding the actual lock in
> the rethook, it may not be possible to ensure it?
>
> Of course, we could give up on checking within this function and leave
> everything to the caller to guarantee - as kretprobe does.
>
> BTW, the reason why we made it possible to pass tasks other than current
> is that the stack unwinding code itself supported unwinding tasks other
> than current, so we had no choice but to create this interface.
>
> However, it is a bad idea to check this in deep inside of unwinding.
This, you cannot take locks in unwinding. The only thing you can do is
try to do the best you can without crashing.
Typically unwind only happens on self -- this is natural, a task crashes
and unwinds itself, or a task does something (takes a lock, hits a
tracepoint, etc) and takes a snapshot of its own stack, and this is
safe.
Things like live-patch use task_call_func(), which ensures the callback
function is done while holding sufficient locks for the task to not
change state.
On Mon, 8 Jun 2026 09:52:37 +0800
Tengda Wu <wutengda@huaweicloud.com> wrote:
>
>
> On 2026/6/5 21:43, Masami Hiramatsu wrote:
> > On Thu, 4 Jun 2026 11:34:45 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> >
> >> On Mon, Jun 01, 2026 at 08:40:01AM +0900, Masami Hiramatsu wrote:
> >>
> >>> Peter, is it OK to drop @rq from task_on_cpu()?
> >>
> >> Sure.
> >>
> >>> Then we can use it from rethook.
> >>
> >> Well, it is in sched/sched.h, which is an internal header, and no you
> >> cannot use that header in rethook.
> >
> > Ah, OK. Hmm, then we should not use it. Maybe ->on_cpu is also internal
> > state?
> >
> >>
> >> But lets step back first, what is the actual problem here, why are we
> >> looking at ->on_cpu at all?
> >
> > Tengda, can you explain it?
> > I think you want to take a stacktrace on !current process, and
> > rethook_find_ret_addr() is rejected i the task is running state.
> >
> > But if you can share actual situation what you need, it is
> > helpful for us to understand.
> >
> > Thank you,
> >
> >
>
>
> Sure.
>
> Background: We are verifying the support of live patches for functions that
> have a kretprobe. The specific verification method is as follows:
>
> We construct a function foo() that calls bar():
>
> void bar(void)
> {
> for (;;) {
> schedule();
> }
> }
>
> void foo(void)
> {
> bar();
> }
>
> A kretprobe is attached to bar():
>
> echo 'r:rp1 bar' > /sys/kernel/tracing/kprobe_events
> echo 1 > /sys/kernel/tracing/events/kprobes/rp1/enable
>
> Then foo() is triggered. The expected behavior is that bar() will call
> schedule() and yield the CPU.
>
> After that, the live patch is activated to attempt replacing the implementation
> of foo(). The expectation is that this should succeed.
>
> However, in reality, because the task that called schedule() is still in the
> RUNNING state, the condition task_is_running(tsk) inside rethook_find_ret_addr()
> is not satisfied, causing the function to return early. This, in turn,
> prevents stack_trace_save_tsk_reliable() from determining the stack as
> reliable, leading to a failure in activating the live patch.
Hmm is the bar() doing infinite loop, or limited loop but take a long time
so just yield a while? Anyway, it seems like a non-good design pattern.
Is it possible to avoid busy loops and instead use Workers, or wait for
something to complete or for input within a loop?
>
> **Not sure if this is correct:**
>
> We believe that after a task voluntarily calls schedule(), when the stack
> is expected to be reliable, it is a safe time to activate a live patch.
In this case, I don't know how to block the loop inside the bar.
Even if !tsk->on_cpu, the tsk can restart running right after checking
the flag.
> Additionally, a similar tsk->on_cpu check can be found elsewhere in the
> kernel (See task_on_another_cpu() in arch/x86/include/asm/unwind.h).
> Therefore, we propose changing the task_is_running(tsk) condition to
> tsk->on_cpu.
Yes, but the caller said there is another check to ensure the race.
/*
* Refuse to unwind the stack of a task while it's executing on another
* CPU. This check is racy, but that's ok: the unwinder has other
* checks to prevent it from going off the rails.
*/
if (task_on_another_cpu(task))
goto err;
Josh, do you know how this avoid the race case?
Thank you,
>
> Thanks,
> Tengda
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
On 2026/6/8 10:56, Masami Hiramatsu wrote:
> On Mon, 8 Jun 2026 09:52:37 +0800
> Tengda Wu <wutengda@huaweicloud.com> wrote:
>
>>
>>
>> On 2026/6/5 21:43, Masami Hiramatsu wrote:
>>> On Thu, 4 Jun 2026 11:34:45 +0200
>>> Peter Zijlstra <peterz@infradead.org> wrote:
>>>
>>>> On Mon, Jun 01, 2026 at 08:40:01AM +0900, Masami Hiramatsu wrote:
>>>>
>>>>> Peter, is it OK to drop @rq from task_on_cpu()?
>>>>
>>>> Sure.
>>>>
>>>>> Then we can use it from rethook.
>>>>
>>>> Well, it is in sched/sched.h, which is an internal header, and no you
>>>> cannot use that header in rethook.
>>>
>>> Ah, OK. Hmm, then we should not use it. Maybe ->on_cpu is also internal
>>> state?
>>>
>>>>
>>>> But lets step back first, what is the actual problem here, why are we
>>>> looking at ->on_cpu at all?
>>>
>>> Tengda, can you explain it?
>>> I think you want to take a stacktrace on !current process, and
>>> rethook_find_ret_addr() is rejected i the task is running state.
>>>
>>> But if you can share actual situation what you need, it is
>>> helpful for us to understand.
>>>
>>> Thank you,
>>>
>>>
>>
>>
>> Sure.
>>
>> Background: We are verifying the support of live patches for functions that
>> have a kretprobe. The specific verification method is as follows:
>>
>> We construct a function foo() that calls bar():
>>
>> void bar(void)
>> {
>> for (;;) {
>> schedule();
>> }
>> }
>>
>> void foo(void)
>> {
>> bar();
>> }
>>
>> A kretprobe is attached to bar():
>>
>> echo 'r:rp1 bar' > /sys/kernel/tracing/kprobe_events
>> echo 1 > /sys/kernel/tracing/events/kprobes/rp1/enable
>>
>> Then foo() is triggered. The expected behavior is that bar() will call
>> schedule() and yield the CPU.
>>
>> After that, the live patch is activated to attempt replacing the implementation
>> of foo(). The expectation is that this should succeed.
>>
>> However, in reality, because the task that called schedule() is still in the
>> RUNNING state, the condition task_is_running(tsk) inside rethook_find_ret_addr()
>> is not satisfied, causing the function to return early. This, in turn,
>> prevents stack_trace_save_tsk_reliable() from determining the stack as
>> reliable, leading to a failure in activating the live patch.
>
> Hmm is the bar() doing infinite loop, or limited loop but take a long time
> so just yield a while? Anyway, it seems like a non-good design pattern.
> Is it possible to avoid busy loops and instead use Workers, or wait for
> something to complete or for input within a loop?
>
>>
>> **Not sure if this is correct:**
>>
>> We believe that after a task voluntarily calls schedule(), when the stack
>> is expected to be reliable, it is a safe time to activate a live patch.
>
> In this case, I don't know how to block the loop inside the bar.
> Even if !tsk->on_cpu, the tsk can restart running right after checking
> the flag.
>
The infinite loop in bar() is indeed a poor design pattern. This test
case is only artificial, not from real-world code. It is merely
intended to verify live patch support for various cases.
However, the point you raised has indeed made me think. I realize that
checking only tsk->on_cpu is not sufficient -- there is also a race
condition where the task could be scheduled back onto a CPU right after
the check. I need to re-examine the validity of this test case and
whether it represents a safe live patch activation scenario.
Thank you again for your patience and for pointing out these
fundamental issues. Your guidance is much appreciated.
Best regards,
Tengda
On 2026/6/1 7:40, Masami Hiramatsu wrote:
> On Fri, 29 May 2026 11:39:49 +0800
> Tengda Wu <wutengda@huaweicloud.com> wrote:
>
>>> We also should not use tsk->on_cpu, but should use task_on_cpu(tsk).
>>>
>>> BTW, should task_on_cpu() use READ_ONCE() etc?
>>> wait_task_inactive() seems a bit fragile.
>>>
>>> Thanks,
>>>
>>
>> It seems that using task_on_cpu() is not necessary here because:
>>
>> 1. It requires an additional 'rq' parameter not available in the rethook context.
>> 2. It just returns p->on_cpu, which is identical to our current use of tsk->on_cpu.
>
> OK. We may need to cleanup task_on_cpu() to remove unused rq, which
> was historically introduced by commit 029632fbb7b7 ("sched: Make
> separate sched*.c translation units") as a parameter for "task_running()"
> because it called task_current(), but was cleaned by commit 0b9d46fc5ef7
> ("sched: Rename task_running() to task_on_cpu()") and now it is not
> used.
>
>>
>> /* file: kernel/sched/sched.h */
>> static inline int task_on_cpu(struct rq *rq, struct task_struct *p)
>> {
>> return p->on_cpu;
>> }
>>
>> Given these constraints, staying with tsk->on_cpu seems more straightforward
>> for the rethook context.
>
> The reason I asked you to use a wrapper function instead of directly
> accessing that on_cpu is to ensure that this part isn't overlooked when
> changing the scheduler's behavior in the future.
>
> They may be equivalent at this point in time, but that doesn't necessarily
> mean they will remain so in the future.
>
> IMHO, an API interface should represent the behavior and resources
> required for that operation, and when accessing it from outside the
> subsystem, we should use that API whenever possible. Otherwise, even
> if we want to update the internal implementation of one subsystem,
> we will have to make changes to other subsystems as well.
>
> Peter, is it OK to drop @rq from task_on_cpu()? Then we can use
> it from rethook.
>
> Thank you,
>
Thank you for the detailed explanation -- that makes perfect sense.
I looked into task_on_cpu() and saw it has many callers.
Once it's confirmed that the parameter can be safely dropped, I'll
follow up with my patch to use the wrapper instead of directly
accessing on_cpu.
Best,
Tengda
>>>>
>>>> Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
>>>> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
>>>> ---
>>>> kernel/trace/rethook.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
>>>> index 5a8bdf88999a..bd5e5f455e85 100644
>>>> --- a/kernel/trace/rethook.c
>>>> +++ b/kernel/trace/rethook.c
>>>> @@ -250,7 +250,7 @@ unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame
>>>> if (WARN_ON_ONCE(!cur))
>>>> return 0;
>>>>
>>>> - if (tsk != current && task_is_running(tsk))
>>>> + if (tsk != current && tsk->on_cpu)
>>>> return 0;
>>>>
>>>> do {
>>>> --
>>>> 2.34.1
>>>>
>>>>
>>>
>>>
>>
>
>
© 2016 - 2026 Red Hat, Inc.