[PATCH] tracing: fgraph: Protect return handler from recursion loop

Masami Hiramatsu (Google) posted 1 patch 1 week, 5 days ago
kernel/trace/fgraph.c |   12 ++++++++++++
1 file changed, 12 insertions(+)
[PATCH] tracing: fgraph: Protect return handler from recursion loop
Posted by Masami Hiramatsu (Google) 1 week, 5 days ago
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

function_graph_enter_regs() prevents itself from recursion by
ftrace_test_recursion_trylock(), but __ftrace_return_to_handler(),
which is called at the exit, does not prevent such recursion.
Therefore, while it can prevent recursive calls from
fgraph_ops::entryfunc(), it is not able to prevent recursive calls
to fgraph from fgraph_ops::retfunc(), resulting in a recursive loop.
This can lead an unexpected recursion bug reported by Menglong.

 is_endbr() is called in __ftrace_return_to_handler -> fprobe_return
  -> kprobe_multi_link_exit_handler -> is_endbr.

To fix this issue, acquire ftrace_test_recursion_trylock() in the
__ftrace_return_to_handler() after unwind the shadow stack to mark
this section must prevent recursive call of fgraph inside user-defined
fgraph_ops::retfunc().

This is essentially a fix to commit 4346ba160409 ("fprobe: Rewrite
fprobe on function-graph tracer"), because before that fgraph was
only used from the function graph tracer. Fprobe allowed user to run
any callbacks from fgraph after that commit.

Reported-by: Menglong Dong <menglong8.dong@gmail.com>
Closes: https://lore.kernel.org/all/20250918120939.1706585-1-dongml2@chinatelecom.cn/
Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 kernel/trace/fgraph.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 1e3b32b1e82c..08dde420635b 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -815,6 +815,7 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
 	unsigned long bitmap;
 	unsigned long ret;
 	int offset;
+	int bit;
 	int i;
 
 	ret_stack = ftrace_pop_return_trace(&trace, &ret, frame_pointer, &offset);
@@ -829,6 +830,15 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
 	if (fregs)
 		ftrace_regs_set_instruction_pointer(fregs, ret);
 
+	bit = ftrace_test_recursion_trylock(trace.func, ret);
+	/*
+	 * This must be succeeded because the entry handler returns before
+	 * modifying the return address if it is nested. Anyway, we need to
+	 * avoid calling user callbacks if it is nested.
+	 */
+	if (WARN_ON_ONCE(bit < 0))
+		goto out;
+
 #ifdef CONFIG_FUNCTION_GRAPH_RETVAL
 	trace.retval = ftrace_regs_get_return_value(fregs);
 #endif
@@ -852,6 +862,8 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
 		}
 	}
 
+	ftrace_test_recursion_unlock(bit);
+out:
 	/*
 	 * The ftrace_graph_return() may still access the current
 	 * ret_stack structure, we need to make sure the update of
Re: [PATCH] tracing: fgraph: Protect return handler from recursion loop
Posted by Steven Rostedt 1 week, 5 days ago
On Fri, 19 Sep 2025 20:57:36 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> function_graph_enter_regs() prevents itself from recursion by
> ftrace_test_recursion_trylock(), but __ftrace_return_to_handler(),
> which is called at the exit, does not prevent such recursion.
> Therefore, while it can prevent recursive calls from
> fgraph_ops::entryfunc(), it is not able to prevent recursive calls
> to fgraph from fgraph_ops::retfunc(), resulting in a recursive loop.
> This can lead an unexpected recursion bug reported by Menglong.
> 
>  is_endbr() is called in __ftrace_return_to_handler -> fprobe_return
>   -> kprobe_multi_link_exit_handler -> is_endbr.  

So basically its if the handler for the return part calls something that it
is tracing, it can trigger the recursion?

> 
> To fix this issue, acquire ftrace_test_recursion_trylock() in the
> __ftrace_return_to_handler() after unwind the shadow stack to mark
> this section must prevent recursive call of fgraph inside user-defined
> fgraph_ops::retfunc().
> 
> This is essentially a fix to commit 4346ba160409 ("fprobe: Rewrite
> fprobe on function-graph tracer"), because before that fgraph was
> only used from the function graph tracer. Fprobe allowed user to run
> any callbacks from fgraph after that commit.

I would actually say it's because before this commit, the return handler
callers never called anything that the entry handlers didn't already call.
If there was recursion, the entry handler would catch it (and the entry
tells fgraph if the exit handler should be called).

The difference here is with fprobes, you can have the exit handler calling
functions that the entry handler does not, which exposes more cases where
recursion could happen.

> 
> Reported-by: Menglong Dong <menglong8.dong@gmail.com>
> Closes: https://lore.kernel.org/all/20250918120939.1706585-1-dongml2@chinatelecom.cn/
> Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
> Cc: stable@vger.kernel.org
> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> ---
>  kernel/trace/fgraph.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> index 1e3b32b1e82c..08dde420635b 100644
> --- a/kernel/trace/fgraph.c
> +++ b/kernel/trace/fgraph.c
> @@ -815,6 +815,7 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
>  	unsigned long bitmap;
>  	unsigned long ret;
>  	int offset;
> +	int bit;
>  	int i;
>  
>  	ret_stack = ftrace_pop_return_trace(&trace, &ret, frame_pointer, &offset);
> @@ -829,6 +830,15 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
>  	if (fregs)
>  		ftrace_regs_set_instruction_pointer(fregs, ret);
>  
> +	bit = ftrace_test_recursion_trylock(trace.func, ret);
> +	/*
> +	 * This must be succeeded because the entry handler returns before
> +	 * modifying the return address if it is nested. Anyway, we need to
> +	 * avoid calling user callbacks if it is nested.
> +	 */
> +	if (WARN_ON_ONCE(bit < 0))

I'm not so sure we need the warn on here. We should probably hook it to the
recursion detection infrastructure that the function tracer has.

The reason I would say not to have the warn on, is because we don't have a
warn on for recursion happening at the entry handler. Because this now is
exposed by fprobe allowing different routines to be called at exit than
what is used in entry, it can easily be triggered.

-- Steve



> +		goto out;
> +
>  #ifdef CONFIG_FUNCTION_GRAPH_RETVAL
>  	trace.retval = ftrace_regs_get_return_value(fregs);
>  #endif
> @@ -852,6 +862,8 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
>  		}
>  	}
>  
> +	ftrace_test_recursion_unlock(bit);
> +out:
>  	/*
>  	 * The ftrace_graph_return() may still access the current
>  	 * ret_stack structure, we need to make sure the update of
Re: [PATCH] tracing: fgraph: Protect return handler from recursion loop
Posted by Masami Hiramatsu (Google) 1 week, 4 days ago
On Fri, 19 Sep 2025 11:27:46 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 19 Sep 2025 20:57:36 +0900
> "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> 
> > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > 
> > function_graph_enter_regs() prevents itself from recursion by
> > ftrace_test_recursion_trylock(), but __ftrace_return_to_handler(),
> > which is called at the exit, does not prevent such recursion.
> > Therefore, while it can prevent recursive calls from
> > fgraph_ops::entryfunc(), it is not able to prevent recursive calls
> > to fgraph from fgraph_ops::retfunc(), resulting in a recursive loop.
> > This can lead an unexpected recursion bug reported by Menglong.
> > 
> >  is_endbr() is called in __ftrace_return_to_handler -> fprobe_return
> >   -> kprobe_multi_link_exit_handler -> is_endbr.  
> 
> So basically its if the handler for the return part calls something that it
> is tracing, it can trigger the recursion?
> 
> > 
> > To fix this issue, acquire ftrace_test_recursion_trylock() in the
> > __ftrace_return_to_handler() after unwind the shadow stack to mark
> > this section must prevent recursive call of fgraph inside user-defined
> > fgraph_ops::retfunc().
> > 
> > This is essentially a fix to commit 4346ba160409 ("fprobe: Rewrite
> > fprobe on function-graph tracer"), because before that fgraph was
> > only used from the function graph tracer. Fprobe allowed user to run
> > any callbacks from fgraph after that commit.
> 
> I would actually say it's because before this commit, the return handler
> callers never called anything that the entry handlers didn't already call.
> If there was recursion, the entry handler would catch it (and the entry
> tells fgraph if the exit handler should be called).
> 
> The difference here is with fprobes, you can have the exit handler calling
> functions that the entry handler does not, which exposes more cases where
> recursion could happen.
> 
> > 
> > Reported-by: Menglong Dong <menglong8.dong@gmail.com>
> > Closes: https://lore.kernel.org/all/20250918120939.1706585-1-dongml2@chinatelecom.cn/
> > Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > ---
> >  kernel/trace/fgraph.c |   12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> > index 1e3b32b1e82c..08dde420635b 100644
> > --- a/kernel/trace/fgraph.c
> > +++ b/kernel/trace/fgraph.c
> > @@ -815,6 +815,7 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
> >  	unsigned long bitmap;
> >  	unsigned long ret;
> >  	int offset;
> > +	int bit;
> >  	int i;
> >  
> >  	ret_stack = ftrace_pop_return_trace(&trace, &ret, frame_pointer, &offset);
> > @@ -829,6 +830,15 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
> >  	if (fregs)
> >  		ftrace_regs_set_instruction_pointer(fregs, ret);
> >  
> > +	bit = ftrace_test_recursion_trylock(trace.func, ret);
> > +	/*
> > +	 * This must be succeeded because the entry handler returns before
> > +	 * modifying the return address if it is nested. Anyway, we need to
> > +	 * avoid calling user callbacks if it is nested.
> > +	 */
> > +	if (WARN_ON_ONCE(bit < 0))
> 
> I'm not so sure we need the warn on here. We should probably hook it to the
> recursion detection infrastructure that the function tracer has.

The reason I added WARN_ON here is this should not happen. The recursion
should be detected at the entry, not at exit. The __ftrace_return_to_handler()
is installed only if the entry does NOT detect the recursion. In that case
I think the exit should succeed recursion_trylock().

> 
> The reason I would say not to have the warn on, is because we don't have a
> warn on for recursion happening at the entry handler. Because this now is
> exposed by fprobe allowing different routines to be called at exit than
> what is used in entry, it can easily be triggered.

At the entry, if it detect recursion, it exits soon. But here we have to
process stack unwind to get the return address. This recursion_trylock()
is to mark this is in the critical section, not detect it.

Thank you,

> 
> -- Steve
> 
> 
> 
> > +		goto out;
> > +
> >  #ifdef CONFIG_FUNCTION_GRAPH_RETVAL
> >  	trace.retval = ftrace_regs_get_return_value(fregs);
> >  #endif
> > @@ -852,6 +862,8 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
> >  		}
> >  	}
> >  
> > +	ftrace_test_recursion_unlock(bit);
> > +out:
> >  	/*
> >  	 * The ftrace_graph_return() may still access the current
> >  	 * ret_stack structure, we need to make sure the update of
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>
Re: [PATCH] tracing: fgraph: Protect return handler from recursion loop
Posted by Steven Rostedt 1 week, 3 days ago
On Sun, 21 Sep 2025 13:05:19 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

>  
> > The reason I would say not to have the warn on, is because we don't have a
> > warn on for recursion happening at the entry handler. Because this now is
> > exposed by fprobe allowing different routines to be called at exit than
> > what is used in entry, it can easily be triggered.  
> 
> At the entry, if it detect recursion, it exits soon. But here we have to
> process stack unwind to get the return address. This recursion_trylock()
> is to mark this is in the critical section, not detect it.

Ah, because the first instance of the exit callback sets the recursion
bit. This will cause recursed entry calls to detect the recursion bit
and return without setting the exit handler to be called.

That is, by setting the recursion bit in the exit handler, it will cause
a recursion in entry to fail before the exit is called again.

I'd like to update the comment:

+	bit = ftrace_test_recursion_trylock(trace.func, ret);
+	/*
+	 * This must be succeeded because the entry handler returns before
+	 * modifying the return address if it is nested. Anyway, we need to
+	 * avoid calling user callbacks if it is nested.
+	 */
+	if (WARN_ON_ONCE(bit < 0))
+		goto out;
+

to:

	/*
	 * Setting the recursion bit here will cause the graph entry to
	 * detect recursion before the exit handle will. If the ext
	 * handler detects recursion, something went wrong.
	 */
	if (WARN_ON_ONCE(bit < 0))

-- Steve