[v1] bpf trampoline support "jmp" mode

[PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode

Posted by Menglong Dong 2 months, 3 weeks ago

For now, the bpf trampoline is called by the "call" instruction. However,
it break the RSB and introduce extra overhead in x86_64 arch.

For example, we hook the function "foo" with fexit, the call and return
logic will be like this:
  call foo -> call trampoline -> call foo-body ->
  return foo-body -> return foo

As we can see above, there are 3 call, but 2 return, which break the RSB
balance. We can pseudo a "return" here, but it's not the best choice,
as it will still cause once RSB miss:
  call foo -> call trampoline -> call foo-body ->
  return foo-body -> return dummy -> return foo

The "return dummy" doesn't pair the "call trampoline", which can also
cause the RSB miss.

Therefore, we introduce the "jmp" mode for bpf trampoline, as advised by
Alexei in [1]. And the logic will become this:
  call foo -> jmp trampoline -> call foo-body ->
  return foo-body -> return foo

As we can see above, the RSB is totally balanced. After the modification,
the performance of fexit increases from 76M/s to 130M/s.

In this series, we introduce the FTRACE_OPS_FL_JMP for ftrace to make it
use the "jmp" instruction instead of "call".

And we introduce the bpf_arch_text_poke_type(), which is able to specify
both the current and new opcode.

Not sure if I should split the first 2 patches into a separate series and
send to the ftrace tree.

Link: https://lore.kernel.org/bpf/CAADnVQLX54sVi1oaHrkSiLqjJaJdm3TQjoVrgU-LZimK6iDcSA@mail.gmail.com/[1]
Menglong Dong (7):
  ftrace: introduce FTRACE_OPS_FL_JMP
  x86/ftrace: implement DYNAMIC_FTRACE_WITH_JMP
  bpf: fix the usage of BPF_TRAMP_F_SKIP_FRAME
  bpf,x86: adjust the "jmp" mode for bpf trampoline
  bpf: introduce bpf_arch_text_poke_type
  bpf,x86: implement bpf_arch_text_poke_type for x86_64
  bpf: implement "jmp" mode for trampoline

 arch/riscv/net/bpf_jit_comp64.c |  2 +-
 arch/x86/Kconfig                |  1 +
 arch/x86/kernel/ftrace.c        |  7 ++++-
 arch/x86/kernel/ftrace_64.S     | 12 +++++++-
 arch/x86/net/bpf_jit_comp.c     | 45 ++++++++++++++++++++--------
 include/linux/bpf.h             | 22 ++++++++++++++
 include/linux/ftrace.h          | 48 +++++++++++++++++++++++++++++
 kernel/bpf/core.c               | 10 +++++++
 kernel/bpf/trampoline.c         | 53 +++++++++++++++++++++++++++------
 kernel/trace/Kconfig            | 12 ++++++++
 kernel/trace/ftrace.c           |  9 +++++-
 11 files changed, 195 insertions(+), 26 deletions(-)

-- 
2.51.2

Re: [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode

Posted by Steven Rostedt 2 months, 3 weeks ago

On Fri, 14 Nov 2025 17:24:43 +0800
Menglong Dong <menglong8.dong@gmail.com> wrote:

> Therefore, we introduce the "jmp" mode for bpf trampoline, as advised by
> Alexei in [1]. And the logic will become this:
>   call foo -> jmp trampoline -> call foo-body ->
>   return foo-body -> return foo

This obviously only works when there's a single function used by that
trampoline. It also doesn't allow tracing of the return side (it's
basically just the function tracer for a single function).

Is there any mechanism to make sure that the trampoline being called is
only used by that one function? I haven't looked at the code yet, but
should there be a test that makes sure a trampoline isn't registered for
two or more different functions?

-- Steve

Re: [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode

Posted by Menglong Dong 2 months, 3 weeks ago

On 2025/11/14 21:38, Steven Rostedt wrote:
> On Fri, 14 Nov 2025 17:24:43 +0800
> Menglong Dong <menglong8.dong@gmail.com> wrote:
> 
> > Therefore, we introduce the "jmp" mode for bpf trampoline, as advised by
> > Alexei in [1]. And the logic will become this:
> >   call foo -> jmp trampoline -> call foo-body ->
> >   return foo-body -> return foo
> 
> This obviously only works when there's a single function used by that
> trampoline. It also doesn't allow tracing of the return side (it's
> basically just the function tracer for a single function).

Hi, Steven. I think you misunderstand something? For the fentry/fexit,
the whole process is:

call foo -> jmp trampoline -> call all the fentry bpf progs ->
call foo-body -> return foo-body -> call all the fexit bpf progs
-> return foo.

The "call foo-body" means "origin call", and it will store the
return value of the traced function to the stack, therefore the
fexit progs can get it.

So it can trace the return side with the "fexit". And it's almost the
same as the origin logic of the bpf trampoline:

call foo -> call trampoline -> call all the fentry bpf progs ->
call foo-body -> return foo-body -> call all the fexit bpf progs
-> skip the rip -> return foo.

What I did here is just replace the "call trampoline" to
"jmp trampoline".

> 
> Is there any mechanism to make sure that the trampoline being called is
> only used by that one function? I haven't looked at the code yet, but
> should there be a test that makes sure a trampoline isn't registered for
> two or more different functions?

As for now, the bpf trampoline is per-function. Every trampoline
has a unique key, and we find the trampoline for the target function
by that key. So it can't be used by two or more different functions.

If the trampoline need to get the ip of the origin call from the stack,
such as BPF_TRAMP_F_SHARE_IPMODIFY case, we will fallback to the
"call" mode, as we can't get the rip from the stack in the "jmp" mode.
And I think this is what you mean "only work for a single function"?
Yeah, we fallback on such case.

Thanks!
Menglong Dong

> 
> -- Steve
> 
>

Re: [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode

Posted by Steven Rostedt 2 months, 3 weeks ago

On Fri, 14 Nov 2025 21:58:34 +0800
Menglong Dong <menglong.dong@linux.dev> wrote:

> On 2025/11/14 21:38, Steven Rostedt wrote:
> > On Fri, 14 Nov 2025 17:24:43 +0800
> > Menglong Dong <menglong8.dong@gmail.com> wrote:
> >   
> > > Therefore, we introduce the "jmp" mode for bpf trampoline, as advised by
> > > Alexei in [1]. And the logic will become this:
> > >   call foo -> jmp trampoline -> call foo-body ->
> > >   return foo-body -> return foo  
> > 
> > This obviously only works when there's a single function used by that
> > trampoline. It also doesn't allow tracing of the return side (it's
> > basically just the function tracer for a single function).  
> 
> Hi, Steven. I think you misunderstand something? For the fentry/fexit,
> the whole process is:

Yeah, I got a bit confused by the notation above.

> 
> call foo -> jmp trampoline -> call all the fentry bpf progs ->
> call foo-body -> return foo-body -> call all the fexit bpf progs
> -> return foo.  
> 
> The "call foo-body" means "origin call", and it will store the
> return value of the traced function to the stack, therefore the
> fexit progs can get it.
> 
> So it can trace the return side with the "fexit". And it's almost the
> same as the origin logic of the bpf trampoline:

OK, so this is just the way it always works.

> 
> call foo -> call trampoline -> call all the fentry bpf progs ->
> call foo-body -> return foo-body -> call all the fexit bpf progs
> -> skip the rip -> return foo.  
> 
> What I did here is just replace the "call trampoline" to
> "jmp trampoline".
> 
> > 
> > Is there any mechanism to make sure that the trampoline being called is
> > only used by that one function? I haven't looked at the code yet, but
> > should there be a test that makes sure a trampoline isn't registered for
> > two or more different functions?  
> 
> As for now, the bpf trampoline is per-function. Every trampoline
> has a unique key, and we find the trampoline for the target function
> by that key. So it can't be used by two or more different functions.
> 
> If the trampoline need to get the ip of the origin call from the stack,
> such as BPF_TRAMP_F_SHARE_IPMODIFY case, we will fallback to the
> "call" mode, as we can't get the rip from the stack in the "jmp" mode.
> And I think this is what you mean "only work for a single function"?
> Yeah, we fallback on such case.


OK, I got lost in the notation. It doesn't need a "call" because each
trampoline is only for a single function. Hence it doesn't need to know the
return address.

-- Steve