[PATCH 0/9] ftrace,bpf: Use single direct ops for bpf trampolines

Jiri Olsa posted 9 patches 1 week, 1 day ago
arch/x86/Kconfig              |   1 +
include/linux/bpf.h           |   7 +-
include/linux/ftrace.h        |  48 ++++++++++---
kernel/bpf/trampoline.c       | 128 +++++++++++++++++++++++++--------
kernel/trace/Kconfig          |   3 +
kernel/trace/ftrace.c         | 477 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------------
kernel/trace/trace.h          |   8 ---
kernel/trace/trace_selftest.c |   5 +-
8 files changed, 447 insertions(+), 230 deletions(-)
[PATCH 0/9] ftrace,bpf: Use single direct ops for bpf trampolines
Posted by Jiri Olsa 1 week, 1 day ago
hi,
while poking the multi-tracing interface I ended up with just one ftrace_ops
object to attach all trampolines.

This change allows to use less direct API calls during the attachment changes
in the future code, so in effect speeding up the attachment.

In current code we get a speed up from using just a single ftrace_ops object.
I got following speed up when measuring simple attach/detach 300 times [1].

- with current code:

  perf stat -e cycles:k,cycles:u,instructions:u,instructions:k -- ./test_progs -t krava -w0
  #158     krava:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

  Performance counter stats for './test_progs -t krava -w0':

     12,003,420,519      cycles:k                                                       
         63,239,794      cycles:u                                                       
        102,155,625      instructions:u                   #    1.62  insn per cycle     
     11,614,183,764      instructions:k                   #    0.97  insn per cycle     

       35.448142362 seconds time elapsed

        0.011032000 seconds user
        2.478243000 seconds sys


- with the fix:

  perf stat -e cycles:k,cycles:u,instructions:u,instructions:k -- ./test_progs -t krava -w0
  #158     krava:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

  Performance counter stats for './test_progs -t krava -w0':

     14,327,218,656      cycles:k                                                       
         46,285,275      cycles:u                                                       
        102,125,592      instructions:u                   #    2.21  insn per cycle     
     14,620,692,457      instructions:k                   #    1.02  insn per cycle     

        2.860883990 seconds time elapsed

        0.009884000 seconds user
        2.777032000 seconds sys


The speedup seems to be related to the fact that with single ftrace_ops object
we don't call ftrace_shutdown anymore (we use ftrace_update_ops instead) and
we skip the 300 synchronize rcu calls (each ~100ms) at the end of that function.

v1 changes:
- make the change x86 specific, after discussing with Mark options for
  arm64 [Mark]

thanks,
jirka


[1] https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/commit/?h=test&id=1b7fc74c36a93e90816f58c37a84522f0949095a
---
Jiri Olsa (9):
      ftrace: Make alloc_and_copy_ftrace_hash direct friendly
      ftrace: Add register_ftrace_direct_hash function
      ftrace: Add unregister_ftrace_direct_hash function
      ftrace: Add modify_ftrace_direct_hash function
      ftrace: Export some of hash related functions
      ftrace: Use direct hash interface in direct functions
      bpf: Add trampoline ip hash table
      ftrace: Factor ftrace_ops ops_func interface
      bpf, x86: Use single ftrace_ops for direct calls

 arch/x86/Kconfig              |   1 +
 include/linux/bpf.h           |   7 +-
 include/linux/ftrace.h        |  48 ++++++++++---
 kernel/bpf/trampoline.c       | 128 +++++++++++++++++++++++++--------
 kernel/trace/Kconfig          |   3 +
 kernel/trace/ftrace.c         | 477 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------------
 kernel/trace/trace.h          |   8 ---
 kernel/trace/trace_selftest.c |   5 +-
 8 files changed, 447 insertions(+), 230 deletions(-)