Make kprobe_multi_link_prog_run() always inline to obtain better
performance. Before this patch, the bench performance is:
./bench trig-kprobe-multi
Setting up benchmark 'trig-kprobe-multi'...
Benchmark 'trig-kprobe-multi' started.
Iter 0 ( 95.485us): hits 62.462M/s ( 62.462M/prod), [...]
Iter 1 (-80.054us): hits 62.486M/s ( 62.486M/prod), [...]
Iter 2 ( 13.572us): hits 62.287M/s ( 62.287M/prod), [...]
Iter 3 ( 76.961us): hits 62.293M/s ( 62.293M/prod), [...]
Iter 4 (-77.698us): hits 62.394M/s ( 62.394M/prod), [...]
Iter 5 (-13.399us): hits 62.319M/s ( 62.319M/prod), [...]
Iter 6 ( 77.573us): hits 62.250M/s ( 62.250M/prod), [...]
Summary: hits 62.338 ± 0.083M/s ( 62.338M/prod)
And after this patch, the performance is:
Iter 0 (454.148us): hits 66.900M/s ( 66.900M/prod), [...]
Iter 1 (-435.540us): hits 68.925M/s ( 68.925M/prod), [...]
Iter 2 ( 8.223us): hits 68.795M/s ( 68.795M/prod), [...]
Iter 3 (-12.347us): hits 68.880M/s ( 68.880M/prod), [...]
Iter 4 ( 2.291us): hits 68.767M/s ( 68.767M/prod), [...]
Iter 5 ( -1.446us): hits 68.756M/s ( 68.756M/prod), [...]
Iter 6 ( 13.882us): hits 68.657M/s ( 68.657M/prod), [...]
Summary: hits 68.792 ± 0.087M/s ( 68.792M/prod)
As we can see, the performance of kprobe-multi increase from 62M/s to
68M/s.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
kernel/trace/bpf_trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index a795f7afbf3d..d57727abaade 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2529,7 +2529,7 @@ static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx)
return run_ctx->entry_ip;
}
-static int
+static __always_inline int
kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
unsigned long entry_ip, struct ftrace_regs *fregs,
bool is_return, void *data)
--
2.52.0