[PATCH 0/1] tracing: warn in trace_event_dyn_put_ref

Krister Johansen posted 1 patch 3 years, 8 months ago
kernel/trace/trace_event_perf.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
[PATCH 0/1] tracing: warn in trace_event_dyn_put_ref
Posted by Krister Johansen 3 years, 8 months ago
Hi Steven,
I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref.

The exact message is:

WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 trace_event_dyn_put_ref+0x15/0x20

With the following stacktrace:

 perf_trace_init+0x8f/0xd0
 perf_tp_event_init+0x1f/0x40
 perf_try_init_event+0x4a/0x130
 perf_event_alloc+0x497/0xf40
 __do_sys_perf_event_open+0x1d4/0xf70
 __x64_sys_perf_event_open+0x20/0x30
 do_syscall_64+0x5c/0xc0
 entry_SYSCALL_64_after_hwframe+0x44/0xae

I've debugged this and think that it's caused by changes in 5.15 that
introduced a refcounter for dynamic events.

Specifically, perf_trace_init() manages the refcount for the events it
finds in the ftrace_events list.  However, perf_trace_event_init() ->
perf_trace_event_unreg() can decrement this count unexpectedly, if
perf_trace_event_open() encounters an error.

I'm able to reproduce reliably with the following shell script.  This is
from an Ubuntu 20.04 userland.  If the uprobes in the script don't line
up with exactly what's on your system, any should do.

--- >8 cut here 8< ---

#!/usr/bin/bash

# A simple-ish reproducer for trace_event_dyn_put_ref() WARN from
# perf_trace_init().  Triggering an -EINTR during uprobe attach is enough to
# land in the error case that does a double-decrement.

cleanup() {
        rm -f perf.data
        jobs -p | xargs -r kill -9 2>/dev/null
        perf probe -d '*'
        exit
}

setup() {
        perf probe -d '*' 2>/dev/null
        perf probe -x /proc/1/root/lib/x86_64-linux-gnu/libc.so.6 getaddrinfo
        perf probe -x /proc/1/root/lib/x86_64-linux-gnu/libc.so.6 getaddrinfo%return
        perf probe -x /proc/1/root/lib/x86_64-linux-gnu/libc.so.6 backtrace
        perf probe -x /proc/1/root/lib/x86_64-linux-gnu/libaudit.so.1 audit_log_acct_message
        perf probe -x /proc/1/root/lib/x86_64-linux-gnu/libaudit.so.1 audit_log_acct_message%return

}

setup

trap cleanup INT TERM

while true; do

  perf record -e 'task:task_newtask' \
    -e 'sched:sched_process_exit' \
    -e 'raw_syscalls:sys_enter' \
    -e 'raw_syscalls:sys_exit' \
    -e 'probe_libaudit:audit_log_acct_message' \
    -e 'probe_libaudit:audit_log_acct_message__return' \
    -e 'probe_libc:backtrace' \
    -e 'probe_libc:getaddrinfo' \
    -e 'probe_libc:getaddrinfo__return' \
    -a &
  sleep "0.$RANDOM"
  jobs -p | xargs -r kill -9
  rm -f perf.data

done

--- >8 cut here 8< ---

I'm sending along a patch that, when applied, does not warn when run
with the script above.  I've tested kprobes and uprobes with it a bit,
and did not observe any obvious regressions.  I don't have the full
battery of tests that you do, though.  However, if you do have something
else I should test against prior to submission, I'd be happy to do so.

That patch is in the e-mail that follows, and its commit message should
explain the approach to the solution, hopefully to your satisfaction.

Thanks,

-K


Krister Johansen (1):
  tracing: fix a WARN from trace_event_dyn_put_ref

 kernel/trace/trace_event_perf.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

-- 
2.25.1