[v16] perf: Support the deferred unwinding infrastructure

Expand all Fold all

[PATCH v16 0/4] perf: Support the deferred unwinding infrastructure

Posted by Steven Rostedt 4 months ago

This is based on top of tip/perf/core commit: 6d48436560e91be85

Then I added the patches from Peter Zijlstra:

https://lore.kernel.org/all/20250924075948.579302904@infradead.org/

This series implements the perf interface to use deferred user space stack
tracing.

The patches for the user space side should still work with this series:

https://lore.kernel.org/linux-trace-kernel/20250908175319.841517121@kernel.org

Patch 1 updates the deferred unwinding infrastructure. It adds a new
function called: unwind_deferred_task_init(). This is used when a tracer
(perf) only needs to follow a single task. The descriptor returned can
be used the same way as the descriptor returned by unwind_deferred_init(),
but the tracer must only use it on one task at a time.

Patch 2 adds the per task deferred stack traces to perf. It adds a new event
type called PERF_RECORD_CALLCHAIN_DEFERRED that is recorded when a task is
about to go back to user space and happens in a location that pages may be
faulted in. It also adds a new callchain context called
PERF_CONTEXT_USER_DEFERRED that is used as a place holder in a kernel
callchain to append the deferred user space stack trace to.

Patch 3 adds the user stack trace context cookie in the kernel callchain right
after the PERF_CONTEXT_USER_DEFERRED context so that the user space side can
map the request to the deferred user space stack trace.

Patch 4 adds support for the per CPU perf events that will allow the kernel to
associate each of the per CPU perf event buffers to a single application. This
is needed so that when a request for a deferred stack trace happens on a task
that then migrates to another CPU, it will know which CPU buffer to use to
record the stack trace on. It is possible to have more than one perf user tool
running and a request made by one perf tool should have the deferred trace go
to the same perf tool's perf CPU event buffer. A global list of all the
descriptors representing each perf tool that is using deferred stack tracing
is created to manage this.

Changes since v15: https://lore.kernel.org/linux-trace-kernel/20250825180638.877627656@kernel.org/

- The main update was that I moved the code to do single task deferred
stack tracing into the unwind code. That allowed to reuse the code
for tracing all tasks, and simplified the perf code in doing so.

The first patch updates the unwind deferred code to have this
infrastructure. It only added a new function:
unwind_deferred_task_init()
This is the same as unwind_deferred_init() but it is used when the
tracer will only trace a single task. The descriptor returned will
have its own task_work callback it will use and it allows for any
number of callers, not a limited set like the "all task" deferred
unwinding has.

- The new code also removed the need to expose the generation of the
cookie.

Josh Poimboeuf (1):
perf: Support deferred user callchains

Steven Rostedt (3):
unwind: Add interface to allow tracing a single task
perf: Have the deferred request record the user context cookie
perf: Support deferred user callchains for per CPU events

----
include/linux/perf_event.h | 9 +-
include/linux/unwind_deferred.h | 15 ++
include/uapi/linux/perf_event.h | 25 ++-
kernel/bpf/stackmap.c | 4 +-
kernel/events/callchain.c | 14 +-
kernel/events/core.c | 362 +++++++++++++++++++++++++++++++++-
kernel/unwind/deferred.c | 283 ++++++++++++++++++++++----
tools/include/uapi/linux/perf_event.h | 25 ++-
8 files changed, 686 insertions(+), 51 deletions(-)