[RESEND][PATCH v15 0/4] perf: Support the deferred unwinding infrastructure

Steven Rostedt posted 4 patches 5 months ago
There is a newer version of this series
include/linux/perf_event.h            |  11 +-
include/linux/unwind_deferred.h       |   5 +
include/uapi/linux/perf_event.h       |  25 +-
kernel/bpf/stackmap.c                 |   4 +-
kernel/events/callchain.c             |  14 +-
kernel/events/core.c                  | 421 +++++++++++++++++++++++++++++++++-
kernel/unwind/deferred.c              |  21 ++
tools/include/uapi/linux/perf_event.h |  25 +-
8 files changed, 518 insertions(+), 8 deletions(-)
[RESEND][PATCH v15 0/4] perf: Support the deferred unwinding infrastructure
Posted by Steven Rostedt 5 months ago
[
  This is simply a resend of version 15 of this patch series
  but with only the kernel changes. I'm separating out the user space
  changes to their own series.
  The original v15 is here:
    https://lore.kernel.org/linux-trace-kernel/20250825180638.877627656@kernel.org/
]

This patch set is based off of perf/core of the tip tree:
  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git

To run this series, you can checkout this repo that has this series as well as the above:

  git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git  unwind/perf-test

This series implements the perf interface to use deferred user space stack
tracing.

Patch 1 adds a new API interface to the user unwinder logic to allow perf to
get the current context cookie for it's task event tracing. Perf's task event
tracing maps a single task per perf event buffer and it follows the task
around, so it only needs to implement its own task_work to do the deferred
stack trace. Because it can still suffer not knowing which user stack trace
belongs to which kernel stack due to dropped events, having the cookie to
create a unique identifier for each user space stack trace to know which
kernel stack to append it to is useful.

Patch 2 adds the per task deferred stack traces to perf. It adds a new event
type called PERF_RECORD_CALLCHAIN_DEFERRED that is recorded when a task is
about to go back to user space and happens in a location that pages may be
faulted in. It also adds a new callchain context called PERF_CONTEXT_USER_DEFERRED
that is used as a place holder in a kernel callchain to append the deferred
user space stack trace to.

Patch 3 adds the user stack trace context cookie in the kernel callchain right
after the PERF_CONTEXT_USER_DEFERRED context so that the user space side can
map the request to the deferred user space stack trace.

Patch 4 adds support for the per CPU perf events that will allow the kernel to
associate each of the per CPU perf event buffers to a single application. This
is needed so that when a request for a deferred stack trace happens on a task
that then migrates to another CPU, it will know which CPU buffer to use to
record the stack trace on. It is possible to have more than one perf user tool
running and a request made by one perf tool should have the deferred trace go
to the same perf tool's perf CPU event buffer. A global list of all the
descriptors representing each perf tool that is using deferred stack tracing
is created to manage this.


Josh Poimboeuf (1):
      perf: Support deferred user callchains

Steven Rostedt (3):
      unwind deferred: Add unwind_user_get_cookie() API
      perf: Have the deferred request record the user context cookie
      perf: Support deferred user callchains for per CPU events

----
 include/linux/perf_event.h            |  11 +-
 include/linux/unwind_deferred.h       |   5 +
 include/uapi/linux/perf_event.h       |  25 +-
 kernel/bpf/stackmap.c                 |   4 +-
 kernel/events/callchain.c             |  14 +-
 kernel/events/core.c                  | 421 +++++++++++++++++++++++++++++++++-
 kernel/unwind/deferred.c              |  21 ++
 tools/include/uapi/linux/perf_event.h |  25 +-
 8 files changed, 518 insertions(+), 8 deletions(-)
Re: [RESEND][PATCH v15 0/4] perf: Support the deferred unwinding infrastructure
Posted by Peter Zijlstra 4 months, 3 weeks ago

So I started looking at this, but given I never seen the deferred unwind
bits that got merged I have to look at that first.

Headers want something like so.. Let me read the rest.

---
 include/linux/unwind_deferred.h       | 38 +++++++++++++++++++----------------
 include/linux/unwind_deferred_types.h |  2 ++
 2 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferred.h
index 26122d00708a..5d51a3f2f8ec 100644
--- a/include/linux/unwind_deferred.h
+++ b/include/linux/unwind_deferred.h
@@ -8,7 +8,8 @@
 
 struct unwind_work;
 
-typedef void (*unwind_callback_t)(struct unwind_work *work, struct unwind_stacktrace *trace, u64 cookie);
+typedef void (*unwind_callback_t)(struct unwind_work *work,
+				  struct unwind_stacktrace *trace, u64 cookie);
 
 struct unwind_work {
 	struct list_head		list;
@@ -44,22 +45,22 @@ void unwind_deferred_task_exit(struct task_struct *task);
 static __always_inline void unwind_reset_info(void)
 {
 	struct unwind_task_info *info = &current->unwind_info;
-	unsigned long bits;
+	unsigned long bits = info->unwind_mask;
 
 	/* Was there any unwinding? */
-	if (unlikely(info->unwind_mask)) {
-		bits = info->unwind_mask;
-		do {
-			/* Is a task_work going to run again before going back */
-			if (bits & UNWIND_PENDING)
-				return;
-		} while (!try_cmpxchg(&info->unwind_mask, &bits, 0UL));
-		current->unwind_info.id.id = 0;
+	if (likely(!bits))
+		return;
 
-		if (unlikely(info->cache)) {
-			info->cache->nr_entries = 0;
-			info->cache->unwind_completed = 0;
-		}
+	do {
+		/* Is a task_work going to run again before going back */
+		if (bits & UNWIND_PENDING)
+			return;
+	} while (!try_cmpxchg(&info->unwind_mask, &bits, 0UL));
+	current->unwind_info.id.id = 0;
+
+	if (unlikely(info->cache)) {
+		info->cache->nr_entries = 0;
+		info->cache->unwind_completed = 0;
 	}
 }
 
@@ -68,9 +69,12 @@ static __always_inline void unwind_reset_info(void)
 static inline void unwind_task_init(struct task_struct *task) {}
 static inline void unwind_task_free(struct task_struct *task) {}
 
-static inline int unwind_user_faultable(struct unwind_stacktrace *trace) { return -ENOSYS; }
-static inline int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func) { return -ENOSYS; }
-static inline int unwind_deferred_request(struct unwind_work *work, u64 *timestamp) { return -ENOSYS; }
+static inline int unwind_user_faultable(struct unwind_stacktrace *trace)
+{ return -ENOSYS; }
+static inline int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func)
+{ return -ENOSYS; }
+static inline int unwind_deferred_request(struct unwind_work *work, u64 *timestamp)
+{ return -ENOSYS; }
 static inline void unwind_deferred_cancel(struct unwind_work *work) {}
 
 static inline void unwind_deferred_task_exit(struct task_struct *task) {}
diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_deferred_types.h
index 33b62ac25c86..29452ff49859 100644
--- a/include/linux/unwind_deferred_types.h
+++ b/include/linux/unwind_deferred_types.h
@@ -2,6 +2,8 @@
 #ifndef _LINUX_UNWIND_USER_DEFERRED_TYPES_H
 #define _LINUX_UNWIND_USER_DEFERRED_TYPES_H
 
+#include <linux/types.h>
+
 struct unwind_cache {
 	unsigned long		unwind_completed;
 	unsigned int		nr_entries;
Re: [RESEND][PATCH v15 0/4] perf: Support the deferred unwinding infrastructure
Posted by Steven Rostedt 4 months, 3 weeks ago
On Thu, 18 Sep 2025 13:46:10 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> So I started looking at this, but given I never seen the deferred unwind
> bits that got merged I have to look at that first.
> 
> Headers want something like so.. Let me read the rest.
> 
> ---
>  include/linux/unwind_deferred.h       | 38 +++++++++++++++++++----------------
>  include/linux/unwind_deferred_types.h |  2 ++
>  2 files changed, 23 insertions(+), 17 deletions(-)

Would you like to send a formal patch with this? I'd actually break it into
two patches. One to clean up the long lines, and the other to change the
logic.

-- Steve


> 
> diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferred.h
> index 26122d00708a..5d51a3f2f8ec 100644
> --- a/include/linux/unwind_deferred.h
> +++ b/include/linux/unwind_deferred.h
> @@ -8,7 +8,8 @@
>  
>  struct unwind_work;
>  
> -typedef void (*unwind_callback_t)(struct unwind_work *work, struct unwind_stacktrace *trace, u64 cookie);
> +typedef void (*unwind_callback_t)(struct unwind_work *work,
> +				  struct unwind_stacktrace *trace, u64 cookie);
>  
>  struct unwind_work {
>  	struct list_head		list;
> @@ -44,22 +45,22 @@ void unwind_deferred_task_exit(struct task_struct *task);
>  static __always_inline void unwind_reset_info(void)
>  {
>  	struct unwind_task_info *info = &current->unwind_info;
> -	unsigned long bits;
> +	unsigned long bits = info->unwind_mask;
>  
>  	/* Was there any unwinding? */
> -	if (unlikely(info->unwind_mask)) {
> -		bits = info->unwind_mask;
> -		do {
> -			/* Is a task_work going to run again before going back */
> -			if (bits & UNWIND_PENDING)
> -				return;
> -		} while (!try_cmpxchg(&info->unwind_mask, &bits, 0UL));
> -		current->unwind_info.id.id = 0;
> +	if (likely(!bits))
> +		return;
>  
> -		if (unlikely(info->cache)) {
> -			info->cache->nr_entries = 0;
> -			info->cache->unwind_completed = 0;
> -		}
> +	do {
> +		/* Is a task_work going to run again before going back */
> +		if (bits & UNWIND_PENDING)
> +			return;
> +	} while (!try_cmpxchg(&info->unwind_mask, &bits, 0UL));
> +	current->unwind_info.id.id = 0;
> +
> +	if (unlikely(info->cache)) {
> +		info->cache->nr_entries = 0;
> +		info->cache->unwind_completed = 0;
>  	}
>  }
>  
> @@ -68,9 +69,12 @@ static __always_inline void unwind_reset_info(void)
>  static inline void unwind_task_init(struct task_struct *task) {}
>  static inline void unwind_task_free(struct task_struct *task) {}
>  
> -static inline int unwind_user_faultable(struct unwind_stacktrace *trace) { return -ENOSYS; }
> -static inline int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func) { return -ENOSYS; }
> -static inline int unwind_deferred_request(struct unwind_work *work, u64 *timestamp) { return -ENOSYS; }
> +static inline int unwind_user_faultable(struct unwind_stacktrace *trace)
> +{ return -ENOSYS; }
> +static inline int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func)
> +{ return -ENOSYS; }
> +static inline int unwind_deferred_request(struct unwind_work *work, u64 *timestamp)
> +{ return -ENOSYS; }
>  static inline void unwind_deferred_cancel(struct unwind_work *work) {}
>  
>  static inline void unwind_deferred_task_exit(struct task_struct *task) {}
> diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_deferred_types.h
> index 33b62ac25c86..29452ff49859 100644
> --- a/include/linux/unwind_deferred_types.h
> +++ b/include/linux/unwind_deferred_types.h
> @@ -2,6 +2,8 @@
>  #ifndef _LINUX_UNWIND_USER_DEFERRED_TYPES_H
>  #define _LINUX_UNWIND_USER_DEFERRED_TYPES_H
>  
> +#include <linux/types.h>
> +
>  struct unwind_cache {
>  	unsigned long		unwind_completed;
>  	unsigned int		nr_entries;
Re: [RESEND][PATCH v15 0/4] perf: Support the deferred unwinding infrastructure
Posted by Peter Zijlstra 4 months, 3 weeks ago
On Thu, Sep 18, 2025 at 11:18:53AM -0400, Steven Rostedt wrote:
> On Thu, 18 Sep 2025 13:46:10 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > So I started looking at this, but given I never seen the deferred unwind
> > bits that got merged I have to look at that first.
> > 
> > Headers want something like so.. Let me read the rest.
> > 
> > ---
> >  include/linux/unwind_deferred.h       | 38 +++++++++++++++++++----------------
> >  include/linux/unwind_deferred_types.h |  2 ++
> >  2 files changed, 23 insertions(+), 17 deletions(-)
> 
> Would you like to send a formal patch with this? I'd actually break it into
> two patches. One to clean up the long lines, and the other to change the
> logic.

Sure, I'll collect the lot while I go through it and whip something up
when I'm done. For now, I'll just shoot a few questions your way.


So we have:

do_syscall_64()
  ... do stuff ...
  syscall_exit_to_user_mode(regs)
    syscall_exit_to_user_mode_work(regs)
      syscall_exit_work()
      exit_to_user_mode_prepare()
        exit_to_user_mode_loop()
	  retume_user_mode_work()
	    task_work_run()
    exit_to_user_mode()
      unwind_reset_info();
      user_enter_irqoff();
      arch_exit_to_user_mode();
      lockdep_hardirqs_on();
  SYSRET/IRET


and

DEFINE_IDTENTRY*()
  irqentry_enter();
  ... stuff ...
  irqentry_exit()
    irqentry_exit_to_user_mode()
      exit_to_user_mode_prepare()
        exit_to_user_mode_loop();
	  retume_user_mode_work()
	    task_work_run()
      exit_to_user_mode()
        unwind_reset_info();
	...
  IRET

Now, task_work_run() is in the exit_to_user_mode_loop() which is notably
*before* exit_to_user_mode() which does the unwind_reset_info().

What happens if we get an NMI requesting an unwind after
unwind_reset_info() while still very much being in the kernel on the way
out?


What is the purpose of unwind_deferred_task_exit()? This is called from
do_exit(), only slightly before it does exit_task_work(), which runs all
pending task_work. Is there something that justifies the manual run and
cancel instead of just leaving it sit in task_work an having it run
naturally? If so, that most certainly deserves a comment.


A similar question for unwind_task_free(), where exactly is it relevant?
Where does it acquire a task_work that is not otherwise already ran on
exit?
Re: [RESEND][PATCH v15 0/4] perf: Support the deferred unwinding infrastructure
Posted by Peter Zijlstra 4 months, 3 weeks ago
On Thu, Sep 18, 2025 at 07:24:14PM +0200, Peter Zijlstra wrote:

> So we have:
> 
> do_syscall_64()
>   ... do stuff ...
>   syscall_exit_to_user_mode(regs)
>     syscall_exit_to_user_mode_work(regs)
>       syscall_exit_work()
>       exit_to_user_mode_prepare()
>         exit_to_user_mode_loop()
> 	  retume_user_mode_work()
> 	    task_work_run()
>     exit_to_user_mode()
>       unwind_reset_info();
>       user_enter_irqoff();
>       arch_exit_to_user_mode();
>       lockdep_hardirqs_on();
>   SYSRET/IRET
> 
> 
> and
> 
> DEFINE_IDTENTRY*()
>   irqentry_enter();
>   ... stuff ...
>   irqentry_exit()
>     irqentry_exit_to_user_mode()
>       exit_to_user_mode_prepare()
>         exit_to_user_mode_loop();
> 	  retume_user_mode_work()
> 	    task_work_run()
>       exit_to_user_mode()
>         unwind_reset_info();
> 	...
>   IRET
> 
> Now, task_work_run() is in the exit_to_user_mode_loop() which is notably
> *before* exit_to_user_mode() which does the unwind_reset_info().
> 
> What happens if we get an NMI requesting an unwind after
> unwind_reset_info() while still very much being in the kernel on the way
> out?

AFAICT it will try and do a task_work_add(TWA_RESUME) from NMI context,
and this will fail horribly.

If you do something like:

	twa_mode = in_nmi() ? TWA_NMI_CURRENT : TWA_RESUME;
	task_work_add(foo, twa_mode);

it might actually work.
Re: [RESEND][PATCH v15 0/4] perf: Support the deferred unwinding infrastructure
Posted by Steven Rostedt 4 months, 3 weeks ago
On Thu, 18 Sep 2025 19:32:20 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> > Now, task_work_run() is in the exit_to_user_mode_loop() which is notably
> > *before* exit_to_user_mode() which does the unwind_reset_info().
> > 
> > What happens if we get an NMI requesting an unwind after
> > unwind_reset_info() while still very much being in the kernel on the way
> > out?  
> 
> AFAICT it will try and do a task_work_add(TWA_RESUME) from NMI context,
> and this will fail horribly.
> 
> If you do something like:
> 
> 	twa_mode = in_nmi() ? TWA_NMI_CURRENT : TWA_RESUME;
> 	task_work_add(foo, twa_mode);
> 
> it might actually work.

Ah, the comment for TWA_RESUME didn't express this restriction.

That does look like that would work as the way I expected task_work to
handle this case.

-- Steve
Re: [RESEND][PATCH v15 0/4] perf: Support the deferred unwinding infrastructure
Posted by Josh Poimboeuf 4 months, 2 weeks ago
On Thu, Sep 18, 2025 at 03:10:18PM -0400, Steven Rostedt wrote:
> On Thu, 18 Sep 2025 19:32:20 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > Now, task_work_run() is in the exit_to_user_mode_loop() which is notably
> > > *before* exit_to_user_mode() which does the unwind_reset_info().
> > > 
> > > What happens if we get an NMI requesting an unwind after
> > > unwind_reset_info() while still very much being in the kernel on the way
> > > out?  
> > 
> > AFAICT it will try and do a task_work_add(TWA_RESUME) from NMI context,
> > and this will fail horribly.
> > 
> > If you do something like:
> > 
> > 	twa_mode = in_nmi() ? TWA_NMI_CURRENT : TWA_RESUME;
> > 	task_work_add(foo, twa_mode);
> > 
> > it might actually work.
> 
> Ah, the comment for TWA_RESUME didn't express this restriction.
> 
> That does look like that would work as the way I expected task_work to
> handle this case.

BTW, I remember Peter had a fix for TWA_NMI_CURRENT, I guess it got lost
in the shuffle or did something else happen in the meantime?

  https://lore.kernel.org/20250122124228.GO7145@noisy.programming.kicks-ass.net

-- 
Josh
Re: [RESEND][PATCH v15 0/4] perf: Support the deferred unwinding infrastructure
Posted by Peter Zijlstra 4 months, 2 weeks ago
On Fri, Sep 19, 2025 at 04:34:02PM -0700, Josh Poimboeuf wrote:
> On Thu, Sep 18, 2025 at 03:10:18PM -0400, Steven Rostedt wrote:
> > On Thu, 18 Sep 2025 19:32:20 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > > Now, task_work_run() is in the exit_to_user_mode_loop() which is notably
> > > > *before* exit_to_user_mode() which does the unwind_reset_info().
> > > > 
> > > > What happens if we get an NMI requesting an unwind after
> > > > unwind_reset_info() while still very much being in the kernel on the way
> > > > out?  
> > > 
> > > AFAICT it will try and do a task_work_add(TWA_RESUME) from NMI context,
> > > and this will fail horribly.
> > > 
> > > If you do something like:
> > > 
> > > 	twa_mode = in_nmi() ? TWA_NMI_CURRENT : TWA_RESUME;
> > > 	task_work_add(foo, twa_mode);
> > > 
> > > it might actually work.
> > 
> > Ah, the comment for TWA_RESUME didn't express this restriction.
> > 
> > That does look like that would work as the way I expected task_work to
> > handle this case.
> 
> BTW, I remember Peter had a fix for TWA_NMI_CURRENT, I guess it got lost
> in the shuffle or did something else happen in the meantime?
> 
>   https://lore.kernel.org/20250122124228.GO7145@noisy.programming.kicks-ass.net

Oh, yeah, I had completely forgotten about all that :-)

I'll go stick it in the pile. Thanks!
Re: [RESEND][PATCH v15 0/4] perf: Support the deferred unwinding infrastructure
Posted by Peter Zijlstra 4 months, 2 weeks ago
On Mon, Sep 22, 2025 at 09:23:07AM +0200, Peter Zijlstra wrote:

> I'll go stick it in the pile. Thanks!

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git unwind/cleanup

I'll let the robot have a chew and post tomorrow or so.
Re: [RESEND][PATCH v15 0/4] perf: Support the deferred unwinding infrastructure
Posted by Steven Rostedt 4 months, 2 weeks ago
On Fri, 19 Sep 2025 16:34:02 -0700
Josh Poimboeuf <jpoimboe@kernel.org> wrote:

> > That does look like that would work as the way I expected task_work to
> > handle this case.  
> 
> BTW, I remember Peter had a fix for TWA_NMI_CURRENT, I guess it got lost
> in the shuffle or did something else happen in the meantime?
> 
>   https://lore.kernel.org/20250122124228.GO7145@noisy.programming.kicks-ass.net

Yeah, it did get lost in the shuffle. I took the code from your git
tree, but missed the comments that were made to the patches you sent to
the mailing list. :-p

Thanks for pointing this out!

-- Steve
Re: [RESEND][PATCH v15 0/4] perf: Support the deferred unwinding infrastructure
Posted by Steven Rostedt 5 months ago
Peter, can you take a look at these patches please. I believe you're the
only one that really maintains this code today.

-- Steve


On Mon, 08 Sep 2025 13:14:12 -0400
Steven Rostedt <rostedt@kernel.org> wrote:

> [
>   This is simply a resend of version 15 of this patch series
>   but with only the kernel changes. I'm separating out the user space
>   changes to their own series.
>   The original v15 is here:
>     https://lore.kernel.org/linux-trace-kernel/20250825180638.877627656@kernel.org/
> ]
> 
> This patch set is based off of perf/core of the tip tree:
>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
> 
> To run this series, you can checkout this repo that has this series as well as the above:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git  unwind/perf-test
> 
> This series implements the perf interface to use deferred user space stack
> tracing.
> 
> Patch 1 adds a new API interface to the user unwinder logic to allow perf to
> get the current context cookie for it's task event tracing. Perf's task event
> tracing maps a single task per perf event buffer and it follows the task
> around, so it only needs to implement its own task_work to do the deferred
> stack trace. Because it can still suffer not knowing which user stack trace
> belongs to which kernel stack due to dropped events, having the cookie to
> create a unique identifier for each user space stack trace to know which
> kernel stack to append it to is useful.
> 
> Patch 2 adds the per task deferred stack traces to perf. It adds a new event
> type called PERF_RECORD_CALLCHAIN_DEFERRED that is recorded when a task is
> about to go back to user space and happens in a location that pages may be
> faulted in. It also adds a new callchain context called PERF_CONTEXT_USER_DEFERRED
> that is used as a place holder in a kernel callchain to append the deferred
> user space stack trace to.
> 
> Patch 3 adds the user stack trace context cookie in the kernel callchain right
> after the PERF_CONTEXT_USER_DEFERRED context so that the user space side can
> map the request to the deferred user space stack trace.
> 
> Patch 4 adds support for the per CPU perf events that will allow the kernel to
> associate each of the per CPU perf event buffers to a single application. This
> is needed so that when a request for a deferred stack trace happens on a task
> that then migrates to another CPU, it will know which CPU buffer to use to
> record the stack trace on. It is possible to have more than one perf user tool
> running and a request made by one perf tool should have the deferred trace go
> to the same perf tool's perf CPU event buffer. A global list of all the
> descriptors representing each perf tool that is using deferred stack tracing
> is created to manage this.
> 
> 
> Josh Poimboeuf (1):
>       perf: Support deferred user callchains
> 
> Steven Rostedt (3):
>       unwind deferred: Add unwind_user_get_cookie() API
>       perf: Have the deferred request record the user context cookie
>       perf: Support deferred user callchains for per CPU events
> 
> ----
>  include/linux/perf_event.h            |  11 +-
>  include/linux/unwind_deferred.h       |   5 +
>  include/uapi/linux/perf_event.h       |  25 +-
>  kernel/bpf/stackmap.c                 |   4 +-
>  kernel/events/callchain.c             |  14 +-
>  kernel/events/core.c                  | 421 +++++++++++++++++++++++++++++++++-
>  kernel/unwind/deferred.c              |  21 ++
>  tools/include/uapi/linux/perf_event.h |  25 +-
>  8 files changed, 518 insertions(+), 8 deletions(-)
Re: [RESEND][PATCH v15 0/4] perf: Support the deferred unwinding infrastructure
Posted by Steven Rostedt 4 months, 3 weeks ago
Peter,

It's been over 3 weeks since the original has been sent. And last week I
broke it up to only hold the kernel changes. Can you please take a look at
it?

I have updated the user space side with Namhyung Kim's updates:

   https://lore.kernel.org/all/20250908175319.841517121@kernel.org/

Also, the two patches to enable deferred unwinding in x86 has been ignored
for almost three weeks as well:

  https://lore.kernel.org/linux-trace-kernel/20250827193644.527334838@kernel.org/

-- Steve


On Mon, 8 Sep 2025 13:21:06 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> Peter, can you take a look at these patches please. I believe you're the
> only one that really maintains this code today.
> 
> -- Steve
> 
> 
> On Mon, 08 Sep 2025 13:14:12 -0400
> Steven Rostedt <rostedt@kernel.org> wrote:
> 
> > [
> >   This is simply a resend of version 15 of this patch series
> >   but with only the kernel changes. I'm separating out the user space
> >   changes to their own series.
> >   The original v15 is here:
> >     https://lore.kernel.org/linux-trace-kernel/20250825180638.877627656@kernel.org/
> > ]
> > 
> > This patch set is based off of perf/core of the tip tree:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
> > 
> > To run this series, you can checkout this repo that has this series as well as the above:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git  unwind/perf-test
> > 
> > This series implements the perf interface to use deferred user space stack
> > tracing.
> > 
> > Patch 1 adds a new API interface to the user unwinder logic to allow perf to
> > get the current context cookie for it's task event tracing. Perf's task event
> > tracing maps a single task per perf event buffer and it follows the task
> > around, so it only needs to implement its own task_work to do the deferred
> > stack trace. Because it can still suffer not knowing which user stack trace
> > belongs to which kernel stack due to dropped events, having the cookie to
> > create a unique identifier for each user space stack trace to know which
> > kernel stack to append it to is useful.
> > 
> > Patch 2 adds the per task deferred stack traces to perf. It adds a new event
> > type called PERF_RECORD_CALLCHAIN_DEFERRED that is recorded when a task is
> > about to go back to user space and happens in a location that pages may be
> > faulted in. It also adds a new callchain context called PERF_CONTEXT_USER_DEFERRED
> > that is used as a place holder in a kernel callchain to append the deferred
> > user space stack trace to.
> > 
> > Patch 3 adds the user stack trace context cookie in the kernel callchain right
> > after the PERF_CONTEXT_USER_DEFERRED context so that the user space side can
> > map the request to the deferred user space stack trace.
> > 
> > Patch 4 adds support for the per CPU perf events that will allow the kernel to
> > associate each of the per CPU perf event buffers to a single application. This
> > is needed so that when a request for a deferred stack trace happens on a task
> > that then migrates to another CPU, it will know which CPU buffer to use to
> > record the stack trace on. It is possible to have more than one perf user tool
> > running and a request made by one perf tool should have the deferred trace go
> > to the same perf tool's perf CPU event buffer. A global list of all the
> > descriptors representing each perf tool that is using deferred stack tracing
> > is created to manage this.
> > 
> > 
> > Josh Poimboeuf (1):
> >       perf: Support deferred user callchains
> > 
> > Steven Rostedt (3):
> >       unwind deferred: Add unwind_user_get_cookie() API
> >       perf: Have the deferred request record the user context cookie
> >       perf: Support deferred user callchains for per CPU events
> > 
> > ----
> >  include/linux/perf_event.h            |  11 +-
> >  include/linux/unwind_deferred.h       |   5 +
> >  include/uapi/linux/perf_event.h       |  25 +-
> >  kernel/bpf/stackmap.c                 |   4 +-
> >  kernel/events/callchain.c             |  14 +-
> >  kernel/events/core.c                  | 421 +++++++++++++++++++++++++++++++++-
> >  kernel/unwind/deferred.c              |  21 ++
> >  tools/include/uapi/linux/perf_event.h |  25 +-
> >  8 files changed, 518 insertions(+), 8 deletions(-)  
>