From nobody Sun Apr 26 09:35:42 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0ABE9291169; Thu, 24 Apr 2025 19:24:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522657; cv=none; b=tX7v0/fvyA5W52CuOv/idWqxsYKpsfZ1qix0BzTJQN1MsWcpVHx5Bvyww2kLLefonl+RBGI4QYXrnbpsF5p//SNG948g0ksqfwUtsyj0Subvxp/yyE5VaK+h3aLWNoq0mK4Hrr8/kPtmdCnlGV9qmCrQ2TLtDe113b8PAR+Rlmg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522657; c=relaxed/simple; bh=4H8Z9MeLJNwl9dY9CuXoH9JBExAzmjxaiL9XMCbuDd8=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=ikcrJGrASbn8YMaHtDkXSXKT5+ZpyOk26eTfXCRmfadlN5tL2jW+KC/cmUs7KaFxAZr3XTafU1FrGNVhHCPkqVpPUf8jLMKWbESHyZrSzghcc2v3zt6RdmMGVwDzcQ7B5LQDo1bovupYqPY9Honjuvlc+VFYFTwRkJEBOUCQLpM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5BB89C4CEEA; Thu, 24 Apr 2025 19:24:16 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1u82D6-0000000H2Oh-2itF; Thu, 24 Apr 2025 15:26:12 -0400 Message-ID: <20250424192612.505622711@goodmis.org> User-Agent: quilt/0.68 Date: Thu, 24 Apr 2025 15:24:57 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Josh Poimboeuf , x86@kernel.org, Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Indu Bhagat , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , Andrii Nakryiko , Jens Remus , Florian Weimer , Andy Lutomirski , Weinan Liu , Blake Jones , Beau Belgrave , "Jose E. Marchesi" , Alexander Aring Subject: [PATCH v5 1/9] unwind_user/deferred: Add deferred unwinding interface References: <20250424192456.851953422@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Add an interface for scheduling task work to unwind the user space stack before returning to user space. This solves several problems for its callers: - Ensure the unwind happens in task context even if the caller may be running in NMI or interrupt context. - Avoid duplicate unwinds, whether called multiple times by the same caller or by different callers. - Create a "context cookie" which allows trace post-processing to correlate kernel unwinds/traces with the user unwind. A concept of a "cookie" is created to detect when the stacktrace is the same. A cookie is generated the first time a user space stacktrace is requested after the task enters the kernel. As the stacktrace is saved on the task_struct while the task is in the kernel, if another request comes in, if the cookie is still the same, it will use the saved stacktrace, and not have to regenerate one. The cookie is passed to the caller on request, and when the stacktrace is generated upon returning to user space, it call the requester's callback with the cookie as well as the stacktrace. Co-developed-by: Steven Rostedt (Google) Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- Changes since v4: https://lore.kernel.org/all/6052e8487746603bdb29b65f4033e= 739092d9925.1737511963.git.jpoimboe@kernel.org/ - Fixed comment where it said 12 bits but it should have been 16 (Peter Zijlstra) - Made the cookie LSB always set to 1 to make sure it never returns zero (Peter Zijlstra) - Updated comment about unwind_ctx_ctr only being generated as needed. (Josh Poimboeuf) include/linux/entry-common.h | 2 +- include/linux/unwind_deferred.h | 22 +++- include/linux/unwind_deferred_types.h | 3 + kernel/unwind/deferred.c | 163 +++++++++++++++++++++++++- 4 files changed, 186 insertions(+), 4 deletions(-) diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index fb2b27154fee..725ec0e87cdd 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -112,7 +112,6 @@ static __always_inline void enter_from_user_mode(struct= pt_regs *regs) =20 CT_WARN_ON(__ct_state() !=3D CT_STATE_USER); user_exit_irqoff(); - unwind_enter_from_user_mode(); =20 instrumentation_begin(); kmsan_unpoison_entry_regs(regs); @@ -363,6 +362,7 @@ static __always_inline void exit_to_user_mode(void) lockdep_hardirqs_on_prepare(); instrumentation_end(); =20 + unwind_exit_to_user_mode(); user_enter_irqoff(); arch_exit_to_user_mode(); lockdep_hardirqs_on(CALLER_ADDR0); diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index 54f1aa6caf29..d36784cae658 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -2,9 +2,19 @@ #ifndef _LINUX_UNWIND_USER_DEFERRED_H #define _LINUX_UNWIND_USER_DEFERRED_H =20 +#include #include #include =20 +struct unwind_work; + +typedef void (*unwind_callback_t)(struct unwind_work *work, struct unwind_= stacktrace *trace, u64 cookie); + +struct unwind_work { + struct list_head list; + unwind_callback_t func; +}; + #ifdef CONFIG_UNWIND_USER =20 void unwind_task_init(struct task_struct *task); @@ -12,9 +22,14 @@ void unwind_task_free(struct task_struct *task); =20 int unwind_deferred_trace(struct unwind_stacktrace *trace); =20 -static __always_inline void unwind_enter_from_user_mode(void) +int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func); +int unwind_deferred_request(struct unwind_work *work, u64 *cookie); +void unwind_deferred_cancel(struct unwind_work *work); + +static __always_inline void unwind_exit_to_user_mode(void) { current->unwind_info.cache.nr_entries =3D 0; + current->unwind_info.cookie =3D 0; } =20 #else /* !CONFIG_UNWIND_USER */ @@ -23,8 +38,11 @@ static inline void unwind_task_init(struct task_struct *= task) {} static inline void unwind_task_free(struct task_struct *task) {} =20 static inline int unwind_deferred_trace(struct unwind_stacktrace *trace) {= return -ENOSYS; } +static inline int unwind_deferred_init(struct unwind_work *work, unwind_ca= llback_t func) { return -ENOSYS; } +static inline int unwind_deferred_request(struct unwind_work *work, u64 *c= ookie) { return -ENOSYS; } +static inline void unwind_deferred_cancel(struct unwind_work *work) {} =20 -static inline void unwind_enter_from_user_mode(void) {} +static inline void unwind_exit_to_user_mode(void) {} =20 #endif /* !CONFIG_UNWIND_USER */ =20 diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index b3b7389ee6eb..33373c32c221 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -9,6 +9,9 @@ struct unwind_cache { =20 struct unwind_task_info { struct unwind_cache cache; + u64 cookie; + struct callback_head work; + int pending; }; =20 #endif /* _LINUX_UNWIND_USER_DEFERRED_TYPES_H */ diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index 99d4d9e049cd..dc438c5f6618 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -2,13 +2,72 @@ /* * Deferred user space unwinding */ +#include +#include +#include #include #include #include -#include +#include =20 #define UNWIND_MAX_ENTRIES 512 =20 +/* + * This is a unique percpu identifier for a given task entry context. + * Conceptually, it's incremented every time the CPU enters the kernel from + * user space, so that each "entry context" on the CPU gets a unique ID. = In + * reality, as an optimization, it's only incremented on demand for the fi= rst + * deferred unwind request after a given entry-from-user. + * + * It's combined with the CPU id to make a systemwide-unique "context cook= ie". + */ +static DEFINE_PER_CPU(u64, unwind_ctx_ctr); + +/* Guards adding to and reading the list of callbacks */ +static DEFINE_MUTEX(callback_mutex); +static LIST_HEAD(callbacks); + +/* + * The context cookie is a unique identifier that is assigned to a user + * space stacktrace. As the user space stacktrace remains the same while + * the task is in the kernel, the cookie is an identifier for the stacktra= ce. + * Although it is possible for the stacktrace to get another cookie if ano= ther + * request is made after the cookie was cleared and before reentering user + * space. + * + * The high 16 bits are the CPU id; the lower 48 bits are a per-CPU entry + * counter shifted left by one and or'd with 1 (to prevent it from ever be= ing + * zero). + */ +static u64 ctx_to_cookie(u64 cpu, u64 ctx) +{ + BUILD_BUG_ON(NR_CPUS > 65535); + return ((ctx << 1) & ((1UL << 48) - 1)) | (cpu << 48) | 1; +} + +/* + * Read the task context cookie, first initializing it if this is the first + * call to get_cookie() since the most recent entry from user. + */ +static u64 get_cookie(struct unwind_task_info *info) +{ + u64 ctx_ctr; + u64 cookie; + u64 cpu; + + guard(irqsave)(); + + cookie =3D info->cookie; + if (cookie) + return cookie; + + cpu =3D raw_smp_processor_id(); + ctx_ctr =3D __this_cpu_inc_return(unwind_ctx_ctr); + info->cookie =3D ctx_to_cookie(cpu, ctx_ctr); + + return info->cookie; +} + int unwind_deferred_trace(struct unwind_stacktrace *trace) { struct unwind_task_info *info =3D ¤t->unwind_info; @@ -47,11 +106,112 @@ int unwind_deferred_trace(struct unwind_stacktrace *t= race) return 0; } =20 +static void unwind_deferred_task_work(struct callback_head *head) +{ + struct unwind_task_info *info =3D container_of(head, struct unwind_task_i= nfo, work); + struct unwind_stacktrace trace; + struct unwind_work *work; + u64 cookie; + + if (WARN_ON_ONCE(!info->pending)) + return; + + /* Allow work to come in again */ + WRITE_ONCE(info->pending, 0); + + /* + * From here on out, the callback must always be called, even if it's + * just an empty trace. + */ + trace.nr =3D 0; + trace.entries =3D NULL; + + unwind_deferred_trace(&trace); + + cookie =3D get_cookie(info); + + guard(mutex)(&callback_mutex); + list_for_each_entry(work, &callbacks, list) { + work->func(work, &trace, cookie); + } + barrier(); + /* If another task work is pending, reuse the cookie and stack trace */ + if (!READ_ONCE(info->pending)) + WRITE_ONCE(info->cookie, 0); +} + +/* + * Schedule a user space unwind to be done in task work before exiting the + * kernel. + * + * The returned cookie output is a unique identifer for the current task e= ntry + * context. Its value will also be passed to the callback function. It c= an be + * used to stitch kernel and user stack traces together in post-processing. + * + * It's valid to call this function multiple times for the same @work with= in + * the same task entry context. Each call will return the same cookie. + * If the callback is not pending because it has already been previously c= alled + * for the same entry context, it will be called again with the same stack= trace + * and cookie. + * + * Returns 0 if the callback will be called on task to user space + * Negative if there's an error. + */ +int unwind_deferred_request(struct unwind_work *work, u64 *cookie) +{ + struct unwind_task_info *info =3D ¤t->unwind_info; + int ret; + + *cookie =3D 0; + + if (WARN_ON_ONCE(in_nmi())) + return -EINVAL; + + if ((current->flags & PF_KTHREAD) || !user_mode(task_pt_regs(current))) + return -EINVAL; + + guard(irqsave)(); + + *cookie =3D get_cookie(info); + + /* callback already pending? */ + if (info->pending) + return 0; + + /* The work has been claimed, now schedule it. */ + ret =3D task_work_add(current, &info->work, TWA_RESUME); + if (WARN_ON_ONCE(ret)) + return ret; + + info->pending =3D 1; + return 0; +} + +void unwind_deferred_cancel(struct unwind_work *work) +{ + if (!work) + return; + + guard(mutex)(&callback_mutex); + list_del(&work->list); +} + +int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func) +{ + memset(work, 0, sizeof(*work)); + + guard(mutex)(&callback_mutex); + list_add(&work->list, &callbacks); + work->func =3D func; + return 0; +} + void unwind_task_init(struct task_struct *task) { struct unwind_task_info *info =3D &task->unwind_info; =20 memset(info, 0, sizeof(*info)); + init_task_work(&info->work, unwind_deferred_task_work); } =20 void unwind_task_free(struct task_struct *task) @@ -59,4 +219,5 @@ void unwind_task_free(struct task_struct *task) struct unwind_task_info *info =3D &task->unwind_info; =20 kfree(info->cache.entries); + task_work_cancel(task, &info->work); } --=20 2.47.2 From nobody Sun Apr 26 09:35:42 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2FB05291178; Thu, 24 Apr 2025 19:24:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522657; cv=none; b=mTZZDCS2Nae+iClgVDXqoRSlYg4soVJg+CkcRT6yCrS278GKSe4B/ew16oXA/3dNHzNKKIzI6QLJL5JTA1t1iCrZWyP9xkJgxY4ObDr7QhHmHGTz/NVnLPVnfCUU6YH/eapFLhuErZ957ONhm1+lyj5Y56e7L54nR3bcR8BRgic= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522657; c=relaxed/simple; bh=adf8R1vDZBe5ipd0Rwj2pZZlPcb/S91fp6Z+GgTsmwo=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=A6bUxdknsABbJaxSvb7HfrSjArk8huWhzQDjoxUEyrCI1wC/6kr0Z93xGDWrRyErIFIPxQsLcEAwEG8XUvXCotTF9KRy2jJKf4IftRlcJLNyip1/Wp8Wmw+HcBjtUbqBxiTx41qp7EXE4PXHfKGOcZ7ayaAFdTLbNM0ogpDqEtU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id AA349C4CEF8; Thu, 24 Apr 2025 19:24:16 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1u82D6-0000000H2PD-3S17; Thu, 24 Apr 2025 15:26:12 -0400 Message-ID: <20250424192612.669992559@goodmis.org> User-Agent: quilt/0.68 Date: Thu, 24 Apr 2025 15:24:58 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Josh Poimboeuf , x86@kernel.org, Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Indu Bhagat , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , Andrii Nakryiko , Jens Remus , Florian Weimer , Andy Lutomirski , Weinan Liu , Blake Jones , Beau Belgrave , "Jose E. Marchesi" , Alexander Aring Subject: [PATCH v5 2/9] unwind_user/deferred: Make unwind deferral requests NMI-safe References: <20250424192456.851953422@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Make unwind_deferred_request() NMI-safe so tracers in NMI context can call it to get the cookie immediately rather than have to do the fragile "schedule irq work and then call unwind_deferred_request()" dance. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- include/linux/unwind_deferred_types.h | 1 + kernel/unwind/deferred.c | 100 ++++++++++++++++++++++---- 2 files changed, 89 insertions(+), 12 deletions(-) diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index 33373c32c221..8f47d77ddda0 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -10,6 +10,7 @@ struct unwind_cache { struct unwind_task_info { struct unwind_cache cache; u64 cookie; + u64 nmi_cookie; struct callback_head work; int pending; }; diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index dc438c5f6618..2afd197da2ef 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -47,23 +47,47 @@ static u64 ctx_to_cookie(u64 cpu, u64 ctx) =20 /* * Read the task context cookie, first initializing it if this is the first - * call to get_cookie() since the most recent entry from user. + * call to get_cookie() since the most recent entry from user. This has t= o be + * done carefully to coordinate with unwind_deferred_request_nmi(). */ static u64 get_cookie(struct unwind_task_info *info) { u64 ctx_ctr; u64 cookie; - u64 cpu; =20 guard(irqsave)(); =20 - cookie =3D info->cookie; + cookie =3D READ_ONCE(info->cookie); if (cookie) return cookie; =20 - cpu =3D raw_smp_processor_id(); - ctx_ctr =3D __this_cpu_inc_return(unwind_ctx_ctr); - info->cookie =3D ctx_to_cookie(cpu, ctx_ctr); + ctx_ctr =3D __this_cpu_read(unwind_ctx_ctr); + + /* Read ctx_ctr before info->nmi_cookie */ + barrier(); + + cookie =3D READ_ONCE(info->nmi_cookie); + if (cookie) { + /* + * This is the first call to get_cookie() since an NMI handler + * first wrote it to info->nmi_cookie. Sync it. + */ + WRITE_ONCE(info->cookie, cookie); + WRITE_ONCE(info->nmi_cookie, 0); + return cookie; + } + + /* + * Write info->cookie. It's ok to race with an NMI here. The value of + * the cookie is based on ctx_ctr from before the NMI could have + * incremented it. The result will be the same even if cookie or + * ctx_ctr end up getting written twice. + */ + cookie =3D ctx_to_cookie(raw_smp_processor_id(), ctx_ctr + 1); + WRITE_ONCE(info->cookie, cookie); + WRITE_ONCE(info->nmi_cookie, 0); + barrier(); + __this_cpu_write(unwind_ctx_ctr, ctx_ctr + 1); =20 return info->cookie; } @@ -140,6 +164,51 @@ static void unwind_deferred_task_work(struct callback_= head *head) WRITE_ONCE(info->cookie, 0); } =20 +static int unwind_deferred_request_nmi(struct unwind_work *work, u64 *cook= ie) +{ + struct unwind_task_info *info =3D ¤t->unwind_info; + bool inited_cookie =3D false; + int ret; + + *cookie =3D info->cookie; + if (!*cookie) { + /* + * This is the first unwind request since the most recent entry + * from user. Initialize the task cookie. + * + * Don't write to info->cookie directly, otherwise it may get + * cleared if the NMI occurred in the kernel during early entry + * or late exit before the task work gets to run. Instead, use + * info->nmi_cookie which gets synced later by get_cookie(). + */ + if (!info->nmi_cookie) { + u64 cpu =3D raw_smp_processor_id(); + u64 ctx_ctr; + + ctx_ctr =3D __this_cpu_inc_return(unwind_ctx_ctr); + info->nmi_cookie =3D ctx_to_cookie(cpu, ctx_ctr); + + inited_cookie =3D true; + } + + *cookie =3D info->nmi_cookie; + } + + if (info->pending) + return 0; + + ret =3D task_work_add(current, &info->work, TWA_NMI_CURRENT); + if (ret) { + if (inited_cookie) + info->nmi_cookie =3D 0; + return ret; + } + + info->pending =3D 1; + + return 0; +} + /* * Schedule a user space unwind to be done in task work before exiting the * kernel. @@ -160,30 +229,37 @@ static void unwind_deferred_task_work(struct callback= _head *head) int unwind_deferred_request(struct unwind_work *work, u64 *cookie) { struct unwind_task_info *info =3D ¤t->unwind_info; + int pending; int ret; =20 *cookie =3D 0; =20 - if (WARN_ON_ONCE(in_nmi())) - return -EINVAL; - if ((current->flags & PF_KTHREAD) || !user_mode(task_pt_regs(current))) return -EINVAL; =20 + if (in_nmi()) + return unwind_deferred_request_nmi(work, cookie); + guard(irqsave)(); =20 *cookie =3D get_cookie(info); =20 /* callback already pending? */ - if (info->pending) + pending =3D READ_ONCE(info->pending); + if (pending) + return 0; + + /* Claim the work unless an NMI just now swooped in to do so. */ + if (!try_cmpxchg(&info->pending, &pending, 1)) return 0; =20 /* The work has been claimed, now schedule it. */ ret =3D task_work_add(current, &info->work, TWA_RESUME); - if (WARN_ON_ONCE(ret)) + if (WARN_ON_ONCE(ret)) { + WRITE_ONCE(info->pending, 0); return ret; + } =20 - info->pending =3D 1; return 0; } =20 --=20 2.47.2 From nobody Sun Apr 26 09:35:42 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4EC2E2918C5; Thu, 24 Apr 2025 19:24:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522657; cv=none; b=J/BfzmGeTwxUMK98pAlROF7MBpt39u5JWUUAsdyR0Z6QjU5QAcHjzrUIiFuReMuq/C3Hhoc2pNUvoA8K54UjZ5Ds0yInPwjwMj6/anRLlMdOkWZMMfoXdvW+Z/AbqUB4bnR2ePD04/AxaSJ2hCyqFtVft0xj96TYnXvjhSHkQ4c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522657; c=relaxed/simple; bh=y7auUpMLFJPKjEnzj7HiaQ3abA+PL0AyQVUPG2xZYmQ=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=qyrAU83irXcoOh5uP4joskAd2NpvhyV1A1vJS9v/O8M3Th738ryQKuwXOh5CbL2W2vBfxydPYDdghqXBu5Cfxk1PWLxJRm6yFIRFkRvKHLb3pPLmLLJvUqGv8NaWBsLAzVJTvxh90zOg4yrbtCLbJJcPDIcX0iWSGPXtK2K6VDI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id B7E7BC4CEFA; Thu, 24 Apr 2025 19:24:16 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1u82D6-0000000H2Ph-4AmK; Thu, 24 Apr 2025 15:26:12 -0400 Message-ID: <20250424192612.844558089@goodmis.org> User-Agent: quilt/0.68 Date: Thu, 24 Apr 2025 15:24:59 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Josh Poimboeuf , x86@kernel.org, Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Indu Bhagat , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , Andrii Nakryiko , Jens Remus , Florian Weimer , Andy Lutomirski , Weinan Liu , Blake Jones , Beau Belgrave , "Jose E. Marchesi" , Alexander Aring Subject: [PATCH v5 3/9] unwind deferred: Use bitmask to determine which callbacks to call References: <20250424192456.851953422@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt In order to know which registered callback requested a stacktrace for when the task goes back to user space, add a bitmask for all registered tracers. The bitmask is the size of log, which means that on a 32 bit machine, it can have at most 32 registered tracers, and on 64 bit, it can have at most 64 registered tracers. This should not be an issue as there should not be more than 10 (unless BPF can abuse this?). When a tracer registers with unwind_deferred_init() it will get a bit number assigned to it. When a tracer requests a stacktrace, it will have its bit set within the task_struct. When the task returns back to user space, it will call the callbacks for all the registered tracers where their bits are set in the task's mask. When a tracer is removed by the unwind_deferred_cancel() all current tasks will clear the associated bit, just in case another tracer gets registered immediately afterward and then gets their callback called unexpectedly. Signed-off-by: Steven Rostedt (Google) --- include/linux/sched.h | 1 + include/linux/unwind_deferred.h | 1 + kernel/unwind/deferred.c | 44 ++++++++++++++++++++++++++++++--- 3 files changed, 42 insertions(+), 4 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index a1e1c07cadfb..d3ee0c5405d6 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1649,6 +1649,7 @@ struct task_struct { =20 #ifdef CONFIG_UNWIND_USER struct unwind_task_info unwind_info; + unsigned long unwind_mask; #endif =20 /* CPU-specific state of this task: */ diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index d36784cae658..719a7cfb3164 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -13,6 +13,7 @@ typedef void (*unwind_callback_t)(struct unwind_work *wor= k, struct unwind_stackt struct unwind_work { struct list_head list; unwind_callback_t func; + int bit; }; =20 #ifdef CONFIG_UNWIND_USER diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index 2afd197da2ef..f505cb1766de 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -26,6 +26,7 @@ static DEFINE_PER_CPU(u64, unwind_ctx_ctr); /* Guards adding to and reading the list of callbacks */ static DEFINE_MUTEX(callback_mutex); static LIST_HEAD(callbacks); +static unsigned long unwind_mask; =20 /* * The context cookie is a unique identifier that is assigned to a user @@ -135,6 +136,7 @@ static void unwind_deferred_task_work(struct callback_h= ead *head) struct unwind_task_info *info =3D container_of(head, struct unwind_task_i= nfo, work); struct unwind_stacktrace trace; struct unwind_work *work; + struct task_struct *task =3D current; u64 cookie; =20 if (WARN_ON_ONCE(!info->pending)) @@ -156,7 +158,10 @@ static void unwind_deferred_task_work(struct callback_= head *head) =20 guard(mutex)(&callback_mutex); list_for_each_entry(work, &callbacks, list) { - work->func(work, &trace, cookie); + if (task->unwind_mask & (1UL << work->bit)) { + work->func(work, &trace, cookie); + clear_bit(work->bit, ¤t->unwind_mask); + } } barrier(); /* If another task work is pending, reuse the cookie and stack trace */ @@ -194,9 +199,12 @@ static int unwind_deferred_request_nmi(struct unwind_w= ork *work, u64 *cookie) *cookie =3D info->nmi_cookie; } =20 - if (info->pending) + if (current->unwind_mask & (1UL << work->bit)) return 0; =20 + if (info->pending) + goto out; + ret =3D task_work_add(current, &info->work, TWA_NMI_CURRENT); if (ret) { if (inited_cookie) @@ -205,6 +213,8 @@ static int unwind_deferred_request_nmi(struct unwind_wo= rk *work, u64 *cookie) } =20 info->pending =3D 1; + out: + set_bit(work->bit, ¤t->unwind_mask); =20 return 0; } @@ -244,14 +254,18 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *cookie) =20 *cookie =3D get_cookie(info); =20 + /* This is already queued */ + if (current->unwind_mask & (1UL << work->bit)) + return 0; + /* callback already pending? */ pending =3D READ_ONCE(info->pending); if (pending) - return 0; + goto out; =20 /* Claim the work unless an NMI just now swooped in to do so. */ if (!try_cmpxchg(&info->pending, &pending, 1)) - return 0; + goto out; =20 /* The work has been claimed, now schedule it. */ ret =3D task_work_add(current, &info->work, TWA_RESUME); @@ -260,16 +274,29 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *cookie) return ret; } =20 + out: + set_bit(work->bit, ¤t->unwind_mask); + return 0; } =20 void unwind_deferred_cancel(struct unwind_work *work) { + struct task_struct *g, *t; + if (!work) return; =20 guard(mutex)(&callback_mutex); list_del(&work->list); + + clear_bit(work->bit, &unwind_mask); + + guard(rcu)(); + /* Clear this bit from all threads */ + for_each_process_thread(g, t) { + clear_bit(work->bit, &t->unwind_mask); + } } =20 int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func) @@ -277,6 +304,14 @@ int unwind_deferred_init(struct unwind_work *work, unw= ind_callback_t func) memset(work, 0, sizeof(*work)); =20 guard(mutex)(&callback_mutex); + + /* See if there's a bit in the mask available */ + if (unwind_mask =3D=3D ~0UL) + return -EBUSY; + + work->bit =3D ffz(unwind_mask); + unwind_mask |=3D 1UL << work->bit; + list_add(&work->list, &callbacks); work->func =3D func; return 0; @@ -288,6 +323,7 @@ void unwind_task_init(struct task_struct *task) =20 memset(info, 0, sizeof(*info)); init_task_work(&info->work, unwind_deferred_task_work); + task->unwind_mask =3D 0; } =20 void unwind_task_free(struct task_struct *task) --=20 2.47.2 From nobody Sun Apr 26 09:35:42 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4EBB12918C2; Thu, 24 Apr 2025 19:24:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522657; cv=none; b=TgU+KQmFhRfdYz62N1WwaUY1KnAMuPwl8Tl0NpgTIqBASxlyaeYKF9cTmeIgdJCwXUZ95iJcLsxVZ07mkKTF5WVlvL6RQZj1Wug3cagQI3N4BL5W+W3HrZvQZQUIyF6h7HPoPw5evVkSDXaCkG8jF0tv5wHySTCwVZAvA39gxR8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522657; c=relaxed/simple; bh=dywbzU05uYdtOAPyIPR7737vlANMjFHBgRmD6TWEHjs=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=A5YhmHz6Qy9c/BjUS+GP3XLmJcl5nEr1v5wgMnAvmzbuGk5aqg8KmDFxSTZX8HzJayjti+YDQ6jJuyzpQBY5hXqkY//CcS2iT8zLOMkgFJypyGxZyNa7jcX9sjY3yky9E35Dz8Efz3Jq2rWcHs4bL8H7U6P5M0DJgv3yunFSruc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id DA519C4CEED; Thu, 24 Apr 2025 19:24:16 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1u82D7-0000000H2QC-0gGT; Thu, 24 Apr 2025 15:26:13 -0400 Message-ID: <20250424192613.014380756@goodmis.org> User-Agent: quilt/0.68 Date: Thu, 24 Apr 2025 15:25:00 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Josh Poimboeuf , x86@kernel.org, Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Indu Bhagat , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , Andrii Nakryiko , Jens Remus , Florian Weimer , Andy Lutomirski , Weinan Liu , Blake Jones , Beau Belgrave , "Jose E. Marchesi" , Alexander Aring Subject: [PATCH v5 4/9] tracing: Do not bother getting user space stacktraces for kernel threads References: <20250424192456.851953422@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt If a user space stacktrace is requested when running a kernel thread, just return, as there's no point trying to get the user space stacktrace as there is no user space. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 8ddf6b17215c..523e98cd121d 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -3087,6 +3087,10 @@ ftrace_trace_userstack(struct trace_array *tr, if (!(tr->trace_flags & TRACE_ITER_USERSTACKTRACE)) return; =20 + /* No point doing user space stacktraces on kernel threads */ + if (current->flags & PF_KTHREAD) + return; + /* * NMIs can not handle page faults, even with fix ups. * The save user stack can (and often does) fault. --=20 2.47.2 From nobody Sun Apr 26 09:35:42 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 832652918F6; Thu, 24 Apr 2025 19:24:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522657; cv=none; b=czeV8mIpHueatw0yQUgO0bYnvQ5YEcLbLdDcmRaQOzTinhLu5vAoUhvjD+/dM/O03iqpMpTEZ2489DuDzBcgnFlVjgDO7qBCkfHzmW1eDaHDQvx9m7UhzLTP4Si6Wt6Hd8hjlp9bDFoqCi4S5g8Nm9xNCSM9DyvzIpjXvTjL71o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522657; c=relaxed/simple; bh=ajvnsllmRpQFIX6Qgf3TWPxF1BZqvJ/hp9YJ40O7mK8=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Q2spTwjADK6K7Pn0kYyrHY+RIA57wbMuMcgJutEvpr+f/aGW+XzxkIDtMlMm8cvzNoQFL6sMZ8JzprTw+pfFQZe0TElQa9sLnvjTRxYbXEe95Q5mGCNApbpJ5qU0Ub0udmL5Vr1fcOwYdXJwpEYddErcYQWi0CgacZx+UYI77qc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 145B5C2BCB1; Thu, 24 Apr 2025 19:24:17 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1u82D7-0000000H2Qg-1P4M; Thu, 24 Apr 2025 15:26:13 -0400 Message-ID: <20250424192613.183589150@goodmis.org> User-Agent: quilt/0.68 Date: Thu, 24 Apr 2025 15:25:01 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Josh Poimboeuf , x86@kernel.org, Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Indu Bhagat , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , Andrii Nakryiko , Jens Remus , Florian Weimer , Andy Lutomirski , Weinan Liu , Blake Jones , Beau Belgrave , "Jose E. Marchesi" , Alexander Aring Subject: [PATCH v5 5/9] tracing: Rename __dynamic_array() to __dynamic_field() for ftrace events References: <20250424192456.851953422@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt The ftrace events (like function, trace_print, etc) are created somewhat manually and not via the TRACE_EVENT() or tracepoint magic macros. It has its own macros. The dynamic fields used __dynamic_array() to be created, but the output is different than the __dynamic_array() used by TRACE_EVENT(). The TRACE_EVENT() __dynamic_array() creates the field like: field:__data_loc u8[] v_data; offset:120; size:4; signed:0; Whereas the ftrace event is created as: field:char buf[]; offset:12; size:0; signed:0; The difference is that the ftrace field is defined as the rest of the size of the event saved in the ring buffer. TRACE_EVENT() doesn't have such a dynamic field, and its version saves a word that holds the offset into the event that the field is stored, as well as the size. For consistency rename the ftrace event macro to __dynamic_field(). This way the ftrace event can also include a __dynamic_array() later that works the same as the TRACE_EVENT() dynamic array. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.h | 4 ++-- kernel/trace/trace_entries.h | 10 +++++----- kernel/trace/trace_export.c | 12 ++++++------ 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 79be1995db44..3c733b9e7b32 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -92,8 +92,8 @@ enum trace_type { #undef __array_desc #define __array_desc(type, container, item, size) =20 -#undef __dynamic_array -#define __dynamic_array(type, item) type item[]; +#undef __dynamic_field +#define __dynamic_field(type, item) type item[]; =20 #undef __rel_dynamic_array #define __rel_dynamic_array(type, item) type item[]; diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h index 4ef4df6623a8..7100d8f86011 100644 --- a/kernel/trace/trace_entries.h +++ b/kernel/trace/trace_entries.h @@ -63,7 +63,7 @@ FTRACE_ENTRY_REG(function, ftrace_entry, F_STRUCT( __field_fn( unsigned long, ip ) __field_fn( unsigned long, parent_ip ) - __dynamic_array( unsigned long, args ) + __dynamic_field( unsigned long, args ) ), =20 F_printk(" %ps <-- %ps", @@ -81,7 +81,7 @@ FTRACE_ENTRY(funcgraph_entry, ftrace_graph_ent_entry, __field_struct( struct ftrace_graph_ent, graph_ent ) __field_packed( unsigned long, graph_ent, func ) __field_packed( unsigned int, graph_ent, depth ) - __dynamic_array(unsigned long, args ) + __dynamic_field(unsigned long, args ) ), =20 F_printk("--> %ps (%u)", (void *)__entry->func, __entry->depth) @@ -259,7 +259,7 @@ FTRACE_ENTRY(bprint, bprint_entry, F_STRUCT( __field( unsigned long, ip ) __field( const char *, fmt ) - __dynamic_array( u32, buf ) + __dynamic_field( u32, buf ) ), =20 F_printk("%ps: %s", @@ -272,7 +272,7 @@ FTRACE_ENTRY_REG(print, print_entry, =20 F_STRUCT( __field( unsigned long, ip ) - __dynamic_array( char, buf ) + __dynamic_field( char, buf ) ), =20 F_printk("%ps: %s", @@ -287,7 +287,7 @@ FTRACE_ENTRY(raw_data, raw_data_entry, =20 F_STRUCT( __field( unsigned int, id ) - __dynamic_array( char, buf ) + __dynamic_field( char, buf ) ), =20 F_printk("id:%04x %08x", diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c index 1698fc22afa0..d9d41e3ba379 100644 --- a/kernel/trace/trace_export.c +++ b/kernel/trace/trace_export.c @@ -57,8 +57,8 @@ static int ftrace_event_register(struct trace_event_call = *call, #undef __array_desc #define __array_desc(type, container, item, size) type item[size]; =20 -#undef __dynamic_array -#define __dynamic_array(type, item) type item[]; +#undef __dynamic_field +#define __dynamic_field(type, item) type item[]; =20 #undef F_STRUCT #define F_STRUCT(args...) args @@ -123,8 +123,8 @@ static void __always_unused ____ftrace_check_##name(voi= d) \ #undef __array_desc #define __array_desc(_type, _container, _item, _len) __array(_type, _item,= _len) =20 -#undef __dynamic_array -#define __dynamic_array(_type, _item) { \ +#undef __dynamic_field +#define __dynamic_field(_type, _item) { \ .type =3D #_type "[]", .name =3D #_item, \ .size =3D 0, .align =3D __alignof__(_type), \ is_signed_type(_type), .filter_type =3D FILTER_OTHER }, @@ -161,8 +161,8 @@ static struct trace_event_fields ftrace_event_fields_##= name[] =3D { \ #undef __array_desc #define __array_desc(type, container, item, len) =20 -#undef __dynamic_array -#define __dynamic_array(type, item) +#undef __dynamic_field +#define __dynamic_field(type, item) =20 #undef F_printk #define F_printk(fmt, args...) __stringify(fmt) ", " __stringify(args) --=20 2.47.2 From nobody Sun Apr 26 09:35:42 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B331B292915; Thu, 24 Apr 2025 19:24:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522657; cv=none; b=XM2NeN+rMQBRVzdUVnoksv7GIKHZl2vk5/0353I7guQ1u+yf3ewHXvEfYYvyS9aipFRSNi7zCh6AJTNZ6pdVeU6C3eJdLaZ5lpmoABtW0xVRzmAqYoBzakwL3ieMcGR2rdP2lanzUnnI1O4za67mZoPWdHtDnmHhwx06eekBYEk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522657; c=relaxed/simple; bh=Qc7t8Ng1SzoxBcFSz25R4/q3zeN6YKq+kpv22PEF44k=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=h89x6WcwXYDR9n9MbTZ4fbRiN5HxyfOdVKTrOUF/MAli+TjzhxW3fJcVbge/vFMp2KTypQa86jd6o7W4DoT8fvlTq7JIDA1wiF0/u/ZHnXV/gSfNOOFD3CIO0Y7EnsrYFHR0VrXgZEBm4fgmQuyTKbkPm8YqrgZuG8P6A02LMac= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 38491C4CEF1; Thu, 24 Apr 2025 19:24:17 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1u82D7-0000000H2RA-27eU; Thu, 24 Apr 2025 15:26:13 -0400 Message-ID: <20250424192613.356969984@goodmis.org> User-Agent: quilt/0.68 Date: Thu, 24 Apr 2025 15:25:02 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Josh Poimboeuf , x86@kernel.org, Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Indu Bhagat , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , Andrii Nakryiko , Jens Remus , Florian Weimer , Andy Lutomirski , Weinan Liu , Blake Jones , Beau Belgrave , "Jose E. Marchesi" , Alexander Aring Subject: [PATCH v5 6/9] tracing: Implement deferred user space stacktracing References: <20250424192456.851953422@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Use the unwind_deferred_*() interface to be able to trace deferred user space stacks. This creates two new ftrace events: user_unwind_cookie user_unwind_stack The user_unwind_cookie will record into the ring buffer the cookie given from unwind_deferred_request(), and the user_unwind_stack will record into the ring buffer the user space stack as well as the cookie associated with it. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 93 ++++++++++++++++++++++++++++++++++++ kernel/trace/trace.h | 12 +++++ kernel/trace/trace_entries.h | 24 ++++++++++ kernel/trace/trace_export.c | 23 +++++++++ kernel/trace/trace_output.c | 72 ++++++++++++++++++++++++++++ 5 files changed, 224 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 523e98cd121d..71340207321e 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -3077,6 +3077,66 @@ EXPORT_SYMBOL_GPL(trace_dump_stack); #ifdef CONFIG_USER_STACKTRACE_SUPPORT static DEFINE_PER_CPU(int, user_stack_count); =20 +static void trace_user_unwind_callback(struct unwind_work *unwind, + struct unwind_stacktrace *trace, + u64 ctx_cookie) +{ + struct trace_array *tr =3D container_of(unwind, struct trace_array, unwin= der); + struct trace_buffer *buffer =3D tr->array_buffer.buffer; + struct userunwind_stack_entry *entry; + struct ring_buffer_event *event; + unsigned int trace_ctx; + unsigned long *caller; + unsigned int offset; + int len; + int i; + + if (!(tr->trace_flags & TRACE_ITER_USERSTACKTRACE_DELAY)) + return; + + len =3D trace->nr * sizeof(unsigned long) + sizeof(*entry); + + trace_ctx =3D tracing_gen_ctx(); + event =3D __trace_buffer_lock_reserve(buffer, TRACE_USER_UNWIND_STACK, + len, trace_ctx); + if (!event) + return; + + entry =3D ring_buffer_event_data(event); + + entry->cookie =3D ctx_cookie; + + offset =3D sizeof(*entry); + len =3D sizeof(unsigned long) * trace->nr; + + entry->__data_loc_stack =3D offset | (len << 16); + caller =3D (void *)entry + offset; + + for (i =3D 0; i < trace->nr; i++) { + caller[i] =3D trace->entries[i]; + } + + __buffer_unlock_commit(buffer, event); +} + +static void +ftrace_trace_userstack_delay(struct trace_array *tr, + struct trace_buffer *buffer, unsigned int trace_ctx) +{ + struct userunwind_cookie_entry *entry; + struct ring_buffer_event *event; + + event =3D __trace_buffer_lock_reserve(buffer, TRACE_USER_UNWIND_COOKIE, + sizeof(*entry), trace_ctx); + if (!event) + return; + entry =3D ring_buffer_event_data(event); + + unwind_deferred_request(&tr->unwinder, &entry->cookie); + + __buffer_unlock_commit(buffer, event); +} + static void ftrace_trace_userstack(struct trace_array *tr, struct trace_buffer *buffer, unsigned int trace_ctx) @@ -3091,6 +3151,11 @@ ftrace_trace_userstack(struct trace_array *tr, if (current->flags & PF_KTHREAD) return; =20 + if (tr->trace_flags & TRACE_ITER_USERSTACKTRACE_DELAY) { + ftrace_trace_userstack_delay(tr, buffer, trace_ctx); + return; + } + /* * NMIs can not handle page faults, even with fix ups. * The save user stack can (and often does) fault. @@ -5189,6 +5254,17 @@ int trace_keep_overwrite(struct tracer *tracer, u32 = mask, int set) return 0; } =20 +static int update_unwind_deferred(struct trace_array *tr, int enabled) +{ + if (enabled) { + return unwind_deferred_init(&tr->unwinder, + trace_user_unwind_callback); + } else { + unwind_deferred_cancel(&tr->unwinder); + return 0; + } +} + int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled) { if ((mask =3D=3D TRACE_ITER_RECORD_TGID) || @@ -5224,6 +5300,19 @@ int set_tracer_flag(struct trace_array *tr, unsigned= int mask, int enabled) } } =20 + if (mask =3D=3D TRACE_ITER_USERSTACKTRACE) { + if (tr->trace_flags & TRACE_ITER_USERSTACKTRACE_DELAY) { + int ret =3D update_unwind_deferred(tr, enabled); + if (ret < 0) + return ret; + } + } + + if (mask =3D=3D TRACE_ITER_USERSTACKTRACE_DELAY) { + if (tr->trace_flags & TRACE_ITER_USERSTACKTRACE) + update_unwind_deferred(tr, enabled); + } + if (enabled) tr->trace_flags |=3D mask; else @@ -9890,6 +9979,10 @@ static int __remove_instance(struct trace_array *tr) if (tr->ref > 1 || (tr->current_trace && tr->trace_ref)) return -EBUSY; =20 + if ((tr->flags & (TRACE_ITER_USERSTACKTRACE & TRACE_ITER_USERSTACKTRACE_D= ELAY)) =3D=3D + (TRACE_ITER_USERSTACKTRACE & TRACE_ITER_USERSTACKTRACE_DELAY)) + unwind_deferred_cancel(&tr->unwinder); + list_del(&tr->list); =20 /* Disable all the flags that were enabled coming in */ diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 3c733b9e7b32..3f0941c9215c 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -49,7 +50,10 @@ enum trace_type { TRACE_GRAPH_ENT, TRACE_GRAPH_RETADDR_ENT, TRACE_USER_STACK, + /* trace-cmd manually adds blktrace after USER_STACK */ TRACE_BLK, + TRACE_USER_UNWIND_STACK, + TRACE_USER_UNWIND_COOKIE, TRACE_BPUTS, TRACE_HWLAT, TRACE_OSNOISE, @@ -92,6 +96,9 @@ enum trace_type { #undef __array_desc #define __array_desc(type, container, item, size) =20 +#undef __dynamic_array +#define __dynamic_array(type, item) u32 __data_loc_##item; + #undef __dynamic_field #define __dynamic_field(type, item) type item[]; =20 @@ -435,6 +442,7 @@ struct trace_array { struct cond_snapshot *cond_snapshot; #endif struct trace_func_repeats __percpu *last_func_repeats; + struct unwind_work unwinder; /* * On boot up, the ring buffer is set to the minimum size, so that * we do not waste memory on systems that are not using tracing. @@ -526,6 +534,9 @@ extern void __ftrace_bad_type(void); IF_ASSIGN(var, ent, struct ctx_switch_entry, 0); \ IF_ASSIGN(var, ent, struct stack_entry, TRACE_STACK); \ IF_ASSIGN(var, ent, struct userstack_entry, TRACE_USER_STACK);\ + IF_ASSIGN(var, ent, struct userunwind_stack_entry, TRACE_USER_UNWIND_STA= CK);\ + IF_ASSIGN(var, ent, struct userunwind_cookie_entry, TRACE_USER_UNWIND_CO= OKIE);\ + IF_ASSIGN(var, ent, struct userstack_entry, TRACE_USER_STACK);\ IF_ASSIGN(var, ent, struct print_entry, TRACE_PRINT); \ IF_ASSIGN(var, ent, struct bprint_entry, TRACE_BPRINT); \ IF_ASSIGN(var, ent, struct bputs_entry, TRACE_BPUTS); \ @@ -1356,6 +1367,7 @@ extern int trace_get_user(struct trace_parser *parser= , const char __user *ubuf, C(PRINTK, "trace_printk"), \ C(ANNOTATE, "annotate"), \ C(USERSTACKTRACE, "userstacktrace"), \ + C(USERSTACKTRACE_DELAY, "userstacktrace_delay"),\ C(SYM_USEROBJ, "sym-userobj"), \ C(PRINTK_MSGONLY, "printk-msg-only"), \ C(CONTEXT_INFO, "context-info"), /* Print pid/cpu/time */ \ diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h index 7100d8f86011..752a99296c95 100644 --- a/kernel/trace/trace_entries.h +++ b/kernel/trace/trace_entries.h @@ -249,6 +249,30 @@ FTRACE_ENTRY(user_stack, userstack_entry, (void *)__entry->caller[6], (void *)__entry->caller[7]) ); =20 +FTRACE_ENTRY(user_unwind_stack, userunwind_stack_entry, + + TRACE_USER_UNWIND_STACK, + + F_STRUCT( + __field( u64, cookie ) + __dynamic_array( unsigned long, stack ) + ), + + F_printk("cookie=3D%lld\n%s", __entry->cookie, + __print_dynamic_array(stack, sizeof(unsigned long))) +); + +FTRACE_ENTRY(user_unwind_cookie, userunwind_cookie_entry, + + TRACE_USER_UNWIND_COOKIE, + + F_STRUCT( + __field( u64, cookie ) + ), + + F_printk("cookie=3D%lld", __entry->cookie) +); + /* * trace_printk entry: */ diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c index d9d41e3ba379..831999f84e2c 100644 --- a/kernel/trace/trace_export.c +++ b/kernel/trace/trace_export.c @@ -57,6 +57,9 @@ static int ftrace_event_register(struct trace_event_call = *call, #undef __array_desc #define __array_desc(type, container, item, size) type item[size]; =20 +#undef __dynamic_array +#define __dynamic_array(type, item) u32 __data_loc_##item; + #undef __dynamic_field #define __dynamic_field(type, item) type item[]; =20 @@ -66,6 +69,16 @@ static int ftrace_event_register(struct trace_event_call= *call, #undef F_printk #define F_printk(fmt, args...) fmt, args =20 +/* Only used for ftrace event format output */ +static inline char * __print_dynamic_array(int array, size_t size) +{ + return NULL; +} + +#undef __print_dynamic_array +#define __print_dynamic_array(array, el_size) \ + __print_dynamic_array(__entry->__data_loc_##array, el_size) + #undef FTRACE_ENTRY #define FTRACE_ENTRY(name, struct_name, id, tstruct, print) \ struct ____ftrace_##name { \ @@ -74,6 +87,7 @@ struct ____ftrace_##name { \ static void __always_unused ____ftrace_check_##name(void) \ { \ struct ____ftrace_##name *__entry =3D NULL; \ + struct trace_seq __maybe_unused *p =3D NULL; \ \ /* force compile-time check on F_printk() */ \ printk(print); \ @@ -123,6 +137,12 @@ static void __always_unused ____ftrace_check_##name(vo= id) \ #undef __array_desc #define __array_desc(_type, _container, _item, _len) __array(_type, _item,= _len) =20 +#undef __dynamic_array +#define __dynamic_array(_type, _item) { \ + .type =3D "__data_loc " #_type "[]", .name =3D #_item, \ + .size =3D 4, .align =3D __alignof__(4), \ + is_signed_type(_type), .filter_type =3D FILTER_OTHER }, + #undef __dynamic_field #define __dynamic_field(_type, _item) { \ .type =3D #_type "[]", .name =3D #_item, \ @@ -161,6 +181,9 @@ static struct trace_event_fields ftrace_event_fields_##= name[] =3D { \ #undef __array_desc #define __array_desc(type, container, item, len) =20 +#undef __dynamic_array +#define __dynamic_array(type, item) + #undef __dynamic_field #define __dynamic_field(type, item) =20 diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index fee40ffbd490..e11911e5f7d0 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -1374,6 +1374,58 @@ static struct trace_event trace_stack_event =3D { }; =20 /* TRACE_USER_STACK */ +static enum print_line_t trace_user_unwind_stack_print(struct trace_iterat= or *iter, + int flags, struct trace_event *event) +{ + struct userunwind_stack_entry *field; + struct trace_seq *s =3D &iter->seq; + unsigned long *caller; + unsigned int offset; + unsigned int len; + unsigned int caller_cnt; + unsigned int i; + + trace_assign_type(field, iter->ent); + + trace_seq_puts(s, "\n"); + + trace_seq_printf(s, "cookie=3D%llx\n", field->cookie); + + /* The stack field is a dynamic pointer */ + offset =3D field->__data_loc_stack; + len =3D offset >> 16; + offset =3D offset & 0xffff; + caller_cnt =3D len / sizeof(*caller); + + caller =3D (void *)iter->ent + offset; + + for (i =3D 0; i < caller_cnt; i++) { + unsigned long ip =3D caller[i]; + + if (!ip || trace_seq_has_overflowed(s)) + break; + + trace_seq_puts(s, " =3D> "); + seq_print_user_ip(s, NULL, ip, flags); + trace_seq_putc(s, '\n'); + } + + return trace_handle_return(s); +} + +static enum print_line_t trace_user_unwind_cookie_print(struct trace_itera= tor *iter, + int flags, struct trace_event *event) +{ + struct userunwind_cookie_entry *field; + struct trace_seq *s =3D &iter->seq; + + trace_assign_type(field, iter->ent); + + trace_seq_printf(s, "cookie=3D%llx\n", field->cookie); + + return trace_handle_return(s); +} + static enum print_line_t trace_user_stack_print(struct trace_iterator *ite= r, int flags, struct trace_event *event) { @@ -1417,6 +1469,24 @@ static enum print_line_t trace_user_stack_print(stru= ct trace_iterator *iter, return trace_handle_return(s); } =20 +static struct trace_event_functions trace_userunwind_stack_funcs =3D { + .trace =3D trace_user_unwind_stack_print, +}; + +static struct trace_event trace_userunwind_stack_event =3D { + .type =3D TRACE_USER_UNWIND_STACK, + .funcs =3D &trace_userunwind_stack_funcs, +}; + +static struct trace_event_functions trace_userunwind_cookie_funcs =3D { + .trace =3D trace_user_unwind_cookie_print, +}; + +static struct trace_event trace_userunwind_cookie_event =3D { + .type =3D TRACE_USER_UNWIND_COOKIE, + .funcs =3D &trace_userunwind_cookie_funcs, +}; + static struct trace_event_functions trace_user_stack_funcs =3D { .trace =3D trace_user_stack_print, }; @@ -1816,6 +1886,8 @@ static struct trace_event *events[] __initdata =3D { &trace_ctx_event, &trace_wake_event, &trace_stack_event, + &trace_userunwind_cookie_event, + &trace_userunwind_stack_event, &trace_user_stack_event, &trace_bputs_event, &trace_bprint_event, --=20 2.47.2 From nobody Sun Apr 26 09:35:42 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D539B292935; Thu, 24 Apr 2025 19:24:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522657; cv=none; b=TuJ8Px38LwT0i6C6vvfkI/b51KsoWUrnJp5KjHb83LDEa6fq9jA5WRCnAkL4CXO3LA4igf1w6K1ABf8tKQq9PyNTH0M5NJyNzOM+J+7tdk25sefdWB5muu3pI94sGCnnMHtqTpU8RIzKDIhl0SlQcL31/IEWaAtPZLeYVI0HrEA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522657; c=relaxed/simple; bh=dfMNrL0VGe5/RSdy86NJBFdTFayMrVUvzrtLKBAtJ7I=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=BvCL8AG9RMQlNewtse/Z6Dmt9f1Yi+PQqhYnajea7FmouIe49kUVyLChgyqUVloZ0aQQyoO8ijrngVGPxvyOPS1/jhIwf3g/ohLNo9EMBtL1BP3MZhZ4w0vPp4jEiHhlqbxKO5T55YeWy76YfXwrExgSPHxPr2CD7bcCt7pA+fo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 66496C4AF12; Thu, 24 Apr 2025 19:24:17 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1u82D7-0000000H2Rf-2peF; Thu, 24 Apr 2025 15:26:13 -0400 Message-ID: <20250424192613.526636593@goodmis.org> User-Agent: quilt/0.68 Date: Thu, 24 Apr 2025 15:25:03 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Josh Poimboeuf , x86@kernel.org, Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Indu Bhagat , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , Andrii Nakryiko , Jens Remus , Florian Weimer , Andy Lutomirski , Weinan Liu , Blake Jones , Beau Belgrave , "Jose E. Marchesi" , Alexander Aring Subject: [PATCH v5 7/9] mm: Add guard for mmap_read_lock References: <20250424192456.851953422@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf This is the new way of doing things. Converting all existing mmap_read_lock users is an exercise left for the reader ;-) Suggested-by: Peter Zijlstra Signed-off-by: Josh Poimboeuf --- include/linux/mmap_lock.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 4706c6769902..cfa5ab84054a 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -222,4 +222,6 @@ static inline int mmap_lock_is_contended(struct mm_stru= ct *mm) return rwsem_is_contended(&mm->mmap_lock); } =20 +DEFINE_GUARD(mmap_read_lock, struct mm_struct *, mmap_read_lock(_T), mmap_= read_unlock(_T)) + #endif /* _LINUX_MMAP_LOCK_H */ --=20 2.47.2 From nobody Sun Apr 26 09:35:42 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19679293B50; Thu, 24 Apr 2025 19:24:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522658; cv=none; b=KbTrl/0UK3JAXjIyovO3sDACwNK77D+E7mtfY1pzDDiezmbROw4UB56h3Wol4uz0AgZ8Pk2N7jWe5AoxYzfjy9n2M04sGHIycIf3mm4ZjHXmvVvCEMISOqAjMFRhKQr2FEPAb3tkVgccZ1gziI/jD1nlzgUx73fpWFCK3SVTLI4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522658; c=relaxed/simple; bh=BG0zUEBPIwP0oa+bn506OOY7CMuENYzp2nHa7OnXsQA=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=S8LOnWO58LUtkGi8yV6EM4efNQbGA5ohVFPy5i6Ph59i50XbiKCGuYXOe8Ve9a/DYZdD0lT8mnl3iJ4nYSwpD1MgwGiT97pd5zsV1+faixInrvqKhE/toVmWp22DF5yizCdTw9MCUdZSQBNYCLw+EoSbvSxCwhIw3OL7jh7JMf0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 91BD7C4AF13; Thu, 24 Apr 2025 19:24:17 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1u82D7-0000000H2SA-3Yej; Thu, 24 Apr 2025 15:26:13 -0400 Message-ID: <20250424192613.695828192@goodmis.org> User-Agent: quilt/0.68 Date: Thu, 24 Apr 2025 15:25:04 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Josh Poimboeuf , x86@kernel.org, Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Indu Bhagat , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , Andrii Nakryiko , Jens Remus , Florian Weimer , Andy Lutomirski , Weinan Liu , Blake Jones , Beau Belgrave , "Jose E. Marchesi" , Alexander Aring Subject: [PATCH v5 8/9] tracing: Have deferred user space stacktrace show file offsets References: <20250424192456.851953422@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Instead of showing the IP address of the user space stack trace, which is where ever it was mapped by the kernel, show the offsets of where it would be in the file. Instead of: trace-cmd-1066 [007] ..... 67.770256: cookie=3D7000000000009 =3D> <00007fdbd0d421ca> =3D> <00007fdbd0f3be27> =3D> <00005635ece557e7> =3D> <00005635ece559d3> =3D> <00005635ece56523> =3D> <00005635ece6479d> =3D> <00005635ece64b01> =3D> <00005635ece64bc0> =3D> <00005635ece53b7e> =3D> <00007fdbd0c6bca8> Which is the addresses of the functions in the virtual address space of the process. Have it record: trace-cmd-1090 [003] ..... 180.779876: cookie=3D3000000000009 =3D> <00000000001001ca> =3D> <000000000000ae27> =3D> <00000000000107e7> =3D> <00000000000109d3> =3D> <0000000000011523> =3D> <000000000001f79d> =3D> <000000000001fb01> =3D> <000000000001fbc0> =3D> <000000000000eb7e> =3D> <0000000000029ca8> Which is the offset from code where it was mapped at. To find this address, the mmap_read_lock is taken and the vma is searched for the addresses. Then what is recorded is simply: (addr - vma->vm_start) + (vma->vm_pgoff << PAGE_SHIFT); Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 71340207321e..f9eb0f7d649c 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -3085,18 +3085,27 @@ static void trace_user_unwind_callback(struct unwin= d_work *unwind, struct trace_buffer *buffer =3D tr->array_buffer.buffer; struct userunwind_stack_entry *entry; struct ring_buffer_event *event; + struct mm_struct *mm =3D current->mm; unsigned int trace_ctx; + struct vm_area_struct *vma =3D NULL; unsigned long *caller; unsigned int offset; int len; int i; =20 + /* This should never happen */ + if (!mm) + return; + if (!(tr->trace_flags & TRACE_ITER_USERSTACKTRACE_DELAY)) return; =20 len =3D trace->nr * sizeof(unsigned long) + sizeof(*entry); =20 trace_ctx =3D tracing_gen_ctx(); + + guard(mmap_read_lock)(mm); + event =3D __trace_buffer_lock_reserve(buffer, TRACE_USER_UNWIND_STACK, len, trace_ctx); if (!event) @@ -3113,7 +3122,16 @@ static void trace_user_unwind_callback(struct unwind= _work *unwind, caller =3D (void *)entry + offset; =20 for (i =3D 0; i < trace->nr; i++) { - caller[i] =3D trace->entries[i]; + unsigned long addr =3D trace->entries[i]; + + if (!vma || addr < vma->vm_start || addr >=3D vma->vm_end) + vma =3D vma_lookup(mm, addr); + + if (!vma) { + caller[i] =3D addr; + continue; + } + caller[i] =3D (addr - vma->vm_start) + (vma->vm_pgoff << PAGE_SHIFT); } =20 __buffer_unlock_commit(buffer, event); --=20 2.47.2 From nobody Sun Apr 26 09:35:42 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E5C3293B5A; Thu, 24 Apr 2025 19:24:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522658; cv=none; b=SBOeinVxqnjUCyQodjtwx6rE8oOfVaPGZbMq7p2zoBR+T+5MHZlY9u/HlQwAr8tS/cvJ8zSs256trV3Qih5gLgCXOSAjf82IhfSsoDqRdTjiw4x5HeR6FIQExLpWKru5slHlXg5b+E5m9lBuvj4d47VtkjiNBBaSZ966+msPSAw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745522658; c=relaxed/simple; bh=kJdHyyr5g9cwZZEEx8rTPCm7ZWJNtMxA7FZ5ZyNruPY=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=K+5RmGcVv4awthZqKsS4ExcwdH4wZYvOVyHWJUkvf6sT7cRTOZSUJZjwwUoSMrFxgDd/27djBMNOLZq5O2uHMEKvNRzx11TBnwkYsVsDwoN50rTHgKL3537bc7oeS8KLPeulcIkI0vOHYxocU4QLZskW+GgP+aqA25hSQ2wm44Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id C660DC4AF14; Thu, 24 Apr 2025 19:24:17 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1u82D8-0000000H2Sf-04ZF; Thu, 24 Apr 2025 15:26:14 -0400 Message-ID: <20250424192613.869730948@goodmis.org> User-Agent: quilt/0.68 Date: Thu, 24 Apr 2025 15:25:05 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Josh Poimboeuf , x86@kernel.org, Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Indu Bhagat , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , Andrii Nakryiko , Jens Remus , Florian Weimer , Andy Lutomirski , Weinan Liu , Blake Jones , Beau Belgrave , "Jose E. Marchesi" , Alexander Aring Subject: [PATCH v5 9/9] tracing: Show inode and device major:minor in deferred user space stacktrace References: <20250424192456.851953422@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt The deferred user space stacktrace event already does a lookup of the vma for each address in the trace to get the file offset for those addresses, it can also report the file itself. Add two more arrays to the user space stacktrace event. One for the inode number, and the other to store the device major:minor number. Now the output looks like this: trace-cmd-1108 [007] ..... 240.253487: cookie=3D7000000000009 =3D> <00000000001001ca> : 1340007 : 254:3 =3D> <000000000000ae27> : 1308548 : 254:3 =3D> <00000000000107e7> : 1440347 : 254:3 =3D> <00000000000109d3> : 1440347 : 254:3 =3D> <0000000000011523> : 1440347 : 254:3 =3D> <000000000001f79d> : 1440347 : 254:3 =3D> <000000000001fb01> : 1440347 : 254:3 =3D> <000000000001fbc0> : 1440347 : 254:3 =3D> <000000000000eb7e> : 1440347 : 254:3 =3D> <0000000000029ca8> : 1340007 : 254:3 Use space tooling can use this information to get the actual functions from the files. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 25 ++++++++++++++++++++++++- kernel/trace/trace_entries.h | 8 ++++++-- kernel/trace/trace_output.c | 27 +++++++++++++++++++++++++++ 3 files changed, 57 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index f9eb0f7d649c..cb0255106b7f 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -3089,6 +3089,8 @@ static void trace_user_unwind_callback(struct unwind_= work *unwind, unsigned int trace_ctx; struct vm_area_struct *vma =3D NULL; unsigned long *caller; + unsigned long *inodes; + unsigned int *devs; unsigned int offset; int len; int i; @@ -3100,7 +3102,8 @@ static void trace_user_unwind_callback(struct unwind_= work *unwind, if (!(tr->trace_flags & TRACE_ITER_USERSTACKTRACE_DELAY)) return; =20 - len =3D trace->nr * sizeof(unsigned long) + sizeof(*entry); + len =3D trace->nr * (sizeof(unsigned long) * 2 + sizeof(unsigned int)) + + sizeof(*entry); =20 trace_ctx =3D tracing_gen_ctx(); =20 @@ -3121,6 +3124,15 @@ static void trace_user_unwind_callback(struct unwind= _work *unwind, entry->__data_loc_stack =3D offset | (len << 16); caller =3D (void *)entry + offset; =20 + offset +=3D len; + entry->__data_loc_inodes =3D offset | (len << 16); + inodes =3D (void *)entry + offset; + + offset +=3D len; + len =3D sizeof(unsigned int) * trace->nr; + entry->__data_loc_dev =3D offset | (len << 16); + devs =3D (void *)entry + offset; + for (i =3D 0; i < trace->nr; i++) { unsigned long addr =3D trace->entries[i]; =20 @@ -3129,9 +3141,20 @@ static void trace_user_unwind_callback(struct unwind= _work *unwind, =20 if (!vma) { caller[i] =3D addr; + inodes[i] =3D 0; + devs[i] =3D 0; continue; } + caller[i] =3D (addr - vma->vm_start) + (vma->vm_pgoff << PAGE_SHIFT); + + if (vma->vm_file && vma->vm_file->f_inode) { + inodes[i] =3D vma->vm_file->f_inode->i_ino; + devs[i] =3D vma->vm_file->f_inode->i_sb->s_dev; + } else { + inodes[i] =3D 0; + devs[i] =3D 0; + } } =20 __buffer_unlock_commit(buffer, event); diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h index 752a99296c95..f2dc09405128 100644 --- a/kernel/trace/trace_entries.h +++ b/kernel/trace/trace_entries.h @@ -256,10 +256,14 @@ FTRACE_ENTRY(user_unwind_stack, userunwind_stack_entr= y, F_STRUCT( __field( u64, cookie ) __dynamic_array( unsigned long, stack ) + __dynamic_array( unsigned long, inodes ) + __dynamic_array( unsigned int, dev ) ), =20 - F_printk("cookie=3D%lld\n%s", __entry->cookie, - __print_dynamic_array(stack, sizeof(unsigned long))) + F_printk("cookie=3D%lld\n%s%s%s", __entry->cookie, + __print_dynamic_array(stack, sizeof(unsigned long)), + __print_dynamic_array(inodes, sizeof(unsigned long)), + __print_dynamic_array(dev, sizeof(unsigned long))) ); =20 FTRACE_ENTRY(user_unwind_cookie, userunwind_cookie_entry, diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index e11911e5f7d0..4bdbc6c48cdb 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -1380,9 +1380,13 @@ static enum print_line_t trace_user_unwind_stack_pri= nt(struct trace_iterator *it struct userunwind_stack_entry *field; struct trace_seq *s =3D &iter->seq; unsigned long *caller; + unsigned long *inodes; + unsigned int *devs; unsigned int offset; unsigned int len; unsigned int caller_cnt; + unsigned int inode_cnt; + unsigned int dev_cnt; unsigned int i; =20 trace_assign_type(field, iter->ent); @@ -1399,6 +1403,21 @@ static enum print_line_t trace_user_unwind_stack_pri= nt(struct trace_iterator *it =20 caller =3D (void *)iter->ent + offset; =20 + /* The inodes and devices are also dynamic pointers */ + offset =3D field->__data_loc_inodes; + len =3D offset >> 16; + offset =3D offset & 0xffff; + inode_cnt =3D len / sizeof(*inodes); + + inodes =3D (void *)iter->ent + offset; + + offset =3D field->__data_loc_dev; + len =3D offset >> 16; + offset =3D offset & 0xffff; + dev_cnt =3D len / sizeof(*devs); + + devs =3D (void *)iter->ent + offset; + for (i =3D 0; i < caller_cnt; i++) { unsigned long ip =3D caller[i]; =20 @@ -1407,6 +1426,14 @@ static enum print_line_t trace_user_unwind_stack_pri= nt(struct trace_iterator *it =20 trace_seq_puts(s, " =3D> "); seq_print_user_ip(s, NULL, ip, flags); + + if (i < inode_cnt) { + trace_seq_printf(s, " : %ld", inodes[i]); + if (i < dev_cnt) { + trace_seq_printf(s, " : %d:%d", + MAJOR(devs[i]), MINOR(devs[i])); + } + } trace_seq_putc(s, '\n'); } =20 --=20 2.47.2