From nobody Sat Oct 11 08:27:36 2025 Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1C7217BB6; Wed, 11 Jun 2025 01:03:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603785; cv=none; b=ZPhMSIGSCwx9QadXBu/J7TEBybDkRUjXq/euJRK7iKt+EweqgCAd971ZCj5tC/h1WhUMT/bkp4tuJswBfWN5mdmtv1crlhzzZxD2snd6914hE4I5G3zZZzzpikYf+HWBTW7sxUF2FXSyPFzElIF4hQD4aHMDI31qDbnnl7LJMcA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603785; c=relaxed/simple; bh=AyHqgtoVY8VieXZI0kCzFifYIGg38Q2Mm5KsA7ZPhWM=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=DUfsaTx/Xaaz78HWv4IRt/SK0/E/jvt7yQHejXCQMqsPcnSfoV+rfty3d79MuhC1iqeCe5lcNAgmE1nLpAuXftkA5aKGOgl7YPc+kJtUZMS5b4WIJKp4JY+ViWTiOd8t0M04J+wRxLy4N18s1PL6wZ60jwUOCHcq2Toijnkz9P8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3C2DF161479; Wed, 11 Jun 2025 01:02:59 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf15.hostedemail.com (Postfix) with ESMTPA id 2C5FE17; Wed, 11 Jun 2025 01:02:56 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uP9tE-00000000v7m-0J2I; Tue, 10 Jun 2025 21:04:28 -0400 Message-ID: <20250611010427.923519889@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 10 Jun 2025 20:54:22 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton Subject: [PATCH v10 01/14] unwind_user: Add user space unwinding API References: <20250611005421.144238328@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Server: rspamout01 X-Rspamd-Queue-Id: 2C5FE17 X-Stat-Signature: o8opz6oua8crdq3fft4epmfwhi3d37z4 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX181uCPnTOZlY+s0XxsdUKjikqA8ZBlJOFY= X-HE-Tag: 1749603776-202463 X-HE-Meta: U2FsdGVkX18LSM9wrvoebXmeg9hhppJB4N88SZahvgAptmkGIIxMMtHzTMGF1Nhk0LFwzk/pbiOry1Ia6BcaVRSSnl+dQLTq+tWNSKZwBjYcsdTTbc+qWzBFOgUxRF9tv5XtleI+LpfHypbPeI64FxdmaMIOCki4NY637jUhuJnrrLz0IrAtbXZgGTRi7+9fRdBdIY0XttKlsJqAcgTwkgkAiRVHhzxv9uhQDYg8VF6x3ZHXfOSfHz9XQOq4QfhubmfEsJj93Wxxo3Kdid6RQVNnQ3yNQ4XSsFTxY0ULPfO6UGI+v44+BObD2V7HLxjJ04Uoig//cgMmAb6nIcJ4fPPx//Zlw0c1BxsUVV27EU67UcdXcJKwejM6DfpUojYnkEboKwLzRvfC3ZRQx5gvR94jjh3yl1Qw8rRsTuNC4ysFWm2/7GQ01nFxxvwhQFEC7pJUaM0lSw0= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Introduce a generic API for unwinding user stacks. In order to expand user space unwinding to be able to handle more complex scenarios, such as deferred unwinding and reading user space information, create a generic interface that all architectures can use that support the various unwinding methods. This is an alternative method for handling user space stack traces from the simple stack_trace_save_user() API. This does not replace that interface, but this interface will be used to expand the functionality of user space stack walking. None of the structures introduced will be exposed to user space tooling. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- MAINTAINERS | 8 +++++ arch/Kconfig | 3 ++ include/linux/unwind_user.h | 15 +++++++++ include/linux/unwind_user_types.h | 31 +++++++++++++++++ kernel/Makefile | 1 + kernel/unwind/Makefile | 1 + kernel/unwind/user.c | 55 +++++++++++++++++++++++++++++++ 7 files changed, 114 insertions(+) create mode 100644 include/linux/unwind_user.h create mode 100644 include/linux/unwind_user_types.h create mode 100644 kernel/unwind/Makefile create mode 100644 kernel/unwind/user.c diff --git a/MAINTAINERS b/MAINTAINERS index a92290fffa16..8617f87bceed 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -25885,6 +25885,14 @@ F: Documentation/driver-api/uio-howto.rst F: drivers/uio/ F: include/linux/uio_driver.h =20 +USERSPACE STACK UNWINDING +M: Josh Poimboeuf +M: Steven Rostedt +S: Maintained +F: include/linux/unwind*.h +F: kernel/unwind/ + + UTIL-LINUX PACKAGE M: Karel Zak L: util-linux@vger.kernel.org diff --git a/arch/Kconfig b/arch/Kconfig index a3308a220f86..ea59e5d7cc69 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -435,6 +435,9 @@ config HAVE_HARDLOCKUP_DETECTOR_ARCH It uses the same command line parameters, and sysctl interface, as the generic hardlockup detectors. =20 +config UNWIND_USER + bool + config HAVE_PERF_REGS bool help diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h new file mode 100644 index 000000000000..aa7923c1384f --- /dev/null +++ b/include/linux/unwind_user.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_UNWIND_USER_H +#define _LINUX_UNWIND_USER_H + +#include + +int unwind_user_start(struct unwind_user_state *state); +int unwind_user_next(struct unwind_user_state *state); + +int unwind_user(struct unwind_stacktrace *trace, unsigned int max_entries); + +#define for_each_user_frame(state) \ + for (unwind_user_start((state)); !(state)->done; unwind_user_next((state)= )) + +#endif /* _LINUX_UNWIND_USER_H */ diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_= types.h new file mode 100644 index 000000000000..6ed1b4ae74e1 --- /dev/null +++ b/include/linux/unwind_user_types.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_UNWIND_USER_TYPES_H +#define _LINUX_UNWIND_USER_TYPES_H + +#include + +enum unwind_user_type { + UNWIND_USER_TYPE_NONE, +}; + +struct unwind_stacktrace { + unsigned int nr; + unsigned long *entries; +}; + +struct unwind_user_frame { + s32 cfa_off; + s32 ra_off; + s32 fp_off; + bool use_fp; +}; + +struct unwind_user_state { + unsigned long ip; + unsigned long sp; + unsigned long fp; + enum unwind_user_type type; + bool done; +}; + +#endif /* _LINUX_UNWIND_USER_TYPES_H */ diff --git a/kernel/Makefile b/kernel/Makefile index 32e80dd626af..541186050251 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -55,6 +55,7 @@ obj-y +=3D rcu/ obj-y +=3D livepatch/ obj-y +=3D dma/ obj-y +=3D entry/ +obj-y +=3D unwind/ obj-$(CONFIG_MODULES) +=3D module/ =20 obj-$(CONFIG_KCMP) +=3D kcmp.o diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile new file mode 100644 index 000000000000..349ce3677526 --- /dev/null +++ b/kernel/unwind/Makefile @@ -0,0 +1 @@ + obj-$(CONFIG_UNWIND_USER) +=3D user.o diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c new file mode 100644 index 000000000000..d30449328981 --- /dev/null +++ b/kernel/unwind/user.c @@ -0,0 +1,55 @@ +// SPDX-License-Identifier: GPL-2.0 +/* +* Generic interfaces for unwinding user space +*/ +#include +#include +#include +#include + +int unwind_user_next(struct unwind_user_state *state) +{ + /* no implementation yet */ + return -EINVAL; +} + +int unwind_user_start(struct unwind_user_state *state) +{ + struct pt_regs *regs =3D task_pt_regs(current); + + memset(state, 0, sizeof(*state)); + + if ((current->flags & PF_KTHREAD) || !user_mode(regs)) { + state->done =3D true; + return -EINVAL; + } + + state->type =3D UNWIND_USER_TYPE_NONE; + + state->ip =3D instruction_pointer(regs); + state->sp =3D user_stack_pointer(regs); + state->fp =3D frame_pointer(regs); + + return 0; +} + +int unwind_user(struct unwind_stacktrace *trace, unsigned int max_entries) +{ + struct unwind_user_state state; + + trace->nr =3D 0; + + if (!max_entries) + return -EINVAL; + + if (current->flags & PF_KTHREAD) + return 0; + + for_each_user_frame(&state) { + trace->entries[trace->nr++] =3D state.ip; + if (trace->nr >=3D max_entries) + break; + } + + return 0; +} --=20 2.47.2 From nobody Sat Oct 11 08:27:36 2025 Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 83140183098; Wed, 11 Jun 2025 01:03:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603789; cv=none; b=oPsibl6sgCOtAxQ52/ywNsR4dxBmPQhzu1bV2T056OcSoGADbxFUzoZCzOmN6Ah4bCIUfOJsE4a7rSIFKQPzmMZMpgsOmuG00K4GDcI3tXUrO/ge9/G580/YRYaQi3CiZYyQlHZ3Lc6s6OI7rdljXJLzceRv0dfc2suUJrJ9I3I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603789; c=relaxed/simple; bh=/EpOwL6wYbL+/lgQQvJ+DJWzgdhfhGfcXoAy1FAyDhU=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=DysSWoX/E5XF/yqBsx0oCc8PtWR/FvIxC9MVa44eG20H4Ec5B7lSZb3CsfU5DP1VOerYnaRG0f8KDfpB6JXrDb/13i3tov7brUDLu7ZHAt9qAwSuYREi5wLYXEbBOM+F31xfZK2nKmRtuyCPfvpRw5aOgvGLc/y5kC/eyh4Tnj4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 57B511D7AA0; Wed, 11 Jun 2025 01:02:59 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf13.hostedemail.com (Postfix) with ESMTPA id 3910220010; Wed, 11 Jun 2025 01:02:56 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uP9tE-00000000v8F-10XJ; Tue, 10 Jun 2025 21:04:28 -0400 Message-ID: <20250611010428.092934995@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 10 Jun 2025 20:54:23 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton Subject: [PATCH v10 02/14] unwind_user: Add frame pointer support References: <20250611005421.144238328@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 3910220010 X-Stat-Signature: ic673n3y31s4ws8r59hk49ybptk6xru9 X-Rspamd-Server: rspamout06 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX19Lz042JoZ3KZn7rLbr13XgkoRnvmAhjnI= X-HE-Tag: 1749603776-60119 X-HE-Meta: U2FsdGVkX1/Z0y2jp9w6bGKmP/OO0jhQx4cDxClTJySQ8q08wYCTgGdTafH748ZYw6cRP5G3j1tBkXU2UftLnoy6wZivqq1nowVfiK8XpgtQztzN6kFqSXxxOSQl/6kHf4SaZWRgbtzL9+PsgFDx+Y2FNS2B+BvBZni+nDNXKokGPGkn9tjvKvh5iOwROjZbOTQGPi1wwMzZIPXfsLiV+/VmUB6BS/OJT8MXSNWLr+U+ZEHVGcj621TQ7dx/5ZGX03kIQ+L7cuberCEXPSSKDi6F2UfBpxukKODkMhbQKGDmgkh2jSz0O4OBK+aLmtFmE+sg+1ia56T78biAxDZjmf8a0Lp3CplPsbOo/Quw22E3CKb2lwVVJhHfRaBfeUxHUoA3cavDRB1c0n4XSxOO7zOwwgdYuld3YAtP55dzq4M= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Add optional support for user space frame pointer unwinding. If supported, the arch needs to enable CONFIG_HAVE_UNWIND_USER_FP and define ARCH_INIT_USER_FP_FRAME. By encoding the frame offsets in struct unwind_user_frame, much of this code can also be reused for future unwinder implementations like sframe. Signed-off-by: Josh Poimboeuf Co-developed-by: Steven Rostedt (Google) Signed-off-by: Steven Rostedt (Google) --- Changes since v9: https://lore.kernel.org/linux-trace-kernel/20250513223551= .290698040@goodmis.org/ As asm-generic headers are not included when an architecture defines the header, having more than one #ifndef and setting variables does not work with those checks in the asm-generic header and the architecture header does not define all the values. - Move ARCH_INIT_USER_FP_FRAME check to linux/user_unwind.h - Have linux/user_unwind.h include asm/user_unwind.h and not have C files have to call the asm header directly - Remove unnecessary frame initialization - Added unwind_user.h to asm-generic/Kbuild arch/Kconfig | 4 +++ include/asm-generic/Kbuild | 1 + include/asm-generic/unwind_user.h | 5 ++++ include/linux/unwind_user.h | 5 ++++ include/linux/unwind_user_types.h | 1 + kernel/unwind/user.c | 49 +++++++++++++++++++++++++++++-- 6 files changed, 63 insertions(+), 2 deletions(-) create mode 100644 include/asm-generic/unwind_user.h diff --git a/arch/Kconfig b/arch/Kconfig index ea59e5d7cc69..8e3fd723bd74 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -438,6 +438,10 @@ config HAVE_HARDLOCKUP_DETECTOR_ARCH config UNWIND_USER bool =20 +config HAVE_UNWIND_USER_FP + bool + select UNWIND_USER + config HAVE_PERF_REGS bool help diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild index 8675b7b4ad23..295c94a3ccc1 100644 --- a/include/asm-generic/Kbuild +++ b/include/asm-generic/Kbuild @@ -59,6 +59,7 @@ mandatory-y +=3D tlbflush.h mandatory-y +=3D topology.h mandatory-y +=3D trace_clock.h mandatory-y +=3D uaccess.h +mandatory-y +=3D unwind_user.h mandatory-y +=3D vermagic.h mandatory-y +=3D vga.h mandatory-y +=3D video.h diff --git a/include/asm-generic/unwind_user.h b/include/asm-generic/unwind= _user.h new file mode 100644 index 000000000000..b8882b909944 --- /dev/null +++ b/include/asm-generic/unwind_user.h @@ -0,0 +1,5 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_UNWIND_USER_H +#define _ASM_GENERIC_UNWIND_USER_H + +#endif /* _ASM_GENERIC_UNWIND_USER_H */ diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h index aa7923c1384f..a405111c41b0 100644 --- a/include/linux/unwind_user.h +++ b/include/linux/unwind_user.h @@ -3,6 +3,11 @@ #define _LINUX_UNWIND_USER_H =20 #include +#include + +#ifndef ARCH_INIT_USER_FP_FRAME + #define ARCH_INIT_USER_FP_FRAME +#endif =20 int unwind_user_start(struct unwind_user_state *state); int unwind_user_next(struct unwind_user_state *state); diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_= types.h index 6ed1b4ae74e1..65bd070eb6b0 100644 --- a/include/linux/unwind_user_types.h +++ b/include/linux/unwind_user_types.h @@ -6,6 +6,7 @@ =20 enum unwind_user_type { UNWIND_USER_TYPE_NONE, + UNWIND_USER_TYPE_FP, }; =20 struct unwind_stacktrace { diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c index d30449328981..4fc550356b33 100644 --- a/kernel/unwind/user.c +++ b/kernel/unwind/user.c @@ -6,10 +6,52 @@ #include #include #include +#include + +static struct unwind_user_frame fp_frame =3D { + ARCH_INIT_USER_FP_FRAME +}; + +static inline bool fp_state(struct unwind_user_state *state) +{ + return IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP) && + state->type =3D=3D UNWIND_USER_TYPE_FP; +} =20 int unwind_user_next(struct unwind_user_state *state) { - /* no implementation yet */ + struct unwind_user_frame *frame; + unsigned long cfa =3D 0, fp, ra =3D 0; + + if (state->done) + return -EINVAL; + + if (fp_state(state)) + frame =3D &fp_frame; + else + goto the_end; + + cfa =3D (frame->use_fp ? state->fp : state->sp) + frame->cfa_off; + + /* stack going in wrong direction? */ + if (cfa <=3D state->sp) + goto the_end; + + if (get_user(ra, (unsigned long *)(cfa + frame->ra_off))) + goto the_end; + + if (frame->fp_off && get_user(fp, (unsigned long __user *)(cfa + frame->f= p_off))) + goto the_end; + + state->ip =3D ra; + state->sp =3D cfa; + if (frame->fp_off) + state->fp =3D fp; + + return 0; + +the_end: + state->done =3D true; return -EINVAL; } =20 @@ -24,7 +66,10 @@ int unwind_user_start(struct unwind_user_state *state) return -EINVAL; } =20 - state->type =3D UNWIND_USER_TYPE_NONE; + if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP)) + state->type =3D UNWIND_USER_TYPE_FP; + else + state->type =3D UNWIND_USER_TYPE_NONE; =20 state->ip =3D instruction_pointer(regs); state->sp =3D user_stack_pointer(regs); --=20 2.47.2 From nobody Sat Oct 11 08:27:36 2025 Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7736818FDD5; Wed, 11 Jun 2025 01:03:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603790; cv=none; b=G81XsHm0TJaMpRHqFWQ0/VqF+80dDXxgyGEmUBU6V15YpqNrkodW6dg6W6KEny/3vwW+3eq51g1Odt6xEGPs3VtflcEfhHMz1ID9VIZ0IhE+mi5FzkPRyTImQqX3oLOTNwk20ZqVa7tA+wqabpOMg29uApt7Y5xVx1qfdRI9apU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603790; c=relaxed/simple; bh=FEdFYcHE9hiNNH0v0kJvOh5Nf4CYXvUN8o4+V7c20RE=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=M0RVb8uS4F253K8sMDnAZUHNnSraAqBz5hotfNjPWD0gw+W91IfWl4v7d144bIBkm3jvrqUaICeq42ayW8vENImG+UKdBUUxoauTRNOvkutaSqpRxGGyme5GcJgqMV3aiJHnC0ssdWYsMEuvI/iSWPk0IfJadLP9oze9liKbT3U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 8A61B1A14A1; Wed, 11 Jun 2025 01:02:59 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf16.hostedemail.com (Postfix) with ESMTPA id 6516A20016; Wed, 11 Jun 2025 01:02:56 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uP9tE-00000000v8l-1ioR; Tue, 10 Jun 2025 21:04:28 -0400 Message-ID: <20250611010428.261095906@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 10 Jun 2025 20:54:24 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton Subject: [PATCH v10 03/14] unwind_user: Add compat mode frame pointer support References: <20250611005421.144238328@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Server: rspamout03 X-Rspamd-Queue-Id: 6516A20016 X-Stat-Signature: qhubogz66cku8w11ugwsoq3j9m6qy8es X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX18D/Otjcnc4hYwtwLxywwau0hsT2HDNzLM= X-HE-Tag: 1749603776-935033 X-HE-Meta: U2FsdGVkX1/J/7m/TUWAbEfJ9dG/cOQggKjAX/hEMf3c78hTrVMRzh7huHhJZeFIgQI8h0aTIxT6aKfl2/5Ma8xfXolYVAo8BIE0LqzW56LhHAr+KQH7zXlpC3jHJkAZBSspDmCuyvEgyYYJNdgu14MklhD7pSy0/Z13ehZYh72R3TCjxr5EmZUSJneeUVsibB0TmbbixjvDVh4xNKrVepikGgw0n7xorksvCvX7oINWaQiqA7MiSP7q4R40s0gsB0rGuIxVtxZuyKArSGgtA2yWIPOmPAVTAJF5tX5aGejh6wAlcR7YgKCuAubXLTIDkiimtOI91SRflFS+nx5Aw+wdJBnxAgx5IC2lgepR2WiMPJmUm7LM0xzrAH4dIKqTv9YJLycyWuluqWqTtd0umm/i42GGSvD7Q9aSvY8I57I= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Add optional support for user space compat mode frame pointer unwinding. If supported, the arch needs to enable CONFIG_HAVE_UNWIND_USER_COMPAT_FP and define ARCH_INIT_USER_COMPAT_FP_FRAME. Signed-off-by: Josh Poimboeuf Co-developed-by: Steven Rostedt (Google) Signed-off-by: Steven Rostedt (Google) --- Changes since v9: https://lore.kernel.org/linux-trace-kernel/20250513223551= .459986355@goodmis.org/ As asm-generic headers are not included when an architecture defines the header, having more than one #ifndef and setting variables does not work with those checks in the asm-generic header and the architecture header does not define all the values. - Move #indef arch_unwind_user_state to linux/user_unwind_types.h - Move the following to linux/unwind_user.h: #ifndef ARCH_INIT_USER_COMPAT_FP_FRAME #ifndef arch_unwind_user_init #ifndef arch_unwind_user_next - Changed UNWIND_GET_USER_LONG() to use "unsigned long" instead of u64 as this can be called on 32 bit architectures and just because "compat_state()" returns false doesn't mean that the value is 64 bit. arch/Kconfig | 4 +++ include/asm-generic/Kbuild | 1 + include/asm-generic/unwind_user_types.h | 5 ++++ include/linux/unwind_user.h | 13 +++++++++ include/linux/unwind_user_types.h | 7 +++++ kernel/unwind/user.c | 36 ++++++++++++++++++++++--- 6 files changed, 62 insertions(+), 4 deletions(-) create mode 100644 include/asm-generic/unwind_user_types.h diff --git a/arch/Kconfig b/arch/Kconfig index 8e3fd723bd74..2c41d3072910 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -442,6 +442,10 @@ config HAVE_UNWIND_USER_FP bool select UNWIND_USER =20 +config HAVE_UNWIND_USER_COMPAT_FP + bool + depends on HAVE_UNWIND_USER_FP + config HAVE_PERF_REGS bool help diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild index 295c94a3ccc1..b797a2434396 100644 --- a/include/asm-generic/Kbuild +++ b/include/asm-generic/Kbuild @@ -60,6 +60,7 @@ mandatory-y +=3D topology.h mandatory-y +=3D trace_clock.h mandatory-y +=3D uaccess.h mandatory-y +=3D unwind_user.h +mandatory-y +=3D unwind_user_types.h mandatory-y +=3D vermagic.h mandatory-y +=3D vga.h mandatory-y +=3D video.h diff --git a/include/asm-generic/unwind_user_types.h b/include/asm-generic/= unwind_user_types.h new file mode 100644 index 000000000000..f568b82e52cd --- /dev/null +++ b/include/asm-generic/unwind_user_types.h @@ -0,0 +1,5 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_UNWIND_USER_TYPES_H +#define _ASM_GENERIC_UNWIND_USER_TYPES_H + +#endif /* _ASM_GENERIC_UNWIND_USER_TYPES_H */ diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h index a405111c41b0..c70da8f7e54c 100644 --- a/include/linux/unwind_user.h +++ b/include/linux/unwind_user.h @@ -9,6 +9,19 @@ #define ARCH_INIT_USER_FP_FRAME #endif =20 +#ifndef ARCH_INIT_USER_COMPAT_FP_FRAME + #define ARCH_INIT_USER_COMPAT_FP_FRAME + #define in_compat_mode(regs) false +#endif + +#ifndef arch_unwind_user_init +static inline void arch_unwind_user_init(struct unwind_user_state *state, = struct pt_regs *reg) {} +#endif + +#ifndef arch_unwind_user_next +static inline void arch_unwind_user_next(struct unwind_user_state *state) = {} +#endif + int unwind_user_start(struct unwind_user_state *state); int unwind_user_next(struct unwind_user_state *state); =20 diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_= types.h index 65bd070eb6b0..0b6563951ca4 100644 --- a/include/linux/unwind_user_types.h +++ b/include/linux/unwind_user_types.h @@ -3,10 +3,16 @@ #define _LINUX_UNWIND_USER_TYPES_H =20 #include +#include + +#ifndef arch_unwind_user_state +struct arch_unwind_user_state {}; +#endif =20 enum unwind_user_type { UNWIND_USER_TYPE_NONE, UNWIND_USER_TYPE_FP, + UNWIND_USER_TYPE_COMPAT_FP, }; =20 struct unwind_stacktrace { @@ -25,6 +31,7 @@ struct unwind_user_state { unsigned long ip; unsigned long sp; unsigned long fp; + struct arch_unwind_user_state arch; enum unwind_user_type type; bool done; }; diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c index 4fc550356b33..29e1f497a26e 100644 --- a/kernel/unwind/user.c +++ b/kernel/unwind/user.c @@ -12,12 +12,32 @@ static struct unwind_user_frame fp_frame =3D { ARCH_INIT_USER_FP_FRAME }; =20 +static struct unwind_user_frame compat_fp_frame =3D { + ARCH_INIT_USER_COMPAT_FP_FRAME +}; + static inline bool fp_state(struct unwind_user_state *state) { return IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP) && state->type =3D=3D UNWIND_USER_TYPE_FP; } =20 +static inline bool compat_state(struct unwind_user_state *state) +{ + return IS_ENABLED(CONFIG_HAVE_UNWIND_USER_COMPAT_FP) && + state->type =3D=3D UNWIND_USER_TYPE_COMPAT_FP; +} + +#define UNWIND_GET_USER_LONG(to, from, state) \ +({ \ + int __ret; \ + if (compat_state(state)) \ + __ret =3D get_user(to, (u32 __user *)(from)); \ + else \ + __ret =3D get_user(to, (unsigned long __user *)(from)); \ + __ret; \ +}) + int unwind_user_next(struct unwind_user_state *state) { struct unwind_user_frame *frame; @@ -26,7 +46,9 @@ int unwind_user_next(struct unwind_user_state *state) if (state->done) return -EINVAL; =20 - if (fp_state(state)) + if (compat_state(state)) + frame =3D &compat_fp_frame; + else if (fp_state(state)) frame =3D &fp_frame; else goto the_end; @@ -37,10 +59,10 @@ int unwind_user_next(struct unwind_user_state *state) if (cfa <=3D state->sp) goto the_end; =20 - if (get_user(ra, (unsigned long *)(cfa + frame->ra_off))) + if (UNWIND_GET_USER_LONG(ra, cfa + frame->ra_off, state)) goto the_end; =20 - if (frame->fp_off && get_user(fp, (unsigned long __user *)(cfa + frame->f= p_off))) + if (frame->fp_off && UNWIND_GET_USER_LONG(fp, cfa + frame->fp_off, state)) goto the_end; =20 state->ip =3D ra; @@ -48,6 +70,8 @@ int unwind_user_next(struct unwind_user_state *state) if (frame->fp_off) state->fp =3D fp; =20 + arch_unwind_user_next(state); + return 0; =20 the_end: @@ -66,7 +90,9 @@ int unwind_user_start(struct unwind_user_state *state) return -EINVAL; } =20 - if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP)) + if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_COMPAT_FP) && in_compat_mode(regs)) + state->type =3D UNWIND_USER_TYPE_COMPAT_FP; + else if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP)) state->type =3D UNWIND_USER_TYPE_FP; else state->type =3D UNWIND_USER_TYPE_NONE; @@ -75,6 +101,8 @@ int unwind_user_start(struct unwind_user_state *state) state->sp =3D user_stack_pointer(regs); state->fp =3D frame_pointer(regs); =20 + arch_unwind_user_init(state, regs); + return 0; } =20 --=20 2.47.2 From nobody Sat Oct 11 08:27:36 2025 Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 22B504C80; Wed, 11 Jun 2025 01:03:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603784; cv=none; b=m8cJED9fTt157O+dlvtvqCoBco8HTL45rYJ+PI6USuOu7sJzqTyrV2LCTIAdX6AY5shVX9uHdlPG17uOj1uKnTra/5WXkpx2AwD1TydP4YDT8lgpwjK3FKd/cGCPNuaarpjOoJTSrj3ff2vFd/m6VcNTFkPvYyqD8HR2m/yH8uY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603784; c=relaxed/simple; bh=GkcfV+ZHCvd7P4aVWtjysL4L/glnQ1G8j6U3jNYg0yk=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=teTpZwKjde7NNQWgCDW1JxX3jcTPRvTFSGEMEj6o1piyBZxLKEqL179XoxvcWP0OCT3qXzOGnNXRpW9ZTK/OSdwtz+aLmUjlRle/sPMo3t7wg6MVOyOXUKoHkNqI3iTQmqunziGdxkxXrbEMkoFXTB/zMBHQMRbrzI6sAKWBsaU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A09F781473; Wed, 11 Jun 2025 01:02:59 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf05.hostedemail.com (Postfix) with ESMTPA id 8A39120010; Wed, 11 Jun 2025 01:02:56 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uP9tE-00000000v9F-2Rhg; Tue, 10 Jun 2025 21:04:28 -0400 Message-ID: <20250611010428.433111891@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 10 Jun 2025 20:54:25 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton Subject: [PATCH v10 04/14] unwind_user/deferred: Add unwind_deferred_trace() References: <20250611005421.144238328@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 8A39120010 X-Stat-Signature: d1sd67rsay336rropajpju8x4iyaw1jq X-Rspamd-Server: rspamout06 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX18s4Fht9zOCaOpUtgl/qgh+t7hS1DhCsZM= X-HE-Tag: 1749603776-184028 X-HE-Meta: U2FsdGVkX182XWrvivxISvy8k+rDzRpftjL+HLhibTBHBTuVTxbNndIytvyPK49g3IjrkC7X9QdetfNDCbDqC1GE3a2Octol2BvVX8AcvMiApR2UqKfAfbDNk4ihWxtK6KFIopQOjOL2xZXdPyIU1fciZuxhbiswW4mB5Vfcr0LiU/+wBz9M7ACHrH9wq6TUlOR/NsK7kegh+qKn4wYYjLiQFIZ5NGYOG7uX7xCFsSsP5o1MdzRN1XQPgH+a59BGEHYasXHhjWdeoD1rlug5JaQF5qQIWZAEbi2BQkWzHUWFmUcka5y0VdEbWQNsHVI11fm9GAbx0t2Ebfq8V1K4kq2Sz7b+PIJgtobZgEOCoPgM6oTs/omxFpqtiEGt3U8Jwsh2tDBUWezOZ6zN0FiI9WnAtWvnhF8WtnnTiqoGkmI= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Add a function that must be called inside a faultable context that will retrieve a user space stack trace. The function unwind_deferred_trace() can be called by a tracer when a task is about to enter user space, or has just come back from user space and has interrupts enabled. This code is based on work by Josh Poimboeuf's deferred unwinding code: Link: https://lore.kernel.org/all/6052e8487746603bdb29b65f4033e739092d9925.= 1737511963.git.jpoimboe@kernel.org/ Signed-off-by: Steven Rostedt (Google) --- include/linux/sched.h | 5 +++ include/linux/unwind_deferred.h | 24 +++++++++++ include/linux/unwind_deferred_types.h | 9 ++++ kernel/fork.c | 4 ++ kernel/unwind/Makefile | 2 +- kernel/unwind/deferred.c | 60 +++++++++++++++++++++++++++ 6 files changed, 103 insertions(+), 1 deletion(-) create mode 100644 include/linux/unwind_deferred.h create mode 100644 include/linux/unwind_deferred_types.h create mode 100644 kernel/unwind/deferred.c diff --git a/include/linux/sched.h b/include/linux/sched.h index 4f78a64beb52..59fdf7d9bb1e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -46,6 +46,7 @@ #include #include #include +#include #include =20 /* task_struct member predeclarations (sorted alphabetically): */ @@ -1654,6 +1655,10 @@ struct task_struct { struct user_event_mm *user_event_mm; #endif =20 +#ifdef CONFIG_UNWIND_USER + struct unwind_task_info unwind_info; +#endif + /* CPU-specific state of this task: */ struct thread_struct thread; =20 diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h new file mode 100644 index 000000000000..5064ebe38c4f --- /dev/null +++ b/include/linux/unwind_deferred.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_UNWIND_USER_DEFERRED_H +#define _LINUX_UNWIND_USER_DEFERRED_H + +#include +#include + +#ifdef CONFIG_UNWIND_USER + +void unwind_task_init(struct task_struct *task); +void unwind_task_free(struct task_struct *task); + +int unwind_deferred_trace(struct unwind_stacktrace *trace); + +#else /* !CONFIG_UNWIND_USER */ + +static inline void unwind_task_init(struct task_struct *task) {} +static inline void unwind_task_free(struct task_struct *task) {} + +static inline int unwind_deferred_trace(struct unwind_stacktrace *trace) {= return -ENOSYS; } + +#endif /* !CONFIG_UNWIND_USER */ + +#endif /* _LINUX_UNWIND_USER_DEFERRED_H */ diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h new file mode 100644 index 000000000000..aa32db574e43 --- /dev/null +++ b/include/linux/unwind_deferred_types.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_UNWIND_USER_DEFERRED_TYPES_H +#define _LINUX_UNWIND_USER_DEFERRED_TYPES_H + +struct unwind_task_info { + unsigned long *entries; +}; + +#endif /* _LINUX_UNWIND_USER_DEFERRED_TYPES_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 1ee8eb11f38b..3341d50c61f2 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -105,6 +105,7 @@ #include #include #include +#include =20 #include #include @@ -732,6 +733,7 @@ void __put_task_struct(struct task_struct *tsk) WARN_ON(refcount_read(&tsk->usage)); WARN_ON(tsk =3D=3D current); =20 + unwind_task_free(tsk); sched_ext_free(tsk); io_uring_free(tsk); cgroup_free(tsk); @@ -2135,6 +2137,8 @@ __latent_entropy struct task_struct *copy_process( p->bpf_ctx =3D NULL; #endif =20 + unwind_task_init(p); + /* Perform scheduler related setup. Assign this task to a CPU. */ retval =3D sched_fork(clone_flags, p); if (retval) diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile index 349ce3677526..6752ac96d7e2 100644 --- a/kernel/unwind/Makefile +++ b/kernel/unwind/Makefile @@ -1 +1 @@ - obj-$(CONFIG_UNWIND_USER) +=3D user.o + obj-$(CONFIG_UNWIND_USER) +=3D user.o deferred.o diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c new file mode 100644 index 000000000000..0bafb95e6336 --- /dev/null +++ b/kernel/unwind/deferred.c @@ -0,0 +1,60 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Deferred user space unwinding + */ +#include +#include +#include +#include + +#define UNWIND_MAX_ENTRIES 512 + +/** + * unwind_deferred_trace - Produce a user stacktrace in faultable context + * @trace: The descriptor that will store the user stacktrace + * + * This must be called in a known faultable context (usually when entering + * or exiting user space). Depending on the available implementations + * the @trace will be loaded with the addresses of the user space stacktra= ce + * if it can be found. + * + * Return: 0 on success and negative on error + * On success @trace will contain the user space stacktrace + */ +int unwind_deferred_trace(struct unwind_stacktrace *trace) +{ + struct unwind_task_info *info =3D ¤t->unwind_info; + + /* Should always be called from faultable context */ + might_fault(); + + if (current->flags & PF_EXITING) + return -EINVAL; + + if (!info->entries) { + info->entries =3D kmalloc_array(UNWIND_MAX_ENTRIES, sizeof(long), + GFP_KERNEL); + if (!info->entries) + return -ENOMEM; + } + + trace->nr =3D 0; + trace->entries =3D info->entries; + unwind_user(trace, UNWIND_MAX_ENTRIES); + + return 0; +} + +void unwind_task_init(struct task_struct *task) +{ + struct unwind_task_info *info =3D &task->unwind_info; + + memset(info, 0, sizeof(*info)); +} + +void unwind_task_free(struct task_struct *task) +{ + struct unwind_task_info *info =3D &task->unwind_info; + + kfree(info->entries); +} --=20 2.47.2 From nobody Sat Oct 11 08:27:36 2025 Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6EB7D18DF89; Wed, 11 Jun 2025 01:03:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603790; cv=none; b=eHQzLugOLxh5UkbdX4PO2dqZg5Xhega0AHLMkoo6O6KJurxsWj0iVt9YcyIIMrByQ14/mAAqdUqLRMhv3dzhpwdhlqhet9w7PLnzJNjp+r58vUzo4kd7BFGZVzfip0ug//DCZuAv/bf0Eiy2Sz8neozsj/lY8B161p5vS83RDKM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603790; c=relaxed/simple; bh=NHVnbwkzh7GFRR5baZ3ij4wNo4wbf13vG+/fcMqC+yM=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=YFeDLutEW8PXjIGC9SdQAZJJO29xPO40HCPQ+q77asWEWh1Nt1s0IFG5bLZcgULuD9MlEbdvPcCmu+P8shrbo1hIqtah4UPDI6GMpzRyfg7Ro98dgENqbYGdqgWGZjpzcDfWLdMd0Mhx21OXHzGa6uLkOmEQaU/taTYkNkol0ko= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C3F511012D8; Wed, 11 Jun 2025 01:02:59 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf16.hostedemail.com (Postfix) with ESMTPA id 965C32000D; Wed, 11 Jun 2025 01:02:56 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uP9tE-00000000v9j-38fn; Tue, 10 Jun 2025 21:04:28 -0400 Message-ID: <20250611010428.603778772@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 10 Jun 2025 20:54:26 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton Subject: [PATCH v10 05/14] unwind_user/deferred: Add unwind cache References: <20250611005421.144238328@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Server: rspamout01 X-Rspamd-Queue-Id: 965C32000D X-Stat-Signature: 3saap8xhb9ht7aqm6fxm5qhxfj85y3ds X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX18Ok2o1cxfxVsuZmgzG49KIB60ysAb0rSk= X-HE-Tag: 1749603776-372355 X-HE-Meta: U2FsdGVkX19PEX1xacieGEJ04oubI975S3Sm/XxALQIdvppjm5f/tBi/PPu+W7cRK8n7lC3ghbtB+ZuyuFz9yF44vtHWxBNXIes1SyH/AHSfeXc+1AUiBQGImKAvqjh8Mk7RENAXQtiVA+tmeMuYuYc3H2Et0CHGxF8//I+h2SwuMgRwaLA0ID50JSOAFTaKBFO1pZZ6qtmJmtOlpyg0qVUU6XdoMETxLJYNNhRL6gM/NtrKOkTPvSimxXq2hcQRC1aKqf3Jn3IcUCfTJG25MgUyjOYOcgyTpdy/UzDjI6JqeYu7pSb9+9KUByPWqMc6al727XpVZhQqbVZnKZYU7YT0QkFJnI4NhBArID4PJwDJLDqu19AbBZBioT6IxwyIg7jb8NQFespL06vF8r4dhuEQ/ir+xus2Ya0Zuw2WVhk= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Cache the results of the unwind to ensure the unwind is only performed once, even when called by multiple tracers. The cache nr_entries gets cleared every time the task exits the kernel. When a stacktrace is requested, nr_entries gets set to the number of entries in the stacktrace. If another stacktrace is requested, if nr_entries is not zero, then it contains the same stacktrace that would be retrieved so it is not processed again and the entries is given to the caller. Co-developed-by: Steven Rostedt (Google) Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- include/linux/entry-common.h | 2 ++ include/linux/unwind_deferred.h | 8 ++++++++ include/linux/unwind_deferred_types.h | 7 ++++++- kernel/unwind/deferred.c | 26 ++++++++++++++++++++------ 4 files changed, 36 insertions(+), 7 deletions(-) diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index f94f3fdf15fc..6e850c9d3f0c 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -12,6 +12,7 @@ #include #include #include +#include =20 #include #include @@ -362,6 +363,7 @@ static __always_inline void exit_to_user_mode(void) lockdep_hardirqs_on_prepare(); instrumentation_end(); =20 + unwind_exit_to_user_mode(); user_enter_irqoff(); arch_exit_to_user_mode(); lockdep_hardirqs_on(CALLER_ADDR0); diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index 5064ebe38c4f..7d6cb2ffd084 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -12,6 +12,12 @@ void unwind_task_free(struct task_struct *task); =20 int unwind_deferred_trace(struct unwind_stacktrace *trace); =20 +static __always_inline void unwind_exit_to_user_mode(void) +{ + if (unlikely(current->unwind_info.cache)) + current->unwind_info.cache->nr_entries =3D 0; +} + #else /* !CONFIG_UNWIND_USER */ =20 static inline void unwind_task_init(struct task_struct *task) {} @@ -19,6 +25,8 @@ static inline void unwind_task_free(struct task_struct *t= ask) {} =20 static inline int unwind_deferred_trace(struct unwind_stacktrace *trace) {= return -ENOSYS; } =20 +static inline void unwind_exit_to_user_mode(void) {} + #endif /* !CONFIG_UNWIND_USER */ =20 #endif /* _LINUX_UNWIND_USER_DEFERRED_H */ diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index aa32db574e43..db5b54b18828 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -2,8 +2,13 @@ #ifndef _LINUX_UNWIND_USER_DEFERRED_TYPES_H #define _LINUX_UNWIND_USER_DEFERRED_TYPES_H =20 +struct unwind_cache { + unsigned int nr_entries; + unsigned long entries[]; +}; + struct unwind_task_info { - unsigned long *entries; + struct unwind_cache *cache; }; =20 #endif /* _LINUX_UNWIND_USER_DEFERRED_TYPES_H */ diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index 0bafb95e6336..e3913781c8c6 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -24,6 +24,7 @@ int unwind_deferred_trace(struct unwind_stacktrace *trace) { struct unwind_task_info *info =3D ¤t->unwind_info; + struct unwind_cache *cache; =20 /* Should always be called from faultable context */ might_fault(); @@ -31,17 +32,30 @@ int unwind_deferred_trace(struct unwind_stacktrace *tra= ce) if (current->flags & PF_EXITING) return -EINVAL; =20 - if (!info->entries) { - info->entries =3D kmalloc_array(UNWIND_MAX_ENTRIES, sizeof(long), - GFP_KERNEL); - if (!info->entries) + if (!info->cache) { + info->cache =3D kzalloc(struct_size(cache, entries, UNWIND_MAX_ENTRIES), + GFP_KERNEL); + if (!info->cache) return -ENOMEM; } =20 + cache =3D info->cache; + trace->entries =3D cache->entries; + + if (cache->nr_entries) { + /* + * The user stack has already been previously unwound in this + * entry context. Skip the unwind and use the cache. + */ + trace->nr =3D cache->nr_entries; + return 0; + } + trace->nr =3D 0; - trace->entries =3D info->entries; unwind_user(trace, UNWIND_MAX_ENTRIES); =20 + cache->nr_entries =3D trace->nr; + return 0; } =20 @@ -56,5 +70,5 @@ void unwind_task_free(struct task_struct *task) { struct unwind_task_info *info =3D &task->unwind_info; =20 - kfree(info->entries); + kfree(info->cache); } --=20 2.47.2 From nobody Sat Oct 11 08:27:36 2025 Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DD5D28F3; Wed, 11 Jun 2025 01:03:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603786; cv=none; b=T5LGyGxeUkKcN5EPuKBdJiQnSX1qtq3AfJh5filZrHVTeKgH4NGSZ1X1cadjDHObu9pb5JX1midK1ZLfSZ8Zgy74kd4dvu41xWLz7J+1749hyqBLqHfgQItjg2iiR3iiN+CizAc9J0PSbtVrWY8ynd/0oDRwm7QtEMGCvdOkflE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603786; c=relaxed/simple; bh=6b2lgHU+YTRGKr3H+RCukY/c20EcCa8J6t7gWzfZy6k=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=u7hgCri6bh2Kqj0aYiUdnOzuwPzFwjpyvKUvOnvsiBFryac6AV8FyuFQbjAlseaU29hg5j1ZJxlBlh/xxJN1SOBfmce66Wq7ssw1khANjp0B3PXKwG8ZVq0PonIicjYyXvRZjCxqBu9KR/oQd31xeQT9qhQiOKxwAjQyD/rqpgY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 001625F698; Wed, 11 Jun 2025 01:02:59 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf02.hostedemail.com (Postfix) with ESMTPA id DB14E80013; Wed, 11 Jun 2025 01:02:56 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uP9tE-00000000vAD-3qiA; Tue, 10 Jun 2025 21:04:28 -0400 Message-ID: <20250611010428.770214773@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 10 Jun 2025 20:54:27 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton Subject: [PATCH v10 06/14] unwind_user/deferred: Add deferred unwinding interface References: <20250611005421.144238328@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Stat-Signature: h7tid9h9g6p3jfie3enwd3r1pouym6mt X-Rspamd-Server: rspamout02 X-Rspamd-Queue-Id: DB14E80013 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX18MSDjUIgXJahJMz2fk9ahqJjmSyy/w7OQ= X-HE-Tag: 1749603776-426910 X-HE-Meta: U2FsdGVkX1+v8hSqakDLRIeDthy5tc+8OmIRaLhHq3eEtnotOdCg+1luRA+Cv2X4ig8qH9+o1+BdZS7RHXlj3U8nyc2mTR7Q0501UPDsVKs0uCtA32jat9INtu1Cxme5IOaGDiHkI4ZBR5cf5G/3aGim1k1NWid0IeR23TDLvZ8dqEgGXuo+RMuV8I+fudLJGljATBig0qllXaPczKhRmE0RjFZNmudHKlSb9sKywDJjidr+XAjn3ySOGoQHefLQBh1b2cwHOOJh79V8b5xmC5+GRmKuogOb+Q2W1U/GA7cJ5YqWylp3gFi8b8TVPeq0ugY+9AvHQZpL8EKj3DyjPPQc9qJbZ0W32JfDFX7xkB6LTBi2ir+FWE2+HWDeXW9Eg6BxlK9hUNxIpq36dpUlkY3b6wDEv6S/fJ4kdzifaVw= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Add an interface for scheduling task work to unwind the user space stack before returning to user space. This solves several problems for its callers: - Ensure the unwind happens in task context even if the caller may be running in interrupt context. - Avoid duplicate unwinds, whether called multiple times by the same caller or by different callers. - Take a timestamp when the first request comes in since the task entered the kernel. This will be returned to the calling function along with the stack trace when the task leaves the kernel. This timestamp can be used to correlate kernel unwinds/traces with the user unwind. The timestamp is created to detect when the stacktrace is the same. It is generated the first time a user space stacktrace is requested after the task enters the kernel. The timestamp is passed to the caller on request, and when the stacktrace is generated upon returning to user space, it call the requester's callback with the timestamp as well as the stacktrace. Co-developed-by: Steven Rostedt (Google) Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- include/linux/unwind_deferred.h | 18 ++++ include/linux/unwind_deferred_types.h | 3 + kernel/unwind/deferred.c | 131 +++++++++++++++++++++++++- 3 files changed, 151 insertions(+), 1 deletion(-) diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index 7d6cb2ffd084..a384eef719a3 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -2,9 +2,19 @@ #ifndef _LINUX_UNWIND_USER_DEFERRED_H #define _LINUX_UNWIND_USER_DEFERRED_H =20 +#include #include #include =20 +struct unwind_work; + +typedef void (*unwind_callback_t)(struct unwind_work *work, struct unwind_= stacktrace *trace, u64 timestamp); + +struct unwind_work { + struct list_head list; + unwind_callback_t func; +}; + #ifdef CONFIG_UNWIND_USER =20 void unwind_task_init(struct task_struct *task); @@ -12,10 +22,15 @@ void unwind_task_free(struct task_struct *task); =20 int unwind_deferred_trace(struct unwind_stacktrace *trace); =20 +int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func); +int unwind_deferred_request(struct unwind_work *work, u64 *timestamp); +void unwind_deferred_cancel(struct unwind_work *work); + static __always_inline void unwind_exit_to_user_mode(void) { if (unlikely(current->unwind_info.cache)) current->unwind_info.cache->nr_entries =3D 0; + current->unwind_info.timestamp =3D 0; } =20 #else /* !CONFIG_UNWIND_USER */ @@ -24,6 +39,9 @@ static inline void unwind_task_init(struct task_struct *t= ask) {} static inline void unwind_task_free(struct task_struct *task) {} =20 static inline int unwind_deferred_trace(struct unwind_stacktrace *trace) {= return -ENOSYS; } +static inline int unwind_deferred_init(struct unwind_work *work, unwind_ca= llback_t func) { return -ENOSYS; } +static inline int unwind_deferred_request(struct unwind_work *work, u64 *t= imestamp) { return -ENOSYS; } +static inline void unwind_deferred_cancel(struct unwind_work *work) {} =20 static inline void unwind_exit_to_user_mode(void) {} =20 diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index db5b54b18828..5df264cf81ad 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -9,6 +9,9 @@ struct unwind_cache { =20 struct unwind_task_info { struct unwind_cache *cache; + struct callback_head work; + u64 timestamp; + int pending; }; =20 #endif /* _LINUX_UNWIND_USER_DEFERRED_TYPES_H */ diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index e3913781c8c6..b76c704ddc6d 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -2,13 +2,35 @@ /* * Deferred user space unwinding */ +#include +#include +#include +#include #include #include #include -#include +#include =20 #define UNWIND_MAX_ENTRIES 512 =20 +/* Guards adding to and reading the list of callbacks */ +static DEFINE_MUTEX(callback_mutex); +static LIST_HEAD(callbacks); + +/* + * Read the task context timestamp, if this is the first caller then + * it will set the timestamp. + */ +static u64 get_timestamp(struct unwind_task_info *info) +{ + lockdep_assert_irqs_disabled(); + + if (!info->timestamp) + info->timestamp =3D local_clock(); + + return info->timestamp; +} + /** * unwind_deferred_trace - Produce a user stacktrace in faultable context * @trace: The descriptor that will store the user stacktrace @@ -59,11 +81,117 @@ int unwind_deferred_trace(struct unwind_stacktrace *tr= ace) return 0; } =20 +static void unwind_deferred_task_work(struct callback_head *head) +{ + struct unwind_task_info *info =3D container_of(head, struct unwind_task_i= nfo, work); + struct unwind_stacktrace trace; + struct unwind_work *work; + u64 timestamp; + + if (WARN_ON_ONCE(!info->pending)) + return; + + /* Allow work to come in again */ + WRITE_ONCE(info->pending, 0); + + /* + * From here on out, the callback must always be called, even if it's + * just an empty trace. + */ + trace.nr =3D 0; + trace.entries =3D NULL; + + unwind_deferred_trace(&trace); + + timestamp =3D info->timestamp; + + guard(mutex)(&callback_mutex); + list_for_each_entry(work, &callbacks, list) { + work->func(work, &trace, timestamp); + } +} + +/** + * unwind_deferred_request - Request a user stacktrace on task exit + * @work: Unwind descriptor requesting the trace + * @timestamp: The time stamp of the first request made for this task + * + * Schedule a user space unwind to be done in task work before exiting the + * kernel. + * + * The returned @timestamp output is the timestamp of the very first reque= st + * for a user space stacktrace for this task since it entered the kernel. + * It can be from a request by any caller of this infrastructure. + * Its value will also be passed to the callback function. It can be + * used to stitch kernel and user stack traces together in post-processing. + * + * It's valid to call this function multiple times for the same @work with= in + * the same task entry context. Each call will return the same timestamp + * while the task hasn't left the kernel. If the callback is not pending b= ecause + * it has already been previously called for the same entry context, it wi= ll be + * called again with the same stack trace and timestamp. + * + * Return: 1 if the the callback was already queued. + * 0 if the callback successfully was queued. + * Negative if there's an error. + * @timestamp holds the timestamp of the first request by any user + */ +int unwind_deferred_request(struct unwind_work *work, u64 *timestamp) +{ + struct unwind_task_info *info =3D ¤t->unwind_info; + int ret; + + *timestamp =3D 0; + + if (WARN_ON_ONCE(in_nmi())) + return -EINVAL; + + if ((current->flags & (PF_KTHREAD | PF_EXITING)) || + !user_mode(task_pt_regs(current))) + return -EINVAL; + + guard(irqsave)(); + + *timestamp =3D get_timestamp(info); + + /* callback already pending? */ + if (info->pending) + return 1; + + /* The work has been claimed, now schedule it. */ + ret =3D task_work_add(current, &info->work, TWA_RESUME); + if (WARN_ON_ONCE(ret)) + return ret; + + info->pending =3D 1; + return 0; +} + +void unwind_deferred_cancel(struct unwind_work *work) +{ + if (!work) + return; + + guard(mutex)(&callback_mutex); + list_del(&work->list); +} + +int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func) +{ + memset(work, 0, sizeof(*work)); + + guard(mutex)(&callback_mutex); + list_add(&work->list, &callbacks); + work->func =3D func; + return 0; +} + void unwind_task_init(struct task_struct *task) { struct unwind_task_info *info =3D &task->unwind_info; =20 memset(info, 0, sizeof(*info)); + init_task_work(&info->work, unwind_deferred_task_work); } =20 void unwind_task_free(struct task_struct *task) @@ -71,4 +199,5 @@ void unwind_task_free(struct task_struct *task) struct unwind_task_info *info =3D &task->unwind_info; =20 kfree(info->cache); + task_work_cancel(task, &info->work); } --=20 2.47.2 From nobody Sat Oct 11 08:27:36 2025 Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B131A198A1A; Wed, 11 Jun 2025 01:03:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603791; cv=none; b=cpXz1DMqUWv31LD6HlJT3K6iMhQaEsFFPK6/FBB6ODPrhTprR0C+k7ASQU7Fnl+s8PXzd5DuLuraY8hKX1ldXrEZNZNd3cCyQJ7imkc7KX3DU1IP0f9TgQM8+wY2RaqGK4MXHTCIJF1D4AUFukQm6LWsUPD9WlZwqUaAoJrPBM0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603791; c=relaxed/simple; bh=a8SrMgu7Q5frriAvZ6hSGoY+qGRM5omerc//373nOfA=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Tfy+UlVZckjEBETl5YV9s/sygelgVfo0hEfDT+0NTGODRKHLMjkQheXoRhMc2PXqV+V8ruEqnSw8MQ1P/IeHCAcTkSt0olmPYYEg++H1coedHRbWGMRLaeRScErK766AlG6WDzcBhI3ODhsjdgYzmtuObIlA/s50hw/Hip0TLOE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 20213161482; Wed, 11 Jun 2025 01:03:00 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf11.hostedemail.com (Postfix) with ESMTPA id 15EB72002D; Wed, 11 Jun 2025 01:02:57 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uP9tF-00000000vAh-0Lh4; Tue, 10 Jun 2025 21:04:29 -0400 Message-ID: <20250611010428.938845449@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 10 Jun 2025 20:54:28 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton Subject: [PATCH v10 07/14] unwind_user/deferred: Make unwind deferral requests NMI-safe References: <20250611005421.144238328@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 15EB72002D X-Stat-Signature: x64qi1ah5jdgg6cu1taank451ukmsfbi X-Rspamd-Server: rspamout06 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX19Ksv4W/toZnjUpeP07wqZ5Kg5uVupVKIg= X-HE-Tag: 1749603777-939147 X-HE-Meta: U2FsdGVkX19LgvhndpAJr1L2blB5Bjl6YfCv7q5NzsCokKSwj56wkqxx1DLCvupfcDt0Bp7IWIUoL5Z8f+9qrwwzrluTg91tVJFD/fgbpjsEtAF7cqr7BCnqd732ttl4/IzkiI1HDfX9fyI9hQj0LWfLI/LAnIjSDU7c6qB/LBGlWMD52JHqUXTwmqm2DnQU78makwOyauT0ZSnFrgQ31bauxG9DGAWt6O3XCMGdLIDe163XmHxCZ36eS/MEFOgnqygGSuqD3+i2uf3TlmJYHWUKv5sU59bWDi7lS/DSgOaVpB2rjOyS71BlbAPhvDOixe07TNGuMmvEFWTpl53tM/aYsifFqw3Tsak6YTxnDXpn7fnxBCOynO+7CvwBpIgNBcbhMtQ6zdsbZ2segf6VS232XGcuhYJQCS5TwKLF3OI= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Make unwind_deferred_request() NMI-safe so tracers in NMI context can call it and safely request a user space stacktrace when the task exits. A "nmi_timestamp" is added to the unwind_task_info that gets updated by NMIs to not race with setting the info->timestamp. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- Changes since v9: https://lore.kernel.org/linux-trace-kernel/20250513223552= .636076711@goodmis.org/ - Check for ret < 0 instead of just ret !=3D 0 from return code of task_work_add(). Don't want to just assume it's less than zero as it needs to return a negative on error. include/linux/unwind_deferred_types.h | 1 + kernel/unwind/deferred.c | 91 ++++++++++++++++++++++++--- 2 files changed, 84 insertions(+), 8 deletions(-) diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index 5df264cf81ad..ae27a02234b8 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -11,6 +11,7 @@ struct unwind_task_info { struct unwind_cache *cache; struct callback_head work; u64 timestamp; + u64 nmi_timestamp; int pending; }; =20 diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index b76c704ddc6d..88c867c32c01 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -25,8 +25,27 @@ static u64 get_timestamp(struct unwind_task_info *info) { lockdep_assert_irqs_disabled(); =20 - if (!info->timestamp) - info->timestamp =3D local_clock(); + /* + * Note, the timestamp is generated on the first request. + * If it exists here, then the timestamp is earlier than + * this request and it means that this request will be + * valid for the stracktrace. + */ + if (!info->timestamp) { + WRITE_ONCE(info->timestamp, local_clock()); + barrier(); + /* + * If an NMI came in and set a timestamp, it means that + * it happened before this timestamp was set (otherwise + * the NMI would have used this one). Use the NMI timestamp + * instead. + */ + if (unlikely(info->nmi_timestamp)) { + WRITE_ONCE(info->timestamp, info->nmi_timestamp); + barrier(); + WRITE_ONCE(info->nmi_timestamp, 0); + } + } =20 return info->timestamp; } @@ -103,6 +122,13 @@ static void unwind_deferred_task_work(struct callback_= head *head) =20 unwind_deferred_trace(&trace); =20 + /* Check if the timestamp was only set by NMI */ + if (info->nmi_timestamp) { + WRITE_ONCE(info->timestamp, info->nmi_timestamp); + barrier(); + WRITE_ONCE(info->nmi_timestamp, 0); + } + timestamp =3D info->timestamp; =20 guard(mutex)(&callback_mutex); @@ -111,6 +137,48 @@ static void unwind_deferred_task_work(struct callback_= head *head) } } =20 +static int unwind_deferred_request_nmi(struct unwind_work *work, u64 *time= stamp) +{ + struct unwind_task_info *info =3D ¤t->unwind_info; + bool inited_timestamp =3D false; + int ret; + + /* Always use the nmi_timestamp first */ + *timestamp =3D info->nmi_timestamp ? : info->timestamp; + + if (!*timestamp) { + /* + * This is the first unwind request since the most recent entry + * from user space. Initialize the task timestamp. + * + * Don't write to info->timestamp directly, otherwise it may race + * with an interruption of get_timestamp(). + */ + info->nmi_timestamp =3D local_clock(); + *timestamp =3D info->nmi_timestamp; + inited_timestamp =3D true; + } + + if (info->pending) + return 1; + + ret =3D task_work_add(current, &info->work, TWA_NMI_CURRENT); + if (ret < 0) { + /* + * If this set nmi_timestamp and is not using it, + * there's no guarantee that it will be used. + * Set it back to zero. + */ + if (inited_timestamp) + info->nmi_timestamp =3D 0; + return ret; + } + + info->pending =3D 1; + + return 0; +} + /** * unwind_deferred_request - Request a user stacktrace on task exit * @work: Unwind descriptor requesting the trace @@ -139,31 +207,38 @@ static void unwind_deferred_task_work(struct callback= _head *head) int unwind_deferred_request(struct unwind_work *work, u64 *timestamp) { struct unwind_task_info *info =3D ¤t->unwind_info; + int pending; int ret; =20 *timestamp =3D 0; =20 - if (WARN_ON_ONCE(in_nmi())) - return -EINVAL; - if ((current->flags & (PF_KTHREAD | PF_EXITING)) || !user_mode(task_pt_regs(current))) return -EINVAL; =20 + if (in_nmi()) + return unwind_deferred_request_nmi(work, timestamp); + guard(irqsave)(); =20 *timestamp =3D get_timestamp(info); =20 /* callback already pending? */ - if (info->pending) + pending =3D READ_ONCE(info->pending); + if (pending) + return 1; + + /* Claim the work unless an NMI just now swooped in to do so. */ + if (!try_cmpxchg(&info->pending, &pending, 1)) return 1; =20 /* The work has been claimed, now schedule it. */ ret =3D task_work_add(current, &info->work, TWA_RESUME); - if (WARN_ON_ONCE(ret)) + if (WARN_ON_ONCE(ret)) { + WRITE_ONCE(info->pending, 0); return ret; + } =20 - info->pending =3D 1; return 0; } =20 --=20 2.47.2 From nobody Sat Oct 11 08:27:36 2025 Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC8C818E20; Wed, 11 Jun 2025 01:03:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.14 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603786; cv=none; b=VPRehdx+6T/P9E2uexLYN4VTY13TJSnJi0+l572IRAe8z/sadHe5UP0R6cf8QqzFpf+GN4+/84VZcZC9kloOjp7EVNPIyoHTrphy1c5SFAoEJ4gaYeaNAh0H0X3GEb1+FF8AOblHuRJtQD3H9mYR+Ew8PyrAemR1OgRQNNM+t/A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603786; c=relaxed/simple; bh=QnsSRmn2WHi5LOTKrQil6Z5vwZZvZho/DvjmUY+PnwE=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=NmlBWYiAaRj0S2unXYWPHExrxL0yyBk6cvjR/K6q9nq/abaZiv85ousD+Oxv+gHGm2eRQUuXzf7jaEAWWimg6zvTPZLpYZK54Rgfpw2v4TncS+CfbnjX2VqW2XDdzHnj/u/4sQgdh2ZESfRTOaf3Fho7ad+04VhLmXQUmbkiN7o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id F0E5E1A149C; Wed, 11 Jun 2025 01:02:59 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf15.hostedemail.com (Postfix) with ESMTPA id 10EFD1D; Wed, 11 Jun 2025 01:02:57 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uP9tF-00000000vBB-14C8; Tue, 10 Jun 2025 21:04:29 -0400 Message-ID: <20250611010429.105907436@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 10 Jun 2025 20:54:29 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton Subject: [PATCH v10 08/14] unwind deferred: Use bitmask to determine which callbacks to call References: <20250611005421.144238328@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Stat-Signature: h7wmri5rexriqxq47a86b1istumgoxy1 X-Rspamd-Server: rspamout04 X-Rspamd-Queue-Id: 10EFD1D X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX18aBbMMIjCWoJgDKw4i0r3WCFIfSzFKqQc= X-HE-Tag: 1749603777-742241 X-HE-Meta: U2FsdGVkX18GrfsI081FEYgMXf9It6xZ2Yf5/7PZM+e7MfteXmZA9KHXcfIEKVFS6cPVd2Ws8hZ8O+gZEgfqnr9CeHKaJj3GodUB5h4z49+AtufPmkNI7WbXClDNEf2JBvM4uPB+fWY5Xs4oMq3o/Y89lXZuRtV3aUkT7W9oWZYMqwk8IxhROxDXtCW9kmPFmhjQLGuoIHsoJYiPvMkRTRJNMlWiBYChLjZl+9tGJ7lPu5NGNWhw3ExJxJ9+0BiX3LWli8T8Aq4iZzpzMmXgkVvUrtH9jAHb1Rg4R0MQeBsrjbnIuc/uk6spiNU53NoTcLQLN1e4M0MBW4SuinzHVALkZINvH/ECrGoVkye5VIpBkA3vP18onPxJ8dcvVz1yvtly7L33s+MaZnXJb/qAXk4/mMhrfpsbu2vA61t9HBM= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt In order to know which registered callback requested a stacktrace for when the task goes back to user space, add a bitmask to keep track of all registered tracers. The bitmask is the size of long, which means that on a 32 bit machine, it can have at most 32 registered tracers, and on 64 bit, it can have at most 64 registered tracers. This should not be an issue as there should not be more than 10 (unless BPF can abuse this?). When a tracer registers with unwind_deferred_init() it will get a bit number assigned to it. When a tracer requests a stacktrace, it will have its bit set within the task_struct. When the task returns back to user space, it will call the callbacks for all the registered tracers where their bits are set in the task's mask. When a tracer is removed by the unwind_deferred_cancel() all current tasks will clear the associated bit, just in case another tracer gets registered immediately afterward and then gets their callback called unexpectedly. Signed-off-by: Steven Rostedt (Google) --- Changes since v9: https://lore.kernel.org/linux-trace-kernel/20250513223552= .804390728@goodmis.org/ - Use BIT() macro for bit setting and testing. - Moved the "unwind_mask" from the task_struct into the task->unwind_info structure. include/linux/unwind_deferred.h | 1 + include/linux/unwind_deferred_types.h | 1 + kernel/unwind/deferred.c | 45 ++++++++++++++++++++++----- 3 files changed, 40 insertions(+), 7 deletions(-) diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index a384eef719a3..1789c3624723 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -13,6 +13,7 @@ typedef void (*unwind_callback_t)(struct unwind_work *wor= k, struct unwind_stackt struct unwind_work { struct list_head list; unwind_callback_t func; + int bit; }; =20 #ifdef CONFIG_UNWIND_USER diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index ae27a02234b8..780b00c07208 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -10,6 +10,7 @@ struct unwind_cache { struct unwind_task_info { struct unwind_cache *cache; struct callback_head work; + unsigned long unwind_mask; u64 timestamp; u64 nmi_timestamp; int pending; diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index 88c867c32c01..268afae31ba4 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -16,6 +16,7 @@ /* Guards adding to and reading the list of callbacks */ static DEFINE_MUTEX(callback_mutex); static LIST_HEAD(callbacks); +static unsigned long unwind_mask; =20 /* * Read the task context timestamp, if this is the first caller then @@ -133,7 +134,10 @@ static void unwind_deferred_task_work(struct callback_= head *head) =20 guard(mutex)(&callback_mutex); list_for_each_entry(work, &callbacks, list) { - work->func(work, &trace, timestamp); + if (info->unwind_mask & BIT(work->bit)) { + work->func(work, &trace, timestamp); + clear_bit(work->bit, &info->unwind_mask); + } } } =20 @@ -159,9 +163,12 @@ static int unwind_deferred_request_nmi(struct unwind_w= ork *work, u64 *timestamp) inited_timestamp =3D true; } =20 - if (info->pending) + if (info->unwind_mask & BIT(work->bit)) return 1; =20 + if (info->pending) + goto out; + ret =3D task_work_add(current, &info->work, TWA_NMI_CURRENT); if (ret < 0) { /* @@ -175,8 +182,8 @@ static int unwind_deferred_request_nmi(struct unwind_wo= rk *work, u64 *timestamp) } =20 info->pending =3D 1; - - return 0; +out: + return test_and_set_bit(work->bit, &info->unwind_mask); } =20 /** @@ -223,14 +230,18 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *timestamp) =20 *timestamp =3D get_timestamp(info); =20 + /* This is already queued */ + if (info->unwind_mask & BIT(work->bit)) + return 1; + /* callback already pending? */ pending =3D READ_ONCE(info->pending); if (pending) - return 1; + goto out; =20 /* Claim the work unless an NMI just now swooped in to do so. */ if (!try_cmpxchg(&info->pending, &pending, 1)) - return 1; + goto out; =20 /* The work has been claimed, now schedule it. */ ret =3D task_work_add(current, &info->work, TWA_RESUME); @@ -239,16 +250,27 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *timestamp) return ret; } =20 - return 0; + out: + return test_and_set_bit(work->bit, &info->unwind_mask); } =20 void unwind_deferred_cancel(struct unwind_work *work) { + struct task_struct *g, *t; + if (!work) return; =20 guard(mutex)(&callback_mutex); list_del(&work->list); + + clear_bit(work->bit, &unwind_mask); + + guard(rcu)(); + /* Clear this bit from all threads */ + for_each_process_thread(g, t) { + clear_bit(work->bit, &t->unwind_info.unwind_mask); + } } =20 int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func) @@ -256,6 +278,14 @@ int unwind_deferred_init(struct unwind_work *work, unw= ind_callback_t func) memset(work, 0, sizeof(*work)); =20 guard(mutex)(&callback_mutex); + + /* See if there's a bit in the mask available */ + if (unwind_mask =3D=3D ~0UL) + return -EBUSY; + + work->bit =3D ffz(unwind_mask); + unwind_mask |=3D BIT(work->bit); + list_add(&work->list, &callbacks); work->func =3D func; return 0; @@ -267,6 +297,7 @@ void unwind_task_init(struct task_struct *task) =20 memset(info, 0, sizeof(*info)); init_task_work(&info->work, unwind_deferred_task_work); + info->unwind_mask =3D 0; } =20 void unwind_task_free(struct task_struct *task) --=20 2.47.2 From nobody Sat Oct 11 08:27:36 2025 Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2457C1B808; Wed, 11 Jun 2025 01:03:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603785; cv=none; b=tGo7y6Q0ybSNRWwk6L04Wri13135G2xZPVafPBsQ57AVx/op4bhJlSzwXdOANBULnB9Wjm9iUpM5j0zrXmJFbl/qme5mlW3ZrilJAv0GBkQPUUZGzhPuGsAf5lhhcSCoqndxebyaB/75M21crRaRmeOuPcYK/0wQntw8kilJTys= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603785; c=relaxed/simple; bh=FsBpRUG9mO01SDgziP1KqYDkNXwIMvIjJjcG/j181qo=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=jDXoAByvUkXai90Lp3s7T40EU6uvN3I0AC86JXgDuWFTTCW5cKzEfWQKAASSWXchFO46fy2YT0FnF249GIa4CiKxCVqJ5NCugxHj08IxevOYf5JREWwwwVy1BWuUPTBMC3yA5kkTFjjuHPTLMRXbhfhV96J6VafwTiz92zUErdE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5E672161483; Wed, 11 Jun 2025 01:03:00 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf01.hostedemail.com (Postfix) with ESMTPA id 6240060009; Wed, 11 Jun 2025 01:02:57 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uP9tF-00000000vBf-1mVG; Tue, 10 Jun 2025 21:04:29 -0400 Message-ID: <20250611010429.274682576@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 10 Jun 2025 20:54:30 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton Subject: [PATCH v10 09/14] unwind deferred: Use SRCU unwind_deferred_task_work() References: <20250611005421.144238328@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 6240060009 X-Stat-Signature: dt5tmn58ihicz5p8idskysj4amfx3z6y X-Rspamd-Server: rspamout06 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX1/z9/lilI4oan58reBSJkvqzJyIm0ANZTc= X-HE-Tag: 1749603777-714072 X-HE-Meta: U2FsdGVkX18F1GndJ4MsaZwcZTkiwymhcSzlC32dYaGbcH34MXUm3bsVx4aMVJySzo1QIUnjN7tSgIayhxT8qI9eD29p6HF+Gubz18a6STGzo+Bqj4DAn3ks5DtYjc3IaB4PUrQZSmO/F7q7wlR+UquGIMy2ZHMv2Ap8Y2mwu1UEX9vZ6mrjSHKxaBec45KJh+ZxPUd32zpH93KJi37jgiSt/ED9xrFZwCuNHfkh+bONFggIlGZu7Jo20EpQyPGVdY6cnEc/1K+pYmBwTQhLTeuisFE5dLGnXc3ioHMwR0Ql1eNwXXHZNMmvctL87Ydpxdg35CDSjCEc0tw46TUyh7umy3CSHj+jX22rUCmuLAUHwmssV9nU2JBI2qXnJrMuGYF8KiDWWbZ+DdUl2qU9MXVlkd56ijwaHSHNUePI0mfvIimmlms/QYrkH7ubJFir4Q5MbFgin8Iy/HDKeRi2cyoN0K8tG/YZHKgcsTD+zyk= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Instead of using the callback_mutex to protect the link list of callbacks in unwind_deferred_task_work(), use SRCU instead. This gets called every time a task exits that has to record a stack trace that was requested. This can happen for many tasks on several CPUs at the same time. A mutex is a bottleneck and can cause a bit of contention and slow down performance. As the callbacks themselves are allowed to sleep, regular RCU cannot be used to protect the list. Instead use SRCU, as that still allows the callbacks to sleep and the list can be read without needing to hold the callback_mutex. Link: https://lore.kernel.org/all/ca9bd83a-6c80-4ee0-a83c-224b9d60b755@effi= cios.com/ Suggested-by: Mathieu Desnoyers Signed-off-by: Steven Rostedt (Google) --- kernel/unwind/deferred.c | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index 268afae31ba4..e44538a2be6c 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -13,10 +13,11 @@ =20 #define UNWIND_MAX_ENTRIES 512 =20 -/* Guards adding to and reading the list of callbacks */ +/* Guards adding to or removing from the list of callbacks */ static DEFINE_MUTEX(callback_mutex); static LIST_HEAD(callbacks); static unsigned long unwind_mask; +DEFINE_STATIC_SRCU(unwind_srcu); =20 /* * Read the task context timestamp, if this is the first caller then @@ -107,6 +108,7 @@ static void unwind_deferred_task_work(struct callback_h= ead *head) struct unwind_stacktrace trace; struct unwind_work *work; u64 timestamp; + int idx; =20 if (WARN_ON_ONCE(!info->pending)) return; @@ -132,13 +134,15 @@ static void unwind_deferred_task_work(struct callback= _head *head) =20 timestamp =3D info->timestamp; =20 - guard(mutex)(&callback_mutex); - list_for_each_entry(work, &callbacks, list) { + idx =3D srcu_read_lock(&unwind_srcu); + list_for_each_entry_srcu(work, &callbacks, list, + srcu_read_lock_held(&unwind_srcu)) { if (info->unwind_mask & BIT(work->bit)) { work->func(work, &trace, timestamp); clear_bit(work->bit, &info->unwind_mask); } } + srcu_read_unlock(&unwind_srcu, idx); } =20 static int unwind_deferred_request_nmi(struct unwind_work *work, u64 *time= stamp) @@ -215,6 +219,7 @@ int unwind_deferred_request(struct unwind_work *work, u= 64 *timestamp) { struct unwind_task_info *info =3D ¤t->unwind_info; int pending; + int bit; int ret; =20 *timestamp =3D 0; @@ -226,12 +231,17 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *timestamp) if (in_nmi()) return unwind_deferred_request_nmi(work, timestamp); =20 + /* Do not allow cancelled works to request again */ + bit =3D READ_ONCE(work->bit); + if (WARN_ON_ONCE(bit < 0)) + return -EINVAL; + guard(irqsave)(); =20 *timestamp =3D get_timestamp(info); =20 /* This is already queued */ - if (info->unwind_mask & BIT(work->bit)) + if (info->unwind_mask & BIT(bit)) return 1; =20 /* callback already pending? */ @@ -257,19 +267,26 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *timestamp) void unwind_deferred_cancel(struct unwind_work *work) { struct task_struct *g, *t; + int bit; =20 if (!work) return; =20 guard(mutex)(&callback_mutex); - list_del(&work->list); + list_del_rcu(&work->list); + bit =3D work->bit; + + /* Do not allow any more requests and prevent callbacks */ + work->bit =3D -1; + + clear_bit(bit, &unwind_mask); =20 - clear_bit(work->bit, &unwind_mask); + synchronize_srcu(&unwind_srcu); =20 guard(rcu)(); /* Clear this bit from all threads */ for_each_process_thread(g, t) { - clear_bit(work->bit, &t->unwind_info.unwind_mask); + clear_bit(bit, &t->unwind_info.unwind_mask); } } =20 @@ -286,7 +303,7 @@ int unwind_deferred_init(struct unwind_work *work, unwi= nd_callback_t func) work->bit =3D ffz(unwind_mask); unwind_mask |=3D BIT(work->bit); =20 - list_add(&work->list, &callbacks); + list_add_rcu(&work->list, &callbacks); work->func =3D func; return 0; } --=20 2.47.2 From nobody Sat Oct 11 08:27:36 2025 Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ADB2D198A08; Wed, 11 Jun 2025 01:03:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603792; cv=none; b=UeDNlH5I6c/XR8urFN+38vZHibgzPMXY79slz92S/m+XQhoJeWvTVRP8jxk4uLjAT3ltrltNn3Rg65y9R7/CeARBRmP7nabskXbLL9ITCS8L7b9D1EfiLdWKwi5UCsZBGriym9UsQjXGbiBrUk3MXkE7zUIqc8iLzmkOsHxFB+8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603792; c=relaxed/simple; bh=GqYTdIEKm0sSSny/CHrTeTKB9ADfwQz0Zv9+GC/L6wY=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=CQSPZQ9SBQ+FdhTomgNuMGYaKyTUEAlU2z+Z3piYqs9BE+hrhnCDBXYL3KT7xQI98fbsoy42VZNqIEOW9rrUvDlLvR8G6BxIw/sQyKBg5NWBkGCBbQSkwhkXE7prLSHD90s7OEkHw893UL1htR0DQIK/H3vdSORBDIZAgogyy04= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 92048161486; Wed, 11 Jun 2025 01:03:00 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf08.hostedemail.com (Postfix) with ESMTPA id 9D5B420026; Wed, 11 Jun 2025 01:02:57 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uP9tF-00000000vC9-2UdM; Tue, 10 Jun 2025 21:04:29 -0400 Message-ID: <20250611010429.444947502@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 10 Jun 2025 20:54:31 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton Subject: [PATCH v10 10/14] unwind: Clear unwind_mask on exit back to user space References: <20250611005421.144238328@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 9D5B420026 X-Stat-Signature: fg59oueh9bbmdcgywstxwezj68kq6z7k X-Rspamd-Server: rspamout06 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX19kkvNLFKGIvqZQFuAGv+HHUsJ3D+QpOYI= X-HE-Tag: 1749603777-955215 X-HE-Meta: U2FsdGVkX19GyhXXr3bC7pMGje2l1+7iPJbbwKBw+mPbi8+R6BRNZtIseVl9+aEc1oGa9viHoSn4/OEYYIZ6HFHUWO93eLlXozVXjncfrVTErRQGP9DIFlSEhA2Fx+3R/hJADXJyO5lo8EkLQqveErYEtpgS/3DvfKlH2El+Z2jlZKagjivGmKnmwq5L+Zoh1QPsxJI6ydIWxfiBi8Kf3HjH8hyjRRTiQ/KEivkfFjoTSe27meWFgI8ew9Jpc8xAXJ0DyNxMYAZ584oB2sfoy+COm00wr2QC/E7F8leXhALhK7rKO/r2iz7wTbj+FjWTvk4rmcyPjSjOa+T+nNqMc971keP8D0q7aHCDq1fiGuqoMNClRFPnV9ed9mS/nn0zITiD/ydzOmWbVNX4VlP4j4EwNI+bzCvVrE7qLhkF3J8= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt When testing the deferred unwinder by attaching deferred user space stacktraces to events, a live lock happened. This was when the deferred unwinding was added to the irqs_disabled event, which happens after the task_work callbacks are called and before the task goes back to user space. The event callback would be registered when irqs were disabled, the task_work would trigger, call the callback for this work and clear the work's bit. Then before getting back to user space, irqs would be disabled again, the event triggered again, and a new task_work registered. This caused an infinite loop and the system hung. To prevent this, clear the bits at the very last moment before going back to user space and when instrumentation is disabled. That is in unwind_exit_to_user_mode(). Move the pending bit from a value on the task_struct to the most significant bit of the unwind_mask (saves space on the task_struct). This will allow modifying the pending bit along with the work bits atomically. Instead of clearing a work's bit after its callback is called, it is delayed until exit. If the work is requested again, the task_work is not queued again and the work will be notified that the task has already been called (via UNWIND_ALREADY_EXECUTED return value). The pending bit is cleared before calling the callback functions but the current work bits remain. If one of the called works registers again, it will not trigger a task_work if its bit is still present in the task's unwind_mask. If a new work registers, then it will set both the pending bit and its own bit but clear the other work bits so that their callbacks do not get called again. Signed-off-by: Steven Rostedt (Google) --- Changes since v9: https://lore.kernel.org/all/20250513223553.143567998@good= mis.org/ - Fix compare with ~UNWIND_PENDING_BIT to be ~UNWIND_PENDING - Use BIT() macro for bit setting and testing include/linux/unwind_deferred.h | 28 +++++++- include/linux/unwind_deferred_types.h | 1 - kernel/unwind/deferred.c | 96 +++++++++++++++++++-------- 3 files changed, 93 insertions(+), 32 deletions(-) diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index 1789c3624723..426e21457606 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -18,6 +18,14 @@ struct unwind_work { =20 #ifdef CONFIG_UNWIND_USER =20 +#define UNWIND_PENDING_BIT (BITS_PER_LONG - 1) +#define UNWIND_PENDING BIT(UNWIND_PENDING_BIT) + +enum { + UNWIND_ALREADY_PENDING =3D 1, + UNWIND_ALREADY_EXECUTED =3D 2, +}; + void unwind_task_init(struct task_struct *task); void unwind_task_free(struct task_struct *task); =20 @@ -29,9 +37,23 @@ void unwind_deferred_cancel(struct unwind_work *work); =20 static __always_inline void unwind_exit_to_user_mode(void) { - if (unlikely(current->unwind_info.cache)) - current->unwind_info.cache->nr_entries =3D 0; - current->unwind_info.timestamp =3D 0; + struct unwind_task_info *info =3D ¤t->unwind_info; + unsigned long bits; + + /* Was there any unwinding? */ + if (likely(!info->unwind_mask)) + return; + + bits =3D info->unwind_mask; + do { + /* Is a task_work going to run again before going back */ + if (bits & UNWIND_PENDING) + return; + } while (!try_cmpxchg(&info->unwind_mask, &bits, 0UL)); + + if (likely(info->cache)) + info->cache->nr_entries =3D 0; + info->timestamp =3D 0; } =20 #else /* !CONFIG_UNWIND_USER */ diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index 780b00c07208..f384e7f45783 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -13,7 +13,6 @@ struct unwind_task_info { unsigned long unwind_mask; u64 timestamp; u64 nmi_timestamp; - int pending; }; =20 #endif /* _LINUX_UNWIND_USER_DEFERRED_TYPES_H */ diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index e44538a2be6c..8a6caaae04d3 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -19,6 +19,11 @@ static LIST_HEAD(callbacks); static unsigned long unwind_mask; DEFINE_STATIC_SRCU(unwind_srcu); =20 +static inline bool unwind_pending(struct unwind_task_info *info) +{ + return test_bit(UNWIND_PENDING_BIT, &info->unwind_mask); +} + /* * Read the task context timestamp, if this is the first caller then * it will set the timestamp. @@ -107,14 +112,17 @@ static void unwind_deferred_task_work(struct callback= _head *head) struct unwind_task_info *info =3D container_of(head, struct unwind_task_i= nfo, work); struct unwind_stacktrace trace; struct unwind_work *work; + unsigned long bits; u64 timestamp; int idx; =20 - if (WARN_ON_ONCE(!info->pending)) + if (WARN_ON_ONCE(!unwind_pending(info))) return; =20 - /* Allow work to come in again */ - WRITE_ONCE(info->pending, 0); + /* Clear pending bit but make sure to have the current bits */ + bits =3D READ_ONCE(info->unwind_mask); + while (!try_cmpxchg(&info->unwind_mask, &bits, bits & ~UNWIND_PENDING)) + ; =20 /* * From here on out, the callback must always be called, even if it's @@ -137,10 +145,8 @@ static void unwind_deferred_task_work(struct callback_= head *head) idx =3D srcu_read_lock(&unwind_srcu); list_for_each_entry_srcu(work, &callbacks, list, srcu_read_lock_held(&unwind_srcu)) { - if (info->unwind_mask & BIT(work->bit)) { + if (bits & BIT(work->bit)) work->func(work, &trace, timestamp); - clear_bit(work->bit, &info->unwind_mask); - } } srcu_read_unlock(&unwind_srcu, idx); } @@ -167,10 +173,13 @@ static int unwind_deferred_request_nmi(struct unwind_= work *work, u64 *timestamp) inited_timestamp =3D true; } =20 - if (info->unwind_mask & BIT(work->bit)) - return 1; + /* Is this already queued */ + if (info->unwind_mask & BIT(work->bit)) { + return unwind_pending(info) ? UNWIND_ALREADY_PENDING : + UNWIND_ALREADY_EXECUTED; + } =20 - if (info->pending) + if (unwind_pending(info)) goto out; =20 ret =3D task_work_add(current, &info->work, TWA_NMI_CURRENT); @@ -185,9 +194,17 @@ static int unwind_deferred_request_nmi(struct unwind_w= ork *work, u64 *timestamp) return ret; } =20 - info->pending =3D 1; + /* + * This is the first to set the PENDING_BIT, clear all others + * as any other bit has already had their callback called, and + * those callbacks should not be called again because of this + * new callback. If they request another callback, then they + * will get a new one. + */ + info->unwind_mask =3D UNWIND_PENDING; out: - return test_and_set_bit(work->bit, &info->unwind_mask); + return test_and_set_bit(work->bit, &info->unwind_mask) ? + UNWIND_ALREADY_PENDING : 0; } =20 /** @@ -210,15 +227,17 @@ static int unwind_deferred_request_nmi(struct unwind_= work *work, u64 *timestamp) * it has already been previously called for the same entry context, it wi= ll be * called again with the same stack trace and timestamp. * - * Return: 1 if the the callback was already queued. - * 0 if the callback successfully was queued. + * Return: 0 if the callback successfully was queued. + * UNWIND_ALREADY_PENDING if the the callback was already queued. + * UNWIND_ALREADY_EXECUTED if the callback was already called + * (and will not be called again) * Negative if there's an error. * @timestamp holds the timestamp of the first request by any user */ int unwind_deferred_request(struct unwind_work *work, u64 *timestamp) { struct unwind_task_info *info =3D ¤t->unwind_info; - int pending; + unsigned long old, bits; int bit; int ret; =20 @@ -240,28 +259,49 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *timestamp) =20 *timestamp =3D get_timestamp(info); =20 - /* This is already queued */ - if (info->unwind_mask & BIT(bit)) - return 1; + old =3D READ_ONCE(info->unwind_mask); + + /* Is this already queued */ + if (old & BIT(bit)) { + /* + * If pending is not set, it means this work's callback + * was already called. + */ + return old & UNWIND_PENDING ? UNWIND_ALREADY_PENDING : + UNWIND_ALREADY_EXECUTED; + } =20 - /* callback already pending? */ - pending =3D READ_ONCE(info->pending); - if (pending) + if (unwind_pending(info)) goto out; =20 - /* Claim the work unless an NMI just now swooped in to do so. */ - if (!try_cmpxchg(&info->pending, &pending, 1)) + /* + * This is the first to enable another task_work for this task since + * the task entered the kernel, or had already called the callbacks. + * Set only the bit for this work and clear all others as they have + * already had their callbacks called, and do not need to call them + * again because of this work. + */ + bits =3D UNWIND_PENDING | BIT(bit); + + /* + * If the cmpxchg() fails, it means that an NMI came in and set + * the pending bit as well as cleared the other bits. Just + * jump to setting the bit for this work. + */ + if (!try_cmpxchg(&info->unwind_mask, &old, bits)) goto out; =20 /* The work has been claimed, now schedule it. */ ret =3D task_work_add(current, &info->work, TWA_RESUME); - if (WARN_ON_ONCE(ret)) { - WRITE_ONCE(info->pending, 0); - return ret; - } + + if (WARN_ON_ONCE(ret)) + WRITE_ONCE(info->unwind_mask, 0); + + return ret; =20 out: - return test_and_set_bit(work->bit, &info->unwind_mask); + return test_and_set_bit(work->bit, &info->unwind_mask) ? + UNWIND_ALREADY_PENDING : 0; } =20 void unwind_deferred_cancel(struct unwind_work *work) @@ -297,7 +337,7 @@ int unwind_deferred_init(struct unwind_work *work, unwi= nd_callback_t func) guard(mutex)(&callback_mutex); =20 /* See if there's a bit in the mask available */ - if (unwind_mask =3D=3D ~0UL) + if (unwind_mask =3D=3D ~(UNWIND_PENDING)) return -EBUSY; =20 work->bit =3D ffz(unwind_mask); --=20 2.47.2 From nobody Sat Oct 11 08:27:36 2025 Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7731A18EFD1; Wed, 11 Jun 2025 01:03:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603790; cv=none; b=EJDHqqqWgI0XCYspXiSvTH9R+Fj/m43mwhrgurlaHFfzgsxwzwoqVN9Lpj9U9amgJ9orjqzRkAfGPuDEKbO0wAwXeLvu9lRa4E2ZxD+I1hx5tZMKGNPR3cF2U2d8T9h85eRQwMeFzZJlx5VQv4xzUqoUDtTtXs2jTWZk2Ie7l7c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603790; c=relaxed/simple; bh=JMGALe8NBV0afyaihQpFztZ463RRAe1+FbMrIliyHKI=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=TEqquum8ZKj9v++ugaeubMMsv1Fb39ZbJNjUQVJU/7/mflTcTsi7vPsyeRk8gye8e85virlNRvHwkiNUg80fT3CPbef72Z17juw6sl6qfKCwMHqYD+apy6FZaYckaa3K7ascu/nYEJ6vu5uiUa0fI27570A730UdPTA6NGQPSIw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id AF36B141480; Wed, 11 Jun 2025 01:03:00 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf16.hostedemail.com (Postfix) with ESMTPA id 9AD2120018; Wed, 11 Jun 2025 01:02:57 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uP9tF-00000000vCd-3CtF; Tue, 10 Jun 2025 21:04:29 -0400 Message-ID: <20250611010429.615420889@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 10 Jun 2025 20:54:32 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton Subject: [PATCH v10 11/14] unwind: Finish up unwind when a task exits References: <20250611005421.144238328@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 9AD2120018 X-Rspamd-Server: rspamout08 X-Stat-Signature: pefpzyuk7dkqaumf8tbxifgjnf9e4q3e X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX1+F6qH0ZBVkCT2MTqVaXgAI/nfOofUpnuc= X-HE-Tag: 1749603777-237719 X-HE-Meta: U2FsdGVkX1/V3hKepsGXv1c8Lr4ogUhjysQDKh0bK2168/tgUsLTi4phS9f/YF6+6uAK5yB7GFnHUgV9gZjJuGZyAnU1xWyGsYf4Xjk0q3iikjcHFZsB21KdLLTXWJ9/CpE71r5aZU5JMaUCTog6a2dHX6LkpLpKifzOqNDihfAJLjziH/t+XP1B8ABJgykYLnJ+KqNnpJ07VuPNDsVSGyP57jD1Rj+0h6CDxyK41Wy5DBjALKS714nF8HqVmmlB/QeRJhCLZ9N1aJEWD+aKr7qN1hu0Q6zZLb0EWwpHaRUlM80elWqdz/gmgabPBVsKhJQPgK1WHrvb2JAv2dERYfOEP4g/G6WVx5JJ72JCXw1Sw1Dl2tjh3kjVtgVXyK5D7YYOaVmTunwZUhXRWwrM0jeyCbMqTAH3+9vjke55u6PMulMFTviQRzKW7N+NX8/t Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt On do_exit() when a task is exiting, if a unwind is requested and the deferred user stacktrace is deferred via the task_work, the task_work callback is called after exit_mm() is called in do_exit(). This means that the user stack trace will not be retrieved and an empty stack is created. Instead, add a function unwind_deferred_task_exit() and call it just before exit_mm() so that the unwinder can call the requested callbacks with the user space stack. Signed-off-by: Steven Rostedt (Google) --- include/linux/unwind_deferred.h | 4 +++- kernel/exit.c | 2 ++ kernel/unwind/deferred.c | 23 ++++++++++++++++++++--- 3 files changed, 25 insertions(+), 4 deletions(-) diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index 426e21457606..bf0cc0477b2e 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -35,6 +35,8 @@ int unwind_deferred_init(struct unwind_work *work, unwind= _callback_t func); int unwind_deferred_request(struct unwind_work *work, u64 *timestamp); void unwind_deferred_cancel(struct unwind_work *work); =20 +void unwind_deferred_task_exit(struct task_struct *task); + static __always_inline void unwind_exit_to_user_mode(void) { struct unwind_task_info *info =3D ¤t->unwind_info; @@ -65,7 +67,7 @@ static inline int unwind_deferred_trace(struct unwind_sta= cktrace *trace) { retur static inline int unwind_deferred_init(struct unwind_work *work, unwind_ca= llback_t func) { return -ENOSYS; } static inline int unwind_deferred_request(struct unwind_work *work, u64 *t= imestamp) { return -ENOSYS; } static inline void unwind_deferred_cancel(struct unwind_work *work) {} - +static inline void unwind_deferred_task_exit(struct task_struct *task) {} static inline void unwind_exit_to_user_mode(void) {} =20 #endif /* !CONFIG_UNWIND_USER */ diff --git a/kernel/exit.c b/kernel/exit.c index bd743900354c..6599f9518436 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -68,6 +68,7 @@ #include #include #include +#include #include #include =20 @@ -938,6 +939,7 @@ void __noreturn do_exit(long code) =20 tsk->exit_code =3D code; taskstats_exit(tsk, group_dead); + unwind_deferred_task_exit(tsk); trace_sched_process_exit(tsk, group_dead); =20 exit_mm(); diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index 8a6caaae04d3..6c95f484568e 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -77,7 +77,7 @@ int unwind_deferred_trace(struct unwind_stacktrace *trace) /* Should always be called from faultable context */ might_fault(); =20 - if (current->flags & PF_EXITING) + if (!current->mm) return -EINVAL; =20 if (!info->cache) { @@ -107,9 +107,9 @@ int unwind_deferred_trace(struct unwind_stacktrace *tra= ce) return 0; } =20 -static void unwind_deferred_task_work(struct callback_head *head) +static void process_unwind_deferred(struct task_struct *task) { - struct unwind_task_info *info =3D container_of(head, struct unwind_task_i= nfo, work); + struct unwind_task_info *info =3D &task->unwind_info; struct unwind_stacktrace trace; struct unwind_work *work; unsigned long bits; @@ -151,6 +151,23 @@ static void unwind_deferred_task_work(struct callback_= head *head) srcu_read_unlock(&unwind_srcu, idx); } =20 +static void unwind_deferred_task_work(struct callback_head *head) +{ + process_unwind_deferred(current); +} + +void unwind_deferred_task_exit(struct task_struct *task) +{ + struct unwind_task_info *info =3D ¤t->unwind_info; + + if (!unwind_pending(info)) + return; + + process_unwind_deferred(task); + + task_work_cancel(task, &info->work); +} + static int unwind_deferred_request_nmi(struct unwind_work *work, u64 *time= stamp) { struct unwind_task_info *info =3D ¤t->unwind_info; --=20 2.47.2 From nobody Sat Oct 11 08:27:36 2025 Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF24C17BBF; Wed, 11 Jun 2025 01:03:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603785; cv=none; b=TDXzjFZhZtOX+lebCCY3GewnvtES7RVqmlsbcLNiddKqPtz8r0O4JGe3xiPLPUfZrGzQtvNCV+Q+OkgYFmQNevSgFLXcfGUjI/1XE1SI3OeR+R4thIWviKQJ+CI56ms1Cda0Mkasc+NAul8+2+ymU2iDYKqDtRQ9NdPXjT7yWFk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603785; c=relaxed/simple; bh=/aoScwYOkCF4fLnSJ9PusAogYx2TOp0qIdg16G+Yb7Q=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=jcsslRK1lrycvLR2MgW9vwvXdUgNhreqZhIinEPFNxtOchCFfIc5CXrLu8TAwE9cFMsAO9f4/halm83D06DBA9Vt4K1c0xD73H3l7OsUFx4Nmj6dbJelKOrkFfHNzDQZuTbqH4Ppxj0tgggoVASCiS7aT3o1ZLkTsxgcNCiPJlQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D3DFC1A14A7; Wed, 11 Jun 2025 01:03:00 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf19.hostedemail.com (Postfix) with ESMTPA id DBBE920025; Wed, 11 Jun 2025 01:02:57 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uP9tF-00000000vD8-3vh3; Tue, 10 Jun 2025 21:04:29 -0400 Message-ID: <20250611010429.786324495@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 10 Jun 2025 20:54:33 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton Subject: [PATCH v10 12/14] unwind_user/x86: Enable frame pointer unwinding on x86 References: <20250611005421.144238328@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: DBBE920025 X-Stat-Signature: frtwck145zhd97i5iktdexifwkfoc7zz X-Rspamd-Server: rspamout05 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX18M8zasJQDjxJt8HH0+pZyYzYAqEiZSnH8= X-HE-Tag: 1749603777-828293 X-HE-Meta: U2FsdGVkX1/+Fw78fgA36RGKZ4XEgHBKz9V393shKSBjR40Ny7hwWcjlr4tSJZWNsSN1IeobvXb/hUzQDGaonmjEjEbvW0v0DPcL7DZy2ml9EYCF04ILU+H1zahHZlvCtsL7yw0syvEUMRrla9nWGdng7lZV9S/kIrzpXMoFtabYUL3YZWKwa01ccKsBRIRsS7zamfSCBzCp98DPXsHFkTMBAjKstEuollM+IdwUWgLLnImzOxDI2yzEYwixv4nS2XaFps67DLOMtIB8MsOA0FCafPKh4uEuoEfqAxbuRRbSaaIjCLbfKi5srLIoOeBKMinlcuRp8UdfUrylpaupJBhCBXK6k6IAykZ0Z5M/1I+E/RXrN722PFxbgtn8kw1YrdNM2u+GedX8KoG3NQqzg9k9NMfI1ifo7bvwYsBDCwE= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Use ARCH_INIT_USER_FP_FRAME to describe how frame pointers are unwound on x86, and enable CONFIG_HAVE_UNWIND_USER_FP accordingly so the unwind_user interfaces can be used. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- arch/x86/Kconfig | 1 + arch/x86/include/asm/unwind_user.h | 11 +++++++++++ 2 files changed, 12 insertions(+) create mode 100644 arch/x86/include/asm/unwind_user.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 340e5468980e..2cdb5cf91541 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -302,6 +302,7 @@ config X86 select HAVE_SYSCALL_TRACEPOINTS select HAVE_UACCESS_VALIDATION if HAVE_OBJTOOL select HAVE_UNSTABLE_SCHED_CLOCK + select HAVE_UNWIND_USER_FP if X86_64 select HAVE_USER_RETURN_NOTIFIER select HAVE_GENERIC_VDSO select VDSO_GETRANDOM if X86_64 diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwi= nd_user.h new file mode 100644 index 000000000000..8597857bf896 --- /dev/null +++ b/arch/x86/include/asm/unwind_user.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_UNWIND_USER_H +#define _ASM_X86_UNWIND_USER_H + +#define ARCH_INIT_USER_FP_FRAME \ + .cfa_off =3D (s32)sizeof(long) * 2, \ + .ra_off =3D (s32)sizeof(long) * -1, \ + .fp_off =3D (s32)sizeof(long) * -2, \ + .use_fp =3D true, + +#endif /* _ASM_X86_UNWIND_USER_H */ --=20 2.47.2 From nobody Sat Oct 11 08:27:36 2025 Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF41A1B808; Wed, 11 Jun 2025 01:03:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603790; cv=none; b=VUVVkseYC6Szss/6bE1zxZM2sSw69qXnjhVSb1HOlHf2dKZBvyu4/14ilU+lHuoqlDcqf2tO7RUrFoWSvMhf0bkJfdWANbLo0FXFiFRf/fgXv6lBvfH6hu3qRO/jZ7bFMwGKHT8ocg5QW+1FsOVlTgkacnxGZTJhpivZjSgX7Ig= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603790; c=relaxed/simple; bh=4vDOBIoUHUiC3BHSTDobiE9oNjdxoZfAGrZ2BlXpfY4=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=pFuPkZk/QEF387nCAOJ9mCyk2FwLJZX7omSjRIlI/UiFaski38QcMVjgEx/jfNPT/f9Zh49SwvPPTLlNHwFtL+7pdjYlpVYWkyIaEcZ/zoP8SMovF6GUbFQP2emUnfl8He02s+oJ2/ExRTGwEUMkojp9swQqeiFJWPD2cJOrCe0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 276CD121505; Wed, 11 Jun 2025 01:03:01 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf10.hostedemail.com (Postfix) with ESMTPA id 0EA2C2F; Wed, 11 Jun 2025 01:02:58 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uP9tG-00000000vDc-0QqN; Tue, 10 Jun 2025 21:04:30 -0400 Message-ID: <20250611010429.957013350@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 10 Jun 2025 20:54:34 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton Subject: [PATCH v10 13/14] perf/x86: Rename and move get_segment_base() and make it global References: <20250611005421.144238328@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Server: rspamout01 X-Rspamd-Queue-Id: 0EA2C2F X-Stat-Signature: 3am47d7tteeu7iiet7kfxk6jwxybttkb X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX18DlY18f07S9uesh6GrlFyqx0guSIaJxXw= X-HE-Tag: 1749603778-889697 X-HE-Meta: U2FsdGVkX1/vbJZAAHLOO3kSSGwV5V40UnVUbMFMNya1p047qc4+gHPfm/V68yyGPRGU94TRlCOLCUl0jh/obqixATiwzlq858R/nS1pTIaangx0T5Yb2APEoN/GRz+5V+Eqhn9IbYNIDlrQZh95YIHAFjmtfusgMzDdgoFp3WxqmSV2BshcyF6g6tcq5VZuy/KbXix71gZrRMOXMGxqfA+a/1hhU16OGSU69t+zE8pJo9eIj3hc3eA2Z/prRUTj2ForrQatQgKu6TkRP0lQsys3D6gHiTL/npKA+Z6Gc6H9lPxTRaTrBPqQBKEvGnyn+fh0qhINfKMiAD89yqEwq6L0JPw2O2V4z4mxVNwhfIfQ4UcE1+sQU8v1a2lNGYb5nZp9G2v2of4DDtfd68gU74yocOMuAQjRIu4BOBzWsqA= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf get_segment_base() will be used by the unwind_user code, so make it global and rename it to segment_base_address() so it doesn't conflict with a KVM function of the same name. As the function is no longer specific to perf, move it to ptrace.c as that seems to be a better location for a generic function like this. Also add a lockdep_assert_irqs_disabled() to make sure it's always called with interrupts disabled. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- arch/x86/events/core.c | 44 ++++------------------------------- arch/x86/include/asm/ptrace.h | 2 ++ arch/x86/kernel/ptrace.c | 38 ++++++++++++++++++++++++++++++ 3 files changed, 45 insertions(+), 39 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 7610f26dfbd9..2f2ec84f2a14 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -43,6 +43,7 @@ #include #include #include +#include #include =20 #include "perf_event.h" @@ -2808,41 +2809,6 @@ valid_user_frame(const void __user *fp, unsigned lon= g size) return __access_ok(fp, size); } =20 -static unsigned long get_segment_base(unsigned int segment) -{ - struct desc_struct *desc; - unsigned int idx =3D segment >> 3; - - if ((segment & SEGMENT_TI_MASK) =3D=3D SEGMENT_LDT) { -#ifdef CONFIG_MODIFY_LDT_SYSCALL - struct ldt_struct *ldt; - - /* - * If we're not in a valid context with a real (not just lazy) - * user mm, then don't even try. - */ - if (!nmi_uaccess_okay()) - return 0; - - /* IRQs are off, so this synchronizes with smp_store_release */ - ldt =3D smp_load_acquire(¤t->mm->context.ldt); - if (!ldt || idx >=3D ldt->nr_entries) - return 0; - - desc =3D &ldt->entries[idx]; -#else - return 0; -#endif - } else { - if (idx >=3D GDT_ENTRIES) - return 0; - - desc =3D raw_cpu_ptr(gdt_page.gdt) + idx; - } - - return get_desc_base(desc); -} - #ifdef CONFIG_UPROBES /* * Heuristic-based check if uprobe is installed at the function entry. @@ -2899,8 +2865,8 @@ perf_callchain_user32(struct pt_regs *regs, struct pe= rf_callchain_entry_ctx *ent if (user_64bit_mode(regs)) return 0; =20 - cs_base =3D get_segment_base(regs->cs); - ss_base =3D get_segment_base(regs->ss); + cs_base =3D segment_base_address(regs->cs); + ss_base =3D segment_base_address(regs->ss); =20 fp =3D compat_ptr(ss_base + regs->bp); pagefault_disable(); @@ -3019,11 +2985,11 @@ static unsigned long code_segment_base(struct pt_re= gs *regs) return 0x10 * regs->cs; =20 if (user_mode(regs) && regs->cs !=3D __USER_CS) - return get_segment_base(regs->cs); + return segment_base_address(regs->cs); #else if (user_mode(regs) && !user_64bit_mode(regs) && regs->cs !=3D __USER32_CS) - return get_segment_base(regs->cs); + return segment_base_address(regs->cs); #endif return 0; } diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 50f75467f73d..59357ec98e52 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -314,6 +314,8 @@ static __always_inline bool regs_irqs_disabled(struct p= t_regs *regs) return !(regs->flags & X86_EFLAGS_IF); } =20 +unsigned long segment_base_address(unsigned int segment); + /* Query offset/name of register from its name/offset */ extern int regs_query_register_offset(const char *name); extern const char *regs_query_register_name(unsigned int offset); diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index 095f04bdabdc..81353a09701b 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -41,6 +41,7 @@ #include #include #include +#include =20 #include "tls.h" =20 @@ -339,6 +340,43 @@ static int set_segment_reg(struct task_struct *task, =20 #endif /* CONFIG_X86_32 */ =20 +unsigned long segment_base_address(unsigned int segment) +{ + struct desc_struct *desc; + unsigned int idx =3D segment >> 3; + + lockdep_assert_irqs_disabled(); + + if ((segment & SEGMENT_TI_MASK) =3D=3D SEGMENT_LDT) { +#ifdef CONFIG_MODIFY_LDT_SYSCALL + struct ldt_struct *ldt; + + /* + * If we're not in a valid context with a real (not just lazy) + * user mm, then don't even try. + */ + if (!nmi_uaccess_okay()) + return 0; + + /* IRQs are off, so this synchronizes with smp_store_release */ + ldt =3D smp_load_acquire(¤t->mm->context.ldt); + if (!ldt || idx >=3D ldt->nr_entries) + return 0; + + desc =3D &ldt->entries[idx]; +#else + return 0; +#endif + } else { + if (idx >=3D GDT_ENTRIES) + return 0; + + desc =3D raw_cpu_ptr(gdt_page.gdt) + idx; + } + + return get_desc_base(desc); +} + static unsigned long get_flags(struct task_struct *task) { unsigned long retval =3D task_pt_regs(task)->flags; --=20 2.47.2 From nobody Sat Oct 11 08:27:36 2025 Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6EAC018DF86; Wed, 11 Jun 2025 01:03:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603790; cv=none; b=L6EVe0Kh771KW9u9uW62zLNqOB2GUCYxXaoMrXwVEJEdJC2j7sk7kXH8MiIinNmvmW09aPIPA7TuWUhvVmpQFIUNmUf/ceogxBltAsCFls5qia5bjkCAT9MKAvKJm2AeTOMzgNbgYu08nGTCPUE1x93C+971JY9fIUUliDdVUfU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749603790; c=relaxed/simple; bh=bPBajh8EGf4+bDng5T4QRryXcyGpgQ9DELBqLYicx48=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=jyYnHZj/DnL1SP/v1t0gouLFV0fgYpeupc+cOuM6OkGaLu9yha5dOc3wqLgR32+Y838p8ux5KiMmDOB58oYVats86aSIqgxNDi1eLtf9A0t/TTjf/Sn5ZcgRnbz5+THmbHScmvXa7+Ooa+xLbicMD5o/tzI1xCJQttAkr87gYjM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 314A45F7BD; Wed, 11 Jun 2025 01:03:01 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf02.hostedemail.com (Postfix) with ESMTPA id 2CAEB8000E; Wed, 11 Jun 2025 01:02:58 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uP9tG-00000000vE6-1AAw; Tue, 10 Jun 2025 21:04:30 -0400 Message-ID: <20250611010430.123232579@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 10 Jun 2025 20:54:35 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton Subject: [PATCH v10 14/14] unwind_user/x86: Enable compat mode frame pointer unwinding on x86 References: <20250611005421.144238328@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 2CAEB8000E X-Rspamd-Server: rspamout08 X-Stat-Signature: xtsqf539zgjt8abwi56oeejptekfnsiy X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX1/Z5lvnMFhI3Ba4nW/U9c/pN3/34EKD5Bs= X-HE-Tag: 1749603778-441344 X-HE-Meta: U2FsdGVkX1/gv6KFmGkvhJ9fjkJYNLktsCGvhOkgNiXiPGe4f3c0Ub8SSRV2pPNu+Nedz1j+af1NzU7fkW5lYvaO4psTf5JaKiCgw73Xe/eGTOeQROKZxuUWVhtpEca0H1ZUyWwfb82W7yljgnugyhDhd7LJ29D+hdr2aFCZIUWguFfGx51GFespPeGGU6nSIqGBmqNMUEv+lSa6kTam9nSvWfxffb1Pt0n6f4iJQGjGZzrvDkR/Tj5ngV7T/GCwJcQzOTZM3VvNrgFAl4rzc5a2yRTyd2bcke+osnGVFpn0p+dgNYad8G5taDkanS+TGw4XQNcgX4Z3T0U1OIQD+kD4+q7qJMtU7lG2eQBRgWPq0HOVQt/IbGYNFJgP7qbRL0gZHodRTqLcBkuxd0CCz3Zb4fU4zpIFbKWLxgMqKYA= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Use ARCH_INIT_USER_COMPAT_FP_FRAME to describe how frame pointers are unwound on x86, and implement the hooks needed to add the segment base addresses. Enable HAVE_UNWIND_USER_COMPAT_FP if the system has compat mode compiled in. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- Changes since v9: https://lore.kernel.org/linux-trace-kernel/20250513223551= .966925463@goodmis.org/ - Remove unneeded include of perf_event.h arch/x86/Kconfig | 1 + arch/x86/include/asm/unwind_user.h | 49 ++++++++++++++++++++++++ arch/x86/include/asm/unwind_user_types.h | 17 ++++++++ 3 files changed, 67 insertions(+) create mode 100644 arch/x86/include/asm/unwind_user_types.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2cdb5cf91541..3f7bdc9e3cec 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -302,6 +302,7 @@ config X86 select HAVE_SYSCALL_TRACEPOINTS select HAVE_UACCESS_VALIDATION if HAVE_OBJTOOL select HAVE_UNSTABLE_SCHED_CLOCK + select HAVE_UNWIND_USER_COMPAT_FP if IA32_EMULATION select HAVE_UNWIND_USER_FP if X86_64 select HAVE_USER_RETURN_NOTIFIER select HAVE_GENERIC_VDSO diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwi= nd_user.h index 8597857bf896..43f8554c1d70 100644 --- a/arch/x86/include/asm/unwind_user.h +++ b/arch/x86/include/asm/unwind_user.h @@ -2,10 +2,59 @@ #ifndef _ASM_X86_UNWIND_USER_H #define _ASM_X86_UNWIND_USER_H =20 +#include +#include + #define ARCH_INIT_USER_FP_FRAME \ .cfa_off =3D (s32)sizeof(long) * 2, \ .ra_off =3D (s32)sizeof(long) * -1, \ .fp_off =3D (s32)sizeof(long) * -2, \ .use_fp =3D true, =20 +#ifdef CONFIG_IA32_EMULATION + +#define ARCH_INIT_USER_COMPAT_FP_FRAME \ + .cfa_off =3D (s32)sizeof(u32) * 2, \ + .ra_off =3D (s32)sizeof(u32) * -1, \ + .fp_off =3D (s32)sizeof(u32) * -2, \ + .use_fp =3D true, + +#define in_compat_mode(regs) !user_64bit_mode(regs) + +static inline void arch_unwind_user_init(struct unwind_user_state *state, + struct pt_regs *regs) +{ + unsigned long cs_base, ss_base; + + if (state->type !=3D UNWIND_USER_TYPE_COMPAT_FP) + return; + + scoped_guard(irqsave) { + cs_base =3D segment_base_address(regs->cs); + ss_base =3D segment_base_address(regs->ss); + } + + state->arch.cs_base =3D cs_base; + state->arch.ss_base =3D ss_base; + + state->ip +=3D cs_base; + state->sp +=3D ss_base; + state->fp +=3D ss_base; +} +#define arch_unwind_user_init arch_unwind_user_init + +static inline void arch_unwind_user_next(struct unwind_user_state *state) +{ + if (state->type !=3D UNWIND_USER_TYPE_COMPAT_FP) + return; + + state->ip +=3D state->arch.cs_base; + state->fp +=3D state->arch.ss_base; +} +#define arch_unwind_user_next arch_unwind_user_next + +#endif /* CONFIG_IA32_EMULATION */ + +#include + #endif /* _ASM_X86_UNWIND_USER_H */ diff --git a/arch/x86/include/asm/unwind_user_types.h b/arch/x86/include/as= m/unwind_user_types.h new file mode 100644 index 000000000000..d7074dc5f0ce --- /dev/null +++ b/arch/x86/include/asm/unwind_user_types.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_UNWIND_USER_TYPES_H +#define _ASM_UNWIND_USER_TYPES_H + +#ifdef CONFIG_IA32_EMULATION + +struct arch_unwind_user_state { + unsigned long ss_base; + unsigned long cs_base; +}; +#define arch_unwind_user_state arch_unwind_user_state + +#endif /* CONFIG_IA32_EMULATION */ + +#include + +#endif /* _ASM_UNWIND_USER_TYPES_H */ --=20 2.47.2