From nobody Wed Oct 8 07:23:47 2025 Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 869BE1CF7AF; Tue, 1 Jul 2025 00:54:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331267; cv=none; b=JeZILb/8RHyZmPkGSoY/1J8ZNr4VClGX5jEoX7GOrVNcjesnAA9WWXiLyDVxC0Uy/HPWhxiizGqnArl9esPbqEQ42OMnNPNBAz6gCZKfThUIWoQv0TRrUahpbx5RhmND7N2HBJoBguA/u1FyrMOn3PMm70Bc84pWmIz91yp3lg0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331267; c=relaxed/simple; bh=QGBLnIi+j9zMG5Lbvu2woOu0wvn5YNbJ7OJKCcW7VMA=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=so7qN0ONo/L9kqT0K4vvCd7mPkVJwpv60Inaj2Kivdp61rizont5qFFSxiYoUVOmlc8hUOfFCABpawn1ZZHH5ZPTCEfYTEWU5J6U29nrpfXPV0BdjwAcEHB/UztN868fQyo5+K5B1g7/FnH74oxbwjXGmI1rLiu/HpMS6IzoC9M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 82D9356AE0; Tue, 1 Jul 2025 00:54:17 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf08.hostedemail.com (Postfix) with ESMTPA id 25D2120025; Tue, 1 Jul 2025 00:54:14 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uWPGs-00000007Nek-3dM6; Mon, 30 Jun 2025 20:54:50 -0400 Message-ID: <20250701005450.721228270@goodmis.org> User-Agent: quilt/0.68 Date: Mon, 30 Jun 2025 20:53:22 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton , Jens Axboe , Florian Weimer Subject: [PATCH v12 01/14] unwind_user: Add user space unwinding API References: <20250701005321.942306427@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Stat-Signature: w6ko9nhacixri3o94xuxaw78t4qr5hrd X-Rspamd-Server: rspamout08 X-Rspamd-Queue-Id: 25D2120025 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX1/jVWKF89UtBzqrdFvjGjru1Ie3KlXETcg= X-HE-Tag: 1751331254-699482 X-HE-Meta: U2FsdGVkX18Z6kXyOqNf7JqlOMT1T3O3UEjxLPO+s6iw97i0tZNLv3/zjJ4w+Yk29Ft+UogZ7fXd18Q+yJZgkYOf/XIQy6ZK2NiwdyoKtSS3ExH1jgebgAyh9ycxmix0vWnkWKi4AZEQ4jApOOoFzQIWOUj5j29hK19woIhYQ0JDeWF+3sXgj8FyvKQyb8fK+ZGikdgKLU2q+lzvqXTq2dRUdefglU5fafGnv9aPAICNXDajq9SJf1OaUrF8n+t+tzAwiL2FA9r0jdDyCq7KAP2T93cluU253Ov8I1D50e5vdcOgn63E5GYR2gfhVre9qkzltnhZaV+UffOJFzzRfCOYfDD8d1rD+ASfbreRxy50Czy9Fj72n2UnUca1xn+ijkchJ6vg+zgAjyWiTpZQEQOrEIexoqG1y7x3sXBRAaLgGt/zTT7jTfHir5IbaJc2Ee4oU0UVuNo= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Introduce a generic API for unwinding user stacks. In order to expand user space unwinding to be able to handle more complex scenarios, such as deferred unwinding and reading user space information, create a generic interface that all architectures can use that support the various unwinding methods. This is an alternative method for handling user space stack traces from the simple stack_trace_save_user() API. This does not replace that interface, but this interface will be used to expand the functionality of user space stack walking. None of the structures introduced will be exposed to user space tooling. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- MAINTAINERS | 8 +++++ arch/Kconfig | 3 ++ include/linux/unwind_user.h | 15 +++++++++ include/linux/unwind_user_types.h | 31 +++++++++++++++++ kernel/Makefile | 1 + kernel/unwind/Makefile | 1 + kernel/unwind/user.c | 55 +++++++++++++++++++++++++++++++ 7 files changed, 114 insertions(+) create mode 100644 include/linux/unwind_user.h create mode 100644 include/linux/unwind_user_types.h create mode 100644 kernel/unwind/Makefile create mode 100644 kernel/unwind/user.c diff --git a/MAINTAINERS b/MAINTAINERS index 4bac4ea21b64..ed5705c4f7d9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -25924,6 +25924,14 @@ F: Documentation/driver-api/uio-howto.rst F: drivers/uio/ F: include/linux/uio_driver.h =20 +USERSPACE STACK UNWINDING +M: Josh Poimboeuf +M: Steven Rostedt +S: Maintained +F: include/linux/unwind*.h +F: kernel/unwind/ + + UTIL-LINUX PACKAGE M: Karel Zak L: util-linux@vger.kernel.org diff --git a/arch/Kconfig b/arch/Kconfig index a3308a220f86..ea59e5d7cc69 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -435,6 +435,9 @@ config HAVE_HARDLOCKUP_DETECTOR_ARCH It uses the same command line parameters, and sysctl interface, as the generic hardlockup detectors. =20 +config UNWIND_USER + bool + config HAVE_PERF_REGS bool help diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h new file mode 100644 index 000000000000..aa7923c1384f --- /dev/null +++ b/include/linux/unwind_user.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_UNWIND_USER_H +#define _LINUX_UNWIND_USER_H + +#include + +int unwind_user_start(struct unwind_user_state *state); +int unwind_user_next(struct unwind_user_state *state); + +int unwind_user(struct unwind_stacktrace *trace, unsigned int max_entries); + +#define for_each_user_frame(state) \ + for (unwind_user_start((state)); !(state)->done; unwind_user_next((state)= )) + +#endif /* _LINUX_UNWIND_USER_H */ diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_= types.h new file mode 100644 index 000000000000..6ed1b4ae74e1 --- /dev/null +++ b/include/linux/unwind_user_types.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_UNWIND_USER_TYPES_H +#define _LINUX_UNWIND_USER_TYPES_H + +#include + +enum unwind_user_type { + UNWIND_USER_TYPE_NONE, +}; + +struct unwind_stacktrace { + unsigned int nr; + unsigned long *entries; +}; + +struct unwind_user_frame { + s32 cfa_off; + s32 ra_off; + s32 fp_off; + bool use_fp; +}; + +struct unwind_user_state { + unsigned long ip; + unsigned long sp; + unsigned long fp; + enum unwind_user_type type; + bool done; +}; + +#endif /* _LINUX_UNWIND_USER_TYPES_H */ diff --git a/kernel/Makefile b/kernel/Makefile index 32e80dd626af..541186050251 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -55,6 +55,7 @@ obj-y +=3D rcu/ obj-y +=3D livepatch/ obj-y +=3D dma/ obj-y +=3D entry/ +obj-y +=3D unwind/ obj-$(CONFIG_MODULES) +=3D module/ =20 obj-$(CONFIG_KCMP) +=3D kcmp.o diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile new file mode 100644 index 000000000000..349ce3677526 --- /dev/null +++ b/kernel/unwind/Makefile @@ -0,0 +1 @@ + obj-$(CONFIG_UNWIND_USER) +=3D user.o diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c new file mode 100644 index 000000000000..d30449328981 --- /dev/null +++ b/kernel/unwind/user.c @@ -0,0 +1,55 @@ +// SPDX-License-Identifier: GPL-2.0 +/* +* Generic interfaces for unwinding user space +*/ +#include +#include +#include +#include + +int unwind_user_next(struct unwind_user_state *state) +{ + /* no implementation yet */ + return -EINVAL; +} + +int unwind_user_start(struct unwind_user_state *state) +{ + struct pt_regs *regs =3D task_pt_regs(current); + + memset(state, 0, sizeof(*state)); + + if ((current->flags & PF_KTHREAD) || !user_mode(regs)) { + state->done =3D true; + return -EINVAL; + } + + state->type =3D UNWIND_USER_TYPE_NONE; + + state->ip =3D instruction_pointer(regs); + state->sp =3D user_stack_pointer(regs); + state->fp =3D frame_pointer(regs); + + return 0; +} + +int unwind_user(struct unwind_stacktrace *trace, unsigned int max_entries) +{ + struct unwind_user_state state; + + trace->nr =3D 0; + + if (!max_entries) + return -EINVAL; + + if (current->flags & PF_KTHREAD) + return 0; + + for_each_user_frame(&state) { + trace->entries[trace->nr++] =3D state.ip; + if (trace->nr >=3D max_entries) + break; + } + + return 0; +} --=20 2.47.2 From nobody Wed Oct 8 07:23:47 2025 Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 072DA339A1; Tue, 1 Jul 2025 00:54:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331262; cv=none; b=DwlDf+eWzuIsaT0FbYXskPaVY3kXR4Zvnq0D/7F43GYJhvOBFncLWBWlO4JN2k6OshBwgKBG3jn/FCgBXEs4iG5Hft7H94pNdfNus+r+Y7sDJyOPj0D65mtW9rcPG/mL6BKdz7CTDdLi2P7qMRRBNBYQE1tiiFZ3GTcub7amAN8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331262; c=relaxed/simple; bh=5Iasa2WIfq3X+AqTWa+BKLntbTlcAOBg3PK5qlqtnys=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=J/P3Hja/HvhB24Y3jYQJIQ8DNehTbtpolUJBdDMD8O5PsiNGhWnt26oSedFvwXRuzm/wUPKhkKwxC/ClQl+zFhKrHNWSiBDzHKDP5nKS+RXhvc4KCxszoAbhZ/Hnnugnc07W6DKZbyCM0pr16861G0LcRp5+Hbbm7+/mpko4PhM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9F89CB75E5; Tue, 1 Jul 2025 00:54:17 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf20.hostedemail.com (Postfix) with ESMTPA id 35D1C20025; Tue, 1 Jul 2025 00:54:14 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uWPGt-00000007NfE-096E; Mon, 30 Jun 2025 20:54:51 -0400 Message-ID: <20250701005450.888492528@goodmis.org> User-Agent: quilt/0.68 Date: Mon, 30 Jun 2025 20:53:23 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton , Jens Axboe , Florian Weimer Subject: [PATCH v12 02/14] unwind_user: Add frame pointer support References: <20250701005321.942306427@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Stat-Signature: cmy8obsme1cmom97qsitmp8chnekokxo X-Rspamd-Server: rspamout07 X-Rspamd-Queue-Id: 35D1C20025 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX1910KeiyFB1zMtgzd1/IStIF7uzHz5wEpY= X-HE-Tag: 1751331254-994453 X-HE-Meta: U2FsdGVkX1/8PikpHKWDXxedWTvuKqpUfhekNDq3rJv+5P1OCBp9S+8p9nRSkMOCYnitPJY3ampgGueKaT5heZUNC3waAjV3A1oTcgNX3ujTyUs4vxIjn6HosG/k7QUlHjzk/sTEmOkaA0n1KX9hDRD+LhqK1pDPljKM+48YHEZH5vS8deXZzBoBzHbQCqQxeBCgIanF5J556Ns7OqiifCJ0pS5bSVnOxyRkI8ImQ6oKLAgEFMzwDt0Be5HvpRTz1ZXLszB4Gtah69pDMU4c3kDSTj2nklU0b9his2TkQjKVgENwJasPmCuUObBIBkwb4oPI8pVkxGcs7YApO7PHJsKtdBYipgx+EjwTJMEPN5lrHSHxEIjjOONOBkV4Ey9k4lmDUVNVZbJN6GIMImTwsM+F2SI5BFRa69WYlYm3rMM= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Add optional support for user space frame pointer unwinding. If supported, the arch needs to enable CONFIG_HAVE_UNWIND_USER_FP and define ARCH_INIT_USER_FP_FRAME. By encoding the frame offsets in struct unwind_user_frame, much of this code can also be reused for future unwinder implementations like sframe. Signed-off-by: Josh Poimboeuf Co-developed-by: Steven Rostedt (Google) Signed-off-by: Steven Rostedt (Google) --- arch/Kconfig | 4 +++ include/asm-generic/Kbuild | 1 + include/asm-generic/unwind_user.h | 5 +++ include/linux/unwind_user.h | 5 +++ include/linux/unwind_user_types.h | 1 + kernel/unwind/user.c | 51 +++++++++++++++++++++++++++++-- 6 files changed, 65 insertions(+), 2 deletions(-) create mode 100644 include/asm-generic/unwind_user.h diff --git a/arch/Kconfig b/arch/Kconfig index ea59e5d7cc69..8e3fd723bd74 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -438,6 +438,10 @@ config HAVE_HARDLOCKUP_DETECTOR_ARCH config UNWIND_USER bool =20 +config HAVE_UNWIND_USER_FP + bool + select UNWIND_USER + config HAVE_PERF_REGS bool help diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild index 8675b7b4ad23..295c94a3ccc1 100644 --- a/include/asm-generic/Kbuild +++ b/include/asm-generic/Kbuild @@ -59,6 +59,7 @@ mandatory-y +=3D tlbflush.h mandatory-y +=3D topology.h mandatory-y +=3D trace_clock.h mandatory-y +=3D uaccess.h +mandatory-y +=3D unwind_user.h mandatory-y +=3D vermagic.h mandatory-y +=3D vga.h mandatory-y +=3D video.h diff --git a/include/asm-generic/unwind_user.h b/include/asm-generic/unwind= _user.h new file mode 100644 index 000000000000..b8882b909944 --- /dev/null +++ b/include/asm-generic/unwind_user.h @@ -0,0 +1,5 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_UNWIND_USER_H +#define _ASM_GENERIC_UNWIND_USER_H + +#endif /* _ASM_GENERIC_UNWIND_USER_H */ diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h index aa7923c1384f..a405111c41b0 100644 --- a/include/linux/unwind_user.h +++ b/include/linux/unwind_user.h @@ -3,6 +3,11 @@ #define _LINUX_UNWIND_USER_H =20 #include +#include + +#ifndef ARCH_INIT_USER_FP_FRAME + #define ARCH_INIT_USER_FP_FRAME +#endif =20 int unwind_user_start(struct unwind_user_state *state); int unwind_user_next(struct unwind_user_state *state); diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_= types.h index 6ed1b4ae74e1..65bd070eb6b0 100644 --- a/include/linux/unwind_user_types.h +++ b/include/linux/unwind_user_types.h @@ -6,6 +6,7 @@ =20 enum unwind_user_type { UNWIND_USER_TYPE_NONE, + UNWIND_USER_TYPE_FP, }; =20 struct unwind_stacktrace { diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c index d30449328981..1201d655654a 100644 --- a/kernel/unwind/user.c +++ b/kernel/unwind/user.c @@ -6,10 +6,54 @@ #include #include #include +#include + +static struct unwind_user_frame fp_frame =3D { + ARCH_INIT_USER_FP_FRAME +}; + +static inline bool fp_state(struct unwind_user_state *state) +{ + return IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP) && + state->type =3D=3D UNWIND_USER_TYPE_FP; +} =20 int unwind_user_next(struct unwind_user_state *state) { - /* no implementation yet */ + struct unwind_user_frame *frame; + unsigned long cfa =3D 0, fp, ra =3D 0; + + if (state->done) + return -EINVAL; + + if (fp_state(state)) + frame =3D &fp_frame; + else + goto done; + + /* Get the Canonical Frame Address (CFA) */ + cfa =3D (frame->use_fp ? state->fp : state->sp) + frame->cfa_off; + + /* stack going in wrong direction? */ + if (cfa <=3D state->sp) + goto done; + + /* Find the Return Address (RA) */ + if (get_user(ra, (unsigned long *)(cfa + frame->ra_off))) + goto done; + + if (frame->fp_off && get_user(fp, (unsigned long __user *)(cfa + frame->f= p_off))) + goto done; + + state->ip =3D ra; + state->sp =3D cfa; + if (frame->fp_off) + state->fp =3D fp; + + return 0; + +done: + state->done =3D true; return -EINVAL; } =20 @@ -24,7 +68,10 @@ int unwind_user_start(struct unwind_user_state *state) return -EINVAL; } =20 - state->type =3D UNWIND_USER_TYPE_NONE; + if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP)) + state->type =3D UNWIND_USER_TYPE_FP; + else + state->type =3D UNWIND_USER_TYPE_NONE; =20 state->ip =3D instruction_pointer(regs); state->sp =3D user_stack_pointer(regs); --=20 2.47.2 From nobody Wed Oct 8 07:23:47 2025 Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA85F1D5ABF; Tue, 1 Jul 2025 00:54:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331268; cv=none; b=Z3nk1JVR+WwXCdSqSwKVjKUCY/3+WyTadBpvaebveMuf27U9I7ymh/iNvCL0AUURpBjqrgVOKH0FNJ65NGsw3b1SAXVQ7w07xLwDWlkV9ZlBdJ/IpOYle6OTEKMP5SOs9YEywYK6lWL89+B+Ui3YuinorDTNSOSM1vMK7BZmWSA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331268; c=relaxed/simple; bh=q2QsKEvtnIKsRdYt6OTQ9dXxhzHlZleBu1wcHPb4KmA=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=fdRTXlvTZCatZU98wf/UjNBhzDTW5QNwXW2hcUfFWQOwWIynGojaKxuh+PgkdO6BIkgPP1U6tZaR5dR3i88/MdJrKfz0h1chq0X31j52ZVGBR/U+bYUWczeD50eCYpya5qHwb3+4hTSBhw6C8aoqaBL+iti2rB7aWB7TuphaCvM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BEF2A1D2D80; Tue, 1 Jul 2025 00:54:17 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf03.hostedemail.com (Postfix) with ESMTPA id 6CFE56000B; Tue, 1 Jul 2025 00:54:14 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uWPGt-00000007Nfl-0rbh; Mon, 30 Jun 2025 20:54:51 -0400 Message-ID: <20250701005451.055982038@goodmis.org> User-Agent: quilt/0.68 Date: Mon, 30 Jun 2025 20:53:24 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton , Jens Axboe , Florian Weimer Subject: [PATCH v12 03/14] unwind_user: Add compat mode frame pointer support References: <20250701005321.942306427@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Stat-Signature: 5hpuu3emhuceecpm319cziywgnc1ygs9 X-Rspamd-Server: rspamout08 X-Rspamd-Queue-Id: 6CFE56000B X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX19k3B6nUg7dyH2F0CIW8f4HjMWdFjC0G5o= X-HE-Tag: 1751331254-312351 X-HE-Meta: U2FsdGVkX1/DSa5YrJ0EqUry+7RAgMfcXxUqCjstXSI5ukRGng9ZgNNxgrGk/YwqDao/Ro4lFJEFqz4L0LnbckzEagG/18D4Up9iuc1YwSGa1Z1sTGXvhyCAlf61vlU+qtaJ0IucN3D86xW10XDkW2mfFfD0ug/iHG9kSPuEZYz5dOsYUbS4DLjT4d4HRvG8qf8ZX16gv14+pbbAbU3OddcEjAqYbRLHIHon/qTdoS57X0ZThTx4Zy963MRatzUpPM/hliGsdDy9DANdbUb4hr3lKnpTyX0H/vlQL/kBIBSsCUtBGnVvc0WAR2l6NhZBhnUU+9/71gamt4qO9ljixls1lGHcDuWvZ4qu41UXuXZq+h36aR0RI7lDgml/S7X9poGAsT8Gmg5ra03jDoxU3mHHjWOKC5TQDujB4Yasr/g= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Add optional support for user space compat mode frame pointer unwinding. If supported, the arch needs to enable CONFIG_HAVE_UNWIND_USER_COMPAT_FP and define ARCH_INIT_USER_COMPAT_FP_FRAME. Signed-off-by: Josh Poimboeuf Co-developed-by: Steven Rostedt (Google) Signed-off-by: Steven Rostedt (Google) --- arch/Kconfig | 4 ++++ include/asm-generic/Kbuild | 1 + include/asm-generic/unwind_user_types.h | 5 ++++ include/linux/unwind_user.h | 5 ++++ include/linux/unwind_user_types.h | 7 ++++++ kernel/unwind/user.c | 32 +++++++++++++++++++++---- 6 files changed, 50 insertions(+), 4 deletions(-) create mode 100644 include/asm-generic/unwind_user_types.h diff --git a/arch/Kconfig b/arch/Kconfig index 8e3fd723bd74..2c41d3072910 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -442,6 +442,10 @@ config HAVE_UNWIND_USER_FP bool select UNWIND_USER =20 +config HAVE_UNWIND_USER_COMPAT_FP + bool + depends on HAVE_UNWIND_USER_FP + config HAVE_PERF_REGS bool help diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild index 295c94a3ccc1..b797a2434396 100644 --- a/include/asm-generic/Kbuild +++ b/include/asm-generic/Kbuild @@ -60,6 +60,7 @@ mandatory-y +=3D topology.h mandatory-y +=3D trace_clock.h mandatory-y +=3D uaccess.h mandatory-y +=3D unwind_user.h +mandatory-y +=3D unwind_user_types.h mandatory-y +=3D vermagic.h mandatory-y +=3D vga.h mandatory-y +=3D video.h diff --git a/include/asm-generic/unwind_user_types.h b/include/asm-generic/= unwind_user_types.h new file mode 100644 index 000000000000..f568b82e52cd --- /dev/null +++ b/include/asm-generic/unwind_user_types.h @@ -0,0 +1,5 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_UNWIND_USER_TYPES_H +#define _ASM_GENERIC_UNWIND_USER_TYPES_H + +#endif /* _ASM_GENERIC_UNWIND_USER_TYPES_H */ diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h index a405111c41b0..ac007363820a 100644 --- a/include/linux/unwind_user.h +++ b/include/linux/unwind_user.h @@ -9,6 +9,11 @@ #define ARCH_INIT_USER_FP_FRAME #endif =20 +#ifndef ARCH_INIT_USER_COMPAT_FP_FRAME + #define ARCH_INIT_USER_COMPAT_FP_FRAME + #define in_compat_mode(regs) false +#endif + int unwind_user_start(struct unwind_user_state *state); int unwind_user_next(struct unwind_user_state *state); =20 diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_= types.h index 65bd070eb6b0..0b6563951ca4 100644 --- a/include/linux/unwind_user_types.h +++ b/include/linux/unwind_user_types.h @@ -3,10 +3,16 @@ #define _LINUX_UNWIND_USER_TYPES_H =20 #include +#include + +#ifndef arch_unwind_user_state +struct arch_unwind_user_state {}; +#endif =20 enum unwind_user_type { UNWIND_USER_TYPE_NONE, UNWIND_USER_TYPE_FP, + UNWIND_USER_TYPE_COMPAT_FP, }; =20 struct unwind_stacktrace { @@ -25,6 +31,7 @@ struct unwind_user_state { unsigned long ip; unsigned long sp; unsigned long fp; + struct arch_unwind_user_state arch; enum unwind_user_type type; bool done; }; diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c index 1201d655654a..3a0ac4346f5b 100644 --- a/kernel/unwind/user.c +++ b/kernel/unwind/user.c @@ -12,12 +12,32 @@ static struct unwind_user_frame fp_frame =3D { ARCH_INIT_USER_FP_FRAME }; =20 +static struct unwind_user_frame compat_fp_frame =3D { + ARCH_INIT_USER_COMPAT_FP_FRAME +}; + static inline bool fp_state(struct unwind_user_state *state) { return IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP) && state->type =3D=3D UNWIND_USER_TYPE_FP; } =20 +static inline bool compat_fp_state(struct unwind_user_state *state) +{ + return IS_ENABLED(CONFIG_HAVE_UNWIND_USER_COMPAT_FP) && + state->type =3D=3D UNWIND_USER_TYPE_COMPAT_FP; +} + +#define unwind_get_user_long(to, from, state) \ +({ \ + int __ret; \ + if (compat_fp_state(state)) \ + __ret =3D get_user(to, (u32 __user *)(from)); \ + else \ + __ret =3D get_user(to, (unsigned long __user *)(from)); \ + __ret; \ +}) + int unwind_user_next(struct unwind_user_state *state) { struct unwind_user_frame *frame; @@ -26,7 +46,9 @@ int unwind_user_next(struct unwind_user_state *state) if (state->done) return -EINVAL; =20 - if (fp_state(state)) + if (compat_fp_state(state)) + frame =3D &compat_fp_frame; + else if (fp_state(state)) frame =3D &fp_frame; else goto done; @@ -39,10 +61,10 @@ int unwind_user_next(struct unwind_user_state *state) goto done; =20 /* Find the Return Address (RA) */ - if (get_user(ra, (unsigned long *)(cfa + frame->ra_off))) + if (unwind_get_user_long(ra, cfa + frame->ra_off, state)) goto done; =20 - if (frame->fp_off && get_user(fp, (unsigned long __user *)(cfa + frame->f= p_off))) + if (frame->fp_off && unwind_get_user_long(fp, cfa + frame->fp_off, state)) goto done; =20 state->ip =3D ra; @@ -68,7 +90,9 @@ int unwind_user_start(struct unwind_user_state *state) return -EINVAL; } =20 - if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP)) + if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_COMPAT_FP) && in_compat_mode(regs)) + state->type =3D UNWIND_USER_TYPE_COMPAT_FP; + else if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP)) state->type =3D UNWIND_USER_TYPE_FP; else state->type =3D UNWIND_USER_TYPE_NONE; --=20 2.47.2 From nobody Wed Oct 8 07:23:47 2025 Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 993B21D5AB5; Tue, 1 Jul 2025 00:54:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331268; cv=none; b=cZUZ2tzvRnN+pWCEIjPidz8KYGPHTdg3Us7vutwL9r3Xo+RNoDyjVKbd74xYG2ax906QRCcE9uzEzFAU4xmH6C7FFIys/QWF2+AXpzPeTdtyfdoADTbTNKtVgpYU0zeg3Tvs6f0gEbWUis/KwNnNLzv+BbJr2Gv4QUAMcI1dERs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331268; c=relaxed/simple; bh=D6WzydJYTzpIkcTqf6Eb08Cdq6+JZu7JrzcdUkXXQ6w=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=lQFAZHEi8ecmmg99YL01hUpvqY6kixBG4ptSEQj0n1GzeQ+lz68S1Yn9P4cRG5w4a1jw35tanJcdC7gPos75nrVnDC3PYDg2JB/+hjQSXqiBH0yfkEtszMwM6dVyJ5VnmsOj012nRqZSFcI+KqpcaevovitLXQm4sA829ff77mQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C7EDF80156; Tue, 1 Jul 2025 00:54:17 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf08.hostedemail.com (Postfix) with ESMTPA id 6924C20026; Tue, 1 Jul 2025 00:54:14 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uWPGt-00000007NgF-1aDM; Mon, 30 Jun 2025 20:54:51 -0400 Message-ID: <20250701005451.227323466@goodmis.org> User-Agent: quilt/0.68 Date: Mon, 30 Jun 2025 20:53:25 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton , Jens Axboe , Florian Weimer Subject: [PATCH v12 04/14] unwind_user/deferred: Add unwind_user_faultable() References: <20250701005321.942306427@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Stat-Signature: z4q967ah719d3beswdjeh8131rx381o5 X-Rspamd-Server: rspamout07 X-Rspamd-Queue-Id: 6924C20026 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX1/l3RTp6CG+cdEjA4JPYr6K3kHlF/VoYr8= X-HE-Tag: 1751331254-697426 X-HE-Meta: U2FsdGVkX18Dvo4sdXIUygnYnnMfYtfuZeRcr0eJhvUs9TnL7SNdxnlTStMDfRCbk6L2DmGCMikPUTiX5463FvNusDyU6mCpDRseMS6NJ7hu8a/W6JdiVTKshEFDtCN0G7XuxUvQoOKLzCHp4Qyx81j1KgeC9FLLc+28WSe6PYxry7AlNM5Nx56++G5FIHSBcLl0KEXcVdk7YD7K/4ooWvTZlDZQHGlRCQpLCcpvU+4OGFzBh3ps/pAwoWlEFEVibiluIByWy/XYimor/+N77jfvEPsvqQAGDtqKvcT0c6A4cqAkcoa1oQFIfiZNhzkNflYNisGtElmG4SnZEObNuYpQXR/NRVJFzqfKppZkmjhqyyK7yW+RavlI7vn6iN+kSBe9N/zl8EehfUgxW/KBb/yLMwzORw3mvHrhiLZ+9Xs= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Add a new API to retrieve a user space callstack called unwind_user_faultable(). The difference between this user space stack tracer from the current user space stack tracer is that this must be called from faultable context as it may use routines to access user space data that needs to be faulted in. It can be safely called from entering or exiting a system call as the code can still be faulted in there. This code is based on work by Josh Poimboeuf's deferred unwinding code: Link: https://lore.kernel.org/all/6052e8487746603bdb29b65f4033e739092d9925.= 1737511963.git.jpoimboe@kernel.org/ Signed-off-by: Steven Rostedt (Google) --- include/linux/sched.h | 5 +++ include/linux/unwind_deferred.h | 24 +++++++++++ include/linux/unwind_deferred_types.h | 9 ++++ kernel/fork.c | 4 ++ kernel/unwind/Makefile | 2 +- kernel/unwind/deferred.c | 60 +++++++++++++++++++++++++++ 6 files changed, 103 insertions(+), 1 deletion(-) create mode 100644 include/linux/unwind_deferred.h create mode 100644 include/linux/unwind_deferred_types.h create mode 100644 kernel/unwind/deferred.c diff --git a/include/linux/sched.h b/include/linux/sched.h index 4f78a64beb52..59fdf7d9bb1e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -46,6 +46,7 @@ #include #include #include +#include #include =20 /* task_struct member predeclarations (sorted alphabetically): */ @@ -1654,6 +1655,10 @@ struct task_struct { struct user_event_mm *user_event_mm; #endif =20 +#ifdef CONFIG_UNWIND_USER + struct unwind_task_info unwind_info; +#endif + /* CPU-specific state of this task: */ struct thread_struct thread; =20 diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h new file mode 100644 index 000000000000..a5f6e8f8a1a2 --- /dev/null +++ b/include/linux/unwind_deferred.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_UNWIND_USER_DEFERRED_H +#define _LINUX_UNWIND_USER_DEFERRED_H + +#include +#include + +#ifdef CONFIG_UNWIND_USER + +void unwind_task_init(struct task_struct *task); +void unwind_task_free(struct task_struct *task); + +int unwind_user_faultable(struct unwind_stacktrace *trace); + +#else /* !CONFIG_UNWIND_USER */ + +static inline void unwind_task_init(struct task_struct *task) {} +static inline void unwind_task_free(struct task_struct *task) {} + +static inline int unwind_user_faultable(struct unwind_stacktrace *trace) {= return -ENOSYS; } + +#endif /* !CONFIG_UNWIND_USER */ + +#endif /* _LINUX_UNWIND_USER_DEFERRED_H */ diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h new file mode 100644 index 000000000000..aa32db574e43 --- /dev/null +++ b/include/linux/unwind_deferred_types.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_UNWIND_USER_DEFERRED_TYPES_H +#define _LINUX_UNWIND_USER_DEFERRED_TYPES_H + +struct unwind_task_info { + unsigned long *entries; +}; + +#endif /* _LINUX_UNWIND_USER_DEFERRED_TYPES_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 1ee8eb11f38b..3341d50c61f2 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -105,6 +105,7 @@ #include #include #include +#include =20 #include #include @@ -732,6 +733,7 @@ void __put_task_struct(struct task_struct *tsk) WARN_ON(refcount_read(&tsk->usage)); WARN_ON(tsk =3D=3D current); =20 + unwind_task_free(tsk); sched_ext_free(tsk); io_uring_free(tsk); cgroup_free(tsk); @@ -2135,6 +2137,8 @@ __latent_entropy struct task_struct *copy_process( p->bpf_ctx =3D NULL; #endif =20 + unwind_task_init(p); + /* Perform scheduler related setup. Assign this task to a CPU. */ retval =3D sched_fork(clone_flags, p); if (retval) diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile index 349ce3677526..eae37bea54fd 100644 --- a/kernel/unwind/Makefile +++ b/kernel/unwind/Makefile @@ -1 +1 @@ - obj-$(CONFIG_UNWIND_USER) +=3D user.o + obj-$(CONFIG_UNWIND_USER) +=3D user.o deferred.o diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c new file mode 100644 index 000000000000..a0badbeb3cc1 --- /dev/null +++ b/kernel/unwind/deferred.c @@ -0,0 +1,60 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Deferred user space unwinding + */ +#include +#include +#include +#include + +#define UNWIND_MAX_ENTRIES 512 + +/** + * unwind_user_faultable - Produce a user stacktrace in faultable context + * @trace: The descriptor that will store the user stacktrace + * + * This must be called in a known faultable context (usually when entering + * or exiting user space). Depending on the available implementations + * the @trace will be loaded with the addresses of the user space stacktra= ce + * if it can be found. + * + * Return: 0 on success and negative on error + * On success @trace will contain the user space stacktrace + */ +int unwind_user_faultable(struct unwind_stacktrace *trace) +{ + struct unwind_task_info *info =3D ¤t->unwind_info; + + /* Should always be called from faultable context */ + might_fault(); + + if (current->flags & PF_EXITING) + return -EINVAL; + + if (!info->entries) { + info->entries =3D kmalloc_array(UNWIND_MAX_ENTRIES, sizeof(long), + GFP_KERNEL); + if (!info->entries) + return -ENOMEM; + } + + trace->nr =3D 0; + trace->entries =3D info->entries; + unwind_user(trace, UNWIND_MAX_ENTRIES); + + return 0; +} + +void unwind_task_init(struct task_struct *task) +{ + struct unwind_task_info *info =3D &task->unwind_info; + + memset(info, 0, sizeof(*info)); +} + +void unwind_task_free(struct task_struct *task) +{ + struct unwind_task_info *info =3D &task->unwind_info; + + kfree(info->entries); +} --=20 2.47.2 From nobody Wed Oct 8 07:23:47 2025 Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97EA41537A7; Tue, 1 Jul 2025 00:54:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331264; cv=none; b=I+dBLm9sjyFIyycfhIOqI3rkVeUUvfiKRqnkwu7PbWphIV3VZmdp/1x3xrSbBM+MnRMF4FpjE7nz3Q47y1OinJc/EBVqcxTAz6HHxtt1PC685xM6rpsbScdMzOV9sFfIyqLLkOmv+DzFvJ21/YF1Kxjwn9/vg51enUxCscD0UOE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331264; c=relaxed/simple; bh=khOhKXtWFZeIJYQHz9mkKupo2UshQTC5+3EPNwqv7Gk=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Szl3hC1BJHstYY1lgiN8Jc+5/MvzTjXJe+X2MBNk4TZ8BNvj1Y46RO+75ErepP1BZIywE1iduZWSVR2F2oGEJvVaQMgtsVO5ygpYXTsB5or8hNrx2PcOjdYw1InTW6PYPCbAQqZLRG25/RSMbGfcLVni5BD9AWmgjMMXBcKzFOg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 19BB71A01C0; Tue, 1 Jul 2025 00:54:18 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf13.hostedemail.com (Postfix) with ESMTPA id B09472000D; Tue, 1 Jul 2025 00:54:14 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uWPGt-00000007Ngj-2J5G; Mon, 30 Jun 2025 20:54:51 -0400 Message-ID: <20250701005451.398606540@goodmis.org> User-Agent: quilt/0.68 Date: Mon, 30 Jun 2025 20:53:26 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton , Jens Axboe , Florian Weimer Subject: [PATCH v12 05/14] unwind_user/deferred: Add unwind cache References: <20250701005321.942306427@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Stat-Signature: 3fjxz643k3uxuhoibpjm7tum3k1oq9ej X-Rspamd-Server: rspamout03 X-Rspamd-Queue-Id: B09472000D X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX19mDYCqy0cnuKrX8GU15TwsOgKu/j5SZVs= X-HE-Tag: 1751331254-693368 X-HE-Meta: U2FsdGVkX1/RHFmfBJ/m6WptKh8HO2oo3RcT9iP35lzLtwG/sprPmsGKlZdcQ7oY5MtNKrS+atGVgY6YHEgqHsWcKxzMw2IYqqJu81M58mYv8k+QQWaOmdb2EEqkolZumZDHDmqxnsIVJsneyusq0YZsuGO/cpgL46Cvj/K2LiKy8ZCEWJOgN6o3sQSZhu1Ch7fJ7002oiwy0uATfxK48MCj8Kyz7Jo4nOkJVA4aRjSo/NX0FtwoLjcJ4fWxFR2P3UTRbvi1GnmHWvh+yoSKhiT7FeQox+XUHrJQlNcxLGqj5qjewqmYI30JaZ66KYMpcf3DVQZ8KKLj+bEyFKxF+e2txrDUdDe/S9OP48G/ESaS5c8ytEz+O1Jx7IRK16y5IckfXoS29Dy6ES8gK6iirKAi+BwZgiNeRGsji4LmwBk= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Cache the results of the unwind to ensure the unwind is only performed once, even when called by multiple tracers. The cache nr_entries gets cleared every time the task exits the kernel. When a stacktrace is requested, nr_entries gets set to the number of entries in the stacktrace. If another stacktrace is requested, if nr_entries is not zero, then it contains the same stacktrace that would be retrieved so it is not processed again and the entries is given to the caller. Co-developed-by: Steven Rostedt (Google) Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- include/linux/entry-common.h | 2 ++ include/linux/unwind_deferred.h | 8 +++++++ include/linux/unwind_deferred_types.h | 7 +++++- kernel/unwind/deferred.c | 31 +++++++++++++++++++++------ 4 files changed, 40 insertions(+), 8 deletions(-) diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index f94f3fdf15fc..8908b8eeb99b 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -12,6 +12,7 @@ #include #include #include +#include =20 #include #include @@ -362,6 +363,7 @@ static __always_inline void exit_to_user_mode(void) lockdep_hardirqs_on_prepare(); instrumentation_end(); =20 + unwind_reset_info(); user_enter_irqoff(); arch_exit_to_user_mode(); lockdep_hardirqs_on(CALLER_ADDR0); diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index a5f6e8f8a1a2..baacf4a1eb4c 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -12,6 +12,12 @@ void unwind_task_free(struct task_struct *task); =20 int unwind_user_faultable(struct unwind_stacktrace *trace); =20 +static __always_inline void unwind_reset_info(void) +{ + if (unlikely(current->unwind_info.cache)) + current->unwind_info.cache->nr_entries =3D 0; +} + #else /* !CONFIG_UNWIND_USER */ =20 static inline void unwind_task_init(struct task_struct *task) {} @@ -19,6 +25,8 @@ static inline void unwind_task_free(struct task_struct *t= ask) {} =20 static inline int unwind_user_faultable(struct unwind_stacktrace *trace) {= return -ENOSYS; } =20 +static inline void unwind_reset_info(void) {} + #endif /* !CONFIG_UNWIND_USER */ =20 #endif /* _LINUX_UNWIND_USER_DEFERRED_H */ diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index aa32db574e43..db5b54b18828 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -2,8 +2,13 @@ #ifndef _LINUX_UNWIND_USER_DEFERRED_TYPES_H #define _LINUX_UNWIND_USER_DEFERRED_TYPES_H =20 +struct unwind_cache { + unsigned int nr_entries; + unsigned long entries[]; +}; + struct unwind_task_info { - unsigned long *entries; + struct unwind_cache *cache; }; =20 #endif /* _LINUX_UNWIND_USER_DEFERRED_TYPES_H */ diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index a0badbeb3cc1..96368a5aa522 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -4,10 +4,13 @@ */ #include #include +#include #include #include =20 -#define UNWIND_MAX_ENTRIES 512 +/* Make the cache fit in a 4K page */ +#define UNWIND_MAX_ENTRIES \ + ((SZ_4K - sizeof(struct unwind_cache)) / sizeof(long)) =20 /** * unwind_user_faultable - Produce a user stacktrace in faultable context @@ -24,6 +27,7 @@ int unwind_user_faultable(struct unwind_stacktrace *trace) { struct unwind_task_info *info =3D ¤t->unwind_info; + struct unwind_cache *cache; =20 /* Should always be called from faultable context */ might_fault(); @@ -31,17 +35,30 @@ int unwind_user_faultable(struct unwind_stacktrace *tra= ce) if (current->flags & PF_EXITING) return -EINVAL; =20 - if (!info->entries) { - info->entries =3D kmalloc_array(UNWIND_MAX_ENTRIES, sizeof(long), - GFP_KERNEL); - if (!info->entries) + if (!info->cache) { + info->cache =3D kzalloc(struct_size(cache, entries, UNWIND_MAX_ENTRIES), + GFP_KERNEL); + if (!info->cache) return -ENOMEM; } =20 + cache =3D info->cache; + trace->entries =3D cache->entries; + + if (cache->nr_entries) { + /* + * The user stack has already been previously unwound in this + * entry context. Skip the unwind and use the cache. + */ + trace->nr =3D cache->nr_entries; + return 0; + } + trace->nr =3D 0; - trace->entries =3D info->entries; unwind_user(trace, UNWIND_MAX_ENTRIES); =20 + cache->nr_entries =3D trace->nr; + return 0; } =20 @@ -56,5 +73,5 @@ void unwind_task_free(struct task_struct *task) { struct unwind_task_info *info =3D &task->unwind_info; =20 - kfree(info->entries); + kfree(info->cache); } --=20 2.47.2 From nobody Wed Oct 8 07:23:47 2025 Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDC5670810; Tue, 1 Jul 2025 00:54:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331263; cv=none; b=CWsjTLFatTraoKA9ZrCGzwWWEVqbo3Y/4PjCrFGH4J8HK1MM0DM8WgGmiDPRYJDKSYtfLOWQ+f+BWmp2Ur2Wvr1Wbv3moia9vE7QdrKX3r6t1nCD0AU/eFSxeo/QK4jLWt6Lo809ilAKupDpu2f9j5bWrsXebgLvAoQ7yD+6xAM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331263; c=relaxed/simple; bh=/fhP0G5w5Mdbxa/N1ajD+zL9+BN18lQDlsVO4FIlCbE=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=UJVOJsC7LRBGUSz7PJc/2rcwj+NXfLbG6reZbqDh18oVLs3MSd3XHz1pJcWl/8ODaavTQzSH9+ajU9S158J/hYi0m+VYkjQ5qQZ9RGIoNRGIu4A040/9Bdi/8cRN3ObD5cNfZtq5tPCNVfhDVprM7McZR+yp0Y4KhciNvD/Jrgo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 497BDC01F7; Tue, 1 Jul 2025 00:54:18 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf18.hostedemail.com (Postfix) with ESMTPA id D4B7930; Tue, 1 Jul 2025 00:54:14 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uWPGt-00000007NhD-30SH; Mon, 30 Jun 2025 20:54:51 -0400 Message-ID: <20250701005451.571473750@goodmis.org> User-Agent: quilt/0.68 Date: Mon, 30 Jun 2025 20:53:27 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton , Jens Axboe , Florian Weimer Subject: [PATCH v12 06/14] unwind_user/deferred: Add deferred unwinding interface References: <20250701005321.942306427@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Stat-Signature: 5s9d1ggaott1gj6zfy8c7c51xa1cehqn X-Rspamd-Server: rspamout04 X-Rspamd-Queue-Id: D4B7930 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX18t4EHvimXWuG5Uw0QPGQCEJafeb1XXSI0= X-HE-Tag: 1751331254-871883 X-HE-Meta: U2FsdGVkX1/SLFt6rq3TrYMtXp+jkc657ZLMPhCdlCDmW8qtTfHiavCD5KvbOdDuR0PdHbYG5ftVuSmCCRwa/AfdoA+XfNGp3w8GkEfcaH6+40etbazd+rCcizWVzdnKVf8PzPk+UQcngms5Kcu4pQGaDdliSKRAQGD+shTarquaTj9prNCRutSOx4AxNyJGksLeSJnNdl+7f7cL0swiKYS04c1olfNHRUZldP7pRm5/kmJnUhe5NA93MxCyq9rKeNvRsfXIyAl392awNkSHro65yUjEHGfMW0wKdF6UYTZnYAxz+jSX/EV/9rg1FpiXgrxwxF2mlkdfMm+DIEun60XMoin3eYCbyf5U+UbqWO8rpa56qKGMGEatE+Hf3z5IbxDO7rqy0gQXsG/AbaUGUc/KtGOOITDlLbvlFwkjxoU= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Add an interface for scheduling task work to unwind the user space stack before returning to user space. This solves several problems for its callers: - Ensure the unwind happens in task context even if the caller may be running in interrupt context. - Avoid duplicate unwinds, whether called multiple times by the same caller or by different callers. - Take a timestamp when the first request comes in since the task entered the kernel. This will be returned to the calling function along with the stack trace when the task leaves the kernel. This timestamp can be used to correlate kernel unwinds/traces with the user unwind. For this to work properly, the architecture must have a local_clock() resolution that guarantees a different timestamp per a task systemcall. The timestamp is created to detect when the stacktrace is the same. It is generated the first time a user space stacktrace is requested after the task enters the kernel. The timestamp is passed to the caller on request, and when the stacktrace is generated upon returning to user space, it will call the requester's callba= ck with the timestamp as well as the stacktrace. The timestamp is cleared when it goes back to user space. Note, this currently adds another conditional to the unwind_reset_info() path that is always called returning to user space, but future changes will put this back to a single conditional. A global list is created and protected by a global mutex that holds tracers that register with the unwind infrastructure. The number of registered tracers will be limited in future changes. Each perf program or ftrace instance will register its own descriptor to use for deferred unwind stack traces. Note, in the function unwind_deferred_task_work() that gets called when returning to user space, it uses a global mutex for synchronization which will cause a big bottleneck. This will be replaced by SRCU, but that change adds some complex synchronization that deservers its own commit. Co-developed-by: Steven Rostedt (Google) Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- Changes since v11: https://lore.kernel.org/20250625225715.825831885@goodmis= .org - Still need to clear cache->nr_entries in unwind_reset_info even if timestamp is zero. This is because unwind_user_faultable() can be called directly and it requires nr_entries to be zeroed but it does not touch the timestamp. include/linux/unwind_deferred.h | 24 +++++ include/linux/unwind_deferred_types.h | 3 + kernel/unwind/deferred.c | 139 +++++++++++++++++++++++++- 3 files changed, 165 insertions(+), 1 deletion(-) diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index baacf4a1eb4c..c6548e8d64d1 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -2,9 +2,19 @@ #ifndef _LINUX_UNWIND_USER_DEFERRED_H #define _LINUX_UNWIND_USER_DEFERRED_H =20 +#include #include #include =20 +struct unwind_work; + +typedef void (*unwind_callback_t)(struct unwind_work *work, struct unwind_= stacktrace *trace, u64 timestamp); + +struct unwind_work { + struct list_head list; + unwind_callback_t func; +}; + #ifdef CONFIG_UNWIND_USER =20 void unwind_task_init(struct task_struct *task); @@ -12,8 +22,19 @@ void unwind_task_free(struct task_struct *task); =20 int unwind_user_faultable(struct unwind_stacktrace *trace); =20 +int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func); +int unwind_deferred_request(struct unwind_work *work, u64 *timestamp); +void unwind_deferred_cancel(struct unwind_work *work); + static __always_inline void unwind_reset_info(void) { + if (unlikely(current->unwind_info.timestamp)) + current->unwind_info.timestamp =3D 0; + /* + * As unwind_user_faultable() can be called directly and + * depends on nr_entries being cleared on exit to user, + * this needs to be a separate conditional. + */ if (unlikely(current->unwind_info.cache)) current->unwind_info.cache->nr_entries =3D 0; } @@ -24,6 +45,9 @@ static inline void unwind_task_init(struct task_struct *t= ask) {} static inline void unwind_task_free(struct task_struct *task) {} =20 static inline int unwind_user_faultable(struct unwind_stacktrace *trace) {= return -ENOSYS; } +static inline int unwind_deferred_init(struct unwind_work *work, unwind_ca= llback_t func) { return -ENOSYS; } +static inline int unwind_deferred_request(struct unwind_work *work, u64 *t= imestamp) { return -ENOSYS; } +static inline void unwind_deferred_cancel(struct unwind_work *work) {} =20 static inline void unwind_reset_info(void) {} =20 diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index db5b54b18828..5df264cf81ad 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -9,6 +9,9 @@ struct unwind_cache { =20 struct unwind_task_info { struct unwind_cache *cache; + struct callback_head work; + u64 timestamp; + int pending; }; =20 #endif /* _LINUX_UNWIND_USER_DEFERRED_TYPES_H */ diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index 96368a5aa522..d5f2c004a5b0 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -2,16 +2,43 @@ /* * Deferred user space unwinding */ +#include +#include +#include +#include #include #include #include #include -#include +#include =20 /* Make the cache fit in a 4K page */ #define UNWIND_MAX_ENTRIES \ ((SZ_4K - sizeof(struct unwind_cache)) / sizeof(long)) =20 +/* Guards adding to and reading the list of callbacks */ +static DEFINE_MUTEX(callback_mutex); +static LIST_HEAD(callbacks); + +/* + * Read the task context timestamp, if this is the first caller then + * it will set the timestamp. + * + * For this to work properly, the timestamp (local_clock()) must + * have a resolution that will guarantee a different timestamp + * everytime a task makes a system call. That is, two short + * system calls back to back must have a different timestamp. + */ +static u64 get_timestamp(struct unwind_task_info *info) +{ + lockdep_assert_irqs_disabled(); + + if (!info->timestamp) + info->timestamp =3D local_clock(); + + return info->timestamp; +} + /** * unwind_user_faultable - Produce a user stacktrace in faultable context * @trace: The descriptor that will store the user stacktrace @@ -62,11 +89,120 @@ int unwind_user_faultable(struct unwind_stacktrace *tr= ace) return 0; } =20 +static void unwind_deferred_task_work(struct callback_head *head) +{ + struct unwind_task_info *info =3D container_of(head, struct unwind_task_i= nfo, work); + struct unwind_stacktrace trace; + struct unwind_work *work; + u64 timestamp; + + if (WARN_ON_ONCE(!info->pending)) + return; + + /* Allow work to come in again */ + WRITE_ONCE(info->pending, 0); + + /* + * From here on out, the callback must always be called, even if it's + * just an empty trace. + */ + trace.nr =3D 0; + trace.entries =3D NULL; + + unwind_user_faultable(&trace); + + timestamp =3D info->timestamp; + + guard(mutex)(&callback_mutex); + list_for_each_entry(work, &callbacks, list) { + work->func(work, &trace, timestamp); + } +} + +/** + * unwind_deferred_request - Request a user stacktrace on task exit + * @work: Unwind descriptor requesting the trace + * @timestamp: The time stamp of the first request made for this task + * + * Schedule a user space unwind to be done in task work before exiting the + * kernel. + * + * The returned @timestamp output is the timestamp of the very first reque= st + * for a user space stacktrace for this task since it entered the kernel. + * It can be from a request by any caller of this infrastructure. + * Its value will also be passed to the callback function. It can be + * used to stitch kernel and user stack traces together in post-processing. + * + * Note, the architecture must have a local_clock() implementation that + * guarantees a different timestamp per task systemcall. + * + * It's valid to call this function multiple times for the same @work with= in + * the same task entry context. Each call will return the same timestamp + * while the task hasn't left the kernel. If the callback is not pending b= ecause + * it has already been previously called for the same entry context, it wi= ll be + * called again with the same stack trace and timestamp. + * + * Return: 1 if the the callback was already queued. + * 0 if the callback successfully was queued. + * Negative if there's an error. + * @timestamp holds the timestamp of the first request by any user + */ +int unwind_deferred_request(struct unwind_work *work, u64 *timestamp) +{ + struct unwind_task_info *info =3D ¤t->unwind_info; + int ret; + + *timestamp =3D 0; + + if (WARN_ON_ONCE(in_nmi())) + return -EINVAL; + + if ((current->flags & (PF_KTHREAD | PF_EXITING)) || + !user_mode(task_pt_regs(current))) + return -EINVAL; + + guard(irqsave)(); + + *timestamp =3D get_timestamp(info); + + /* callback already pending? */ + if (info->pending) + return 1; + + /* The work has been claimed, now schedule it. */ + ret =3D task_work_add(current, &info->work, TWA_RESUME); + if (WARN_ON_ONCE(ret)) + return ret; + + info->pending =3D 1; + return 0; +} + +void unwind_deferred_cancel(struct unwind_work *work) +{ + if (!work) + return; + + guard(mutex)(&callback_mutex); + list_del(&work->list); +} + +int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func) +{ + memset(work, 0, sizeof(*work)); + + guard(mutex)(&callback_mutex); + list_add(&work->list, &callbacks); + work->func =3D func; + return 0; +} + void unwind_task_init(struct task_struct *task) { struct unwind_task_info *info =3D &task->unwind_info; =20 memset(info, 0, sizeof(*info)); + init_task_work(&info->work, unwind_deferred_task_work); } =20 void unwind_task_free(struct task_struct *task) @@ -74,4 +210,5 @@ void unwind_task_free(struct task_struct *task) struct unwind_task_info *info =3D &task->unwind_info; =20 kfree(info->cache); + task_work_cancel(task, &info->work); } --=20 2.47.2 From nobody Wed Oct 8 07:23:47 2025 Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1CBEB16F0FE; Tue, 1 Jul 2025 00:54:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331265; cv=none; b=Ce666ndAG2ckKhDQ3bVJPx54b9WgiYG96kRPcdH4A2q6DFXof3wo+FiQWK5w/FRcuT1EOFs+ipF3YQEqp3qfWGSFmP0dX0U1zGFrqNBHMavjO6zXy+8PHVdw3ueJWBwGsoIupc0bDh2c1z0kvE1sfxGDgQA1dWPHvMJ5cfQVRts= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331265; c=relaxed/simple; bh=1wS6lakLpdteLnsr88ulp7l7j3pQT8qjs+yZVHKMq4U=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=ETzQPeJyxxqAxWo7aKSyfXDFM/Zy8lrJ5PecVNE50R2fyjn5oNUbBAx5epqi+YP02QKNwwcEuFI3RJlboJ17U6BMO9YaXo/wCHBfeAC48qUizF/CLf2X9PAaS8liJt6QnHi23lHHUKbCkuMb0h3mnKDQd7/RcdH7Z4jDJ43XRqE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A7A1E1A01FD; Tue, 1 Jul 2025 00:54:18 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf10.hostedemail.com (Postfix) with ESMTPA id 1B48332; Tue, 1 Jul 2025 00:54:15 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uWPGt-00000007Nhh-3iMF; Mon, 30 Jun 2025 20:54:51 -0400 Message-ID: <20250701005451.737614486@goodmis.org> User-Agent: quilt/0.68 Date: Mon, 30 Jun 2025 20:53:28 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton , Jens Axboe , Florian Weimer Subject: [PATCH v12 07/14] unwind_user/deferred: Make unwind deferral requests NMI-safe References: <20250701005321.942306427@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Stat-Signature: 7z9e4fs7ybwjdnz9x7y9zk9wnq3aw1y7 X-Rspamd-Server: rspamout01 X-Rspamd-Queue-Id: 1B48332 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX1+gF0XbNysM/GqWHbUi2JRP7t9ZL6/Vm78= X-HE-Tag: 1751331255-438644 X-HE-Meta: U2FsdGVkX1/xLg5aS2WODqWxEAGHGdPtWfOluZKKdNvKKS7qO0N90CsoaNDWcX4eq1Wi5fFRzH0MvdMbhJCOo3IZY2eKNFj9HRnwByQ9fpR7O3gu2wrCPErGMAWuvHEH1uPb5wvBpdHgeB1jgV5pyMryzDktJ7tWZ7riCob61CFt0n2zthIsL9KAhgFpTVNmjCd5YGtJxSoMFjBo7M3U+ivtI21L6WFq16ykq8mDiTUuH5X99rJIhKpss9N813nX1OSW5dUld1LmJNQcCddadbD/qQXfKb+9uSKlJzA9xk6innXRwmjej4vJWYY8l9Uu1uSPt6UGgSN1zUVWrhxa2MtljipiBR6NyPzwkHl21zJUxCnJf3/vG4zE8qBGfcvd/LeY6a9kNPI8+27AizX82lMKfd6+hlQoJXqY4mnAigRo+na9z8VLTUtzBMSPM66hr+Xk/bHH2eFCBni3+JXtTg== Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Make unwind_deferred_request() NMI-safe so tracers in NMI context can call it and safely request a user space stacktrace when the task exits. Note, this is only allowed for architectures that implement a safe 64 bit cmpxchg. Which rules out some 32bit architectures and even some 64 bit ones. If an architecture requests a deferred stack trace from NMI context that does not support a safe NMI 64 bit cmpxchg, it will get an -EINVAL. For those architectures, they would need another method (perhaps an irqwork), to request a deferred user space stack trace. That can be dealt with later if one of theses architectures require this feature. Suggested-by: Peter Zijlstra Signed-off-by: Steven Rostedt (Google) --- include/linux/unwind_deferred.h | 4 +- include/linux/unwind_deferred_types.h | 7 ++- kernel/unwind/deferred.c | 74 ++++++++++++++++++++++----- 3 files changed, 69 insertions(+), 16 deletions(-) diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index c6548e8d64d1..73f6cac53530 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -28,8 +28,8 @@ void unwind_deferred_cancel(struct unwind_work *work); =20 static __always_inline void unwind_reset_info(void) { - if (unlikely(current->unwind_info.timestamp)) - current->unwind_info.timestamp =3D 0; + if (unlikely(local64_read(¤t->unwind_info.timestamp))) + local64_set(¤t->unwind_info.timestamp, 0); /* * As unwind_user_faultable() can be called directly and * depends on nr_entries being cleared on exit to user, diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index 5df264cf81ad..0d722e877473 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -2,6 +2,9 @@ #ifndef _LINUX_UNWIND_USER_DEFERRED_TYPES_H #define _LINUX_UNWIND_USER_DEFERRED_TYPES_H =20 +#include +#include + struct unwind_cache { unsigned int nr_entries; unsigned long entries[]; @@ -10,8 +13,8 @@ struct unwind_cache { struct unwind_task_info { struct unwind_cache *cache; struct callback_head work; - u64 timestamp; - int pending; + local64_t timestamp; + local_t pending; }; =20 #endif /* _LINUX_UNWIND_USER_DEFERRED_TYPES_H */ diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index d5f2c004a5b0..dd36e58c8cad 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -12,6 +12,35 @@ #include #include =20 +/* + * For requesting a deferred user space stack trace from NMI context + * the architecture must support a 64bit safe cmpxchg in NMI context. + * For those architectures that do not have that, then it cannot ask + * for a deferred user space stack trace from an NMI context. If it + * does, then it will get -EINVAL. + */ +#if defined(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) && \ + !defined(CONFIG_GENERIC_ATOMIC64) +# define CAN_USE_IN_NMI 1 +static inline u64 assign_timestamp(struct unwind_task_info *info, + u64 timestamp) +{ + u64 old =3D 0; + if (!local64_try_cmpxchg(&info->timestamp, &old, timestamp)) + timestamp =3D old; + return timestamp; +} +#else +# define CAN_USE_IN_NMI 0 +static inline u64 assign_timestamp(struct unwind_task_info *info, + u64 timestamp) +{ + /* For archs that do not allow NMI here */ + local64_set(&info->timestamp, timestamp); + return timestamp; +} +#endif + /* Make the cache fit in a 4K page */ #define UNWIND_MAX_ENTRIES \ ((SZ_4K - sizeof(struct unwind_cache)) / sizeof(long)) @@ -31,12 +60,21 @@ static LIST_HEAD(callbacks); */ static u64 get_timestamp(struct unwind_task_info *info) { + u64 timestamp; + lockdep_assert_irqs_disabled(); =20 - if (!info->timestamp) - info->timestamp =3D local_clock(); + /* + * Note, the timestamp is generated on the first request. + * If it exists here, then the timestamp is earlier than + * this request and it means that this request will be + * valid for the stracktrace. + */ + timestamp =3D local64_read(&info->timestamp); + if (timestamp) + return timestamp; =20 - return info->timestamp; + return assign_timestamp(info, local_clock()); } =20 /** @@ -96,11 +134,11 @@ static void unwind_deferred_task_work(struct callback_= head *head) struct unwind_work *work; u64 timestamp; =20 - if (WARN_ON_ONCE(!info->pending)) + if (WARN_ON_ONCE(!local_read(&info->pending))) return; =20 /* Allow work to come in again */ - WRITE_ONCE(info->pending, 0); + local_set(&info->pending, 0); =20 /* * From here on out, the callback must always be called, even if it's @@ -111,7 +149,7 @@ static void unwind_deferred_task_work(struct callback_h= ead *head) =20 unwind_user_faultable(&trace); =20 - timestamp =3D info->timestamp; + timestamp =3D local64_read(&info->timestamp); =20 guard(mutex)(&callback_mutex); list_for_each_entry(work, &callbacks, list) { @@ -150,31 +188,43 @@ static void unwind_deferred_task_work(struct callback= _head *head) int unwind_deferred_request(struct unwind_work *work, u64 *timestamp) { struct unwind_task_info *info =3D ¤t->unwind_info; + long pending; int ret; =20 *timestamp =3D 0; =20 - if (WARN_ON_ONCE(in_nmi())) - return -EINVAL; - if ((current->flags & (PF_KTHREAD | PF_EXITING)) || !user_mode(task_pt_regs(current))) return -EINVAL; =20 + /* NMI requires having safe 64 bit cmpxchg operations */ + if (!CAN_USE_IN_NMI && in_nmi()) + return -EINVAL; + guard(irqsave)(); =20 *timestamp =3D get_timestamp(info); =20 /* callback already pending? */ - if (info->pending) + pending =3D local_read(&info->pending); + if (pending) return 1; =20 + if (CAN_USE_IN_NMI) { + /* Claim the work unless an NMI just now swooped in to do so. */ + if (!local_try_cmpxchg(&info->pending, &pending, 1)) + return 1; + } else { + local_set(&info->pending, 1); + } + /* The work has been claimed, now schedule it. */ ret =3D task_work_add(current, &info->work, TWA_RESUME); - if (WARN_ON_ONCE(ret)) + if (WARN_ON_ONCE(ret)) { + local_set(&info->pending, 0); return ret; + } =20 - info->pending =3D 1; return 0; } =20 --=20 2.47.2 From nobody Wed Oct 8 07:23:47 2025 Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7AD681DE4E0; Tue, 1 Jul 2025 00:54:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331268; cv=none; b=ibw5O+mYJn9IdHxkebkawAPnWydzvUjL5TTdwag0lBKmd4AnOxOOg1UebFOEz6S2e0vYhCtyc3Ds6MTzxvcDSE1DaW1erAxMGxYt2gfsRPjZBfPERhzEwx1YJmSQqsC69Tj2D9jTYz3RLPZ4V8KKGJSqdwHzTOUPe9K8Qgcd40E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331268; c=relaxed/simple; bh=rXC1Fg93hxW6EJd18gEoOC8xh7/TchJBv5qxOiphQq8=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=tfBSR9FD5ydTPjSS2l+l1lpVrF3wbWnEg612B2njZkebMojB+MgToIr3c8Hdj+5acfeCSJlMUq+NgMI3v3m5N6pWpkMwSguQieqstQ4+TS3UY3CzBWiKP0F2VXc7urJc2xfOqsMx3lMwmqNpsQ+SHLbHkThKL4OPHLnxfA/lFkI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7FF3F573BF; Tue, 1 Jul 2025 00:54:18 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf01.hostedemail.com (Postfix) with ESMTPA id 3F6D960010; Tue, 1 Jul 2025 00:54:15 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uWPGu-00000007NiB-0E4s; Mon, 30 Jun 2025 20:54:52 -0400 Message-ID: <20250701005451.904934515@goodmis.org> User-Agent: quilt/0.68 Date: Mon, 30 Jun 2025 20:53:29 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton , Jens Axboe , Florian Weimer Subject: [PATCH v12 08/14] unwind deferred: Use bitmask to determine which callbacks to call References: <20250701005321.942306427@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Stat-Signature: 9ifrdh5icewbw7s7trrhzmzm387i9rdm X-Rspamd-Server: rspamout01 X-Rspamd-Queue-Id: 3F6D960010 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX18esSv+0gwwWJavHZlucJYwjGOft7mZe3U= X-HE-Tag: 1751331255-866704 X-HE-Meta: U2FsdGVkX1+EWsySnhoXhrhG5zoXr+f7sRWBw4wD6pDbOgkQG1sSpHflWL6MMP0nv3H2C2Rhce0kO0GrLrgD5k/TIbtOlyxnQCa6BUF600/4zvSK59qqxxA503QagMv9txsNtX/2nlOSvlTZRY63Yq7tqbaLFi0EPTrozSgApUU78izqYj8zpnkubIY7dzMt8722t+ANdD7+z6JbTzgheaFyODC0495uHKTbPSVKYn94xXubYzpHQ63T9RO9xkU26hfvapX90yKxh5+G1u3zzEx+X1RGnlRaCzBFWLK2WMqel97IYnLbG7YEG/ZV5FQ/Jw46tBMdroisiQGu10uEo6MUEUHjPPppORvpQQ6mUO5rjIpdfoEoc3ZTSxGcma+kE0G8yoPAht3e98AuUw9hlA== Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt In order to know which registered callback requested a stacktrace for when the task goes back to user space, add a bitmask to keep track of all registered tracers. The bitmask is the size of long, which means that on a 32 bit machine, it can have at most 32 registered tracers, and on 64 bit, it can have at most 64 registered tracers. This should not be an issue as there should not be more than 10 (unless BPF can abuse this?). When a tracer registers with unwind_deferred_init() it will get a bit number assigned to it. When a tracer requests a stacktrace, it will have its bit set within the task_struct. When the task returns back to user space, it will call the callbacks for all the registered tracers where their bits are set in the task's mask. When a tracer is removed by the unwind_deferred_cancel() all current tasks will clear the associated bit, just in case another tracer gets registered immediately afterward and then gets their callback called unexpectedly. Signed-off-by: Steven Rostedt (Google) --- include/linux/unwind_deferred.h | 1 + include/linux/unwind_deferred_types.h | 1 + kernel/unwind/deferred.c | 36 ++++++++++++++++++++++++--- 3 files changed, 34 insertions(+), 4 deletions(-) diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index 73f6cac53530..538b4b7968dc 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -13,6 +13,7 @@ typedef void (*unwind_callback_t)(struct unwind_work *wor= k, struct unwind_stackt struct unwind_work { struct list_head list; unwind_callback_t func; + int bit; }; =20 #ifdef CONFIG_UNWIND_USER diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index 0d722e877473..5863bf4eb436 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -13,6 +13,7 @@ struct unwind_cache { struct unwind_task_info { struct unwind_cache *cache; struct callback_head work; + unsigned long unwind_mask; local64_t timestamp; local_t pending; }; diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index dd36e58c8cad..6c558d00ff41 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -48,6 +48,7 @@ static inline u64 assign_timestamp(struct unwind_task_inf= o *info, /* Guards adding to and reading the list of callbacks */ static DEFINE_MUTEX(callback_mutex); static LIST_HEAD(callbacks); +static unsigned long unwind_mask; =20 /* * Read the task context timestamp, if this is the first caller then @@ -153,7 +154,10 @@ static void unwind_deferred_task_work(struct callback_= head *head) =20 guard(mutex)(&callback_mutex); list_for_each_entry(work, &callbacks, list) { - work->func(work, &trace, timestamp); + if (test_bit(work->bit, &info->unwind_mask)) { + work->func(work, &trace, timestamp); + clear_bit(work->bit, &info->unwind_mask); + } } } =20 @@ -205,15 +209,19 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *timestamp) =20 *timestamp =3D get_timestamp(info); =20 + /* This is already queued */ + if (test_bit(work->bit, &info->unwind_mask)) + return 1; + /* callback already pending? */ pending =3D local_read(&info->pending); if (pending) - return 1; + goto out; =20 if (CAN_USE_IN_NMI) { /* Claim the work unless an NMI just now swooped in to do so. */ if (!local_try_cmpxchg(&info->pending, &pending, 1)) - return 1; + goto out; } else { local_set(&info->pending, 1); } @@ -225,16 +233,27 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *timestamp) return ret; } =20 - return 0; + out: + return test_and_set_bit(work->bit, &info->unwind_mask); } =20 void unwind_deferred_cancel(struct unwind_work *work) { + struct task_struct *g, *t; + if (!work) return; =20 guard(mutex)(&callback_mutex); list_del(&work->list); + + __clear_bit(work->bit, &unwind_mask); + + guard(rcu)(); + /* Clear this bit from all threads */ + for_each_process_thread(g, t) { + clear_bit(work->bit, &t->unwind_info.unwind_mask); + } } =20 int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func) @@ -242,6 +261,14 @@ int unwind_deferred_init(struct unwind_work *work, unw= ind_callback_t func) memset(work, 0, sizeof(*work)); =20 guard(mutex)(&callback_mutex); + + /* See if there's a bit in the mask available */ + if (unwind_mask =3D=3D ~0UL) + return -EBUSY; + + work->bit =3D ffz(unwind_mask); + __set_bit(work->bit, &unwind_mask); + list_add(&work->list, &callbacks); work->func =3D func; return 0; @@ -253,6 +280,7 @@ void unwind_task_init(struct task_struct *task) =20 memset(info, 0, sizeof(*info)); init_task_work(&info->work, unwind_deferred_task_work); + info->unwind_mask =3D 0; } =20 void unwind_task_free(struct task_struct *task) --=20 2.47.2 From nobody Wed Oct 8 07:23:47 2025 Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B18A1DE4FC; Tue, 1 Jul 2025 00:54:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331268; cv=none; b=IkeCXDEux2IVpwPxHYbgQooUOMKoRnzc0r8ucHM0yV7ShHFfv8kQyWax6sHvu4bKhZgE4ekR7YM6WmGJswTX7pjqO8bQ9i3Q07oKIMp7s4NWP12Fyd7H6GKUeYIhT1IJZ+MWSV3VYcbMvuuHuCf9ffLZrRUogLr9xC61Sss79gw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331268; c=relaxed/simple; bh=zFlV6PkJW7QtFFFgn9LhTnGdVPMu4vYQ8ztUb+m2Ubc=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=VlWVKn/ua6uVlPx8ywsDmrl1Hfc/rEtFR6EZkSfGZkF/QloDuzHyU61nGgU3y86leQOohfvRCnr/8iqJ3VqOCfC9RcYR/O4wWLGTXttG5YMgMb5kAbLHs4JtOlTYOjsPvreN86u/UM7EOwCPDK8FItDHs/CyZoQlM6WfVRG//jg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AD09D122C9C; Tue, 1 Jul 2025 00:54:18 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf10.hostedemail.com (Postfix) with ESMTPA id 4E2543D; Tue, 1 Jul 2025 00:54:15 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uWPGu-00000007Nig-0w2H; Mon, 30 Jun 2025 20:54:52 -0400 Message-ID: <20250701005452.075382262@goodmis.org> User-Agent: quilt/0.68 Date: Mon, 30 Jun 2025 20:53:30 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton , Jens Axboe , Florian Weimer Subject: [PATCH v12 09/14] unwind deferred: Use SRCU unwind_deferred_task_work() References: <20250701005321.942306427@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Stat-Signature: wwho8yoafpus59qp7dg4f3wg1h4iknsm X-Rspamd-Server: rspamout05 X-Rspamd-Queue-Id: 4E2543D X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX19Aa1oLHTzRf4GBtDkSVWEHGm/YjUmMFMs= X-HE-Tag: 1751331255-91572 X-HE-Meta: U2FsdGVkX19sgRL/WSoCwR8lln+Bf5cmaeBSAN8Xk7y5jPTP4z22qLzMivV+soSoTw1cvSKifMWGn1DbTq3TA+TGi5WiYWegirzQj1h29L+jwqT6TRB2P67LDQp7B6E/UUgofXPs/cIIoQSo4XpzEl7pB+cq+6AfldpPpiRvViv3/+ASwN5IDV/AmrGBfzS4Fbhyx4ZJRYkJIQvv6lzJYUuca4Qp3SXftxNF77LO+CwTbc1exaN9grZuZuMWlIw2abqmQy7aCUB1Caar1NoiEQuxQOtOcOK9ikStjd/4/LVjlkp5RrvnDA8RVDSmLNUKO7c9HJ8GEXreTOTb9sHDu6r9er7Fri8Ll59WOdLXuyzlD1hpNZe/p9xtfEgnlQbXvRhRpRzhDh4d0A669JwNp8JYzki8n5V43HezS527OToHh87eM14Oo3fdriL2gnW80N5nQRbTh0+iRxIKUzNhAGsz6qFJDPKMcsglsMM9vFc= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Instead of using the callback_mutex to protect the link list of callbacks in unwind_deferred_task_work(), use SRCU instead. This gets called every time a task exits that has to record a stack trace that was requested. This can happen for many tasks on several CPUs at the same time. A mutex is a bottleneck and can cause a bit of contention and slow down performance. As the callbacks themselves are allowed to sleep, regular RCU cannot be used to protect the list. Instead use SRCU, as that still allows the callbacks to sleep and the list can be read without needing to hold the callback_mutex. Link: https://lore.kernel.org/all/ca9bd83a-6c80-4ee0-a83c-224b9d60b755@effi= cios.com/ Suggested-by: Mathieu Desnoyers Signed-off-by: Steven Rostedt (Google) --- kernel/unwind/deferred.c | 35 ++++++++++++++++++++++++++--------- 1 file changed, 26 insertions(+), 9 deletions(-) diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index 6c558d00ff41..7309c9e0e57a 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -45,10 +45,11 @@ static inline u64 assign_timestamp(struct unwind_task_i= nfo *info, #define UNWIND_MAX_ENTRIES \ ((SZ_4K - sizeof(struct unwind_cache)) / sizeof(long)) =20 -/* Guards adding to and reading the list of callbacks */ +/* Guards adding to or removing from the list of callbacks */ static DEFINE_MUTEX(callback_mutex); static LIST_HEAD(callbacks); static unsigned long unwind_mask; +DEFINE_STATIC_SRCU(unwind_srcu); =20 /* * Read the task context timestamp, if this is the first caller then @@ -134,6 +135,7 @@ static void unwind_deferred_task_work(struct callback_h= ead *head) struct unwind_stacktrace trace; struct unwind_work *work; u64 timestamp; + int idx; =20 if (WARN_ON_ONCE(!local_read(&info->pending))) return; @@ -152,13 +154,15 @@ static void unwind_deferred_task_work(struct callback= _head *head) =20 timestamp =3D local64_read(&info->timestamp); =20 - guard(mutex)(&callback_mutex); - list_for_each_entry(work, &callbacks, list) { + idx =3D srcu_read_lock(&unwind_srcu); + list_for_each_entry_srcu(work, &callbacks, list, + srcu_read_lock_held(&unwind_srcu)) { if (test_bit(work->bit, &info->unwind_mask)) { work->func(work, &trace, timestamp); clear_bit(work->bit, &info->unwind_mask); } } + srcu_read_unlock(&unwind_srcu, idx); } =20 /** @@ -193,6 +197,7 @@ int unwind_deferred_request(struct unwind_work *work, u= 64 *timestamp) { struct unwind_task_info *info =3D ¤t->unwind_info; long pending; + int bit; int ret; =20 *timestamp =3D 0; @@ -205,12 +210,17 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *timestamp) if (!CAN_USE_IN_NMI && in_nmi()) return -EINVAL; =20 + /* Do not allow cancelled works to request again */ + bit =3D READ_ONCE(work->bit); + if (WARN_ON_ONCE(bit < 0)) + return -EINVAL; + guard(irqsave)(); =20 *timestamp =3D get_timestamp(info); =20 /* This is already queued */ - if (test_bit(work->bit, &info->unwind_mask)) + if (test_bit(bit, &info->unwind_mask)) return 1; =20 /* callback already pending? */ @@ -234,25 +244,32 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *timestamp) } =20 out: - return test_and_set_bit(work->bit, &info->unwind_mask); + return test_and_set_bit(bit, &info->unwind_mask); } =20 void unwind_deferred_cancel(struct unwind_work *work) { struct task_struct *g, *t; + int bit; =20 if (!work) return; =20 guard(mutex)(&callback_mutex); - list_del(&work->list); + list_del_rcu(&work->list); + bit =3D work->bit; + + /* Do not allow any more requests and prevent callbacks */ + work->bit =3D -1; + + __clear_bit(bit, &unwind_mask); =20 - __clear_bit(work->bit, &unwind_mask); + synchronize_srcu(&unwind_srcu); =20 guard(rcu)(); /* Clear this bit from all threads */ for_each_process_thread(g, t) { - clear_bit(work->bit, &t->unwind_info.unwind_mask); + clear_bit(bit, &t->unwind_info.unwind_mask); } } =20 @@ -269,7 +286,7 @@ int unwind_deferred_init(struct unwind_work *work, unwi= nd_callback_t func) work->bit =3D ffz(unwind_mask); __set_bit(work->bit, &unwind_mask); =20 - list_add(&work->list, &callbacks); + list_add_rcu(&work->list, &callbacks); work->func =3D func; return 0; } --=20 2.47.2 From nobody Wed Oct 8 07:23:47 2025 Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1BE9382899; Tue, 1 Jul 2025 00:54:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331264; cv=none; b=m4oUAknd/cKSch7l+xUt8Wh95raODJHD4lSzNwGGvcnIZdFwXgbODnz/XLeLnN/IIfJKw6a63+yzzGh6GGCDJF4StjkP2+jumRzZME1ZSN27BcjtXIL648V1nISEtWinH+MQt8jwL6dU98CSIPQ/rMgyPg0Nzq8BjinILLYBSgo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331264; c=relaxed/simple; bh=8DwPUDxFGeovN8J1sg9//CEOBf440LfymehMpAW0Y4Y=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=ZsSUxWjZygegmpbaWDYbP3mHAoiF+zJEcTP53cqN6DyVPxpk9itBijd/xMfyOXfbFtx8ULreWB1L3+Eq+90aU+tG7DuUTAUq31dSY35q00gQnWcDVYurj/Yqz+G/triYHDpeTm8OtxQaUcGJ7Ud9efwTHSodMH3kmsysUvhUpRY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1D98E80204; Tue, 1 Jul 2025 00:54:18 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf15.hostedemail.com (Postfix) with ESMTPA id 9048818; Tue, 1 Jul 2025 00:54:15 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uWPGu-00000007NjA-1dLi; Mon, 30 Jun 2025 20:54:52 -0400 Message-ID: <20250701005452.242933931@goodmis.org> User-Agent: quilt/0.68 Date: Mon, 30 Jun 2025 20:53:31 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton , Jens Axboe , Florian Weimer Subject: [PATCH v12 10/14] unwind: Clear unwind_mask on exit back to user space References: <20250701005321.942306427@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 9048818 X-Stat-Signature: xrfpg6zdcem1eqoio3yf8r5u36eexcu3 X-Rspamd-Server: rspamout06 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX186K+NrTq+59RsrITaxZ8jpzL89X/lJ4m8= X-HE-Tag: 1751331255-189999 X-HE-Meta: U2FsdGVkX1/bRHk4tKjHKexecRm3lzIqZW6cmW6JzvhZ1WqRLfjfE8n3sXFfeLTMUfeTUIx22pPtqc0w/9Z/ndGnGK/kUaCuY9xm5pozbWOgV8JZmz3bOpBYyUwd6CTlGXEp74QuDKdcDGz4v2Nc7V1qwsfC3Lh6aJsTTiCCjOpVJ2dArZJY8eORohsB9Yw6voekXY7idrpUTPKnjRbLKSS7tlq5RbU+NtVUJvIb7joLOI4s1WQyUflxvWGUk68W1+7kxScRuOftnMrZm64VZy+woA80QcwN15bh776Ld510GGT7Pq6q+SydubnNskXuUvw3kBg2x+tR8zo0HiPJM+/MwiRo8Egbi3MQFjpbMB+bNJ16BC01YHGgoLDp2tgDYgLhDHUSM7ZFZk7yMAqMR9UPEhM3aBzp9Nc9SP64cf4= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt When testing the deferred unwinder by attaching deferred user space stacktraces to events, a live lock happened. This was when the deferred unwinding was added to the irqs_disabled event, which happens after the task_work callbacks are called and before the task goes back to user space. The event callback would be registered when irqs were disabled, the task_work would trigger, call the callback for this work and clear the work's bit. Then before getting back to user space, irqs would be disabled again, the event triggered again, and a new task_work registered. This caused an infinite loop and the system hung. To prevent this, clear the bits at the very last moment before going back to user space and when instrumentation is disabled. That is in unwind_exit_to_user_mode(). Move the pending bit from a value on the task_struct to the most significant bit of the unwind_mask (saves space on the task_struct). This will allow modifying the pending bit along with the work bits atomically. Instead of clearing a work's bit after its callback is called, it is delayed until exit. If the work is requested again, the task_work is not queued again and the work will be notified that the task has already been called (via UNWIND_ALREADY_EXECUTED return value). The pending bit is cleared before calling the callback functions but the current work bits remain. If one of the called works registers again, it will not trigger a task_work if its bit is still present in the task's unwind_mask. If a new work registers, then it will set both the pending bit and its own bit but clear the other work bits so that their callbacks do not get called again. Signed-off-by: Steven Rostedt (Google) --- Changes since v11: https://lore.kernel.org/20250625225716.505389511@goodmis= .org - Still clear info->cache->nr_entries in unwind_reset_info() If unwind_user_faultable() is called directly (like perf will do) it still needs to clear the nr_entries as calling the function directly does not set the bit. Later code will change this so it has only one conditional to check when not enabled. include/linux/unwind_deferred.h | 25 +++++++-- include/linux/unwind_deferred_types.h | 1 - kernel/unwind/deferred.c | 76 ++++++++++++++++++--------- 3 files changed, 74 insertions(+), 28 deletions(-) diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index 538b4b7968dc..d25a72fb21ef 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -18,6 +18,14 @@ struct unwind_work { =20 #ifdef CONFIG_UNWIND_USER =20 +#define UNWIND_PENDING_BIT (BITS_PER_LONG - 1) +#define UNWIND_PENDING BIT(UNWIND_PENDING_BIT) + +enum { + UNWIND_ALREADY_PENDING =3D 1, + UNWIND_ALREADY_EXECUTED =3D 2, +}; + void unwind_task_init(struct task_struct *task); void unwind_task_free(struct task_struct *task); =20 @@ -29,15 +37,26 @@ void unwind_deferred_cancel(struct unwind_work *work); =20 static __always_inline void unwind_reset_info(void) { - if (unlikely(local64_read(¤t->unwind_info.timestamp))) + struct unwind_task_info *info =3D ¤t->unwind_info; + unsigned long bits; + + /* Was there any unwinding? */ + if (unlikely(info->unwind_mask)) { + bits =3D info->unwind_mask; + do { + /* Is a task_work going to run again before going back */ + if (bits & UNWIND_PENDING) + return; + } while (!try_cmpxchg(&info->unwind_mask, &bits, 0UL)); local64_set(¤t->unwind_info.timestamp, 0); + } /* * As unwind_user_faultable() can be called directly and * depends on nr_entries being cleared on exit to user, * this needs to be a separate conditional. */ - if (unlikely(current->unwind_info.cache)) - current->unwind_info.cache->nr_entries =3D 0; + if (unlikely(info->cache)) + info->cache->nr_entries =3D 0; } =20 #else /* !CONFIG_UNWIND_USER */ diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index 5863bf4eb436..4308367f1887 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -15,7 +15,6 @@ struct unwind_task_info { struct callback_head work; unsigned long unwind_mask; local64_t timestamp; - local_t pending; }; =20 #endif /* _LINUX_UNWIND_USER_DEFERRED_TYPES_H */ diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index 7309c9e0e57a..e7e4442926d3 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -51,6 +51,11 @@ static LIST_HEAD(callbacks); static unsigned long unwind_mask; DEFINE_STATIC_SRCU(unwind_srcu); =20 +static inline bool unwind_pending(struct unwind_task_info *info) +{ + return test_bit(UNWIND_PENDING_BIT, &info->unwind_mask); +} + /* * Read the task context timestamp, if this is the first caller then * it will set the timestamp. @@ -134,14 +139,17 @@ static void unwind_deferred_task_work(struct callback= _head *head) struct unwind_task_info *info =3D container_of(head, struct unwind_task_i= nfo, work); struct unwind_stacktrace trace; struct unwind_work *work; + unsigned long bits; u64 timestamp; int idx; =20 - if (WARN_ON_ONCE(!local_read(&info->pending))) + if (WARN_ON_ONCE(!unwind_pending(info))) return; =20 - /* Allow work to come in again */ - local_set(&info->pending, 0); + /* Clear pending bit but make sure to have the current bits */ + bits =3D READ_ONCE(info->unwind_mask); + while (!try_cmpxchg(&info->unwind_mask, &bits, bits & ~UNWIND_PENDING)) + ; =20 /* * From here on out, the callback must always be called, even if it's @@ -157,10 +165,8 @@ static void unwind_deferred_task_work(struct callback_= head *head) idx =3D srcu_read_lock(&unwind_srcu); list_for_each_entry_srcu(work, &callbacks, list, srcu_read_lock_held(&unwind_srcu)) { - if (test_bit(work->bit, &info->unwind_mask)) { + if (test_bit(work->bit, &bits)) work->func(work, &trace, timestamp); - clear_bit(work->bit, &info->unwind_mask); - } } srcu_read_unlock(&unwind_srcu, idx); } @@ -188,15 +194,17 @@ static void unwind_deferred_task_work(struct callback= _head *head) * it has already been previously called for the same entry context, it wi= ll be * called again with the same stack trace and timestamp. * - * Return: 1 if the the callback was already queued. - * 0 if the callback successfully was queued. + * Return: 0 if the callback successfully was queued. + * UNWIND_ALREADY_PENDING if the the callback was already queued. + * UNWIND_ALREADY_EXECUTED if the callback was already called + * (and will not be called again) * Negative if there's an error. * @timestamp holds the timestamp of the first request by any user */ int unwind_deferred_request(struct unwind_work *work, u64 *timestamp) { struct unwind_task_info *info =3D ¤t->unwind_info; - long pending; + unsigned long old, bits; int bit; int ret; =20 @@ -219,32 +227,52 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *timestamp) =20 *timestamp =3D get_timestamp(info); =20 - /* This is already queued */ - if (test_bit(bit, &info->unwind_mask)) - return 1; + old =3D READ_ONCE(info->unwind_mask); + + /* Is this already queued */ + if (test_bit(bit, &old)) { + /* + * If pending is not set, it means this work's callback + * was already called. + */ + return old & UNWIND_PENDING ? UNWIND_ALREADY_PENDING : + UNWIND_ALREADY_EXECUTED; + } =20 - /* callback already pending? */ - pending =3D local_read(&info->pending); - if (pending) + if (unwind_pending(info)) goto out; =20 + /* + * This is the first to enable another task_work for this task since + * the task entered the kernel, or had already called the callbacks. + * Set only the bit for this work and clear all others as they have + * already had their callbacks called, and do not need to call them + * again because of this work. + */ + bits =3D UNWIND_PENDING | BIT(bit); + + /* + * If the cmpxchg() fails, it means that an NMI came in and set + * the pending bit as well as cleared the other bits. Just + * jump to setting the bit for this work. + */ if (CAN_USE_IN_NMI) { - /* Claim the work unless an NMI just now swooped in to do so. */ - if (!local_try_cmpxchg(&info->pending, &pending, 1)) + if (!try_cmpxchg(&info->unwind_mask, &old, bits)) goto out; } else { - local_set(&info->pending, 1); + info->unwind_mask =3D bits; } =20 /* The work has been claimed, now schedule it. */ ret =3D task_work_add(current, &info->work, TWA_RESUME); - if (WARN_ON_ONCE(ret)) { - local_set(&info->pending, 0); - return ret; - } =20 + if (WARN_ON_ONCE(ret)) + WRITE_ONCE(info->unwind_mask, 0); + + return ret; out: - return test_and_set_bit(bit, &info->unwind_mask); + return test_and_set_bit(bit, &info->unwind_mask) ? + UNWIND_ALREADY_PENDING : 0; } =20 void unwind_deferred_cancel(struct unwind_work *work) @@ -280,7 +308,7 @@ int unwind_deferred_init(struct unwind_work *work, unwi= nd_callback_t func) guard(mutex)(&callback_mutex); =20 /* See if there's a bit in the mask available */ - if (unwind_mask =3D=3D ~0UL) + if (unwind_mask =3D=3D ~(UNWIND_PENDING)) return -EBUSY; =20 work->bit =3D ffz(unwind_mask); --=20 2.47.2 From nobody Wed Oct 8 07:23:47 2025 Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 856EE1DE8A8; Tue, 1 Jul 2025 00:54:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331270; cv=none; b=bIgE1akY9MwX5ag0/dxTgdNOcYTuBSPqkPp48l+dxoYIEOWgtL8flmY0YO9sjj68UJoc8uVHmvHi7j46y7ExiZcZodcw8clyKjAxXXmOhCoyopvy1SMLAOWW+YYhBMMy22lPweFlTm8kH68l0zN+hGvg2A0RhulH7LEq78Zsi0M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331270; c=relaxed/simple; bh=GxsWDBD9I8xgA6mocM89fqPpLAPvsy58hRYfl9t6/Q8=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=bmfCKsk6w1HlNilfSADtzrMCJGcub7GxNP5SZm/37WKoHoGl1O3nykH5ZWTFBjMO2vz2tctp3KSeDscT6LXgxw9/rmjvdT/ngj68oLQGQCZR6lXp8X6McD57EyPGSvOTLX9U3T64D4eUyX+qKUE3A3YseDOYMk9jscE8vCMq44Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2CFD4B75F3; Tue, 1 Jul 2025 00:54:19 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf14.hostedemail.com (Postfix) with ESMTPA id B31392F; Tue, 1 Jul 2025 00:54:15 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uWPGu-00000007Nje-2LQ8; Mon, 30 Jun 2025 20:54:52 -0400 Message-ID: <20250701005452.410928589@goodmis.org> User-Agent: quilt/0.68 Date: Mon, 30 Jun 2025 20:53:32 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton , Jens Axboe , Florian Weimer Subject: [PATCH v12 11/14] unwind: Add USED bit to only have one conditional on way back to user space References: <20250701005321.942306427@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Stat-Signature: k7xnun1bnkyjn736ianodppzekqz1gon X-Rspamd-Server: rspamout07 X-Rspamd-Queue-Id: B31392F X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX19eKUGLvBLqRE45Kp4Db8tasvp/PYHjQVs= X-HE-Tag: 1751331255-3749 X-HE-Meta: U2FsdGVkX1/mbvrZl5+8EhNgL1PIjqDaW1yKBD81gcn6LuB/4orj1A2jMOX7GY0kPjqxUuk4MgWGBVW2rofCqRReFDzs9O9oD9bmy+06vpO3X1+O3VtrfhwOhYMMDbvdiVQma0Vf9u8Qw8u67KkdpD8Yl2bLvTyueCi+kOF3DqUfULyYHKV9zhQmyUvBv2pe4AmsJ9zApONJqsCziQBavNRtt2BeMjUSRR+LxQM25j/qOWhk+FHyXz4O7JcDFUvCagn02TNGIB5vmlEIm0pltXF/ErzY2llY22VSDXNpkI3/66yfoQ36dwL7XCyL+5Ibsk8QKkO4rsaMpAkDhIo2C670KpaecoTEnpc4tg2HAyD5ltdd3qxiY1Z5a549gaHDf0W5cRpIrYmvEYNVscahc+Ig7QA+RA1puTE5huVq/4oCrP7+tDfFDw== Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt On the way back to user space, the function unwind_reset_info() is called unconditionally (but always inlined). It currently has two conditionals. One that checks the unwind_mask which is set whenever a deferred trace is called and is used to know that the mask needs to be cleared. The other checks if the cache has been allocated, and if so, it resets the nr_entries so that the unwinder knows it needs to do the work to get a new user space stack trace again (it only does it once per entering the kernel). Use one of the bits in the unwind mask as a "USED" bit that gets set whenever a trace is created. This will make it possible to only check the unwind_mask in the unwind_reset_info() to know if it needs to do work or not and eliminates a conditional that happens every time the task goes back to user space. Signed-off-by: Steven Rostedt (Google) --- include/linux/unwind_deferred.h | 14 +++++++------- kernel/unwind/deferred.c | 5 ++++- 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index d25a72fb21ef..a1c62097f142 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -21,6 +21,10 @@ struct unwind_work { #define UNWIND_PENDING_BIT (BITS_PER_LONG - 1) #define UNWIND_PENDING BIT(UNWIND_PENDING_BIT) =20 +/* Set if the unwinding was used (directly or deferred) */ +#define UNWIND_USED_BIT (UNWIND_PENDING_BIT - 1) +#define UNWIND_USED BIT(UNWIND_USED_BIT) + enum { UNWIND_ALREADY_PENDING =3D 1, UNWIND_ALREADY_EXECUTED =3D 2, @@ -49,14 +53,10 @@ static __always_inline void unwind_reset_info(void) return; } while (!try_cmpxchg(&info->unwind_mask, &bits, 0UL)); local64_set(¤t->unwind_info.timestamp, 0); + + if (unlikely(info->cache)) + info->cache->nr_entries =3D 0; } - /* - * As unwind_user_faultable() can be called directly and - * depends on nr_entries being cleared on exit to user, - * this needs to be a separate conditional. - */ - if (unlikely(info->cache)) - info->cache->nr_entries =3D 0; } =20 #else /* !CONFIG_UNWIND_USER */ diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index e7e4442926d3..5ab9b9045ae5 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -131,6 +131,9 @@ int unwind_user_faultable(struct unwind_stacktrace *tra= ce) =20 cache->nr_entries =3D trace->nr; =20 + /* Clear nr_entries on way back to user space */ + set_bit(UNWIND_USED_BIT, &info->unwind_mask); + return 0; } =20 @@ -308,7 +311,7 @@ int unwind_deferred_init(struct unwind_work *work, unwi= nd_callback_t func) guard(mutex)(&callback_mutex); =20 /* See if there's a bit in the mask available */ - if (unwind_mask =3D=3D ~(UNWIND_PENDING)) + if (unwind_mask =3D=3D ~(UNWIND_PENDING|UNWIND_USED)) return -EBUSY; =20 work->bit =3D ffz(unwind_mask); --=20 2.47.2 From nobody Wed Oct 8 07:23:47 2025 Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BD4AC156C6A; Tue, 1 Jul 2025 00:54:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331264; cv=none; b=rBDaOoc+OAgXgllVoBKXgThKJG/wP92GRWUNR1IQf38ATNjtBqDRcHDqVN2qNu2uPlsjSskxju4cj8sRhbgELxbyb9fgUl58J5fL4xCRFEgAR0dZTtzIjcH0Taoq7b+s7D0yha6a1+Ty/t5JNd9sBhoNWB6/0eW86zuyHury9b8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331264; c=relaxed/simple; bh=vzxhBtsz4jECiJNxO1EaySWJmuQOxKUvlb4C8TsO7rI=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=r/t2AsYGleVhUs/dMCMW1E2u+mXWNnArRGeMZ/ju5Y3R0Byaj58Jp9pGlK6t1silYIhq9OV40EqtaU8uj3zMdTUBeyYcG7B5yZ6maUeqL7oJLw0JOO4n5uz+O7YskPdDunommAlBXDLcRd2s0/Xryl8EkBRZsRYuqNt29+KRf+Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2A69C1A01FF; Tue, 1 Jul 2025 00:54:19 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf03.hostedemail.com (Postfix) with ESMTPA id C90B76000C; Tue, 1 Jul 2025 00:54:15 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uWPGu-00000007Nk8-34ga; Mon, 30 Jun 2025 20:54:52 -0400 Message-ID: <20250701005452.580618259@goodmis.org> User-Agent: quilt/0.68 Date: Mon, 30 Jun 2025 20:53:33 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton , Jens Axboe , Florian Weimer Subject: [PATCH v12 12/14] unwind: Finish up unwind when a task exits References: <20250701005321.942306427@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Stat-Signature: g7quk9eypuzgxj6o8phhjy5wgs5r7owe X-Rspamd-Server: rspamout08 X-Rspamd-Queue-Id: C90B76000C X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX18dNkJgtlObVhp8/H7i59W/Nc1msCf8frM= X-HE-Tag: 1751331255-439594 X-HE-Meta: U2FsdGVkX19M/O81DunncGh3Yhen+OgjoK0kZzh64szJs3NPP/FhYXqK9jSqc03JhEDC7GeMlPRInqqLsZduqJEUZPkLcg5Eh+1Ds3e+oqs/PfUL0ga328X//W1334feAbhImzmcFkMenC35FCD/57YpZSJae4q5DS+X6aTM/cxMefSoW40GzWXa6IkTiezQj+EH7ifObrtCHXsPIouBRqdjBZsKROVzF5qbU/2XzhJpKidDr0WOMN5Q8Jck0CZzGvkTTIF8oKXJo0PZxKn7C9Xf5iY7+nvaYld5m4DoC7kUPmRWcHixL+iDLmw7lfSxG7WZWYETtOwhghLD15ZnrW4VTUzFhSQiTfJq4WmQLjLslmiLLcfxQwNFnS+jtigAC/Q29Dhx5ELzXH9Y6VyDSWsSg49qEhgB/Y2wsPjUuNmBlrfuKVqS+NB186sRdRnb Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt On do_exit() when a task is exiting, if a unwind is requested and the deferred user stacktrace is deferred via the task_work, the task_work callback is called after exit_mm() is called in do_exit(). This means that the user stack trace will not be retrieved and an empty stack is created. Instead, add a function unwind_deferred_task_exit() and call it just before exit_mm() so that the unwinder can call the requested callbacks with the user space stack. Signed-off-by: Steven Rostedt (Google) --- include/linux/unwind_deferred.h | 3 +++ kernel/exit.c | 2 ++ kernel/unwind/deferred.c | 23 ++++++++++++++++++++--- 3 files changed, 25 insertions(+), 3 deletions(-) diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index a1c62097f142..44654e6149ec 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -39,6 +39,8 @@ int unwind_deferred_init(struct unwind_work *work, unwind= _callback_t func); int unwind_deferred_request(struct unwind_work *work, u64 *timestamp); void unwind_deferred_cancel(struct unwind_work *work); =20 +void unwind_deferred_task_exit(struct task_struct *task); + static __always_inline void unwind_reset_info(void) { struct unwind_task_info *info =3D ¤t->unwind_info; @@ -69,6 +71,7 @@ static inline int unwind_deferred_init(struct unwind_work= *work, unwind_callback static inline int unwind_deferred_request(struct unwind_work *work, u64 *t= imestamp) { return -ENOSYS; } static inline void unwind_deferred_cancel(struct unwind_work *work) {} =20 +static inline void unwind_deferred_task_exit(struct task_struct *task) {} static inline void unwind_reset_info(void) {} =20 #endif /* !CONFIG_UNWIND_USER */ diff --git a/kernel/exit.c b/kernel/exit.c index bb184a67ac73..1d8c8ac33c4f 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -68,6 +68,7 @@ #include #include #include +#include #include #include =20 @@ -938,6 +939,7 @@ void __noreturn do_exit(long code) =20 tsk->exit_code =3D code; taskstats_exit(tsk, group_dead); + unwind_deferred_task_exit(tsk); trace_sched_process_exit(tsk, group_dead); =20 /* diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index 5ab9b9045ae5..9ec1e74c6469 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -104,7 +104,7 @@ int unwind_user_faultable(struct unwind_stacktrace *tra= ce) /* Should always be called from faultable context */ might_fault(); =20 - if (current->flags & PF_EXITING) + if (!current->mm) return -EINVAL; =20 if (!info->cache) { @@ -137,9 +137,9 @@ int unwind_user_faultable(struct unwind_stacktrace *tra= ce) return 0; } =20 -static void unwind_deferred_task_work(struct callback_head *head) +static void process_unwind_deferred(struct task_struct *task) { - struct unwind_task_info *info =3D container_of(head, struct unwind_task_i= nfo, work); + struct unwind_task_info *info =3D &task->unwind_info; struct unwind_stacktrace trace; struct unwind_work *work; unsigned long bits; @@ -174,6 +174,23 @@ static void unwind_deferred_task_work(struct callback_= head *head) srcu_read_unlock(&unwind_srcu, idx); } =20 +static void unwind_deferred_task_work(struct callback_head *head) +{ + process_unwind_deferred(current); +} + +void unwind_deferred_task_exit(struct task_struct *task) +{ + struct unwind_task_info *info =3D ¤t->unwind_info; + + if (!unwind_pending(info)) + return; + + process_unwind_deferred(task); + + task_work_cancel(task, &info->work); +} + /** * unwind_deferred_request - Request a user stacktrace on task exit * @work: Unwind descriptor requesting the trace --=20 2.47.2 From nobody Wed Oct 8 07:23:47 2025 Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E15A18248C; Tue, 1 Jul 2025 00:54:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331263; cv=none; b=RwXwdp3fHiVpI4h6CYXLIcxicp0Fxb5vToFLJ5nEIMfnVUkIUQvB/4/4WWgzWWCkkhxcYqC+846oyIYBiOrKuzQyIKmq4c3OkoHlWl52ebK5h+wqUd4hNriJ8M0ZRek4+U7gPcdHFFKqJi5tx+4t3x4NEhGRU9z0hCHzqm4wEZs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331263; c=relaxed/simple; bh=q9368biQbKFkdqwUJGMNR4U7z4DEcx2lmBxRY3YPNYQ=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=ThFvY9YiIcfVKUHfXB284QfTWyuvUJQ4UoG+FAI70pwXb5ejE0N/I67r1Lita/85C42OjpFS7VF0CwfnUlF/JjuyuielRgWjAhD17ZZaEdiKj/0tWoKs95a/rXNC2f0XhhCP6fiFxG9cnwlA066MOF9kIiL2YFpWhlSFpRtPEb8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 61A0DC020C; Tue, 1 Jul 2025 00:54:19 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf10.hostedemail.com (Postfix) with ESMTPA id EDCB435; Tue, 1 Jul 2025 00:54:15 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uWPGu-00000007Nkc-3o8t; Mon, 30 Jun 2025 20:54:52 -0400 Message-ID: <20250701005452.754512359@goodmis.org> User-Agent: quilt/0.68 Date: Mon, 30 Jun 2025 20:53:34 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton , Jens Axboe , Florian Weimer Subject: [PATCH v12 13/14] unwind_user/x86: Enable frame pointer unwinding on x86 References: <20250701005321.942306427@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Stat-Signature: e4kj9kydhumeoou84895ssrqromn74qb X-Rspamd-Server: rspamout08 X-Rspamd-Queue-Id: EDCB435 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX18aFETgkKjQnCAuepSv4j+KA0JmuNmOu8A= X-HE-Tag: 1751331255-519679 X-HE-Meta: U2FsdGVkX18cGjudfIGcv7XWJ4dSMPaYc2Tjc0fCQMSwV8s8Pn2czI+QQOAZIq4WhF/kT66ktE0fEJK+ANqhHJ4fzBtgFAP+ZZ/7jK8dXl75CGBWGyxIxjVga0vo0TjhnGY4vbpNfIOa+7Wa7QLQzr/WUVw/MI8z/YQEBcnerGLQvy+U2ZTkM5d7Ln7hRlvl0Do9USjwYd/dUNhGr9wFQT4GkNno5fAeCPnwxzy+C97w8o2rPwhq+l6lK9N7UZvyvgLGjjxRgHVAa4f7eUQdA+Ey9GkFsAE1AjGFw12Pvk5rb2WBwSqo7oEuXWSTUmQ+ukCd3/9+aEcz14A9Zz6WZ6Q2/N1GCLFX48p0vtnkviTsNenPdqvMc0zKgM3EQl7ENcLZ7+HWzC5TJ9SFXZly7Ub6ZSWGUVIrHRVDrm5ny/o= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Use ARCH_INIT_USER_FP_FRAME to describe how frame pointers are unwound on x86, and enable CONFIG_HAVE_UNWIND_USER_FP accordingly so the unwind_user interfaces can be used. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- arch/x86/Kconfig | 1 + arch/x86/include/asm/unwind_user.h | 11 +++++++++++ 2 files changed, 12 insertions(+) create mode 100644 arch/x86/include/asm/unwind_user.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 71019b3b54ea..5862433c81e1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -302,6 +302,7 @@ config X86 select HAVE_SYSCALL_TRACEPOINTS select HAVE_UACCESS_VALIDATION if HAVE_OBJTOOL select HAVE_UNSTABLE_SCHED_CLOCK + select HAVE_UNWIND_USER_FP if X86_64 select HAVE_USER_RETURN_NOTIFIER select HAVE_GENERIC_VDSO select VDSO_GETRANDOM if X86_64 diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwi= nd_user.h new file mode 100644 index 000000000000..8597857bf896 --- /dev/null +++ b/arch/x86/include/asm/unwind_user.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_UNWIND_USER_H +#define _ASM_X86_UNWIND_USER_H + +#define ARCH_INIT_USER_FP_FRAME \ + .cfa_off =3D (s32)sizeof(long) * 2, \ + .ra_off =3D (s32)sizeof(long) * -1, \ + .fp_off =3D (s32)sizeof(long) * -2, \ + .use_fp =3D true, + +#endif /* _ASM_X86_UNWIND_USER_H */ --=20 2.47.2 From nobody Wed Oct 8 07:23:47 2025 Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2CA91F4285; Tue, 1 Jul 2025 00:54:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.40.44.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331270; cv=none; b=AKeB+mLcxirAu75Cx7jvoCrrQ3xziyjOc+O3TSRFGE+oJEYMMAXbqT6nch6xptQ599+xHPb53gjXED3PGqEqaEB7evjRhN0jgc1SrsJHKNHR0Spqqq4CLnMtcmcE2Gj1BlotZdso7UEOI7Bii9V++lWGluyk0ZqsUrnRrczYFKw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751331270; c=relaxed/simple; bh=op+FuZOzazd2dtQp7uMh4eI4DQXFaxXx36SQU+aaXy0=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=SnrBccTc3X3X/EgBCD+oeQ/BdJZWNdgidsRNz2efAMf9hmHD/Kc9zfLpoLnuBnC5iwnYSswVhBEC1qREDzIcIGcuiEBITp997cdvj9RNTuW+WAN5ir2WuwwMwKjGhbllJHol/r/YIT/U9ImADG52zb3C8QhTWpjSmDJn2X+HNg8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org; spf=pass smtp.mailfrom=goodmis.org; arc=none smtp.client-ip=216.40.44.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=goodmis.org Received: from omf17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A3E881A0201; Tue, 1 Jul 2025 00:54:19 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: nevets@goodmis.org) by omf17.hostedemail.com (Postfix) with ESMTPA id 4397017; Tue, 1 Jul 2025 00:54:16 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uWPGv-00000007Nl6-0Kb1; Mon, 30 Jun 2025 20:54:53 -0400 Message-ID: <20250701005452.929734016@goodmis.org> User-Agent: quilt/0.68 Date: Mon, 30 Jun 2025 20:53:35 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andrii Nakryiko , Indu Bhagat , "Jose E. Marchesi" , Beau Belgrave , Jens Remus , Linus Torvalds , Andrew Morton , Jens Axboe , Florian Weimer Subject: [PATCH v12 14/14] unwind_user/x86: Enable compat mode frame pointer unwinding on x86 References: <20250701005321.942306427@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Rspamd-Queue-Id: 4397017 X-Stat-Signature: b9jgaho3hjmz3kfbnskcmpzacmze3c33 X-Rspamd-Server: rspamout06 X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Session-ID: U2FsdGVkX18AXorUSuczDz+HuuRMhJGSPJTxiyn5tq8= X-HE-Tag: 1751331256-670172 X-HE-Meta: U2FsdGVkX1887Dux80qxXj7AM9ibFWbSB2Heg7j56D8F6xHlHUduXfoD0xR0W+1sgaEd60fN9TqzYkb+fJeT6YGFw++fUs2vDNHCVe4Uqwsf7i+TZBQty9rx4lcz++RN40H/AlzxM/lBhGWdQIxrRVrsSyTbCatVMRkkvPXvjwIdlvf6L8SVZe/kq9zjefrknJk0pugzujRl1Vz4j+MNuZVlxg0d1x5JatC0xaRomvr8JxZQCwuNfPScdOo8yXsp4cXk2bTZOe8oMe6QRkCSmx5hj0Dqv5ZdfJ5/ATM6HZ9RR2SIBwUnpkIf0JVaYbd19XA9+220H1EnB8GU6BqOhYhIHI+pGYtHTEtlPkCN6U0xm7EdQiroycjWm0oPz2ho8DElRN5A2tOsV/mTtVEGGapUxsUMcj74qTnWLSAeXp4= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Use ARCH_INIT_USER_COMPAT_FP_FRAME to describe how frame pointers are unwound on x86, and implement the hooks needed to add the segment base addresses. Enable HAVE_UNWIND_USER_COMPAT_FP if the system has compat mode compiled in. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- Changes since v11: https://lore.kernel.org/20250625225717.187191105@goodmis= .org - Fix header macro protection name to include X86 (Ingo Molnar) - Use insn_get_seg_base() to get segment registers instead of using the function perf uses and making it global. Also as that function doesn't look to have a requirement to disable interrupts, the scoped_guard(irqsav= e) is removed. - Check return code of insn_get_seg_base() for the unlikely event that it returns invalid (-1). - Moved arch_unwind_user_init() into stacktrace.c as to use insn_get_seg_base(), it must include insn-eval.h that defines pt_regs_offset(), but that is also used in the perf generic code as an array and if it is included in the header file, it causes a build conflict. - Update the comments that explain arch_unwind_user_init/next that a macro needs to be defined with those names if they are going to be used. arch/x86/Kconfig | 1 + arch/x86/include/asm/unwind_user.h | 31 ++++++++++++++++++++++++ arch/x86/include/asm/unwind_user_types.h | 17 +++++++++++++ arch/x86/kernel/stacktrace.c | 28 +++++++++++++++++++++ include/linux/unwind_user.h | 20 +++++++++++++++ kernel/unwind/user.c | 4 +++ 6 files changed, 101 insertions(+) create mode 100644 arch/x86/include/asm/unwind_user_types.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5862433c81e1..17d4094c821b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -302,6 +302,7 @@ config X86 select HAVE_SYSCALL_TRACEPOINTS select HAVE_UACCESS_VALIDATION if HAVE_OBJTOOL select HAVE_UNSTABLE_SCHED_CLOCK + select HAVE_UNWIND_USER_COMPAT_FP if IA32_EMULATION select HAVE_UNWIND_USER_FP if X86_64 select HAVE_USER_RETURN_NOTIFIER select HAVE_GENERIC_VDSO diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwi= nd_user.h index 8597857bf896..19634a73612d 100644 --- a/arch/x86/include/asm/unwind_user.h +++ b/arch/x86/include/asm/unwind_user.h @@ -2,10 +2,41 @@ #ifndef _ASM_X86_UNWIND_USER_H #define _ASM_X86_UNWIND_USER_H =20 +#include + #define ARCH_INIT_USER_FP_FRAME \ .cfa_off =3D (s32)sizeof(long) * 2, \ .ra_off =3D (s32)sizeof(long) * -1, \ .fp_off =3D (s32)sizeof(long) * -2, \ .use_fp =3D true, =20 +#ifdef CONFIG_IA32_EMULATION + +#define ARCH_INIT_USER_COMPAT_FP_FRAME \ + .cfa_off =3D (s32)sizeof(u32) * 2, \ + .ra_off =3D (s32)sizeof(u32) * -1, \ + .fp_off =3D (s32)sizeof(u32) * -2, \ + .use_fp =3D true, + +#define in_compat_mode(regs) !user_64bit_mode(regs) + +void arch_unwind_user_init(struct unwind_user_state *state, + struct pt_regs *regs); + +static inline void arch_unwind_user_next(struct unwind_user_state *state) +{ + if (state->type !=3D UNWIND_USER_TYPE_COMPAT_FP) + return; + + state->ip +=3D state->arch.cs_base; + state->fp +=3D state->arch.ss_base; +} + +#define arch_unwind_user_init arch_unwind_user_init +#define arch_unwind_user_next arch_unwind_user_next + +#endif /* CONFIG_IA32_EMULATION */ + +#include + #endif /* _ASM_X86_UNWIND_USER_H */ diff --git a/arch/x86/include/asm/unwind_user_types.h b/arch/x86/include/as= m/unwind_user_types.h new file mode 100644 index 000000000000..f93d535f900e --- /dev/null +++ b/arch/x86/include/asm/unwind_user_types.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_UNWIND_USER_TYPES_H +#define _ASM_X86_UNWIND_USER_TYPES_H + +#ifdef CONFIG_IA32_EMULATION + +struct arch_unwind_user_state { + unsigned long ss_base; + unsigned long cs_base; +}; +#define arch_unwind_user_state arch_unwind_user_state + +#endif /* CONFIG_IA32_EMULATION */ + +#include + +#endif /* _ASM_UNWIND_USER_TYPES_H */ diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c index ee117fcf46ed..8ef9d8c71df9 100644 --- a/arch/x86/kernel/stacktrace.c +++ b/arch/x86/kernel/stacktrace.c @@ -9,7 +9,10 @@ #include #include #include +#include #include +#include +#include #include =20 void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, @@ -128,3 +131,28 @@ void arch_stack_walk_user(stack_trace_consume_fn consu= me_entry, void *cookie, } } =20 +#ifdef CONFIG_IA32_EMULATION +void arch_unwind_user_init(struct unwind_user_state *state, + struct pt_regs *regs) +{ + unsigned long cs_base, ss_base; + + if (state->type !=3D UNWIND_USER_TYPE_COMPAT_FP) + return; + + cs_base =3D insn_get_seg_base(regs, INAT_SEG_REG_CS); + ss_base =3D insn_get_seg_base(regs, INAT_SEG_REG_SS); + + if (cs_base =3D=3D -1) + cs_base =3D 0; + if (ss_base =3D=3D -1) + ss_base =3D 0; + + state->arch.cs_base =3D cs_base; + state->arch.ss_base =3D ss_base; + + state->ip +=3D cs_base; + state->sp +=3D ss_base; + state->fp +=3D ss_base; +} +#endif /* CONFIG_IA32_EMULATION */ diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h index ac007363820a..b57b68215c6f 100644 --- a/include/linux/unwind_user.h +++ b/include/linux/unwind_user.h @@ -14,6 +14,26 @@ #define in_compat_mode(regs) false #endif =20 +/* + * If an architecture needs to initialize the state for a specific + * reason, for example, it may need to do something different + * in compat mode, it can define a macro named arch_unwind_user_init + * with the name of the function that will perform this initialization. + */ +#ifndef arch_unwind_user_init +static inline void arch_unwind_user_init(struct unwind_user_state *state, = struct pt_regs *reg) {} +#endif + +/* + * If an architecture requires some more updates to the state between + * stack frames, it can define a macro named arch_unwind_user_next + * with the name of the function that will update the state between + * reading stack frames during the user space stack walk. + */ +#ifndef arch_unwind_user_next +static inline void arch_unwind_user_next(struct unwind_user_state *state) = {} +#endif + int unwind_user_start(struct unwind_user_state *state); int unwind_user_next(struct unwind_user_state *state); =20 diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c index 3a0ac4346f5b..2bb7995c3f23 100644 --- a/kernel/unwind/user.c +++ b/kernel/unwind/user.c @@ -72,6 +72,8 @@ int unwind_user_next(struct unwind_user_state *state) if (frame->fp_off) state->fp =3D fp; =20 + arch_unwind_user_next(state); + return 0; =20 done: @@ -101,6 +103,8 @@ int unwind_user_start(struct unwind_user_state *state) state->sp =3D user_stack_pointer(regs); state->fp =3D frame_pointer(regs); =20 + arch_unwind_user_init(state, regs); + return 0; } =20 --=20 2.47.2