From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A3D7720C030; Fri, 9 May 2025 16:51:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809498; cv=none; b=FbNt9U6GhaOOkkIT+Iws4KQuKRX+F+VZis5TODzoa6oWdSwYGxZHCAe2czdrCa8Kir3uR8zvGT/g9hZk9eiPIm+YJLMUwFeVddapyqdPZcUIuNf1GMn9ZlPl0HML2wgbn6cAemgmJUI37mXWLi3NyzFQ4am6Sk+qV2gD2jLUD7g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809498; c=relaxed/simple; bh=hOOD7yJ9YqulASsWyfsQlY43NJCQLFt0xRlCHKCfrKo=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=DUKybabcDPQBUfN8J4lRkHo+f0UaXxCFxzIga47nGYn2AIIbwZFmFn2RJCV1FETjx97mWIEa6xek6cOkb7G4MJTqG4zW3mxLJYZdFtYgeUtMxWWtxnmM0V1C9xHc6nXuADh/zmY/Cfg65bryaMV3NjyvUwtqHy+ugHT1jw0x2fQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2FD67C4CEEE; Fri, 9 May 2025 16:51:38 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQwz-00000002gDk-3v6R; Fri, 09 May 2025 12:51:53 -0400 Message-ID: <20250509165153.784197961@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:25 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 01/18] unwind_user: Add user space unwinding API References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Introduce a generic API for unwinding user stacks. In order to expand user space unwinding to be able to handle more complex scenarios, such as deferred unwinding and reading user space information, create a generic interface that all architectures can use that support the various unwinding methods. This is an alternative method for handling user space stack traces from the simple stack_trace_save_user() API. This does not replace that interface, but this interface will be used to expand the functionality of user space stack walking. None of the structures introduced will be exposed to user space tooling. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- MAINTAINERS | 8 +++++ arch/Kconfig | 3 ++ include/linux/unwind_user.h | 15 +++++++++ include/linux/unwind_user_types.h | 31 +++++++++++++++++ kernel/Makefile | 1 + kernel/unwind/Makefile | 1 + kernel/unwind/user.c | 55 +++++++++++++++++++++++++++++++ 7 files changed, 114 insertions(+) create mode 100644 include/linux/unwind_user.h create mode 100644 include/linux/unwind_user_types.h create mode 100644 kernel/unwind/Makefile create mode 100644 kernel/unwind/user.c diff --git a/MAINTAINERS b/MAINTAINERS index 800680e11431..ff1af7d36e77 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -25316,6 +25316,14 @@ F: Documentation/driver-api/uio-howto.rst F: drivers/uio/ F: include/linux/uio_driver.h =20 +USERSPACE STACK UNWINDING +M: Josh Poimboeuf +M: Steven Rostedt +S: Maintained +F: include/linux/unwind*.h +F: kernel/unwind/ + + UTIL-LINUX PACKAGE M: Karel Zak L: util-linux@vger.kernel.org diff --git a/arch/Kconfig b/arch/Kconfig index b0adb665041f..ccbcead9fac0 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -435,6 +435,9 @@ config HAVE_HARDLOCKUP_DETECTOR_ARCH It uses the same command line parameters, and sysctl interface, as the generic hardlockup detectors. =20 +config UNWIND_USER + bool + config HAVE_PERF_REGS bool help diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h new file mode 100644 index 000000000000..aa7923c1384f --- /dev/null +++ b/include/linux/unwind_user.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_UNWIND_USER_H +#define _LINUX_UNWIND_USER_H + +#include + +int unwind_user_start(struct unwind_user_state *state); +int unwind_user_next(struct unwind_user_state *state); + +int unwind_user(struct unwind_stacktrace *trace, unsigned int max_entries); + +#define for_each_user_frame(state) \ + for (unwind_user_start((state)); !(state)->done; unwind_user_next((state)= )) + +#endif /* _LINUX_UNWIND_USER_H */ diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_= types.h new file mode 100644 index 000000000000..6ed1b4ae74e1 --- /dev/null +++ b/include/linux/unwind_user_types.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_UNWIND_USER_TYPES_H +#define _LINUX_UNWIND_USER_TYPES_H + +#include + +enum unwind_user_type { + UNWIND_USER_TYPE_NONE, +}; + +struct unwind_stacktrace { + unsigned int nr; + unsigned long *entries; +}; + +struct unwind_user_frame { + s32 cfa_off; + s32 ra_off; + s32 fp_off; + bool use_fp; +}; + +struct unwind_user_state { + unsigned long ip; + unsigned long sp; + unsigned long fp; + enum unwind_user_type type; + bool done; +}; + +#endif /* _LINUX_UNWIND_USER_TYPES_H */ diff --git a/kernel/Makefile b/kernel/Makefile index 434929de17ef..5a2b2be2a32d 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -55,6 +55,7 @@ obj-y +=3D rcu/ obj-y +=3D livepatch/ obj-y +=3D dma/ obj-y +=3D entry/ +obj-y +=3D unwind/ obj-$(CONFIG_MODULES) +=3D module/ =20 obj-$(CONFIG_KCMP) +=3D kcmp.o diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile new file mode 100644 index 000000000000..349ce3677526 --- /dev/null +++ b/kernel/unwind/Makefile @@ -0,0 +1 @@ + obj-$(CONFIG_UNWIND_USER) +=3D user.o diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c new file mode 100644 index 000000000000..d30449328981 --- /dev/null +++ b/kernel/unwind/user.c @@ -0,0 +1,55 @@ +// SPDX-License-Identifier: GPL-2.0 +/* +* Generic interfaces for unwinding user space +*/ +#include +#include +#include +#include + +int unwind_user_next(struct unwind_user_state *state) +{ + /* no implementation yet */ + return -EINVAL; +} + +int unwind_user_start(struct unwind_user_state *state) +{ + struct pt_regs *regs =3D task_pt_regs(current); + + memset(state, 0, sizeof(*state)); + + if ((current->flags & PF_KTHREAD) || !user_mode(regs)) { + state->done =3D true; + return -EINVAL; + } + + state->type =3D UNWIND_USER_TYPE_NONE; + + state->ip =3D instruction_pointer(regs); + state->sp =3D user_stack_pointer(regs); + state->fp =3D frame_pointer(regs); + + return 0; +} + +int unwind_user(struct unwind_stacktrace *trace, unsigned int max_entries) +{ + struct unwind_user_state state; + + trace->nr =3D 0; + + if (!max_entries) + return -EINVAL; + + if (current->flags & PF_KTHREAD) + return 0; + + for_each_user_frame(&state) { + trace->entries[trace->nr++] =3D state.ip; + if (trace->nr >=3D max_entries) + break; + } + + return 0; +} --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B60C42367CD; Fri, 9 May 2025 16:51:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809498; cv=none; b=PyRBoBeQDOc9PrPMstinsecNCKtzIS0YZ10Urg3dCw0/mnjoAT9+AIzeA7kKCcB5183ABSA3EwHx4Rll1VU0vsjYs/BEnL3d9tDXcyoupAzEWYOy20lOv8evNlHGuQ7FqPNC7Sy7xM2kAX4+NY4PlCxmE/sBcb5bXvoezpnHcgY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809498; c=relaxed/simple; bh=YQCIvf5LGT5XFWsFFOofkXUiq80zAe/dxuiLdDbLBHk=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Fe76uPud3A3nXg10ZHLbQ838OLKxLWRqPOV4pQ5GqiL6VWQ0CoRyMyRUJKvJZvlHE1emc44Mkbijagq778EC0vLgTjWDovnisveMimpdV3n95EwPVyyJOeLx1r2EtSgqSuDAG/9DlFIhdcGTUVIPN74lhdZw9jwCAjq4A87EQHU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 42C9CC4CEF1; Fri, 9 May 2025 16:51:38 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx0-00000002gEE-0RSL; Fri, 09 May 2025 12:51:54 -0400 Message-ID: <20250509165153.956076418@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:26 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 02/18] unwind_user: Add frame pointer support References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Add optional support for user space frame pointer unwinding. If supported, the arch needs to enable CONFIG_HAVE_UNWIND_USER_FP and define ARCH_INIT_USER_FP_FRAME. By encoding the frame offsets in struct unwind_user_frame, much of this code can also be reused for future unwinder implementations like sframe. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- arch/Kconfig | 4 +++ include/asm-generic/unwind_user.h | 9 ++++++ include/linux/unwind_user_types.h | 1 + kernel/unwind/user.c | 51 +++++++++++++++++++++++++++++-- 4 files changed, 63 insertions(+), 2 deletions(-) create mode 100644 include/asm-generic/unwind_user.h diff --git a/arch/Kconfig b/arch/Kconfig index ccbcead9fac0..0e3844c0e200 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -438,6 +438,10 @@ config HAVE_HARDLOCKUP_DETECTOR_ARCH config UNWIND_USER bool =20 +config HAVE_UNWIND_USER_FP + bool + select UNWIND_USER + config HAVE_PERF_REGS bool help diff --git a/include/asm-generic/unwind_user.h b/include/asm-generic/unwind= _user.h new file mode 100644 index 000000000000..832425502fb3 --- /dev/null +++ b/include/asm-generic/unwind_user.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_UNWIND_USER_H +#define _ASM_GENERIC_UNWIND_USER_H + +#ifndef ARCH_INIT_USER_FP_FRAME + #define ARCH_INIT_USER_FP_FRAME +#endif + +#endif /* _ASM_GENERIC_UNWIND_USER_H */ diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_= types.h index 6ed1b4ae74e1..65bd070eb6b0 100644 --- a/include/linux/unwind_user_types.h +++ b/include/linux/unwind_user_types.h @@ -6,6 +6,7 @@ =20 enum unwind_user_type { UNWIND_USER_TYPE_NONE, + UNWIND_USER_TYPE_FP, }; =20 struct unwind_stacktrace { diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c index d30449328981..0671a81494d3 100644 --- a/kernel/unwind/user.c +++ b/kernel/unwind/user.c @@ -6,10 +6,54 @@ #include #include #include +#include +#include + +static struct unwind_user_frame fp_frame =3D { + ARCH_INIT_USER_FP_FRAME +}; + +static inline bool fp_state(struct unwind_user_state *state) +{ + return IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP) && + state->type =3D=3D UNWIND_USER_TYPE_FP; +} =20 int unwind_user_next(struct unwind_user_state *state) { - /* no implementation yet */ + struct unwind_user_frame _frame; + struct unwind_user_frame *frame =3D &_frame; + unsigned long cfa =3D 0, fp, ra =3D 0; + + if (state->done) + return -EINVAL; + + if (fp_state(state)) + frame =3D &fp_frame; + else + goto the_end; + + cfa =3D (frame->use_fp ? state->fp : state->sp) + frame->cfa_off; + + /* stack going in wrong direction? */ + if (cfa <=3D state->sp) + goto the_end; + + if (get_user(ra, (unsigned long *)(cfa + frame->ra_off))) + goto the_end; + + if (frame->fp_off && get_user(fp, (unsigned long __user *)(cfa + frame->f= p_off))) + goto the_end; + + state->ip =3D ra; + state->sp =3D cfa; + if (frame->fp_off) + state->fp =3D fp; + + return 0; + +the_end: + state->done =3D true; return -EINVAL; } =20 @@ -24,7 +68,10 @@ int unwind_user_start(struct unwind_user_state *state) return -EINVAL; } =20 - state->type =3D UNWIND_USER_TYPE_NONE; + if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP)) + state->type =3D UNWIND_USER_TYPE_FP; + else + state->type =3D UNWIND_USER_TYPE_NONE; =20 state->ip =3D instruction_pointer(regs); state->sp =3D user_stack_pointer(regs); --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B60662356B9; Fri, 9 May 2025 16:51:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809498; cv=none; b=jtXmwgwEKmyoTkJOYQy18pBuTTidaeZHeuleWyFDlFOVdTk+eSB+vx6VS9XHdY3/ijbLRrnFBSxy53qC9dufu+wwB0uXOz+VzOaoRNXmeclvmTE5344nYuxoKoPIsQZUm40GPkzl8rcEiz0OeJnWtIQ+CeW8CMkU3fC/igw0Ci4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809498; c=relaxed/simple; bh=mzbtnjJps06HCLHPOPcqSMIDBIvOi7mfpTIlyoJ7BSc=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=RrC9PLX8wyJOpAArQ2bKLLDwEs9eCTQyGNfDi6bQmW7Qz0YnbVmHjPi4aE/NjU+pWdjDq8czsd6vlWhnyB6r887Pl0SWBRIuCZvdCzpVIRjAs9mev2hwbjnrouFnCc4Ef1F24WKJDvqCFGSfY7zGq7B3S81aPU94uqGEQs41TNY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6BED4C4CEEF; Fri, 9 May 2025 16:51:38 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx0-00000002gEi-17fo; Fri, 09 May 2025 12:51:54 -0400 Message-ID: <20250509165154.126466241@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:27 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 03/18] unwind_user/x86: Enable frame pointer unwinding on x86 References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Use ARCH_INIT_USER_FP_FRAME to describe how frame pointers are unwound on x86, and enable CONFIG_HAVE_UNWIND_USER_FP accordingly so the unwind_user interfaces can be used. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- arch/x86/Kconfig | 1 + arch/x86/include/asm/unwind_user.h | 11 +++++++++++ 2 files changed, 12 insertions(+) create mode 100644 arch/x86/include/asm/unwind_user.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 4c33c644b92d..a6e529dc4550 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -301,6 +301,7 @@ config X86 select HAVE_SYSCALL_TRACEPOINTS select HAVE_UACCESS_VALIDATION if HAVE_OBJTOOL select HAVE_UNSTABLE_SCHED_CLOCK + select HAVE_UNWIND_USER_FP if X86_64 select HAVE_USER_RETURN_NOTIFIER select HAVE_GENERIC_VDSO select VDSO_GETRANDOM if X86_64 diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwi= nd_user.h new file mode 100644 index 000000000000..8597857bf896 --- /dev/null +++ b/arch/x86/include/asm/unwind_user.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_UNWIND_USER_H +#define _ASM_X86_UNWIND_USER_H + +#define ARCH_INIT_USER_FP_FRAME \ + .cfa_off =3D (s32)sizeof(long) * 2, \ + .ra_off =3D (s32)sizeof(long) * -1, \ + .fp_off =3D (s32)sizeof(long) * -2, \ + .use_fp =3D true, + +#endif /* _ASM_X86_UNWIND_USER_H */ --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F331B238C2C; Fri, 9 May 2025 16:51:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809499; cv=none; b=QOqxvpwrofoB+doGYefwH5C5qxBouuwifdpX1eUF+vcoQ8cdCKRnNByFYUVg3ubaZglK8Fe5n+0AkEGjgMlgfSSPt3snkv3TP6oYtnJWvJDBkAheg+HWFJj0ya4zaD9ge5N7jeuj2ft7NJKVgPfu5eE402DpMinW8ARvJ5neerI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809499; c=relaxed/simple; bh=JTFH4J/aPfmXUm6zG2FJR1h1yEhf4M7031l6ypEpMj0=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=psf4qDXPKpSwKnNGJ1Sz5cBmPDU/PFsXLzLaXcUeaMzAKRuXYYoJFbIBm/NcUsPt6mKTC++kfVR3WLGkmbm6wTatmDwiIf8qifKXZ3fmrP/i7kpUQSx0JRwPgPN39BGiS/rPIHHntK7EZ4zv2i7DwHJu5F9jPfQ8iEtQioeNqWM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id A34D5C4CEE9; Fri, 9 May 2025 16:51:38 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx0-00000002gFC-1plN; Fri, 09 May 2025 12:51:54 -0400 Message-ID: <20250509165154.287839165@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:28 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 04/18] perf/x86: Rename and move get_segment_base() and make it global References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf get_segment_base() will be used by the unwind_user code, so make it global and rename it so it doesn't conflict with a KVM function of the same name. As the function is no longer specific to perf, move it to ptrace.c as that seems to be a better location for a generic function like this. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- arch/x86/events/core.c | 44 ++++------------------------------- arch/x86/include/asm/ptrace.h | 2 ++ arch/x86/kernel/ptrace.c | 38 ++++++++++++++++++++++++++++++ 3 files changed, 45 insertions(+), 39 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 2e10dcf897c5..cc6329235b68 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -43,6 +43,7 @@ #include #include #include +#include #include =20 #include "perf_event.h" @@ -2809,41 +2810,6 @@ valid_user_frame(const void __user *fp, unsigned lon= g size) return __access_ok(fp, size); } =20 -static unsigned long get_segment_base(unsigned int segment) -{ - struct desc_struct *desc; - unsigned int idx =3D segment >> 3; - - if ((segment & SEGMENT_TI_MASK) =3D=3D SEGMENT_LDT) { -#ifdef CONFIG_MODIFY_LDT_SYSCALL - struct ldt_struct *ldt; - - /* - * If we're not in a valid context with a real (not just lazy) - * user mm, then don't even try. - */ - if (!nmi_uaccess_okay()) - return 0; - - /* IRQs are off, so this synchronizes with smp_store_release */ - ldt =3D smp_load_acquire(¤t->mm->context.ldt); - if (!ldt || idx >=3D ldt->nr_entries) - return 0; - - desc =3D &ldt->entries[idx]; -#else - return 0; -#endif - } else { - if (idx >=3D GDT_ENTRIES) - return 0; - - desc =3D raw_cpu_ptr(gdt_page.gdt) + idx; - } - - return get_desc_base(desc); -} - #ifdef CONFIG_UPROBES /* * Heuristic-based check if uprobe is installed at the function entry. @@ -2900,8 +2866,8 @@ perf_callchain_user32(struct pt_regs *regs, struct pe= rf_callchain_entry_ctx *ent if (user_64bit_mode(regs)) return 0; =20 - cs_base =3D get_segment_base(regs->cs); - ss_base =3D get_segment_base(regs->ss); + cs_base =3D segment_base_address(regs->cs); + ss_base =3D segment_base_address(regs->ss); =20 fp =3D compat_ptr(ss_base + regs->bp); pagefault_disable(); @@ -3020,11 +2986,11 @@ static unsigned long code_segment_base(struct pt_re= gs *regs) return 0x10 * regs->cs; =20 if (user_mode(regs) && regs->cs !=3D __USER_CS) - return get_segment_base(regs->cs); + return segment_base_address(regs->cs); #else if (user_mode(regs) && !user_64bit_mode(regs) && regs->cs !=3D __USER32_CS) - return get_segment_base(regs->cs); + return segment_base_address(regs->cs); #endif return 0; } diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 50f75467f73d..59357ec98e52 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -314,6 +314,8 @@ static __always_inline bool regs_irqs_disabled(struct p= t_regs *regs) return !(regs->flags & X86_EFLAGS_IF); } =20 +unsigned long segment_base_address(unsigned int segment); + /* Query offset/name of register from its name/offset */ extern int regs_query_register_offset(const char *name); extern const char *regs_query_register_name(unsigned int offset); diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index 095f04bdabdc..81353a09701b 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -41,6 +41,7 @@ #include #include #include +#include =20 #include "tls.h" =20 @@ -339,6 +340,43 @@ static int set_segment_reg(struct task_struct *task, =20 #endif /* CONFIG_X86_32 */ =20 +unsigned long segment_base_address(unsigned int segment) +{ + struct desc_struct *desc; + unsigned int idx =3D segment >> 3; + + lockdep_assert_irqs_disabled(); + + if ((segment & SEGMENT_TI_MASK) =3D=3D SEGMENT_LDT) { +#ifdef CONFIG_MODIFY_LDT_SYSCALL + struct ldt_struct *ldt; + + /* + * If we're not in a valid context with a real (not just lazy) + * user mm, then don't even try. + */ + if (!nmi_uaccess_okay()) + return 0; + + /* IRQs are off, so this synchronizes with smp_store_release */ + ldt =3D smp_load_acquire(¤t->mm->context.ldt); + if (!ldt || idx >=3D ldt->nr_entries) + return 0; + + desc =3D &ldt->entries[idx]; +#else + return 0; +#endif + } else { + if (idx >=3D GDT_ENTRIES) + return 0; + + desc =3D raw_cpu_ptr(gdt_page.gdt) + idx; + } + + return get_desc_base(desc); +} + static unsigned long get_flags(struct task_struct *task) { unsigned long retval =3D task_pt_regs(task)->flags; --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F326F238C27; Fri, 9 May 2025 16:51:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809499; cv=none; b=lKh6TQuri5Gw6vTRXNBpjaVdQheYHjB3GIxsqF/Z5rS9A1Au0qRSKDXnw/qTiBOBPTJs4D+524PoRXVOhe2iHHacb7XX2iT6KmPpnexr/sUJ2FTpOYOYllWLkLqlGy8lybBu+b7DtwCs8ebXVgaX+EuS/+aC3+H9YqsF3kpR15E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809499; c=relaxed/simple; bh=/O8CQz956FRS/3j6BWFD/9/GKy/nCXhh68LzH+c4gKM=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=U3MJEm9nYkmChsc5VZArzyAhh1909ycFDkjnkQRYmrnV9wFXwqsuDnnZv2Ov2Ef6eLJsXwP4W4kmGI5I16RspZmmy/80QjVm39oxTM9OqkKeG5Cq+r8E2pIVMYjQPJmLcx6CTJTEEEIJAz/jpABUW2BMRlTdOS8v1Ou3LPcuTb0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id BC5ACC4CEF5; Fri, 9 May 2025 16:51:38 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx0-00000002gFh-2WXW; Fri, 09 May 2025 12:51:54 -0400 Message-ID: <20250509165154.456665131@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:29 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 05/18] unwind_user: Add compat mode frame pointer support References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Add optional support for user space compat mode frame pointer unwinding. If supported, the arch needs to enable CONFIG_HAVE_UNWIND_USER_COMPAT_FP and define ARCH_INIT_USER_COMPAT_FP_FRAME. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- arch/Kconfig | 4 +++ include/asm-generic/Kbuild | 2 ++ include/asm-generic/unwind_user.h | 15 +++++++++++ include/asm-generic/unwind_user_types.h | 9 +++++++ include/linux/unwind_user_types.h | 3 +++ kernel/unwind/user.c | 36 ++++++++++++++++++++++--- 6 files changed, 65 insertions(+), 4 deletions(-) create mode 100644 include/asm-generic/unwind_user_types.h diff --git a/arch/Kconfig b/arch/Kconfig index 0e3844c0e200..dbb1cc89e040 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -442,6 +442,10 @@ config HAVE_UNWIND_USER_FP bool select UNWIND_USER =20 +config HAVE_UNWIND_USER_COMPAT_FP + bool + depends on HAVE_UNWIND_USER_FP + config HAVE_PERF_REGS bool help diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild index 8675b7b4ad23..b797a2434396 100644 --- a/include/asm-generic/Kbuild +++ b/include/asm-generic/Kbuild @@ -59,6 +59,8 @@ mandatory-y +=3D tlbflush.h mandatory-y +=3D topology.h mandatory-y +=3D trace_clock.h mandatory-y +=3D uaccess.h +mandatory-y +=3D unwind_user.h +mandatory-y +=3D unwind_user_types.h mandatory-y +=3D vermagic.h mandatory-y +=3D vga.h mandatory-y +=3D video.h diff --git a/include/asm-generic/unwind_user.h b/include/asm-generic/unwind= _user.h index 832425502fb3..385638ce4aec 100644 --- a/include/asm-generic/unwind_user.h +++ b/include/asm-generic/unwind_user.h @@ -2,8 +2,23 @@ #ifndef _ASM_GENERIC_UNWIND_USER_H #define _ASM_GENERIC_UNWIND_USER_H =20 +#include + #ifndef ARCH_INIT_USER_FP_FRAME #define ARCH_INIT_USER_FP_FRAME #endif =20 +#ifndef ARCH_INIT_USER_COMPAT_FP_FRAME + #define ARCH_INIT_USER_COMPAT_FP_FRAME + #define in_compat_mode(regs) false +#endif + +#ifndef arch_unwind_user_init +static inline void arch_unwind_user_init(struct unwind_user_state *state, = struct pt_regs *reg) {} +#endif + +#ifndef arch_unwind_user_next +static inline void arch_unwind_user_next(struct unwind_user_state *state) = {} +#endif + #endif /* _ASM_GENERIC_UNWIND_USER_H */ diff --git a/include/asm-generic/unwind_user_types.h b/include/asm-generic/= unwind_user_types.h new file mode 100644 index 000000000000..ee803de7c998 --- /dev/null +++ b/include/asm-generic/unwind_user_types.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_UNWIND_USER_TYPES_H +#define _ASM_GENERIC_UNWIND_USER_TYPES_H + +#ifndef arch_unwind_user_state +struct arch_unwind_user_state {}; +#endif + +#endif /* _ASM_GENERIC_UNWIND_USER_TYPES_H */ diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_= types.h index 65bd070eb6b0..3ec4a097a3dd 100644 --- a/include/linux/unwind_user_types.h +++ b/include/linux/unwind_user_types.h @@ -3,10 +3,12 @@ #define _LINUX_UNWIND_USER_TYPES_H =20 #include +#include =20 enum unwind_user_type { UNWIND_USER_TYPE_NONE, UNWIND_USER_TYPE_FP, + UNWIND_USER_TYPE_COMPAT_FP, }; =20 struct unwind_stacktrace { @@ -25,6 +27,7 @@ struct unwind_user_state { unsigned long ip; unsigned long sp; unsigned long fp; + struct arch_unwind_user_state arch; enum unwind_user_type type; bool done; }; diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c index 0671a81494d3..635cc04bb299 100644 --- a/kernel/unwind/user.c +++ b/kernel/unwind/user.c @@ -13,12 +13,32 @@ static struct unwind_user_frame fp_frame =3D { ARCH_INIT_USER_FP_FRAME }; =20 +static struct unwind_user_frame compat_fp_frame =3D { + ARCH_INIT_USER_COMPAT_FP_FRAME +}; + static inline bool fp_state(struct unwind_user_state *state) { return IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP) && state->type =3D=3D UNWIND_USER_TYPE_FP; } =20 +static inline bool compat_state(struct unwind_user_state *state) +{ + return IS_ENABLED(CONFIG_HAVE_UNWIND_USER_COMPAT_FP) && + state->type =3D=3D UNWIND_USER_TYPE_COMPAT_FP; +} + +#define UNWIND_GET_USER_LONG(to, from, state) \ +({ \ + int __ret; \ + if (compat_state(state)) \ + __ret =3D get_user(to, (u32 __user *)(from)); \ + else \ + __ret =3D get_user(to, (u64 __user *)(from)); \ + __ret; \ +}) + int unwind_user_next(struct unwind_user_state *state) { struct unwind_user_frame _frame; @@ -28,7 +48,9 @@ int unwind_user_next(struct unwind_user_state *state) if (state->done) return -EINVAL; =20 - if (fp_state(state)) + if (compat_state(state)) + frame =3D &compat_fp_frame; + else if (fp_state(state)) frame =3D &fp_frame; else goto the_end; @@ -39,10 +61,10 @@ int unwind_user_next(struct unwind_user_state *state) if (cfa <=3D state->sp) goto the_end; =20 - if (get_user(ra, (unsigned long *)(cfa + frame->ra_off))) + if (UNWIND_GET_USER_LONG(ra, cfa + frame->ra_off, state)) goto the_end; =20 - if (frame->fp_off && get_user(fp, (unsigned long __user *)(cfa + frame->f= p_off))) + if (frame->fp_off && UNWIND_GET_USER_LONG(fp, cfa + frame->fp_off, state)) goto the_end; =20 state->ip =3D ra; @@ -50,6 +72,8 @@ int unwind_user_next(struct unwind_user_state *state) if (frame->fp_off) state->fp =3D fp; =20 + arch_unwind_user_next(state); + return 0; =20 the_end: @@ -68,7 +92,9 @@ int unwind_user_start(struct unwind_user_state *state) return -EINVAL; } =20 - if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP)) + if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_COMPAT_FP) && in_compat_mode(regs)) + state->type =3D UNWIND_USER_TYPE_COMPAT_FP; + else if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP)) state->type =3D UNWIND_USER_TYPE_FP; else state->type =3D UNWIND_USER_TYPE_NONE; @@ -77,6 +103,8 @@ int unwind_user_start(struct unwind_user_state *state) state->sp =3D user_stack_pointer(regs); state->fp =3D frame_pointer(regs); =20 + arch_unwind_user_init(state, regs); + return 0; } =20 --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82D9A23A989; Fri, 9 May 2025 16:51:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809499; cv=none; b=inMg2t6mZii/eOVckx+kWT/m1IRkLknat/bWNq3q+PZAgdoJOacAiuaGFAmc/WlVXlPJEbSyLSRX1tuqTvj/AIPDYx8kaHeHIzZ7dQhHa0exutovZHyGmh+9PdhNcAqtnYC9pfObz90xVhhBR4UfwsGx3lBg2YIUttg+c3M5bAM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809499; c=relaxed/simple; bh=ZdIqly60pPXlTVJKAm08XsXfirDfPrHykK4zDv5b0JY=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=JQ1DLbTY33pt7i6Ni+YNIyBHKFsnETu9dBcaWA58ou1hVakOpg8Rojj/89s/+OBtk132KnKLfIgDchxFX4egYDrioU8amS9EuO2vsirCsX6gzfa/tXnzhcQyx2Zd1x6VzAJKTo9IWC/Z2j98QF4aP/1J/hJM6olTUc18d34puQY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 05604C4CEEE; Fri, 9 May 2025 16:51:39 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx0-00000002gGC-3EAk; Fri, 09 May 2025 12:51:54 -0400 Message-ID: <20250509165154.622716588@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:30 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 06/18] unwind_user/x86: Enable compat mode frame pointer unwinding on x86 References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Use ARCH_INIT_USER_COMPAT_FP_FRAME to describe how frame pointers are unwound on x86, and implement the hooks needed to add the segment base addresses. Enable HAVE_UNWIND_USER_COMPAT_FP if the system has compat mode compiled in. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- arch/x86/Kconfig | 1 + arch/x86/include/asm/unwind_user.h | 50 ++++++++++++++++++++++++ arch/x86/include/asm/unwind_user_types.h | 17 ++++++++ 3 files changed, 68 insertions(+) create mode 100644 arch/x86/include/asm/unwind_user_types.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a6e529dc4550..ee81e06cabca 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -301,6 +301,7 @@ config X86 select HAVE_SYSCALL_TRACEPOINTS select HAVE_UACCESS_VALIDATION if HAVE_OBJTOOL select HAVE_UNSTABLE_SCHED_CLOCK + select HAVE_UNWIND_USER_COMPAT_FP if IA32_EMULATION select HAVE_UNWIND_USER_FP if X86_64 select HAVE_USER_RETURN_NOTIFIER select HAVE_GENERIC_VDSO diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwi= nd_user.h index 8597857bf896..bb1148111259 100644 --- a/arch/x86/include/asm/unwind_user.h +++ b/arch/x86/include/asm/unwind_user.h @@ -2,10 +2,60 @@ #ifndef _ASM_X86_UNWIND_USER_H #define _ASM_X86_UNWIND_USER_H =20 +#include +#include +#include + #define ARCH_INIT_USER_FP_FRAME \ .cfa_off =3D (s32)sizeof(long) * 2, \ .ra_off =3D (s32)sizeof(long) * -1, \ .fp_off =3D (s32)sizeof(long) * -2, \ .use_fp =3D true, =20 +#ifdef CONFIG_IA32_EMULATION + +#define ARCH_INIT_USER_COMPAT_FP_FRAME \ + .cfa_off =3D (s32)sizeof(u32) * 2, \ + .ra_off =3D (s32)sizeof(u32) * -1, \ + .fp_off =3D (s32)sizeof(u32) * -2, \ + .use_fp =3D true, + +#define in_compat_mode(regs) !user_64bit_mode(regs) + +static inline void arch_unwind_user_init(struct unwind_user_state *state, + struct pt_regs *regs) +{ + unsigned long cs_base, ss_base; + + if (state->type !=3D UNWIND_USER_TYPE_COMPAT_FP) + return; + + scoped_guard(irqsave) { + cs_base =3D segment_base_address(regs->cs); + ss_base =3D segment_base_address(regs->ss); + } + + state->arch.cs_base =3D cs_base; + state->arch.ss_base =3D ss_base; + + state->ip +=3D cs_base; + state->sp +=3D ss_base; + state->fp +=3D ss_base; +} +#define arch_unwind_user_init arch_unwind_user_init + +static inline void arch_unwind_user_next(struct unwind_user_state *state) +{ + if (state->type !=3D UNWIND_USER_TYPE_COMPAT_FP) + return; + + state->ip +=3D state->arch.cs_base; + state->fp +=3D state->arch.ss_base; +} +#define arch_unwind_user_next arch_unwind_user_next + +#endif /* CONFIG_IA32_EMULATION */ + +#include + #endif /* _ASM_X86_UNWIND_USER_H */ diff --git a/arch/x86/include/asm/unwind_user_types.h b/arch/x86/include/as= m/unwind_user_types.h new file mode 100644 index 000000000000..d7074dc5f0ce --- /dev/null +++ b/arch/x86/include/asm/unwind_user_types.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_UNWIND_USER_TYPES_H +#define _ASM_UNWIND_USER_TYPES_H + +#ifdef CONFIG_IA32_EMULATION + +struct arch_unwind_user_state { + unsigned long ss_base; + unsigned long cs_base; +}; +#define arch_unwind_user_state arch_unwind_user_state + +#endif /* CONFIG_IA32_EMULATION */ + +#include + +#endif /* _ASM_UNWIND_USER_TYPES_H */ --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A20D23A9BF; Fri, 9 May 2025 16:51:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809499; cv=none; b=VSputoJJB/35f66qdQU/BXKtl7xj0JCM1fX9FZhU4kMtyC9u50EbVyQu/qGJTGh7taS4E7NwFJ9PZNO6uZ0wzthYIL7M1M9Iq1Ggx7syTueN8C+Fbcjf7hQm3BYpWKJWkbNtE46UGEd3EoPKkRJ3rEs27naXF4SgQygZ7z+E/0c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809499; c=relaxed/simple; bh=oSppW/Xo92Q7JQgTgYM1Vhe4SN1cEBvoZwAoUT4fSZo=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=dE3NVg3yALOcy0YSzxff60HpbygWMGOOOnr+Jzc/lduU5XbQOkiHoLzr7tdij+o6xjSWuHE9nQLPF8V7DzrRD/uc9FUMaJ9PoupFs1XD0qVFc9a+Klq/OiXEAYSgFAPBwv6yUNi2C+u25YWJwBuRZ1mu+sL4JGmGjRyZFRoMAK4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3BB18C4CEF4; Fri, 9 May 2025 16:51:39 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx0-00000002gGg-3w4X; Fri, 09 May 2025 12:51:54 -0400 Message-ID: <20250509165154.789938318@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:31 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 07/18] unwind_user/deferred: Add unwind_deferred_trace() References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Add a function that must be called inside a faultable context that will retrieve a user space stack trace. The function unwind_deferred_trace() can be called by a tracer when a task is about to enter user space, or has just come back from user space and has interrupts enabled. This code is based on work by Josh Poimboeuf's deferred unwinding code: Link: https://lore.kernel.org/all/6052e8487746603bdb29b65f4033e739092d9925.= 1737511963.git.jpoimboe@kernel.org/ Signed-off-by: Steven Rostedt (Google) --- Changes since v7: https://lore.kernel.org/20250502165008.564172398@goodmis.= org - Fixed whitespace issues - Added kerneldoc to unwind_deferred_trace() include/linux/sched.h | 5 +++ include/linux/unwind_deferred.h | 24 +++++++++++ include/linux/unwind_deferred_types.h | 9 ++++ kernel/fork.c | 4 ++ kernel/unwind/Makefile | 2 +- kernel/unwind/deferred.c | 60 +++++++++++++++++++++++++++ 6 files changed, 103 insertions(+), 1 deletion(-) create mode 100644 include/linux/unwind_deferred.h create mode 100644 include/linux/unwind_deferred_types.h create mode 100644 kernel/unwind/deferred.c diff --git a/include/linux/sched.h b/include/linux/sched.h index 4ecc0c6b1cb0..a1e1c07cadfb 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -47,6 +47,7 @@ #include #include #include +#include #include =20 /* task_struct member predeclarations (sorted alphabetically): */ @@ -1646,6 +1647,10 @@ struct task_struct { struct user_event_mm *user_event_mm; #endif =20 +#ifdef CONFIG_UNWIND_USER + struct unwind_task_info unwind_info; +#endif + /* CPU-specific state of this task: */ struct thread_struct thread; =20 diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h new file mode 100644 index 000000000000..5064ebe38c4f --- /dev/null +++ b/include/linux/unwind_deferred.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_UNWIND_USER_DEFERRED_H +#define _LINUX_UNWIND_USER_DEFERRED_H + +#include +#include + +#ifdef CONFIG_UNWIND_USER + +void unwind_task_init(struct task_struct *task); +void unwind_task_free(struct task_struct *task); + +int unwind_deferred_trace(struct unwind_stacktrace *trace); + +#else /* !CONFIG_UNWIND_USER */ + +static inline void unwind_task_init(struct task_struct *task) {} +static inline void unwind_task_free(struct task_struct *task) {} + +static inline int unwind_deferred_trace(struct unwind_stacktrace *trace) {= return -ENOSYS; } + +#endif /* !CONFIG_UNWIND_USER */ + +#endif /* _LINUX_UNWIND_USER_DEFERRED_H */ diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h new file mode 100644 index 000000000000..aa32db574e43 --- /dev/null +++ b/include/linux/unwind_deferred_types.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_UNWIND_USER_DEFERRED_TYPES_H +#define _LINUX_UNWIND_USER_DEFERRED_TYPES_H + +struct unwind_task_info { + unsigned long *entries; +}; + +#endif /* _LINUX_UNWIND_USER_DEFERRED_TYPES_H */ diff --git a/kernel/fork.c b/kernel/fork.c index c4b26cd8998b..8c79c7c2c553 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -105,6 +105,7 @@ #include #include #include +#include =20 #include #include @@ -991,6 +992,7 @@ void __put_task_struct(struct task_struct *tsk) WARN_ON(refcount_read(&tsk->usage)); WARN_ON(tsk =3D=3D current); =20 + unwind_task_free(tsk); sched_ext_free(tsk); io_uring_free(tsk); cgroup_free(tsk); @@ -2395,6 +2397,8 @@ __latent_entropy struct task_struct *copy_process( p->bpf_ctx =3D NULL; #endif =20 + unwind_task_init(p); + /* Perform scheduler related setup. Assign this task to a CPU. */ retval =3D sched_fork(clone_flags, p); if (retval) diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile index 349ce3677526..6752ac96d7e2 100644 --- a/kernel/unwind/Makefile +++ b/kernel/unwind/Makefile @@ -1 +1 @@ - obj-$(CONFIG_UNWIND_USER) +=3D user.o + obj-$(CONFIG_UNWIND_USER) +=3D user.o deferred.o diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c new file mode 100644 index 000000000000..0bafb95e6336 --- /dev/null +++ b/kernel/unwind/deferred.c @@ -0,0 +1,60 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Deferred user space unwinding + */ +#include +#include +#include +#include + +#define UNWIND_MAX_ENTRIES 512 + +/** + * unwind_deferred_trace - Produce a user stacktrace in faultable context + * @trace: The descriptor that will store the user stacktrace + * + * This must be called in a known faultable context (usually when entering + * or exiting user space). Depending on the available implementations + * the @trace will be loaded with the addresses of the user space stacktra= ce + * if it can be found. + * + * Return: 0 on success and negative on error + * On success @trace will contain the user space stacktrace + */ +int unwind_deferred_trace(struct unwind_stacktrace *trace) +{ + struct unwind_task_info *info =3D ¤t->unwind_info; + + /* Should always be called from faultable context */ + might_fault(); + + if (current->flags & PF_EXITING) + return -EINVAL; + + if (!info->entries) { + info->entries =3D kmalloc_array(UNWIND_MAX_ENTRIES, sizeof(long), + GFP_KERNEL); + if (!info->entries) + return -ENOMEM; + } + + trace->nr =3D 0; + trace->entries =3D info->entries; + unwind_user(trace, UNWIND_MAX_ENTRIES); + + return 0; +} + +void unwind_task_init(struct task_struct *task) +{ + struct unwind_task_info *info =3D &task->unwind_info; + + memset(info, 0, sizeof(*info)); +} + +void unwind_task_free(struct task_struct *task) +{ + struct unwind_task_info *info =3D &task->unwind_info; + + kfree(info->entries); +} --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71014239E9C; Fri, 9 May 2025 16:51:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809499; cv=none; b=f+4irKaevO3f/4QWzYtMf0MuOAdvOUEu4+0GC1ImyieW0YTOKUCcj2rWKW4kxGUN5cZOcbKQn5pQEewM5BDz78v0VPwdA+UfqzO8ucaqsgs8aOgUaTcVtZ1oOAdpTvAsVJtB7LNUF2Lmuoy5Y6boxH6C6kiUX2N5CpC3qPLyZFo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809499; c=relaxed/simple; bh=ufMG0Oyzm5psynbpq4hu4JKQIoDt2Fur92v2iDYLSVI=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=pdHB1iloRPV2zgzo1BqFW6CxXnLy8Fr5f6Zf022qrjXOsLEl0E6jsjC+KBOG/90L8kCCW2FAVmBEOW7cKtCTxMNMI037d2Wc41RFZZSC5m2PqKXx9DOBEwVOn+BSX4UIi1CizOu1pYa5HS6RVCpyIjTS1rA4r8VBMHV+bmt52qI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 473F0C4CEF3; Fri, 9 May 2025 16:51:39 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx1-00000002gHD-0RKe; Fri, 09 May 2025 12:51:55 -0400 Message-ID: <20250509165154.958107663@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:32 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 08/18] unwind_user/deferred: Add unwind cache References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Cache the results of the unwind to ensure the unwind is only performed once, even when called by multiple tracers. The cache nr_entries gets cleared every time the task exits the kernel. When a stacktrace is requested, nr_entries gets set to the number of entries in the stacktrace. If another stacktrace is requested, if nr_entries is not zero, then it contains the same stacktrace that would be retrieved so it is not processed again and the entries is given to the caller. Co-developed-by: Steven Rostedt (Google) Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- Changes since v7: https://lore.kernel.org/20250502165008.734340489@goodmis.= org - Allocate unwind_cache as a structure and not just its entries (Ingo Molnar) - Fixed white space issues (Ingo Molnar) include/linux/entry-common.h | 2 ++ include/linux/unwind_deferred.h | 8 ++++++++ include/linux/unwind_deferred_types.h | 7 ++++++- kernel/unwind/deferred.c | 26 ++++++++++++++++++++------ 4 files changed, 36 insertions(+), 7 deletions(-) diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index f94f3fdf15fc..6e850c9d3f0c 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -12,6 +12,7 @@ #include #include #include +#include =20 #include #include @@ -362,6 +363,7 @@ static __always_inline void exit_to_user_mode(void) lockdep_hardirqs_on_prepare(); instrumentation_end(); =20 + unwind_exit_to_user_mode(); user_enter_irqoff(); arch_exit_to_user_mode(); lockdep_hardirqs_on(CALLER_ADDR0); diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index 5064ebe38c4f..7d6cb2ffd084 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -12,6 +12,12 @@ void unwind_task_free(struct task_struct *task); =20 int unwind_deferred_trace(struct unwind_stacktrace *trace); =20 +static __always_inline void unwind_exit_to_user_mode(void) +{ + if (unlikely(current->unwind_info.cache)) + current->unwind_info.cache->nr_entries =3D 0; +} + #else /* !CONFIG_UNWIND_USER */ =20 static inline void unwind_task_init(struct task_struct *task) {} @@ -19,6 +25,8 @@ static inline void unwind_task_free(struct task_struct *t= ask) {} =20 static inline int unwind_deferred_trace(struct unwind_stacktrace *trace) {= return -ENOSYS; } =20 +static inline void unwind_exit_to_user_mode(void) {} + #endif /* !CONFIG_UNWIND_USER */ =20 #endif /* _LINUX_UNWIND_USER_DEFERRED_H */ diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index aa32db574e43..db5b54b18828 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -2,8 +2,13 @@ #ifndef _LINUX_UNWIND_USER_DEFERRED_TYPES_H #define _LINUX_UNWIND_USER_DEFERRED_TYPES_H =20 +struct unwind_cache { + unsigned int nr_entries; + unsigned long entries[]; +}; + struct unwind_task_info { - unsigned long *entries; + struct unwind_cache *cache; }; =20 #endif /* _LINUX_UNWIND_USER_DEFERRED_TYPES_H */ diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index 0bafb95e6336..e3913781c8c6 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -24,6 +24,7 @@ int unwind_deferred_trace(struct unwind_stacktrace *trace) { struct unwind_task_info *info =3D ¤t->unwind_info; + struct unwind_cache *cache; =20 /* Should always be called from faultable context */ might_fault(); @@ -31,17 +32,30 @@ int unwind_deferred_trace(struct unwind_stacktrace *tra= ce) if (current->flags & PF_EXITING) return -EINVAL; =20 - if (!info->entries) { - info->entries =3D kmalloc_array(UNWIND_MAX_ENTRIES, sizeof(long), - GFP_KERNEL); - if (!info->entries) + if (!info->cache) { + info->cache =3D kzalloc(struct_size(cache, entries, UNWIND_MAX_ENTRIES), + GFP_KERNEL); + if (!info->cache) return -ENOMEM; } =20 + cache =3D info->cache; + trace->entries =3D cache->entries; + + if (cache->nr_entries) { + /* + * The user stack has already been previously unwound in this + * entry context. Skip the unwind and use the cache. + */ + trace->nr =3D cache->nr_entries; + return 0; + } + trace->nr =3D 0; - trace->entries =3D info->entries; unwind_user(trace, UNWIND_MAX_ENTRIES); =20 + cache->nr_entries =3D trace->nr; + return 0; } =20 @@ -56,5 +70,5 @@ void unwind_task_free(struct task_struct *task) { struct unwind_task_info *info =3D &task->unwind_info; =20 - kfree(info->entries); + kfree(info->cache); } --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF9C523AE9B; Fri, 9 May 2025 16:51:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809499; cv=none; b=YjOliwnVwn8iQ6brQJOBbLCg5ygz5Yv6mn0R0VuePFIhHZImb0xGAD7M0RP8MnerjXpTrZs7DAwLTaM4X/2cO9sCWQ1rBu5lVCydFquzXCEBS5j5iQKwKxGnvcUNVlmqvmkAPoOOXpFY9/mclLXmLg2EzPjbjbfCqv0tvTVpzRs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809499; c=relaxed/simple; bh=svXgj5zjOA8Dv/75MaXTE0GK6R6LrHdzcoJzxaoTE3c=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=DmiIuBSgiNKDP12PZM/iUxuU23qlhHJSTAIJEVM2grf10AFkgP3t6I1vhIJEKlRHycpMuO9kul7XnY/qM5+JemtbAkJNEzJKL/kALIaBk1B1wJD+9Rn6h5PK+GYtYQgI1uyyi8esVD503jULFj4Qb4qe0/gmhpq6AP6w46Fjw6o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7A4BFC4CEF2; Fri, 9 May 2025 16:51:39 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx1-00000002gHj-18R8; Fri, 09 May 2025 12:51:55 -0400 Message-ID: <20250509165155.124809873@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:33 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 09/18] unwind_user/deferred: Add deferred unwinding interface References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Add an interface for scheduling task work to unwind the user space stack before returning to user space. This solves several problems for its callers: - Ensure the unwind happens in task context even if the caller may be running in NMI or interrupt context. - Avoid duplicate unwinds, whether called multiple times by the same caller or by different callers. - Take a timestamp when the first request comes in since the task entered the kernel. This will be returned to the calling function along with the stack trace when the task leaves the kernel. This timestamp can be used to correlate kernel unwinds/traces with the user unwind. The timestamp is created to detect when the stacktrace is the same. It is generated the first time a user space stacktrace is requested after the task enters the kernel. The timestamp is passed to the caller on request, and when the stacktrace is generated upon returning to user space, it call the requester's callback with the timestamp as well as the stacktrace. Co-developed-by: Steven Rostedt (Google) Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- Changes since v7: https://lore.kernel.org/20250502165008.904786447@goodmis.= org - Use a timestamp instead of a "cookie" - Updated comments to kerneldoc for unwind_deferred_request() include/linux/unwind_deferred.h | 18 ++++ include/linux/unwind_deferred_types.h | 3 + kernel/unwind/deferred.c | 131 +++++++++++++++++++++++++- 3 files changed, 151 insertions(+), 1 deletion(-) diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index 7d6cb2ffd084..a384eef719a3 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -2,9 +2,19 @@ #ifndef _LINUX_UNWIND_USER_DEFERRED_H #define _LINUX_UNWIND_USER_DEFERRED_H =20 +#include #include #include =20 +struct unwind_work; + +typedef void (*unwind_callback_t)(struct unwind_work *work, struct unwind_= stacktrace *trace, u64 timestamp); + +struct unwind_work { + struct list_head list; + unwind_callback_t func; +}; + #ifdef CONFIG_UNWIND_USER =20 void unwind_task_init(struct task_struct *task); @@ -12,10 +22,15 @@ void unwind_task_free(struct task_struct *task); =20 int unwind_deferred_trace(struct unwind_stacktrace *trace); =20 +int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func); +int unwind_deferred_request(struct unwind_work *work, u64 *timestamp); +void unwind_deferred_cancel(struct unwind_work *work); + static __always_inline void unwind_exit_to_user_mode(void) { if (unlikely(current->unwind_info.cache)) current->unwind_info.cache->nr_entries =3D 0; + current->unwind_info.timestamp =3D 0; } =20 #else /* !CONFIG_UNWIND_USER */ @@ -24,6 +39,9 @@ static inline void unwind_task_init(struct task_struct *t= ask) {} static inline void unwind_task_free(struct task_struct *task) {} =20 static inline int unwind_deferred_trace(struct unwind_stacktrace *trace) {= return -ENOSYS; } +static inline int unwind_deferred_init(struct unwind_work *work, unwind_ca= llback_t func) { return -ENOSYS; } +static inline int unwind_deferred_request(struct unwind_work *work, u64 *t= imestamp) { return -ENOSYS; } +static inline void unwind_deferred_cancel(struct unwind_work *work) {} =20 static inline void unwind_exit_to_user_mode(void) {} =20 diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index db5b54b18828..5df264cf81ad 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -9,6 +9,9 @@ struct unwind_cache { =20 struct unwind_task_info { struct unwind_cache *cache; + struct callback_head work; + u64 timestamp; + int pending; }; =20 #endif /* _LINUX_UNWIND_USER_DEFERRED_TYPES_H */ diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index e3913781c8c6..b76c704ddc6d 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -2,13 +2,35 @@ /* * Deferred user space unwinding */ +#include +#include +#include +#include #include #include #include -#include +#include =20 #define UNWIND_MAX_ENTRIES 512 =20 +/* Guards adding to and reading the list of callbacks */ +static DEFINE_MUTEX(callback_mutex); +static LIST_HEAD(callbacks); + +/* + * Read the task context timestamp, if this is the first caller then + * it will set the timestamp. + */ +static u64 get_timestamp(struct unwind_task_info *info) +{ + lockdep_assert_irqs_disabled(); + + if (!info->timestamp) + info->timestamp =3D local_clock(); + + return info->timestamp; +} + /** * unwind_deferred_trace - Produce a user stacktrace in faultable context * @trace: The descriptor that will store the user stacktrace @@ -59,11 +81,117 @@ int unwind_deferred_trace(struct unwind_stacktrace *tr= ace) return 0; } =20 +static void unwind_deferred_task_work(struct callback_head *head) +{ + struct unwind_task_info *info =3D container_of(head, struct unwind_task_i= nfo, work); + struct unwind_stacktrace trace; + struct unwind_work *work; + u64 timestamp; + + if (WARN_ON_ONCE(!info->pending)) + return; + + /* Allow work to come in again */ + WRITE_ONCE(info->pending, 0); + + /* + * From here on out, the callback must always be called, even if it's + * just an empty trace. + */ + trace.nr =3D 0; + trace.entries =3D NULL; + + unwind_deferred_trace(&trace); + + timestamp =3D info->timestamp; + + guard(mutex)(&callback_mutex); + list_for_each_entry(work, &callbacks, list) { + work->func(work, &trace, timestamp); + } +} + +/** + * unwind_deferred_request - Request a user stacktrace on task exit + * @work: Unwind descriptor requesting the trace + * @timestamp: The time stamp of the first request made for this task + * + * Schedule a user space unwind to be done in task work before exiting the + * kernel. + * + * The returned @timestamp output is the timestamp of the very first reque= st + * for a user space stacktrace for this task since it entered the kernel. + * It can be from a request by any caller of this infrastructure. + * Its value will also be passed to the callback function. It can be + * used to stitch kernel and user stack traces together in post-processing. + * + * It's valid to call this function multiple times for the same @work with= in + * the same task entry context. Each call will return the same timestamp + * while the task hasn't left the kernel. If the callback is not pending b= ecause + * it has already been previously called for the same entry context, it wi= ll be + * called again with the same stack trace and timestamp. + * + * Return: 1 if the the callback was already queued. + * 0 if the callback successfully was queued. + * Negative if there's an error. + * @timestamp holds the timestamp of the first request by any user + */ +int unwind_deferred_request(struct unwind_work *work, u64 *timestamp) +{ + struct unwind_task_info *info =3D ¤t->unwind_info; + int ret; + + *timestamp =3D 0; + + if (WARN_ON_ONCE(in_nmi())) + return -EINVAL; + + if ((current->flags & (PF_KTHREAD | PF_EXITING)) || + !user_mode(task_pt_regs(current))) + return -EINVAL; + + guard(irqsave)(); + + *timestamp =3D get_timestamp(info); + + /* callback already pending? */ + if (info->pending) + return 1; + + /* The work has been claimed, now schedule it. */ + ret =3D task_work_add(current, &info->work, TWA_RESUME); + if (WARN_ON_ONCE(ret)) + return ret; + + info->pending =3D 1; + return 0; +} + +void unwind_deferred_cancel(struct unwind_work *work) +{ + if (!work) + return; + + guard(mutex)(&callback_mutex); + list_del(&work->list); +} + +int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func) +{ + memset(work, 0, sizeof(*work)); + + guard(mutex)(&callback_mutex); + list_add(&work->list, &callbacks); + work->func =3D func; + return 0; +} + void unwind_task_init(struct task_struct *task) { struct unwind_task_info *info =3D &task->unwind_info; =20 memset(info, 0, sizeof(*info)); + init_task_work(&info->work, unwind_deferred_task_work); } =20 void unwind_task_free(struct task_struct *task) @@ -71,4 +199,5 @@ void unwind_task_free(struct task_struct *task) struct unwind_task_info *info =3D &task->unwind_info; =20 kfree(info->cache); + task_work_cancel(task, &info->work); } --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C9EA023BD0B; Fri, 9 May 2025 16:51:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809499; cv=none; b=PbkeRLP9fwQe1H/Uq9XTkAmgJTNlea10t/MHkB2vyLqaioUqxBTltclcv0Lu0B49AZt/ruLVtsoNrwrzeyZ+bNF8CrW3cQpPgjTsgQPy8ASYv+7Sp9qven5ULtTaO+L1NiuPhdGM7fWi3MSnNlbtvjaexlb8zLNkyMPvgh5nmh0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809499; c=relaxed/simple; bh=o+zpPJwuZrytbEsaS1hNJK0I6210t782rt6sY9BeE+M=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=FRj5OFMHSpklQhPTFTd2CTssF2KV6UYSpyNOsEyIlmo86XgGRTS0suhlTMCha+i50eWYr/QhyibxUvCWWWh6G4aPA4skAm6+yy0E+Uut6xV1oLZrzLkufFhaAbTWPTugxqg9Rnai+CR7d3LVfvXmQ+JNpUD3A2pIQc0urFpjFOg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9F031C4CEE9; Fri, 9 May 2025 16:51:39 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx1-00000002gIE-1ptS; Fri, 09 May 2025 12:51:55 -0400 Message-ID: <20250509165155.292241900@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:34 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 10/18] unwind_user/deferred: Make unwind deferral requests NMI-safe References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Make unwind_deferred_request() NMI-safe so tracers in NMI context can call it and safely request a user space stacktrace when the task exits. A "nmi_timestamp" is added to the unwind_task_info that gets updated by NMIs to not race with setting the info->timestamp. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- Changes since v7: https://lore.kernel.org/20250502165009.069806229@goodmis.= org - Updated to use timestamp instead of cookie include/linux/unwind_deferred_types.h | 1 + kernel/unwind/deferred.c | 91 ++++++++++++++++++++++++--- 2 files changed, 84 insertions(+), 8 deletions(-) diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index 5df264cf81ad..ae27a02234b8 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -11,6 +11,7 @@ struct unwind_task_info { struct unwind_cache *cache; struct callback_head work; u64 timestamp; + u64 nmi_timestamp; int pending; }; =20 diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index b76c704ddc6d..238cd97079ec 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -25,8 +25,27 @@ static u64 get_timestamp(struct unwind_task_info *info) { lockdep_assert_irqs_disabled(); =20 - if (!info->timestamp) - info->timestamp =3D local_clock(); + /* + * Note, the timestamp is generated on the first request. + * If it exists here, then the timestamp is earlier than + * this request and it means that this request will be + * valid for the stracktrace. + */ + if (!info->timestamp) { + WRITE_ONCE(info->timestamp, local_clock()); + barrier(); + /* + * If an NMI came in and set a timestamp, it means that + * it happened before this timestamp was set (otherwise + * the NMI would have used this one). Use the NMI timestamp + * instead. + */ + if (unlikely(info->nmi_timestamp)) { + WRITE_ONCE(info->timestamp, info->nmi_timestamp); + barrier(); + WRITE_ONCE(info->nmi_timestamp, 0); + } + } =20 return info->timestamp; } @@ -103,6 +122,13 @@ static void unwind_deferred_task_work(struct callback_= head *head) =20 unwind_deferred_trace(&trace); =20 + /* Check if the timestamp was only set by NMI */ + if (info->nmi_timestamp) { + WRITE_ONCE(info->timestamp, info->nmi_timestamp); + barrier(); + WRITE_ONCE(info->nmi_timestamp, 0); + } + timestamp =3D info->timestamp; =20 guard(mutex)(&callback_mutex); @@ -111,6 +137,48 @@ static void unwind_deferred_task_work(struct callback_= head *head) } } =20 +static int unwind_deferred_request_nmi(struct unwind_work *work, u64 *time= stamp) +{ + struct unwind_task_info *info =3D ¤t->unwind_info; + bool inited_timestamp =3D false; + int ret; + + /* Always use the nmi_timestamp first */ + *timestamp =3D info->nmi_timestamp ? : info->timestamp; + + if (!*timestamp) { + /* + * This is the first unwind request since the most recent entry + * from user space. Initialize the task timestamp. + * + * Don't write to info->timestamp directly, otherwise it may race + * with an interruption of get_timestamp(). + */ + info->nmi_timestamp =3D local_clock(); + *timestamp =3D info->nmi_timestamp; + inited_timestamp =3D true; + } + + if (info->pending) + return 1; + + ret =3D task_work_add(current, &info->work, TWA_NMI_CURRENT); + if (ret) { + /* + * If this set nmi_timestamp and is not using it, + * there's no guarantee that it will be used. + * Set it back to zero. + */ + if (inited_timestamp) + info->nmi_timestamp =3D 0; + return ret; + } + + info->pending =3D 1; + + return 0; +} + /** * unwind_deferred_request - Request a user stacktrace on task exit * @work: Unwind descriptor requesting the trace @@ -139,31 +207,38 @@ static void unwind_deferred_task_work(struct callback= _head *head) int unwind_deferred_request(struct unwind_work *work, u64 *timestamp) { struct unwind_task_info *info =3D ¤t->unwind_info; + int pending; int ret; =20 *timestamp =3D 0; =20 - if (WARN_ON_ONCE(in_nmi())) - return -EINVAL; - if ((current->flags & (PF_KTHREAD | PF_EXITING)) || !user_mode(task_pt_regs(current))) return -EINVAL; =20 + if (in_nmi()) + return unwind_deferred_request_nmi(work, timestamp); + guard(irqsave)(); =20 *timestamp =3D get_timestamp(info); =20 /* callback already pending? */ - if (info->pending) + pending =3D READ_ONCE(info->pending); + if (pending) + return 1; + + /* Claim the work unless an NMI just now swooped in to do so. */ + if (!try_cmpxchg(&info->pending, &pending, 1)) return 1; =20 /* The work has been claimed, now schedule it. */ ret =3D task_work_add(current, &info->work, TWA_RESUME); - if (WARN_ON_ONCE(ret)) + if (WARN_ON_ONCE(ret)) { + WRITE_ONCE(info->pending, 0); return ret; + } =20 - info->pending =3D 1; return 0; } =20 --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2340223C8D5; Fri, 9 May 2025 16:51:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809500; cv=none; b=P+YP9kbsSOUwBNab4mOzsd1G81h4ct66Zs6lZMtj/p2wbgW38ArKYd2wHMYx7/vJLjW2zh/kbzxMVg9yLNupc/ci6kjHKKTk3YKeKTxJ3X606DJkSi5+oDUWyC3bT4z7PoK16aF4R4WkmERJTzPRYEzMn2Ru/dm+CqpSumcakjE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809500; c=relaxed/simple; bh=nFgF+ACnRcNWWOr1gv5eFabQhi240pk46GZke2cBQLU=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=I0/sAiFOlFCIiQ/1DBbly2/AmUjukt9eOj0FXae6mv7sYsRPQpZ9zOfZ+NiyRLhYymL+5bQWAQ7Z8uy3flimQUt8/6ZVbkYi/sC7+zLKq7IwBB6f1sttDOaXexkdzqChsuVUlPNipu+K+z90HzWhu6Pdey5upURIx6nBnOqZPCM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id DA8E1C4CEFA; Fri, 9 May 2025 16:51:39 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx1-00000002gIj-2YPz; Fri, 09 May 2025 12:51:55 -0400 Message-ID: <20250509165155.459900954@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:35 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 11/18] unwind deferred: Use bitmask to determine which callbacks to call References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt In order to know which registered callback requested a stacktrace for when the task goes back to user space, add a bitmask for all registered tracers. The bitmask is the size of log, which means that on a 32 bit machine, it can have at most 32 registered tracers, and on 64 bit, it can have at most 64 registered tracers. This should not be an issue as there should not be more than 10 (unless BPF can abuse this?). When a tracer registers with unwind_deferred_init() it will get a bit number assigned to it. When a tracer requests a stacktrace, it will have its bit set within the task_struct. When the task returns back to user space, it will call the callbacks for all the registered tracers where their bits are set in the task's mask. When a tracer is removed by the unwind_deferred_cancel() all current tasks will clear the associated bit, just in case another tracer gets registered immediately afterward and then gets their callback called unexpectedly. Signed-off-by: Steven Rostedt (Google) --- include/linux/sched.h | 1 + include/linux/unwind_deferred.h | 1 + kernel/unwind/deferred.c | 46 ++++++++++++++++++++++++++++----- 3 files changed, 41 insertions(+), 7 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index a1e1c07cadfb..d3ee0c5405d6 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1649,6 +1649,7 @@ struct task_struct { =20 #ifdef CONFIG_UNWIND_USER struct unwind_task_info unwind_info; + unsigned long unwind_mask; #endif =20 /* CPU-specific state of this task: */ diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index a384eef719a3..1789c3624723 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -13,6 +13,7 @@ typedef void (*unwind_callback_t)(struct unwind_work *wor= k, struct unwind_stackt struct unwind_work { struct list_head list; unwind_callback_t func; + int bit; }; =20 #ifdef CONFIG_UNWIND_USER diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index 238cd97079ec..7ae0bec5b36a 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -16,6 +16,7 @@ /* Guards adding to and reading the list of callbacks */ static DEFINE_MUTEX(callback_mutex); static LIST_HEAD(callbacks); +static unsigned long unwind_mask; =20 /* * Read the task context timestamp, if this is the first caller then @@ -106,6 +107,7 @@ static void unwind_deferred_task_work(struct callback_h= ead *head) struct unwind_stacktrace trace; struct unwind_work *work; u64 timestamp; + struct task_struct *task =3D current; =20 if (WARN_ON_ONCE(!info->pending)) return; @@ -133,7 +135,10 @@ static void unwind_deferred_task_work(struct callback_= head *head) =20 guard(mutex)(&callback_mutex); list_for_each_entry(work, &callbacks, list) { - work->func(work, &trace, timestamp); + if (task->unwind_mask & (1UL << work->bit)) { + work->func(work, &trace, timestamp); + clear_bit(work->bit, ¤t->unwind_mask); + } } } =20 @@ -159,9 +164,12 @@ static int unwind_deferred_request_nmi(struct unwind_w= ork *work, u64 *timestamp) inited_timestamp =3D true; } =20 - if (info->pending) + if (current->unwind_mask & (1UL << work->bit)) return 1; =20 + if (info->pending) + goto out; + ret =3D task_work_add(current, &info->work, TWA_NMI_CURRENT); if (ret) { /* @@ -175,8 +183,8 @@ static int unwind_deferred_request_nmi(struct unwind_wo= rk *work, u64 *timestamp) } =20 info->pending =3D 1; - - return 0; +out: + return test_and_set_bit(work->bit, ¤t->unwind_mask); } =20 /** @@ -223,14 +231,18 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *timestamp) =20 *timestamp =3D get_timestamp(info); =20 + /* This is already queued */ + if (current->unwind_mask & (1UL << work->bit)) + return 1; + /* callback already pending? */ pending =3D READ_ONCE(info->pending); if (pending) - return 1; + goto out; =20 /* Claim the work unless an NMI just now swooped in to do so. */ if (!try_cmpxchg(&info->pending, &pending, 1)) - return 1; + goto out; =20 /* The work has been claimed, now schedule it. */ ret =3D task_work_add(current, &info->work, TWA_RESUME); @@ -239,16 +251,27 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *timestamp) return ret; } =20 - return 0; + out: + return test_and_set_bit(work->bit, ¤t->unwind_mask); } =20 void unwind_deferred_cancel(struct unwind_work *work) { + struct task_struct *g, *t; + if (!work) return; =20 guard(mutex)(&callback_mutex); list_del(&work->list); + + clear_bit(work->bit, &unwind_mask); + + guard(rcu)(); + /* Clear this bit from all threads */ + for_each_process_thread(g, t) { + clear_bit(work->bit, &t->unwind_mask); + } } =20 int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func) @@ -256,6 +279,14 @@ int unwind_deferred_init(struct unwind_work *work, unw= ind_callback_t func) memset(work, 0, sizeof(*work)); =20 guard(mutex)(&callback_mutex); + + /* See if there's a bit in the mask available */ + if (unwind_mask =3D=3D ~0UL) + return -EBUSY; + + work->bit =3D ffz(unwind_mask); + unwind_mask |=3D 1UL << work->bit; + list_add(&work->list, &callbacks); work->func =3D func; return 0; @@ -267,6 +298,7 @@ void unwind_task_init(struct task_struct *task) =20 memset(info, 0, sizeof(*info)); init_task_work(&info->work, unwind_deferred_task_work); + task->unwind_mask =3D 0; } =20 void unwind_task_free(struct task_struct *task) --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3BA2B23D291; Fri, 9 May 2025 16:51:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809500; cv=none; b=bDP8JsF225i+cwd2c45w1vhW0CaO7/fogNbBUE3LdqZPhpUL7zT8lVFs/4npWbDKqA1O4LitPHlp4CmpQlTmf3uJoP+4Cj8cJv455gDJWQxOJahDFqRi2w3eOME4P8JclRZubq0PFAuQGrQVVYfWkhrlegeE+m2c4tub623v5Oc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809500; c=relaxed/simple; bh=LFtCo/JCrm+gEk0JbCkVo9QFz4ej+uVZlmOJS47TCt8=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=nPW+YfcBmwgDTwfR+rcM4ZEUJz9bOYj+BciENcb2SiArr0FYwct8Q+L5VoRsG0sgKjMWlURNGqeL+mTd9c2dpmI8rNTkboHPIiOt+myqIAvKHN5q/f/HZXgqcoqtoEGGCrTgYBogsce2vg5cXCCjevetRqVSBvkfexsGUMor7bY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0062AC4CEF3; Fri, 9 May 2025 16:51:40 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx1-00000002gJE-3G3m; Fri, 09 May 2025 12:51:55 -0400 Message-ID: <20250509165155.628873521@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:36 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 12/18] unwind deferred: Use SRCU unwind_deferred_task_work() References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Instead of using the callback_mutex to protect the link list of callbacks in unwind_deferred_task_work(), use SRCU instead. This gets called every time a task exits that has to record a stack trace that was requested. This can happen for many tasks on several CPUs at the same time. A mutex is a bottleneck and can cause a bit of contention and slow down performance. As the callbacks themselves are allowed to sleep, regular RCU can not be used to protect the list. Instead use SRCU, as that still allows the callbacks to sleep and the list can be read without needing to hold the callback_mutex. Link: https://lore.kernel.org/all/ca9bd83a-6c80-4ee0-a83c-224b9d60b755@effi= cios.com/ Suggested-by: Mathieu Desnoyers Signed-off-by: Steven Rostedt (Google) --- kernel/unwind/deferred.c | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index 7ae0bec5b36a..5d6976ee648f 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -13,10 +13,11 @@ =20 #define UNWIND_MAX_ENTRIES 512 =20 -/* Guards adding to and reading the list of callbacks */ +/* Guards adding to or removing from the list of callbacks */ static DEFINE_MUTEX(callback_mutex); static LIST_HEAD(callbacks); static unsigned long unwind_mask; +DEFINE_STATIC_SRCU(unwind_srcu); =20 /* * Read the task context timestamp, if this is the first caller then @@ -108,6 +109,7 @@ static void unwind_deferred_task_work(struct callback_h= ead *head) struct unwind_work *work; u64 timestamp; struct task_struct *task =3D current; + int idx; =20 if (WARN_ON_ONCE(!info->pending)) return; @@ -133,13 +135,15 @@ static void unwind_deferred_task_work(struct callback= _head *head) =20 timestamp =3D info->timestamp; =20 - guard(mutex)(&callback_mutex); - list_for_each_entry(work, &callbacks, list) { + idx =3D srcu_read_lock(&unwind_srcu); + list_for_each_entry_srcu(work, &callbacks, list, + srcu_read_lock_held(&unwind_srcu)) { if (task->unwind_mask & (1UL << work->bit)) { work->func(work, &trace, timestamp); clear_bit(work->bit, ¤t->unwind_mask); } } + srcu_read_unlock(&unwind_srcu, idx); } =20 static int unwind_deferred_request_nmi(struct unwind_work *work, u64 *time= stamp) @@ -216,6 +220,7 @@ int unwind_deferred_request(struct unwind_work *work, u= 64 *timestamp) { struct unwind_task_info *info =3D ¤t->unwind_info; int pending; + int bit; int ret; =20 *timestamp =3D 0; @@ -227,12 +232,17 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *timestamp) if (in_nmi()) return unwind_deferred_request_nmi(work, timestamp); =20 + /* Do not allow cancelled works to request again */ + bit =3D READ_ONCE(work->bit); + if (WARN_ON_ONCE(bit < 0)) + return -EINVAL; + guard(irqsave)(); =20 *timestamp =3D get_timestamp(info); =20 /* This is already queued */ - if (current->unwind_mask & (1UL << work->bit)) + if (current->unwind_mask & (1UL << bit)) return 1; =20 /* callback already pending? */ @@ -258,19 +268,26 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *timestamp) void unwind_deferred_cancel(struct unwind_work *work) { struct task_struct *g, *t; + int bit; =20 if (!work) return; =20 guard(mutex)(&callback_mutex); - list_del(&work->list); + list_del_rcu(&work->list); + bit =3D work->bit; + + /* Do not allow any more requests and prevent callbacks */ + work->bit =3D -1; + + clear_bit(bit, &unwind_mask); =20 - clear_bit(work->bit, &unwind_mask); + synchronize_srcu(&unwind_srcu); =20 guard(rcu)(); /* Clear this bit from all threads */ for_each_process_thread(g, t) { - clear_bit(work->bit, &t->unwind_mask); + clear_bit(bit, &t->unwind_mask); } } =20 @@ -287,7 +304,7 @@ int unwind_deferred_init(struct unwind_work *work, unwi= nd_callback_t func) work->bit =3D ffz(unwind_mask); unwind_mask |=3D 1UL << work->bit; =20 - list_add(&work->list, &callbacks); + list_add_rcu(&work->list, &callbacks); work->func =3D func; return 0; } --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCE5724110F; Fri, 9 May 2025 16:51:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809500; cv=none; b=cLY/vp6iI1mdW8X/56jjLW9ST9FXnU4YuHgBhMF+ownRFku9wfseHVtBvjX4XYKZeY/cDR2Ox2hwpiNJhdxIzqlHoIPtqhpFJLY+/SF6pswcnvHrnyiEEdJCIGt4YTj3cUvuMq5DHN7Q/K9kl4174XyeIgDdT7ahRACg1q27GmY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809500; c=relaxed/simple; bh=LPJdAvV3vgiW3CunL+1Ufqd5avv0X4srOOM4scFLZac=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=EcPFv0bYdIqo6TtGRvEFYMZ1MXvGeMQDoTEQvYGjMR2+lQbO7qbyYqXz+uboFYtajuIaCwTL8gzFsPw9XBr2JmG5yBCoUJyXwIv2Q1iZiPXzicaH1USOgEaVIgfNaBUzDht/2cTXZomKjYrYsyJrHGHBFRWossyfn71isApq+PI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2B920C4CEE9; Fri, 9 May 2025 16:51:40 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx1-00000002gJj-3xod; Fri, 09 May 2025 12:51:55 -0400 Message-ID: <20250509165155.796115119@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:37 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 13/18] unwind: Clear unwind_mask on exit back to user space References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt When testing the deferred unwinder by attaching deferred user space stacktraces to events, a live lock happened. This was when the deferred unwinding was added to the irqs_disabled event, which happens after the task_work callbacks are called and before the task goes back to user space. The event callback would be registered when irqs were disabled, the task_work would trigger, call the callback for this work and clear the work's bit. Then before getting back to user space, irqs would be disabled again, the event triggered again, and a new task_work registered. This caused an infinite loop and the system hung. To prevent this, clear the bits at the very last moment before going back to user space and when instrumentation is disabled. That is in unwind_exit_to_user_mode(). Move the pending bit from a value on the task_struct to the most significant bit of the unwind_mask (saves space on the task_struct). This will allow modifying the pending bit along with the work bits atomically. Instead of clearing a work's bit after its callback is called, it is delayed until exit. If the work is requested again, the task_work is not queued again and the work will be notified that the task has already been called (via UNWIND_ALREADY_EXECUTED return value). The pending bit is cleared before calling the callback functions but the current work bits remain. If one of the called works registers again, it will not trigger a task_work if its bit is still present in the task's unwind_mask. If a new work registers, then it will set both the pending bit and its own bit but clear the other work bits so that their callbacks do not get called again. Signed-off-by: Steven Rostedt (Google) --- include/linux/unwind_deferred.h | 23 ++++++- include/linux/unwind_deferred_types.h | 1 - kernel/unwind/deferred.c | 96 +++++++++++++++++++-------- 3 files changed, 90 insertions(+), 30 deletions(-) diff --git a/include/linux/unwind_deferred.h b/include/linux/unwind_deferre= d.h index 1789c3624723..b3c8703fcc22 100644 --- a/include/linux/unwind_deferred.h +++ b/include/linux/unwind_deferred.h @@ -18,6 +18,14 @@ struct unwind_work { =20 #ifdef CONFIG_UNWIND_USER =20 +#define UNWIND_PENDING_BIT (BITS_PER_LONG - 1) +#define UNWIND_PENDING (1UL << UNWIND_PENDING_BIT) + +enum { + UNWIND_ALREADY_PENDING =3D 1, + UNWIND_ALREADY_EXECUTED =3D 2, +}; + void unwind_task_init(struct task_struct *task); void unwind_task_free(struct task_struct *task); =20 @@ -29,7 +37,20 @@ void unwind_deferred_cancel(struct unwind_work *work); =20 static __always_inline void unwind_exit_to_user_mode(void) { - if (unlikely(current->unwind_info.cache)) + unsigned long bits; + + /* Was there any unwinding? */ + if (likely(!current->unwind_mask)) + return; + + bits =3D current->unwind_mask; + do { + /* Is a task_work going to run again before going back */ + if (bits & UNWIND_PENDING) + return; + } while (!try_cmpxchg(¤t->unwind_mask, &bits, 0UL)); + + if (likely(current->unwind_info.cache)) current->unwind_info.cache->nr_entries =3D 0; current->unwind_info.timestamp =3D 0; } diff --git a/include/linux/unwind_deferred_types.h b/include/linux/unwind_d= eferred_types.h index ae27a02234b8..28811a9d4262 100644 --- a/include/linux/unwind_deferred_types.h +++ b/include/linux/unwind_deferred_types.h @@ -12,7 +12,6 @@ struct unwind_task_info { struct callback_head work; u64 timestamp; u64 nmi_timestamp; - int pending; }; =20 #endif /* _LINUX_UNWIND_USER_DEFERRED_TYPES_H */ diff --git a/kernel/unwind/deferred.c b/kernel/unwind/deferred.c index 5d6976ee648f..b0289a695b92 100644 --- a/kernel/unwind/deferred.c +++ b/kernel/unwind/deferred.c @@ -19,6 +19,11 @@ static LIST_HEAD(callbacks); static unsigned long unwind_mask; DEFINE_STATIC_SRCU(unwind_srcu); =20 +static inline bool unwind_pending(struct task_struct *task) +{ + return test_bit(UNWIND_PENDING_BIT, &task->unwind_mask); +} + /* * Read the task context timestamp, if this is the first caller then * it will set the timestamp. @@ -107,15 +112,18 @@ static void unwind_deferred_task_work(struct callback= _head *head) struct unwind_task_info *info =3D container_of(head, struct unwind_task_i= nfo, work); struct unwind_stacktrace trace; struct unwind_work *work; + unsigned long bits; u64 timestamp; struct task_struct *task =3D current; int idx; =20 - if (WARN_ON_ONCE(!info->pending)) + if (WARN_ON_ONCE(!unwind_pending(task))) return; =20 - /* Allow work to come in again */ - WRITE_ONCE(info->pending, 0); + /* Clear pending bit but make sure to have the current bits */ + bits =3D READ_ONCE(task->unwind_mask); + while (!try_cmpxchg(&task->unwind_mask, &bits, bits & ~UNWIND_PENDING_BIT= )) + ; =20 /* * From here on out, the callback must always be called, even if it's @@ -138,10 +146,8 @@ static void unwind_deferred_task_work(struct callback_= head *head) idx =3D srcu_read_lock(&unwind_srcu); list_for_each_entry_srcu(work, &callbacks, list, srcu_read_lock_held(&unwind_srcu)) { - if (task->unwind_mask & (1UL << work->bit)) { + if (bits & (1UL << work->bit)) work->func(work, &trace, timestamp); - clear_bit(work->bit, ¤t->unwind_mask); - } } srcu_read_unlock(&unwind_srcu, idx); } @@ -168,10 +174,13 @@ static int unwind_deferred_request_nmi(struct unwind_= work *work, u64 *timestamp) inited_timestamp =3D true; } =20 - if (current->unwind_mask & (1UL << work->bit)) - return 1; + /* Is this already queued */ + if (current->unwind_mask & (1UL << work->bit)) { + return unwind_pending(current) ? UNWIND_ALREADY_PENDING : + UNWIND_ALREADY_EXECUTED; + } =20 - if (info->pending) + if (unwind_pending(current)) goto out; =20 ret =3D task_work_add(current, &info->work, TWA_NMI_CURRENT); @@ -186,9 +195,17 @@ static int unwind_deferred_request_nmi(struct unwind_w= ork *work, u64 *timestamp) return ret; } =20 - info->pending =3D 1; + /* + * This is the first to set the PENDING_BIT, clear all others + * as any other bit has already had their callback called, and + * those callbacks should not be called again because of this + * new callback. If they request another callback, then they + * will get a new one. + */ + current->unwind_mask =3D UNWIND_PENDING; out: - return test_and_set_bit(work->bit, ¤t->unwind_mask); + return test_and_set_bit(work->bit, ¤t->unwind_mask) ? + UNWIND_ALREADY_PENDING : 0; } =20 /** @@ -211,15 +228,17 @@ static int unwind_deferred_request_nmi(struct unwind_= work *work, u64 *timestamp) * it has already been previously called for the same entry context, it wi= ll be * called again with the same stack trace and timestamp. * - * Return: 1 if the the callback was already queued. - * 0 if the callback successfully was queued. + * Return: 0 if the callback successfully was queued. + * UNWIND_ALREADY_PENDING if the the callback was already queued. + * UNWIND_ALREADY_EXECUTED if the callback was already called + * (and will not be called again) * Negative if there's an error. * @timestamp holds the timestamp of the first request by any user */ int unwind_deferred_request(struct unwind_work *work, u64 *timestamp) { struct unwind_task_info *info =3D ¤t->unwind_info; - int pending; + unsigned long old, bits; int bit; int ret; =20 @@ -241,28 +260,49 @@ int unwind_deferred_request(struct unwind_work *work,= u64 *timestamp) =20 *timestamp =3D get_timestamp(info); =20 - /* This is already queued */ - if (current->unwind_mask & (1UL << bit)) - return 1; + old =3D READ_ONCE(current->unwind_mask); + + /* Is this already queued */ + if (old & (1UL << bit)) { + /* + * If pending is not set, it means this work's callback + * was already called. + */ + return old & UNWIND_PENDING ? UNWIND_ALREADY_PENDING : + UNWIND_ALREADY_EXECUTED; + } =20 - /* callback already pending? */ - pending =3D READ_ONCE(info->pending); - if (pending) + if (unwind_pending(current)) goto out; =20 - /* Claim the work unless an NMI just now swooped in to do so. */ - if (!try_cmpxchg(&info->pending, &pending, 1)) + /* + * This is the first to enable another task_work for this task since + * the task entered the kernel, or had already called the callbacks. + * Set only the bit for this work and clear all others as they have + * already had their callbacks called, and do not need to call them + * again because of this work. + */ + bits =3D UNWIND_PENDING | (1 << bit); + + /* + * If the cmpxchg() fails, it means that an NMI came in and set + * the pending bit as well as cleared the other bits. Just + * jump to setting the bit for this work. + */ + if (!try_cmpxchg(¤t->unwind_mask, &old, bits)) goto out; =20 /* The work has been claimed, now schedule it. */ ret =3D task_work_add(current, &info->work, TWA_RESUME); - if (WARN_ON_ONCE(ret)) { - WRITE_ONCE(info->pending, 0); - return ret; - } + + if (WARN_ON_ONCE(ret)) + WRITE_ONCE(current->unwind_mask, 0); + + return ret; =20 out: - return test_and_set_bit(work->bit, ¤t->unwind_mask); + return test_and_set_bit(work->bit, ¤t->unwind_mask) ? + UNWIND_ALREADY_PENDING : 0; } =20 void unwind_deferred_cancel(struct unwind_work *work) @@ -298,7 +338,7 @@ int unwind_deferred_init(struct unwind_work *work, unwi= nd_callback_t func) guard(mutex)(&callback_mutex); =20 /* See if there's a bit in the mask available */ - if (unwind_mask =3D=3D ~0UL) + if (unwind_mask =3D=3D ~(UNWIND_PENDING)) return -EBUSY; =20 work->bit =3D ffz(unwind_mask); --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCDEA241103; Fri, 9 May 2025 16:51:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809500; cv=none; b=I/73WHFpa6yO0s9hKR22zGH1ISqisJFk2lr+rQRUicOV6X+pKQAEPjcCLqdTzUDo/AdrfLCYbhxs0dhopt1TXTCEHDjXFL3iA4jdZITHp48h/XQpHZk2Hz7rh6O3mW+cCAORrs8nSEGmTyd07n8g3xrt4gFsfrFaWpqzOeGdEDY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809500; c=relaxed/simple; bh=nvl7Vl/7xOcHc3Bu1gQteWzmxNk+BT5GUaar6P8imWE=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=JvFVUESRwbNB3icmXdvnMd6pVnI+/U8HLZDNOQenldb4dY9985ptKjpdtQtqsr02AO4hhoQsmYy6NP21KJ9bUlnQVEQm9SHC5IfXkGNJ2axLbLbr2in7VZCwjVC++u5ey+4H00IxAuBwYm7wmlP9FRAn+DIh+baLTVVXFt3JZLE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5BFABC4CEF3; Fri, 9 May 2025 16:51:40 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx2-00000002gKD-0Tkj; Fri, 09 May 2025 12:51:56 -0400 Message-ID: <20250509165155.965084136@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:38 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim , Namhyung Kim Subject: [PATCH v8 14/18] perf: Remove get_perf_callchain() init_nr argument References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf The 'init_nr' argument has double duty: it's used to initialize both the number of contexts and the number of stack entries. That's confusing and the callers always pass zero anyway. Hard code the zero. Acked-by: Namhyung Kim Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) Acked-by: Alexei Starovoitov --- include/linux/perf_event.h | 2 +- kernel/bpf/stackmap.c | 4 ++-- kernel/events/callchain.c | 12 ++++++------ kernel/events/core.c | 2 +- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 947ad12dfdbe..3cc0b0ea0afa 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1651,7 +1651,7 @@ DECLARE_PER_CPU(struct perf_callchain_entry, perf_cal= lchain_entry); extern void perf_callchain_user(struct perf_callchain_entry_ctx *entry, st= ruct pt_regs *regs); extern void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, = struct pt_regs *regs); extern struct perf_callchain_entry * -get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool us= er, +get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, u32 max_stack, bool crosstask, bool add_mark); extern int get_callchain_buffers(int max_stack); extern void put_callchain_buffers(void); diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 3615c06b7dfa..ec3a57a5fba1 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -314,7 +314,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, str= uct bpf_map *, map, if (max_depth > sysctl_perf_event_max_stack) max_depth =3D sysctl_perf_event_max_stack; =20 - trace =3D get_perf_callchain(regs, 0, kernel, user, max_depth, + trace =3D get_perf_callchain(regs, kernel, user, max_depth, false, false); =20 if (unlikely(!trace)) @@ -451,7 +451,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struc= t task_struct *task, else if (kernel && task) trace =3D get_callchain_entry_for_task(task, max_depth); else - trace =3D get_perf_callchain(regs, 0, kernel, user, max_depth, + trace =3D get_perf_callchain(regs, kernel, user, max_depth, crosstask, false); =20 if (unlikely(!trace) || trace->nr < skip) { diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index 6c83ad674d01..b0f5bd228cd8 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -217,7 +217,7 @@ static void fixup_uretprobe_trampoline_entries(struct p= erf_callchain_entry *entr } =20 struct perf_callchain_entry * -get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool us= er, +get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, u32 max_stack, bool crosstask, bool add_mark) { struct perf_callchain_entry *entry; @@ -228,11 +228,11 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr,= bool kernel, bool user, if (!entry) return NULL; =20 - ctx.entry =3D entry; - ctx.max_stack =3D max_stack; - ctx.nr =3D entry->nr =3D init_nr; - ctx.contexts =3D 0; - ctx.contexts_maxed =3D false; + ctx.entry =3D entry; + ctx.max_stack =3D max_stack; + ctx.nr =3D entry->nr =3D 0; + ctx.contexts =3D 0; + ctx.contexts_maxed =3D false; =20 if (kernel && !user_mode(regs)) { if (add_mark) diff --git a/kernel/events/core.c b/kernel/events/core.c index 05136e835042..0bac8cae08a8 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8110,7 +8110,7 @@ perf_callchain(struct perf_event *event, struct pt_re= gs *regs) if (!kernel && !user) return &__empty_callchain; =20 - callchain =3D get_perf_callchain(regs, 0, kernel, user, + callchain =3D get_perf_callchain(regs, kernel, user, max_stack, crosstask, true); return callchain ?: &__empty_callchain; } --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAA3323F299; Fri, 9 May 2025 16:51:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809500; cv=none; b=pUogpEwpQ7P/xNqYWWUtdESOIT+aHiGazp5PXS1bT1FgnNI/nZdPSjksvoIY3VHo3SOUFJU8IiGEMSp59eHSW+8PwfQFhfBazaBBaMfIixS1bV4pO2i76zDud/eM+dlCi849octGxuyU/9CmPcmF/hkhKC+jYGH3MOL76Fps6EE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809500; c=relaxed/simple; bh=6DE7KZOXrlLO0p0yUtPCwZQ+NilkphXCklG8RJnYqCw=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=h9FPeI+Qf8ggLPeWmWfsBaqNgQfCoBFhittW6h2HD+PuB6/xqxO/o/OLHXdk0rHMCEKbpKuEEtNvi+3oSEdQ6Q2wXCMkHUAzPUkuej3AgTFjO+cAAzs5//UxdiCbbXzB1BLRHrejMpTy27O08W9pOdk744rHPFPpikPnj7Dj9hQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 827E9C19421; Fri, 9 May 2025 16:51:40 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx2-00000002gKj-1DBk; Fri, 09 May 2025 12:51:56 -0400 Message-ID: <20250509165156.135430576@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:39 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 15/18] perf: Have get_perf_callchain() return NULL if crosstask and user are set References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf get_perf_callchain() doesn't support cross-task unwinding for user space stacks, have it return NULL if both the crosstask and user arguments are set. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- kernel/events/callchain.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index b0f5bd228cd8..abf258913ab6 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -224,6 +224,10 @@ get_perf_callchain(struct pt_regs *regs, bool kernel, = bool user, struct perf_callchain_entry_ctx ctx; int rctx, start_entry_idx; =20 + /* crosstask is not supported for user stacks */ + if (crosstask && user) + return NULL; + entry =3D get_callchain_entry(&rctx); if (!entry) return NULL; @@ -249,9 +253,6 @@ get_perf_callchain(struct pt_regs *regs, bool kernel, b= ool user, } =20 if (regs) { - if (crosstask) - goto exit_put; - if (add_mark) perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); =20 @@ -261,7 +262,6 @@ get_perf_callchain(struct pt_regs *regs, bool kernel, b= ool user, } } =20 -exit_put: put_callchain_entry(rctx); =20 return entry; --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18FC42417F0; Fri, 9 May 2025 16:51:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809501; cv=none; b=Tj4CaviKEcS+j016s6zHgScbBfNSqHMjVc5qOS3acHVVtPpeEylaEg4MvQG0iJJxtKP/opXwRLGCxxgmysNRHFM+9Ig55vogaMIsoxYX4Q9A8zlWpwLS/QYX+xlsYPpM/FJmX+ZB+4rKRjGp957TlBR51o9zPaQLHIj+9VI7rrU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809501; c=relaxed/simple; bh=HGrKuS39xLJq83enXt1f+CmV81NnDU+NNzgmZrHZ28Q=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=b1yLZrLXND8D1fJguabzokXmVrXq9ka/p2ND23lHCPqk49P3hZuW7L+4lGm4jhWuRqKOccMJAVhAnZ+0wNloWDbx7ykGZUUIwbjngnwJWuf4L3/CvejaStWpalXWFtEWE7tAP8p27weikDTVzBeWaO8FmRRFC0upFj3rPK4gvjk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id A5B3DC4CEF1; Fri, 9 May 2025 16:51:40 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx2-00000002gLE-1ttz; Fri, 09 May 2025 12:51:56 -0400 Message-ID: <20250509165156.309890035@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:40 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 16/18] perf: Use current->flags & PF_KTHREAD instead of current->mm == NULL References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt To determine if a task is a kernel thread or not, it is more reliable to use (current->flags & PF_KTHREAD) than to rely on current->mm being NULL. That is because some kernel tasks (io_uring helpers) may have a mm field. Link: https://lore.kernel.org/linux-trace-kernel/20250424163607.GE18306@noi= sy.programming.kicks-ass.net/ Signed-off-by: Steven Rostedt (Google) --- kernel/events/callchain.c | 6 +++--- kernel/events/core.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index abf258913ab6..cda145dc11bd 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -246,10 +246,10 @@ get_perf_callchain(struct pt_regs *regs, bool kernel,= bool user, =20 if (user) { if (!user_mode(regs)) { - if (current->mm) - regs =3D task_pt_regs(current); - else + if (current->flags & PF_KTHREAD) regs =3D NULL; + else + regs =3D task_pt_regs(current); } =20 if (regs) { diff --git a/kernel/events/core.c b/kernel/events/core.c index 0bac8cae08a8..7c7d5a27c568 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7989,7 +7989,7 @@ static u64 perf_virt_to_phys(u64 virt) * Try IRQ-safe get_user_page_fast_only first. * If failed, leave phys_addr as 0. */ - if (current->mm !=3D NULL) { + if (!(current->flags & PF_KTHREAD)) { struct page *p; =20 pagefault_disable(); --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 120352417D9; Fri, 9 May 2025 16:51:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809501; cv=none; b=jn3WIB3PeDGnPCJo6ZlsWM/QCDgj+ICd8wMW5JchxLwjpBd4fJJEkDqd9499pC1MclKfu/6xspmyV7jNq7uHbWwcsDTM3epuuxTZ8tneRUtcvhTg+Xd/0jdrXpTklCoYrBRRg/i/+KgVZ7oVVCHUseTyHWshQdGWjuTMd1Sv2TE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809501; c=relaxed/simple; bh=mggx5smXanCvg7n8gnR/FploZ9NPITBU1NWvDKMpMos=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=OzsR3b1Sk+cqxYUPdn3P1WrrC7OrEqNmF79prQ9AZHGAgWSEsTDaH4DsyX7TbpPv5wSppg1NKpzNMLNX+LBVpx92/VQp1rK6QZNYHXKtKwYVYNMWMCspkaUqxQFgUYH6yY17JgVrC0kJ63unMcj3MAThA//52tOykXTs0KITD4A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id DFC22C4CEFC; Fri, 9 May 2025 16:51:40 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx2-00000002gLj-2c7P; Fri, 09 May 2025 12:51:56 -0400 Message-ID: <20250509165156.474492361@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:41 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 17/18] perf: Simplify get_perf_callchain() user logic References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf Simplify the get_perf_callchain() user logic a bit. task_pt_regs() should never be NULL. Acked-by: Namhyung Kim Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- kernel/events/callchain.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index cda145dc11bd..2798c0c9f782 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -247,21 +247,19 @@ get_perf_callchain(struct pt_regs *regs, bool kernel,= bool user, if (user) { if (!user_mode(regs)) { if (current->flags & PF_KTHREAD) - regs =3D NULL; - else - regs =3D task_pt_regs(current); + goto exit_put; + regs =3D task_pt_regs(current); } =20 - if (regs) { - if (add_mark) - perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); + if (add_mark) + perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); =20 - start_entry_idx =3D entry->nr; - perf_callchain_user(&ctx, regs); - fixup_uretprobe_trampoline_entries(entry, start_entry_idx); - } + start_entry_idx =3D entry->nr; + perf_callchain_user(&ctx, regs); + fixup_uretprobe_trampoline_entries(entry, start_entry_idx); } =20 +exit_put: put_callchain_entry(rctx); =20 return entry; --=20 2.47.2 From nobody Tue Oct 7 18:40:23 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 31C8D242D64; Fri, 9 May 2025 16:51:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809501; cv=none; b=PiruhN4iBZW2HL9ePULIuBTreZ5dhVo70Rf0VHhdSjankklfg6dXSfOkScerVSCPFK3eeRlBR42wTbt+Kmlf8dcwkBEjS+zdwJxFALTYuz/epvdT3uWXS8rl3x+JMg98oZxf93otWrg1PwcsJGkKNoB4vW/F8B2K7T85mXgAs8Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746809501; c=relaxed/simple; bh=kJ1lRlKcjudXfNQvt0hTUNRiwuzXdk/8XeV3RVZQ1LI=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Z3NLcKGucNsQ297J8VkZ+K49F74c4nPtDOkddRZ4rEjj8qSgi8IxxIpzhEYr+lB1F5T2jNrv1WIFqqckJMZdjIW8R4d8/zsHbOsoxzrF1hOkPXf+ShCnzbRbp0SBSU1ivFtxrLCiytNnC3vxJHt9hrx2wEixyHCHMs2cd+qqzSQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 11472C4CEF5; Fri, 9 May 2025 16:51:41 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1uDQx2-00000002gMD-3K0A; Fri, 09 May 2025 12:51:56 -0400 Message-ID: <20250509165156.644117342@goodmis.org> User-Agent: quilt/0.68 Date: Fri, 09 May 2025 12:45:42 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org Cc: Masami Hiramatsu , Mathieu Desnoyers , Josh Poimboeuf , Peter Zijlstra , Ingo Molnar , Jiri Olsa , Namhyung Kim Subject: [PATCH v8 18/18] perf: Skip user unwind if the task is a kernel thread References: <20250509164524.448387100@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Josh Poimboeuf If the task is not a user thread, there's no user stack to unwind. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- kernel/events/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 7c7d5a27c568..02e52df7a02e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8101,7 +8101,8 @@ struct perf_callchain_entry * perf_callchain(struct perf_event *event, struct pt_regs *regs) { bool kernel =3D !event->attr.exclude_callchain_kernel; - bool user =3D !event->attr.exclude_callchain_user; + bool user =3D !event->attr.exclude_callchain_user && + !(current->flags & PF_KTHREAD); /* Disallow cross-task user callchains. */ bool crosstask =3D event->ctx->task && event->ctx->task !=3D current; const u32 max_stack =3D event->attr.sample_max_stack; --=20 2.47.2