From nobody Fri Nov 29 21:57:38 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65B116F2EB; Fri, 13 Sep 2024 23:02:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268580; cv=none; b=Qzod5jw/I1Y/2irfvkEiDhvK0pWScPK+08S0UoTthADNtWS/lvEJRncpVY+/+T80F0YCOmjyahR1vcBawb0pWOSwy89n4M9PjBJkYI8tlhF7mErXwJ3pZbl52S/Zox0mgrlybLkRkB/v85lMiyPlAUyEKMPWnhyFoBTTR0P81j8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268580; c=relaxed/simple; bh=UrdTg7XX5CHxZ4JxxT04nUldvsddNuXUWipjWZSoCrI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cBBxsCJM3AnB1930CALR8j0bs2IOcdOpG+x9i5CHT9vYm7ga4ECtFes2I0KeAwRAF7Ya3eXopgyqnFBHZEmZ4bIadaVGDcMMulzqBu1ytJilyNwO4fcIqzdGhXp6tNxx2kUIcQacmzsY8lSRbVuRlpEFFuvQ0efqKYy/2cemKO8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=SJBI2HJ8; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SJBI2HJ8" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 54574C4CECC; Fri, 13 Sep 2024 23:02:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726268579; bh=UrdTg7XX5CHxZ4JxxT04nUldvsddNuXUWipjWZSoCrI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SJBI2HJ80rWuyaYOHIYKXsOsOmQJ2QglVEMYHtnjzlCj2/IsdBpWI7iGDE3CUOVQi KB9ZxKKRDqhT+WXesm+5HQ86l8jw9/9fXBAbC9mNROAlQtQkXtmUUdQV72ZhndDGXP qdkf8R205yavpl1Y4DkaFlPJ4dpxL7uwHIEqVeDJJtd2XcFcQoLcmVz+Pyc7U2H0sc WhX93ApSPgnJa2JMV2YuItd5XFtu0GumGOBYy7B7D5jLuNR1enjyiiFqQZo6uBR3lR xnJf5juTQUrazpyfJ4RsFaCzHS7xhG8ghTYTGpE5RkUUvJBT1J2z4oi6THVPaQX/oa I748qOrUAKQZw== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James Subject: [PATCH v2 01/11] unwind: Introduce generic user space unwinding interface Date: Sat, 14 Sep 2024 01:02:03 +0200 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce a user space unwinder interface which will provide a generic way for architectures to unwind different user space stack frame types. Signed-off-by: Josh Poimboeuf --- arch/Kconfig | 3 ++ include/linux/user_unwind.h | 31 ++++++++++++++ kernel/Makefile | 1 + kernel/unwind/Makefile | 1 + kernel/unwind/user.c | 81 +++++++++++++++++++++++++++++++++++++ 5 files changed, 117 insertions(+) create mode 100644 include/linux/user_unwind.h create mode 100644 kernel/unwind/Makefile create mode 100644 kernel/unwind/user.c diff --git a/arch/Kconfig b/arch/Kconfig index 975dd22a2dbd..b1002b2da331 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -425,6 +425,9 @@ config HAVE_HARDLOCKUP_DETECTOR_ARCH It uses the same command line parameters, and sysctl interface, as the generic hardlockup detectors. =20 +config HAVE_USER_UNWIND + bool + config HAVE_PERF_REGS bool help diff --git a/include/linux/user_unwind.h b/include/linux/user_unwind.h new file mode 100644 index 000000000000..0a19ac6c92b2 --- /dev/null +++ b/include/linux/user_unwind.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_USER_UNWIND_H +#define _LINUX_USER_UNWIND_H + +#include + +enum user_unwind_type { + USER_UNWIND_TYPE_AUTO, + USER_UNWIND_TYPE_FP, +}; + +struct user_unwind_frame { + s32 cfa_off; + s32 ra_off; + s32 fp_off; + bool use_fp; +}; + +struct user_unwind_state { + unsigned long ip, sp, fp; + enum user_unwind_type type; + bool done; +}; + +extern int user_unwind_start(struct user_unwind_state *state, enum user_un= wind_type); +extern int user_unwind_next(struct user_unwind_state *state); + +#define for_each_user_frame(state, type) \ + for (user_unwind_start(state, type); !state.done; user_unwind_next(state)) + +#endif /* _LINUX_USER_UNWIND_H */ diff --git a/kernel/Makefile b/kernel/Makefile index 3c13240dfc9f..259581df82dd 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -50,6 +50,7 @@ obj-y +=3D rcu/ obj-y +=3D livepatch/ obj-y +=3D dma/ obj-y +=3D entry/ +obj-y +=3D unwind/ obj-$(CONFIG_MODULES) +=3D module/ =20 obj-$(CONFIG_KCMP) +=3D kcmp.o diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile new file mode 100644 index 000000000000..eb466d6a3295 --- /dev/null +++ b/kernel/unwind/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_HAVE_USER_UNWIND) +=3D user.o diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c new file mode 100644 index 000000000000..5d16f9604a61 --- /dev/null +++ b/kernel/unwind/user.c @@ -0,0 +1,81 @@ +// SPDX-License-Identifier: GPL-2.0 +/* +* Generic interface for unwinding user space +* +* Copyright (C) 2024 Josh Poimboeuf +*/ +#include +#include +#include +#include +#include +#include + +static struct user_unwind_frame fp_frame =3D { + ARCH_INIT_USER_FP_FRAME +}; + +int user_unwind_next(struct user_unwind_state *state) +{ + struct user_unwind_frame _frame; + struct user_unwind_frame *frame =3D &_frame; + unsigned long cfa, fp, ra; + int ret =3D -EINVAL; + + if (state->done) + return -EINVAL; + + switch (state->type) { + case USER_UNWIND_TYPE_FP: + frame =3D &fp_frame; + break; + default: + BUG(); + } + + cfa =3D (frame->use_fp ? state->fp : state->sp) + frame->cfa_off; + + if (frame->ra_off && get_user(ra, (unsigned long *)(cfa + frame->ra_off))) + goto the_end; + + if (frame->fp_off && get_user(fp, (unsigned long *)(cfa + frame->fp_off))) + goto the_end; + + state->sp =3D cfa; + state->ip =3D ra; + if (frame->fp_off) + state->fp =3D fp; + + return 0; + +the_end: + state->done =3D true; + return ret; +} + +int user_unwind_start(struct user_unwind_state *state, + enum user_unwind_type type) +{ + struct pt_regs *regs =3D task_pt_regs(current); + + memset(state, 0, sizeof(*state)); + + if (!current->mm) { + state->done =3D true; + return -EINVAL; + } + + switch (type) { + case USER_UNWIND_TYPE_AUTO: + case USER_UNWIND_TYPE_FP: + break; + default: + return -EINVAL; + } + + state->sp =3D user_stack_pointer(regs); + state->ip =3D instruction_pointer(regs); + state->fp =3D frame_pointer(regs); + + return user_unwind_next(state); +} --=20 2.46.0 From nobody Fri Nov 29 21:57:38 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8773618785F; Fri, 13 Sep 2024 23:03:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268583; cv=none; b=mf6rOsLnoq7/hom0hUepCI5BUCwo1toMxzQpvd9yWQfzH+iTzCoBk4P/CNsakMyAbPGZ81mwhBMbATC11+L5Iu/mhb+dWT1rxp377XRp5fHJWdHgl8NSKf69CO0hq1K2WxFNvV5qz74Bfq0Mw9n3eKSty26q9pBQMh/N3tIlxjw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268583; c=relaxed/simple; bh=Ab1OGV8cOmFkRUQsKiONmAb7LX06hR3QPA6EWPVv8pU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dfG6XO5T+zg1lcvc7mSJA2IWzPKjpgdOEG6GMVDpMXmDakXqMv5XcZJsrUrJmMPtyo3wm2iRroH4Wus7CSCjl+5PE5VK+V/nqBu44xx3jWz6uG+h6LUYCl4l0Wujp+Jj0NeKGKBlPPjXFjJjLTkSzIiyS6ck42iUc4k7dLtN34I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=exNg8iID; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="exNg8iID" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7209DC4CEC0; Fri, 13 Sep 2024 23:02:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726268583; bh=Ab1OGV8cOmFkRUQsKiONmAb7LX06hR3QPA6EWPVv8pU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=exNg8iIDO52bZLTNsYVzcfuj2qrWaBXxX/uwLXJ7QlY2PgBGLtRilC7LnQO+k0Zn8 FD7JdoUyaHzdYrFJcrMBuvbOMSDTDqPpuSevIr5MNDxPSpsNexKwrr5SpX42uXRxOV Ce2P8xEkrCyg0Nh721o++DBCtCBKH5SWzc3uWcgFTvg6SAOULhWitIvSLG6JontQt5 MKiI7DSQ9SeFzpHz/UiKvMhSicxnHN1jMRVnJAeAZJK15YIeC/1d35naR29ZM8ktiO YsW4XwNY3DLxiPp3xzYTlYCBcX9zlJ4d8kTRIyKZKHSQxZcYZXoi2UrGVM55ckga7S yd0eAy654HLNw== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James Subject: [PATCH v2 02/11] unwind/x86: Add HAVE_USER_UNWIND Date: Sat, 14 Sep 2024 01:02:04 +0200 Message-ID: <82ef19a767cb75e76a985ecc0d47a39400b4fdf5.1726268190.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use ARCH_INIT_USER_FP_FRAME to describe how frame pointers are unwound on x86, and enable HAVE_USER_UNWIND accordinlgy so the user unwind interfaces can be used. Signed-off-by: Josh Poimboeuf --- arch/x86/Kconfig | 1 + arch/x86/include/asm/user_unwind.h | 11 +++++++++++ 2 files changed, 12 insertions(+) create mode 100644 arch/x86/include/asm/user_unwind.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 007bab9f2a0e..266edff59058 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -286,6 +286,7 @@ config X86 select HAVE_UACCESS_VALIDATION if HAVE_OBJTOOL select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER + select HAVE_USER_UNWIND select HAVE_GENERIC_VDSO select VDSO_GETRANDOM if X86_64 select HOTPLUG_PARALLEL if SMP && X86_64 diff --git a/arch/x86/include/asm/user_unwind.h b/arch/x86/include/asm/user= _unwind.h new file mode 100644 index 000000000000..8c509c65cfb5 --- /dev/null +++ b/arch/x86/include/asm/user_unwind.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_USER_UNWIND_H +#define _ASM_X86_USER_UNWIND_H + +#define ARCH_INIT_USER_FP_FRAME \ + .ra_off =3D (s32)sizeof(long) * -1, \ + .cfa_off =3D (s32)sizeof(long) * 2, \ + .fp_off =3D (s32)sizeof(long) * -2, \ + .use_fp =3D true, + +#endif /* _ASM_X86_USER_UNWIND_H */ --=20 2.46.0 From nobody Fri Nov 29 21:57:38 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C502D14EC73; Fri, 13 Sep 2024 23:03:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268587; cv=none; b=CNAmF0LJEBfWa4+CC6ev6CY6+qNgPLUxBrVN2HN3NsMFTerKfiGC3LyDV/ZLFcq07jfUfO/ogJHZAl+utNDcMSK3xX8ARhEnaPrmuRiGz/FW6qnrjQkFrCQxMcPyabXZfprpVxDVh+47jASasamAXX//YhvLpfJmD2HKqPINQ8w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268587; c=relaxed/simple; bh=CJOqADHRSEkfW2AkoPLjYrEmazkSzhIezeAkCe7JAPY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dltopxfJAXNDfWclfQC/BokMHVNQDsB0gis1ZsCOyFp8nnH+GwnY21X9NHVTo73+0417+wzVMt7Be0Rf9PhqkYyhU2YBQUeeK9s/0lrEe6Mmmnh12iGVQMGqM5lVyys72TAY/CVS5+YKErCwWMO9K1f2jxA1Pj+4SRi156dz0uQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=R6aS34ld; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="R6aS34ld" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8CC8DC4CEC5; Fri, 13 Sep 2024 23:03:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726268587; bh=CJOqADHRSEkfW2AkoPLjYrEmazkSzhIezeAkCe7JAPY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=R6aS34ldfHxbNbQSLUGxp/JDrkwzkups61hz+Wil5EhAYz9lceZ4zKK+dR6JoI/Lh RIjxjZfDR1llaz6fH2JfZY+q1gEduAw2dERMktxC2yiglcMlaPn27EKQbPz9dzjqTG Zw65ngGP7wMBAM9N0WM4jepDTvUg22XYwZzpX2kXR83eOfUwvRpZdFFHQzm89S7SAa 8PnG3yyz2lTOPz+ZYos/Ndy5pJpM1m0lCLSrIAZYcaLevUFyMzabfzOrJTnnvX49GO 9zSo5bMD7IyTu/zJVYO2vWqoxBz4THHEgyjx40Pz+NG0SOgYpWc7473ORliNrAOKr/ KJ9BgfjGucbCQ== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James Subject: [PATCH v2 03/11] unwind: Introduce SFrame user space unwinding Date: Sat, 14 Sep 2024 01:02:05 +0200 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some distros have started compiling frame pointers into all their packages to enable the kernel to do system-wide profiling of user space. Unfortunately that creates a runtime performance penalty across the entire system. Using DWARF instead isn't feasible due to the complexity it would add to the kernel. For in-kernel unwinding we solved this problem with the creation of the ORC unwinder for x86_64. Similarly, for user space the GNU assembler has created the SFrame format starting with binutils 2.41 for SFrame v2. SFrame is a simpler version of .eh_frame which gets placed in the .sframe section. Add support for unwinding user space using SFrame. More information about SFrame can be found here: - https://lwn.net/Articles/932209/ - https://lwn.net/Articles/940686/ - https://sourceware.org/binutils/docs/sframe-spec.html Signed-off-by: Josh Poimboeuf --- arch/Kconfig | 3 + fs/binfmt_elf.c | 47 +++- include/linux/mm_types.h | 3 + include/linux/sframe.h | 46 ++++ include/linux/user_unwind.h | 1 + include/uapi/linux/elf.h | 1 + include/uapi/linux/prctl.h | 3 + kernel/fork.c | 10 + kernel/sys.c | 11 + kernel/unwind/Makefile | 1 + kernel/unwind/sframe.c | 420 ++++++++++++++++++++++++++++++++++++ kernel/unwind/sframe.h | 215 ++++++++++++++++++ kernel/unwind/user.c | 14 ++ mm/init-mm.c | 4 +- 14 files changed, 774 insertions(+), 5 deletions(-) create mode 100644 include/linux/sframe.h create mode 100644 kernel/unwind/sframe.c create mode 100644 kernel/unwind/sframe.h diff --git a/arch/Kconfig b/arch/Kconfig index b1002b2da331..ff5d5bc5f947 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -428,6 +428,9 @@ config HAVE_HARDLOCKUP_DETECTOR_ARCH config HAVE_USER_UNWIND bool =20 +config HAVE_USER_UNWIND_SFRAME + bool + config HAVE_PERF_REGS bool help diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 19fa49cd9907..923aed390f2e 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -47,6 +47,7 @@ #include #include #include +#include #include #include =20 @@ -633,11 +634,13 @@ static unsigned long load_elf_interp(struct elfhdr *i= nterp_elf_ex, unsigned long no_base, struct elf_phdr *interp_elf_phdata, struct arch_elf_state *arch_state) { - struct elf_phdr *eppnt; + struct elf_phdr *eppnt, *sframe_phdr =3D NULL; unsigned long load_addr =3D 0; int load_addr_set =3D 0; unsigned long error =3D ~0UL; unsigned long total_size; + unsigned long start_code =3D ~0UL; + unsigned long end_code =3D 0; int i; =20 /* First of all, some simple consistency checks */ @@ -659,7 +662,8 @@ static unsigned long load_elf_interp(struct elfhdr *int= erp_elf_ex, =20 eppnt =3D interp_elf_phdata; for (i =3D 0; i < interp_elf_ex->e_phnum; i++, eppnt++) { - if (eppnt->p_type =3D=3D PT_LOAD) { + switch (eppnt->p_type) { + case PT_LOAD: { int elf_type =3D MAP_PRIVATE; int elf_prot =3D make_prot(eppnt->p_flags, arch_state, true, true); @@ -688,7 +692,7 @@ static unsigned long load_elf_interp(struct elfhdr *int= erp_elf_ex, /* * Check to see if the section's size will overflow the * allowed task size. Note that p_filesz must always be - * <=3D p_memsize so it's only necessary to check p_memsz. + * <=3D p_memsz so it's only necessary to check p_memsz. */ k =3D load_addr + eppnt->p_vaddr; if (BAD_ADDR(k) || @@ -698,7 +702,28 @@ static unsigned long load_elf_interp(struct elfhdr *in= terp_elf_ex, error =3D -ENOMEM; goto out; } + + if ((eppnt->p_flags & PF_X) && k < start_code) + start_code =3D k; + + if ((eppnt->p_flags & PF_X) && k + eppnt->p_filesz > end_code) + end_code =3D k + eppnt->p_filesz; + break; } + case PT_GNU_SFRAME: + sframe_phdr =3D eppnt; + break; + } + } + + if (sframe_phdr) { + struct sframe_file sfile =3D { + .sframe_addr =3D load_addr + sframe_phdr->p_vaddr, + .text_start =3D start_code, + .text_end =3D end_code, + }; + + __sframe_add_section(&sfile); } =20 error =3D load_addr; @@ -823,7 +848,7 @@ static int load_elf_binary(struct linux_binprm *bprm) int first_pt_load =3D 1; unsigned long error; struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata =3D NULL; - struct elf_phdr *elf_property_phdata =3D NULL; + struct elf_phdr *elf_property_phdata =3D NULL, *sframe_phdr =3D NULL; unsigned long elf_brk; int retval, i; unsigned long elf_entry; @@ -931,6 +956,10 @@ static int load_elf_binary(struct linux_binprm *bprm) executable_stack =3D EXSTACK_DISABLE_X; break; =20 + case PT_GNU_SFRAME: + sframe_phdr =3D elf_ppnt; + break; + case PT_LOPROC ... PT_HIPROC: retval =3D arch_elf_pt_proc(elf_ex, elf_ppnt, bprm->file, false, @@ -1316,6 +1345,16 @@ static int load_elf_binary(struct linux_binprm *bprm) MAP_FIXED | MAP_PRIVATE, 0); } =20 + if (sframe_phdr) { + struct sframe_file sfile =3D { + .sframe_addr =3D load_bias + sframe_phdr->p_vaddr, + .text_start =3D start_code, + .text_end =3D end_code, + }; + + __sframe_add_section(&sfile); + } + regs =3D current_pt_regs(); #ifdef ELF_PLAT_INIT /* diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 485424979254..1aee78cbea33 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1019,6 +1019,9 @@ struct mm_struct { #endif } lru_gen; #endif /* CONFIG_LRU_GEN_WALKS_MMU */ +#ifdef CONFIG_HAVE_USER_UNWIND_SFRAME + struct maple_tree sframe_mt; +#endif } __randomize_layout; =20 /* diff --git a/include/linux/sframe.h b/include/linux/sframe.h new file mode 100644 index 000000000000..3a44f76929e2 --- /dev/null +++ b/include/linux/sframe.h @@ -0,0 +1,46 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_SFRAME_H +#define _LINUX_SFRAME_H + +#include + +struct sframe_file { + unsigned long sframe_addr, text_start, text_end; +}; + +struct user_unwind_frame; + +#ifdef CONFIG_HAVE_USER_UNWIND_SFRAME + +#define INIT_MM_SFRAME .sframe_mt =3D MTREE_INIT(sframe_mt, 0) + +extern void sframe_free_mm(struct mm_struct *mm); + +extern int __sframe_add_section(struct sframe_file *file); +extern int sframe_add_section(unsigned long sframe_addr, unsigned long tex= t_start, unsigned long text_end); +extern int sframe_remove_section(unsigned long sframe_addr); +extern int sframe_find(unsigned long ip, struct user_unwind_frame *frame); + +static inline bool current_has_sframe(void) +{ + struct mm_struct *mm =3D current->mm; + + return mm && !mtree_empty(&mm->sframe_mt); +} + +#else /* !CONFIG_HAVE_USER_UNWIND_SFRAME */ + +#define INIT_MM_SFRAME + +static inline void sframe_free_mm(struct mm_struct *mm) {} + +static inline int __sframe_add_section(struct sframe_file *file) { return = -EINVAL; } +static inline int sframe_add_section(unsigned long sframe_addr, unsigned l= ong text_start, unsigned long text_end) { return -EINVAL; } +static inline int sframe_remove_section(unsigned long sframe_addr) { retur= n -EINVAL; } +static inline int sframe_find(unsigned long ip, struct user_unwind_frame *= frame) { return -EINVAL; } + +static inline bool current_has_sframe(void) { return false; } + +#endif /* CONFIG_HAVE_USER_UNWIND_SFRAME */ + +#endif /* _LINUX_SFRAME_H */ diff --git a/include/linux/user_unwind.h b/include/linux/user_unwind.h index 0a19ac6c92b2..8003f9d35405 100644 --- a/include/linux/user_unwind.h +++ b/include/linux/user_unwind.h @@ -7,6 +7,7 @@ enum user_unwind_type { USER_UNWIND_TYPE_AUTO, USER_UNWIND_TYPE_FP, + USER_UNWIND_TYPE_SFRAME, }; =20 struct user_unwind_frame { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index b54b313bcf07..b2aca31e1a49 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -39,6 +39,7 @@ typedef __s64 Elf64_Sxword; #define PT_GNU_STACK (PT_LOOS + 0x474e551) #define PT_GNU_RELRO (PT_LOOS + 0x474e552) #define PT_GNU_PROPERTY (PT_LOOS + 0x474e553) +#define PT_GNU_SFRAME (PT_LOOS + 0x474e554) =20 =20 /* ARM MTE memory tag segment type */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 35791791a879..69511077c910 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -328,4 +328,7 @@ struct prctl_mm_map { # define PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC 0x10 /* Clear the aspect on exec */ # define PR_PPC_DEXCR_CTRL_MASK 0x1f =20 +#define PR_ADD_SFRAME 74 +#define PR_REMOVE_SFRAME 75 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/fork.c b/kernel/fork.c index cc760491f201..a216f091edfb 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -104,6 +104,7 @@ #include #include #include +#include =20 #include #include @@ -923,6 +924,7 @@ void __mmdrop(struct mm_struct *mm) mm_pasid_drop(mm); mm_destroy_cid(mm); percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS); + sframe_free_mm(mm); =20 free_mm(mm); } @@ -1249,6 +1251,13 @@ static void mm_init_uprobes_state(struct mm_struct *= mm) #endif } =20 +static void mm_init_sframe(struct mm_struct *mm) +{ +#ifdef CONFIG_HAVE_USER_UNWIND_SFRAME + mt_init(&mm->sframe_mt); +#endif +} + static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct = *p, struct user_namespace *user_ns) { @@ -1280,6 +1289,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm= , struct task_struct *p, mm->pmd_huge_pte =3D NULL; #endif mm_init_uprobes_state(mm); + mm_init_sframe(mm); hugetlb_count_init(mm); =20 if (current->mm) { diff --git a/kernel/sys.c b/kernel/sys.c index 3a2df1bd9f64..e4d2b64f4ae4 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -64,6 +64,7 @@ #include #include #include +#include =20 #include =20 @@ -2782,6 +2783,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, a= rg2, unsigned long, arg3, case PR_RISCV_SET_ICACHE_FLUSH_CTX: error =3D RISCV_SET_ICACHE_FLUSH_CTX(arg2, arg3); break; + case PR_ADD_SFRAME: + if (arg5) + return -EINVAL; + error =3D sframe_add_section(arg2, arg3, arg4); + break; + case PR_REMOVE_SFRAME: + if (arg3 || arg4 || arg5) + return -EINVAL; + error =3D sframe_remove_section(arg2); + break; default: error =3D -EINVAL; break; diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile index eb466d6a3295..6f202c5840cf 100644 --- a/kernel/unwind/Makefile +++ b/kernel/unwind/Makefile @@ -1 +1,2 @@ obj-$(CONFIG_HAVE_USER_UNWIND) +=3D user.o +obj-$(CONFIG_HAVE_USER_UNWIND_SFRAME) +=3D sframe.o diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c new file mode 100644 index 000000000000..3e4d29e737a1 --- /dev/null +++ b/kernel/unwind/sframe.c @@ -0,0 +1,420 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include +#include + +#include "sframe.h" + +#define SFRAME_FILENAME_LEN 32 + +struct sframe_section { + struct rcu_head rcu; + + unsigned long sframe_addr; + unsigned long text_addr; + + unsigned long fdes_addr; + unsigned long fres_addr; + unsigned int fdes_nr; + signed char ra_off, fp_off; +}; + +DEFINE_STATIC_SRCU(sframe_srcu); + +#define __SFRAME_GET_USER(out, user_ptr, type) \ +({ \ + type __tmp; \ + if (get_user(__tmp, (type *)user_ptr)) \ + return -EFAULT; \ + user_ptr +=3D sizeof(__tmp); \ + out =3D __tmp; \ +}) + +#define SFRAME_GET_USER_SIGNED(out, user_ptr, size) \ +({ \ + switch (size) { \ + case 1: \ + __SFRAME_GET_USER(out, user_ptr, s8); \ + break; \ + case 2: \ + __SFRAME_GET_USER(out, user_ptr, s16); \ + break; \ + case 4: \ + __SFRAME_GET_USER(out, user_ptr, s32); \ + break; \ + default: \ + return -EINVAL; \ + } \ +}) + +#define SFRAME_GET_USER_UNSIGNED(out, user_ptr, size) \ +({ \ + switch (size) { \ + case 1: \ + __SFRAME_GET_USER(out, user_ptr, u8); \ + break; \ + case 2: \ + __SFRAME_GET_USER(out, user_ptr, u16); \ + break; \ + case 4: \ + __SFRAME_GET_USER(out, user_ptr, u32); \ + break; \ + default: \ + return -EINVAL; \ + } \ +}) + +static unsigned char fre_type_to_size(unsigned char fre_type) +{ + if (fre_type > 2) + return 0; + return 1 << fre_type; +} + +static unsigned char offset_size_enum_to_size(unsigned char off_size) +{ + if (off_size > 2) + return 0; + return 1 << off_size; +} + +static int find_fde(struct sframe_section *sec, unsigned long ip, + struct sframe_fde *fde) +{ + s32 func_off, ip_off; + struct sframe_fde __user *first, *last, *mid, *found; + + ip_off =3D ip - sec->sframe_addr; + + first =3D (void *)sec->fdes_addr; + last =3D first + sec->fdes_nr; + while (first <=3D last) { + mid =3D first + ((last - first) / 2); + if (get_user(func_off, (s32 *)mid)) + return -EFAULT; + if (ip_off >=3D func_off) { + found =3D mid; + first =3D mid + 1; + } else + last =3D mid - 1; + } + + if (!found) + return -EINVAL; + + if (copy_from_user(fde, found, sizeof(*fde))) + return -EFAULT; + + return 0; +} + +static int find_fre(struct sframe_section *sec, struct sframe_fde *fde, + unsigned long ip, struct user_unwind_frame *frame) +{ + unsigned char fde_type =3D SFRAME_FUNC_FDE_TYPE(fde->info); + unsigned char fre_type =3D SFRAME_FUNC_FRE_TYPE(fde->info); + s32 fre_ip_off, cfa_off, ra_off, fp_off, ip_off; + unsigned char offset_count, offset_size; + unsigned char addr_size; + void __user *f, *last_f; + u8 fre_info; + int i; + + addr_size =3D fre_type_to_size(fre_type); + if (!addr_size) + return -EINVAL; + + ip_off =3D ip - sec->sframe_addr - fde->start_addr; + + f =3D (void *)sec->fres_addr + fde->fres_off; + + for (i =3D 0; i < fde->fres_num; i++) { + + SFRAME_GET_USER_UNSIGNED(fre_ip_off, f, addr_size); + + if (fde_type =3D=3D SFRAME_FDE_TYPE_PCINC) { + if (fre_ip_off > ip_off) + break; + } else { + /* SFRAME_FDE_TYPE_PCMASK */ + if (ip_off % fde->rep_size < fre_ip_off) + break; + } + + SFRAME_GET_USER_UNSIGNED(fre_info, f, 1); + + offset_count =3D SFRAME_FRE_OFFSET_COUNT(fre_info); + offset_size =3D offset_size_enum_to_size(SFRAME_FRE_OFFSET_SIZE(fre_inf= o)); + + if (!offset_count || !offset_size) + return -EINVAL; + + last_f =3D f; + f +=3D offset_count * offset_size; + } + + if (!last_f) + return -EINVAL; + + f =3D last_f; + + SFRAME_GET_USER_UNSIGNED(cfa_off, f, offset_size); + offset_count--; + + ra_off =3D sec->ra_off; + if (!ra_off) { + if (!offset_count--) + return -EINVAL; + SFRAME_GET_USER_SIGNED(ra_off, f, offset_size); + } + + fp_off =3D sec->fp_off; + if (!fp_off && offset_count) { + offset_count--; + SFRAME_GET_USER_SIGNED(fp_off, f, offset_size); + } + + if (offset_count) + return -EINVAL; + + frame->cfa_off =3D cfa_off; + frame->ra_off =3D ra_off; + frame->fp_off =3D fp_off; + frame->use_fp =3D SFRAME_FRE_CFA_BASE_REG_ID(fre_info) =3D=3D SFRAME_BASE= _REG_FP; + + return 0; +} + +int sframe_find(unsigned long ip, struct user_unwind_frame *frame) +{ + struct mm_struct *mm =3D current->mm; + struct sframe_section *sec; + struct sframe_fde fde; + int srcu_idx; + int ret =3D -EINVAL; + + srcu_idx =3D srcu_read_lock(&sframe_srcu); + + sec =3D mtree_load(&mm->sframe_mt, ip); + if (!sec) { + srcu_read_unlock(&sframe_srcu, srcu_idx); + return -EINVAL; + } + + + ret =3D find_fde(sec, ip, &fde); + if (ret) + goto err_unlock; + + ret =3D find_fre(sec, &fde, ip, frame); + if (ret) + goto err_unlock; + + srcu_read_unlock(&sframe_srcu, srcu_idx); + return 0; + +err_unlock: + srcu_read_unlock(&sframe_srcu, srcu_idx); + return ret; +} + +static int get_sframe_file(unsigned long sframe_addr, struct sframe_file *= file) +{ + struct mm_struct *mm =3D current->mm; + struct vm_area_struct *sframe_vma, *text_vma, *vma; + VMA_ITERATOR(vmi, mm, 0); + + mmap_read_lock(mm); + + sframe_vma =3D vma_lookup(mm, sframe_addr); + if (!sframe_vma || !sframe_vma->vm_file) + goto err_unlock; + + text_vma =3D NULL; + + for_each_vma(vmi, vma) { + if (vma->vm_file !=3D sframe_vma->vm_file) + continue; + if (vma->vm_flags & VM_EXEC) { + if (text_vma) { + /* + * Multiple EXEC segments in a single file + * aren't currently supported, is that a thing? + */ + mmap_read_unlock(mm); + pr_warn_once("unsupported multiple EXEC segments in task %s[%d]\n", + current->comm, current->pid); + return -EINVAL; + } + text_vma =3D vma; + } + } + + file->sframe_addr =3D sframe_addr; + file->text_start =3D text_vma->vm_start; + file->text_end =3D text_vma->vm_end; + + mmap_read_unlock(mm); + return 0; + +err_unlock: + mmap_read_unlock(mm); + return -EINVAL; +} + +static int validate_sframe_addrs(struct sframe_file *file) +{ + struct mm_struct *mm =3D current->mm; + struct vm_area_struct *text_vma; + + mmap_read_lock(mm); + + if (!vma_lookup(mm, file->sframe_addr)) + goto err_unlock; + + text_vma =3D vma_lookup(mm, file->text_start); + if (!(text_vma->vm_flags & VM_EXEC)) + goto err_unlock; + + if (vma_lookup(mm, file->text_end-1) !=3D text_vma) + goto err_unlock; + + mmap_read_unlock(mm); + return 0; + +err_unlock: + mmap_read_unlock(mm); + return -EINVAL; +} + +int __sframe_add_section(struct sframe_file *file) +{ + struct maple_tree *sframe_mt =3D ¤t->mm->sframe_mt; + struct sframe_section *sec; + struct sframe_header shdr; + unsigned long header_end; + int ret; + + if (copy_from_user(&shdr, (void *)file->sframe_addr, sizeof(shdr))) + return -EFAULT; + + if (shdr.preamble.magic !=3D SFRAME_MAGIC || + shdr.preamble.version !=3D SFRAME_VERSION_2 || + !(shdr.preamble.flags & SFRAME_F_FDE_SORTED) || + shdr.auxhdr_len || !shdr.num_fdes || !shdr.num_fres || + shdr.fdes_off > shdr.fres_off) { + /* + * Either binutils < 2.41, corrupt sframe header, or + * unsupported feature. + * */ + pr_warn_once("bad sframe header in task %s[%d]\n", + current->comm, current->pid); + return -EINVAL; + } + + header_end =3D file->sframe_addr + SFRAME_HDR_SIZE(shdr); + + sec =3D kmalloc(sizeof(*sec), GFP_KERNEL); + if (!sec) + return -ENOMEM; + + sec->sframe_addr =3D file->sframe_addr; + sec->text_addr =3D file->text_start; + sec->fdes_addr =3D header_end + shdr.fdes_off; + sec->fres_addr =3D header_end + shdr.fres_off; + sec->fdes_nr =3D shdr.num_fdes; + sec->ra_off =3D shdr.cfa_fixed_ra_offset; + sec->fp_off =3D shdr.cfa_fixed_fp_offset; + + ret =3D mtree_insert_range(sframe_mt, file->text_start, file->text_end, + sec, GFP_KERNEL); + if (ret) { + kfree(sec); + return ret; + } + + return 0; +} + +int sframe_add_section(unsigned long sframe_addr, unsigned long text_start= , unsigned long text_end) +{ + struct sframe_file file; + int ret; + + if (!text_start || !text_end) { + ret =3D get_sframe_file(sframe_addr, &file); + if (ret) + return ret; + } else { + /* + * This is mainly for generated code, for which the text isn't + * file-backed so the user has to give the text bounds. + */ + file.sframe_addr =3D sframe_addr; + file.text_start =3D text_start; + file.text_end =3D text_end; + ret =3D validate_sframe_addrs(&file); + if (ret) + return ret; + } + + return __sframe_add_section(&file); +} + +static void sframe_free_rcu(struct rcu_head *rcu) +{ + struct sframe_section *sec =3D container_of(rcu, struct sframe_section, r= cu); + + kfree(sec); +} + +static int __sframe_remove_section(struct mm_struct *mm, + struct sframe_section *sec) +{ + struct sframe_section *s; + + s =3D mtree_erase(&mm->sframe_mt, sec->text_addr); + if (!s || WARN_ON_ONCE(s !=3D sec)) + return -EINVAL; + + call_srcu(&sframe_srcu, &sec->rcu, sframe_free_rcu); + + return 0; +} + +int sframe_remove_section(unsigned long sframe_addr) +{ + struct mm_struct *mm =3D current->mm; + struct sframe_section *sec; + unsigned long index =3D 0; + + sec =3D mtree_load(&mm->sframe_mt, sframe_addr); + if (!sec) + return -EINVAL; + + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) { + if (sec->sframe_addr =3D=3D sframe_addr) + return __sframe_remove_section(mm, sec); + } + + return -EINVAL; +} + +void sframe_free_mm(struct mm_struct *mm) +{ + struct sframe_section *sec; + unsigned long index =3D 0; + + if (!mm) + return; + + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) + kfree(sec); + + mtree_destroy(&mm->sframe_mt); +} diff --git a/kernel/unwind/sframe.h b/kernel/unwind/sframe.h new file mode 100644 index 000000000000..aa468d6f1f4a --- /dev/null +++ b/kernel/unwind/sframe.h @@ -0,0 +1,215 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2023, Oracle and/or its affiliates. + * + * This file contains definitions for the SFrame stack tracing format, whi= ch is + * documented at https://sourceware.org/binutils/docs + */ +#ifndef _SFRAME_H +#define _SFRAME_H + +#include + +#define SFRAME_VERSION_1 1 +#define SFRAME_VERSION_2 2 +#define SFRAME_MAGIC 0xdee2 + +/* Function Descriptor Entries are sorted on PC. */ +#define SFRAME_F_FDE_SORTED 0x1 +/* Frame-pointer based stack tracing. Defined, but not set. */ +#define SFRAME_F_FRAME_POINTER 0x2 + +#define SFRAME_CFA_FIXED_FP_INVALID 0 +#define SFRAME_CFA_FIXED_RA_INVALID 0 + +/* Supported ABIs/Arch. */ +#define SFRAME_ABI_AARCH64_ENDIAN_BIG 1 /* AARCH64 big endian. */ +#define SFRAME_ABI_AARCH64_ENDIAN_LITTLE 2 /* AARCH64 little endian. */ +#define SFRAME_ABI_AMD64_ENDIAN_LITTLE 3 /* AMD64 little endian. */ + +/* SFrame FRE types. */ +#define SFRAME_FRE_TYPE_ADDR1 0 +#define SFRAME_FRE_TYPE_ADDR2 1 +#define SFRAME_FRE_TYPE_ADDR4 2 + +/* + * SFrame Function Descriptor Entry types. + * + * The SFrame format has two possible representations for functions. The + * choice of which type to use is made according to the instruction patter= ns + * in the relevant program stub. + */ + +/* Unwinders perform a (PC >=3D FRE_START_ADDR) to look up a matching FRE.= */ +#define SFRAME_FDE_TYPE_PCINC 0 +/* + * Unwinders perform a (PC & FRE_START_ADDR_AS_MASK >=3D FRE_START_ADDR_AS= _MASK) + * to look up a matching FRE. Typical usecases are pltN entries, trampolin= es + * etc. + */ +#define SFRAME_FDE_TYPE_PCMASK 1 + +/** + * struct sframe_preamble - SFrame Preamble. + * @magic: Magic number (SFRAME_MAGIC). + * @version: Format version number (SFRAME_VERSION). + * @flags: Various flags. + */ +struct sframe_preamble { + u16 magic; + u8 version; + u8 flags; +} __packed; + +/** + * struct sframe_header - SFrame Header. + * @preamble: SFrame preamble. + * @abi_arch: Identify the arch (including endianness) and ABI. + * @cfa_fixed_fp_offset: Offset for the Frame Pointer (FP) from CFA may be + * fixed for some ABIs ((e.g, in AMD64 when -fno-omit-frame-pointer is + * used). When fixed, this field specifies the fixed stack frame offset + * and the individual FREs do not need to track it. When not fixed, it + * is set to SFRAME_CFA_FIXED_FP_INVALID, and the individual FREs may + * provide the applicable stack frame offset, if any. + * @cfa_fixed_ra_offset: Offset for the Return Address from CFA is fixed f= or + * some ABIs. When fixed, this field specifies the fixed stack frame + * offset and the individual FREs do not need to track it. When not + * fixed, it is set to SFRAME_CFA_FIXED_FP_INVALID. + * @auxhdr_len: Number of bytes making up the auxiliary header, if any. + * Some ABI/arch, in the future, may use this space for extending the + * information in SFrame header. Auxiliary header is contained in bytes + * sequentially following the sframe_header. + * @num_fdes: Number of SFrame FDEs in this SFrame section. + * @num_fres: Number of SFrame Frame Row Entries. + * @fre_len: Number of bytes in the SFrame Frame Row Entry section. + * @fdes_off: Offset of SFrame Function Descriptor Entry section. + * @fres_off: Offset of SFrame Frame Row Entry section. + */ +struct sframe_header { + struct sframe_preamble preamble; + u8 abi_arch; + s8 cfa_fixed_fp_offset; + s8 cfa_fixed_ra_offset; + u8 auxhdr_len; + u32 num_fdes; + u32 num_fres; + u32 fre_len; + u32 fdes_off; + u32 fres_off; +} __packed; + +#define SFRAME_HDR_SIZE(sframe_hdr) \ + ((sizeof(struct sframe_header) + (sframe_hdr).auxhdr_len)) + +/* Two possible keys for executable (instruction) pointers signing. */ +#define SFRAME_AARCH64_PAUTH_KEY_A 0 /* Key A. */ +#define SFRAME_AARCH64_PAUTH_KEY_B 1 /* Key B. */ + +/** + * struct sframe_fde - SFrame Function Descriptor Entry. + * @start_addr: Function start address. Encoded as a signed offset, + * relative to the current FDE. + * @size: Size of the function in bytes. + * @fres_off: Offset of the first SFrame Frame Row Entry of the function, + * relative to the beginning of the SFrame Frame Row Entry sub-section. + * @fres_num: Number of frame row entries for the function. + * @info: Additional information for deciphering the stack trace + * information for the function. Contains information about SFrame FRE + * type, SFrame FDE type, PAC authorization A/B key, etc. + * @rep_size: Block size for SFRAME_FDE_TYPE_PCMASK + * @padding: Unused + */ +struct sframe_fde { + s32 start_addr; + u32 size; + u32 fres_off; + u32 fres_num; + u8 info; + u8 rep_size; + u16 padding; +} __packed; + +/* + * 'func_info' in SFrame FDE contains additional information for decipheri= ng + * the stack trace information for the function. In V1, the information is + * organized as follows: + * - 4-bits: Identify the FRE type used for the function. + * - 1-bit: Identify the FDE type of the function - mask or inc. + * - 1-bit: PAC authorization A/B key (aarch64). + * - 2-bits: Unused. + * --------------------------------------------------------------------- + * | Unused | PAC auth A/B key (aarch64) | FDE type | FRE type | + * | | Unused (amd64) | | | + * --------------------------------------------------------------------- + * 8 6 5 4 0 + */ + +/* Note: Set PAC auth key to SFRAME_AARCH64_PAUTH_KEY_A by default. */ +#define SFRAME_FUNC_INFO(fde_type, fre_enc_type) \ + (((SFRAME_AARCH64_PAUTH_KEY_A & 0x1) << 5) | \ + (((fde_type) & 0x1) << 4) | ((fre_enc_type) & 0xf)) + +#define SFRAME_FUNC_FRE_TYPE(data) ((data) & 0xf) +#define SFRAME_FUNC_FDE_TYPE(data) (((data) >> 4) & 0x1) +#define SFRAME_FUNC_PAUTH_KEY(data) (((data) >> 5) & 0x1) + +/* + * Size of stack frame offsets in an SFrame Frame Row Entry. A single + * SFrame FRE has all offsets of the same size. Offset size may vary + * across frame row entries. + */ +#define SFRAME_FRE_OFFSET_1B 0 +#define SFRAME_FRE_OFFSET_2B 1 +#define SFRAME_FRE_OFFSET_4B 2 + +/* An SFrame Frame Row Entry can be SP or FP based. */ +#define SFRAME_BASE_REG_FP 0 +#define SFRAME_BASE_REG_SP 1 + +/* + * The index at which a specific offset is presented in the variable length + * bytes of an FRE. + */ +#define SFRAME_FRE_CFA_OFFSET_IDX 0 +/* + * The RA stack offset, if present, will always be at index 1 in the varia= ble + * length bytes of the FRE. + */ +#define SFRAME_FRE_RA_OFFSET_IDX 1 +/* + * The FP stack offset may appear at offset 1 or 2, depending on the ABI a= s RA + * may or may not be tracked. + */ +#define SFRAME_FRE_FP_OFFSET_IDX 2 + +/* + * 'fre_info' in SFrame FRE contains information about: + * - 1 bit: base reg for CFA + * - 4 bits: Number of offsets (N). A value of up to 3 is allowed to tra= ck + * all three of CFA, FP and RA (fixed implicit order). + * - 2 bits: information about size of the offsets (S) in bytes. + * Valid values are SFRAME_FRE_OFFSET_1B, SFRAME_FRE_OFFSET_2B, + * SFRAME_FRE_OFFSET_4B + * - 1 bit: Mangled RA state bit (aarch64 only). + * --------------------------------------------------------------- + * | Mangled-RA (aarch64) | Size of | Number of | base_reg | + * | Unused (amd64) | offsets | offsets | | + * --------------------------------------------------------------- + * 8 7 5 1 0 + */ + +/* Note: Set mangled_ra_p to zero by default. */ +#define SFRAME_FRE_INFO(base_reg_id, offset_num, offset_size) \ + (((0 & 0x1) << 7) | (((offset_size) & 0x3) << 5) | \ + (((offset_num) & 0xf) << 1) | ((base_reg_id) & 0x1)) + +/* Set the mangled_ra_p bit as indicated. */ +#define SFRAME_FRE_INFO_UPDATE_MANGLED_RA_P(mangled_ra_p, fre_info) \ + ((((mangled_ra_p) & 0x1) << 7) | ((fre_info) & 0x7f)) + +#define SFRAME_FRE_CFA_BASE_REG_ID(data) ((data) & 0x1) +#define SFRAME_FRE_OFFSET_COUNT(data) (((data) >> 1) & 0xf) +#define SFRAME_FRE_OFFSET_SIZE(data) (((data) >> 5) & 0x3) +#define SFRAME_FRE_MANGLED_RA_P(data) (((data) >> 7) & 0x1) + +#endif /* _SFRAME_H */ diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c index 5d16f9604a61..3a7b14cf522b 100644 --- a/kernel/unwind/user.c +++ b/kernel/unwind/user.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include =20 @@ -29,6 +30,11 @@ int user_unwind_next(struct user_unwind_state *state) case USER_UNWIND_TYPE_FP: frame =3D &fp_frame; break; + case USER_UNWIND_TYPE_SFRAME: + ret =3D sframe_find(state->ip, frame); + if (ret) + goto the_end; + break; default: BUG(); } @@ -57,6 +63,7 @@ int user_unwind_start(struct user_unwind_state *state, enum user_unwind_type type) { struct pt_regs *regs =3D task_pt_regs(current); + bool sframe_possible =3D current_has_sframe(); =20 memset(state, 0, sizeof(*state)); =20 @@ -67,6 +74,13 @@ int user_unwind_start(struct user_unwind_state *state, =20 switch (type) { case USER_UNWIND_TYPE_AUTO: + state->type =3D sframe_possible ? USER_UNWIND_TYPE_SFRAME : + USER_UNWIND_TYPE_FP; + break; + case USER_UNWIND_TYPE_SFRAME: + if (!sframe_possible) + return -EINVAL; + break; case USER_UNWIND_TYPE_FP: break; default: diff --git a/mm/init-mm.c b/mm/init-mm.c index 24c809379274..c4c6af046778 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -11,6 +11,7 @@ #include #include #include +#include #include =20 #ifndef INIT_MM_CONTEXT @@ -44,7 +45,8 @@ struct mm_struct init_mm =3D { #endif .user_ns =3D &init_user_ns, .cpu_bitmap =3D CPU_BITS_NONE, - INIT_MM_CONTEXT(init_mm) + INIT_MM_CONTEXT(init_mm), + INIT_MM_SFRAME, }; =20 void setup_initial_init_mm(void *start_code, void *end_code, --=20 2.46.0 From nobody Fri Nov 29 21:57:38 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D79F8189BB7; Fri, 13 Sep 2024 23:03:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268591; cv=none; b=Odfg9iCo2HXD/onAkHbA5CxJRQo2Rbg4rAzMjaiwjYWB9UwNXmuwObrA0wuZq3E2bLI+F7laPus424q0Xnj8CPeC2v0Nb+9Ms5OKW4oJQcad5ZESs6ly/zuwiQTlCwqnnOCLdFmUBEDnhLq/SUfSKK81NJV0nTvLUGCGpAgPqRw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268591; c=relaxed/simple; bh=BWrk0q88o8e+ReU0k54FjB59AgH0K/bGbd52KtpS2Hw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pOBOQx+m+sKdPGFP14BccAqmah1qX8Q9yRB/vQVdDsHIuL0IWP629TtGKzHy4qW18x3urABmvGenEKJ9GzFuBCJrpWGQ5wHLmLTqzPhEtBxZOP4XP6XI8xOUqMmoqGC0udvW7v23u8jpNQELcl6zaN659Yhb/YWuTG7+GZaYXsk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=FcLM4pKi; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="FcLM4pKi" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D4C28C4CECC; Fri, 13 Sep 2024 23:03:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726268591; bh=BWrk0q88o8e+ReU0k54FjB59AgH0K/bGbd52KtpS2Hw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FcLM4pKiesCclYaybkr0GdMDphswuS0pTTycksFGH/vBgcUJOtVVg+T8RVAc/jhcQ dsnfF8g77iLLcAjMZeDPkFBrBKSn0yvUpaEf6dhnRwRDdlVVJxNaP0xtv76U3QQlvg sSdm2hgRYfLY1ebAlCiiy6luLLmCYIhBFmAL55Udm5kDcPAjludV352JWxp6okwmBw Frw+zlGitLLk/7RFxRFIcDMFKBCNM/QRBsA9ugJFnPFX48xRvOdOqpfWbvRwIRemHg QqkV6mAuFYL0EDHJcFJdxm4/zJz4nl257tuKW94QonTBlhosFvsiGgkaeuj7NvUZ6o EBC0YUKOUSR+A== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James Subject: [PATCH v2 04/11] unwind/x86/64: Add HAVE_USER_UNWIND_SFRAME Date: Sat, 14 Sep 2024 01:02:06 +0200 Message-ID: <37de872b0894b21408754ae2903b7701cc6dfab7.1726268190.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Binutils 2.41 supports generating sframe v2 for x86_64. It works well in testing so enable it. NOTE: An out-of-tree glibc patch is still needed to enable setting PR_ADD_SFRAME for shared libraries and dlopens. Signed-off-by: Josh Poimboeuf --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 266edff59058..0b3d7c72b65b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -287,6 +287,7 @@ config X86 select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER select HAVE_USER_UNWIND + select HAVE_USER_UNWIND_SFRAME if X86_64 select HAVE_GENERIC_VDSO select VDSO_GETRANDOM if X86_64 select HOTPLUG_PARALLEL if SMP && X86_64 --=20 2.46.0 From nobody Fri Nov 29 21:57:38 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C0F8719006A; Fri, 13 Sep 2024 23:03:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268595; cv=none; b=a3wmI/RExK7lNYNzBXn2UUWrPbsg4S8NSGN37aCXjNWoNpkV63jMOePjGrG4wWCD60dXEduvbC6LZzmuGvdlkLqX/jU5VhuoeTjDNRrtfL5HnoITZf2DABjZsyS8Z3tH39ReyDH5b+wwFA5PvrSrP1osbt4eAbEbsc2nDAu7hZs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268595; c=relaxed/simple; bh=JS+sjdrUxMebxJ308crpqT2b6bsP1SFrOCMYC7oexSI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ffcXh5xDVN5qLQYil9RDhSQKAx13aoYEJR/7Ny4Db003WiyMF5C42c3H3pXAZUXsudnS6hw2lbT6gH4D0v3It2MyHGT0JusC62LUzQ+gxHTTXDgN6Gy7icraR15A248w7Z2reyyepLoz2+vbgUlkuEjP18vHnDGXVMvYEiHz3N4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oXPqKfe9; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oXPqKfe9" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 016A3C4CECD; Fri, 13 Sep 2024 23:03:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726268595; bh=JS+sjdrUxMebxJ308crpqT2b6bsP1SFrOCMYC7oexSI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oXPqKfe9ROTxunWuKafuKnOM/0I4RB7y2Yg02dG8/Rnw/dDlH5FoUBqF2BU8RjH7t El3Z9u0Bj6VL6vXds0cx8ZBQDf1aUMNuJBnVQ3KpLIy6Ucy/VbKleBmNX56rRW2aDX lGX90UGqJ2fxNnsVrJ843BbhRe1uHn3njjBEEfWoIlhr7jRC588LPhIk53CaqbcmzF e+MpY1uYp8NsT0KenCP57q4u3gxtWQw5mW9SdJJh3nBTOxO7LB9Inq5rbRrhKCR5to MjTJp/JZTLsEdukXRKlTkbu3E6XSnkGgO6I9lyABL6bfoYH4cYkUbLZM2xkJKDjIN5 f64ajUXmWckXg== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James Subject: [PATCH v2 05/11] perf/x86: Use user_unwind interface Date: Sat, 14 Sep 2024 01:02:07 +0200 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Simplify __perf_callchain_user() and prepare to enable deferred sframe unwinding by switching to the generic user unwind interface. Signed-off-by: Josh Poimboeuf --- arch/x86/events/core.c | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index be01823b1bb4..e82aadf99d9b 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -29,6 +29,7 @@ #include #include #include +#include =20 #include #include @@ -2862,8 +2863,7 @@ perf_callchain_user32(struct pt_regs *regs, struct pe= rf_callchain_entry_ctx *ent void perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs= *regs) { - struct stack_frame frame; - const struct stack_frame __user *fp; + struct user_unwind_state state; =20 if (perf_guest_state()) { /* TODO: We don't support guest os callchain now */ @@ -2876,8 +2876,6 @@ perf_callchain_user(struct perf_callchain_entry_ctx *= entry, struct pt_regs *regs if (regs->flags & (X86_VM_MASK | PERF_EFLAGS_VM)) return; =20 - fp =3D (void __user *)regs->bp; - perf_callchain_store(entry, regs->ip); =20 if (!nmi_uaccess_okay()) @@ -2887,18 +2885,12 @@ perf_callchain_user(struct perf_callchain_entry_ctx= *entry, struct pt_regs *regs return; =20 pagefault_disable(); - while (entry->nr < entry->max_stack) { - if (!valid_user_frame(fp, sizeof(frame))) - break; =20 - if (__get_user(frame.next_frame, &fp->next_frame)) + for_each_user_frame(&state, USER_UNWIND_TYPE_FP) { + if (perf_callchain_store(entry, state.ip)) break; - if (__get_user(frame.return_address, &fp->return_address)) - break; - - perf_callchain_store(entry, frame.return_address); - fp =3D (void __user *)frame.next_frame; } + pagefault_enable(); } =20 --=20 2.46.0 From nobody Fri Nov 29 21:57:38 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 602181865F3; Fri, 13 Sep 2024 23:03:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268600; cv=none; b=Bj1VZuIU6NLxZJWvqr9zRZ/q/gd2+933QpkYot/imcJUScBs3cj+BURyIidATElua0i4mPS9yiM1Xs4av72UwFtHMLWEOUofr1f8A0v7ApyyfLvRaRChsDNe5oa8bKrCQHK0xBkDTAyoYYaKr+MdLzIRiBbxt0W0jaep62K3bg4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268600; c=relaxed/simple; bh=1BrnE9jNsOh9s3h17ejYdYppCxUeupDFyNyWCmzffhs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PFPpb7YHU62PtqPrnfQXmoScsX4Ee0KwVb00OZp5xnq9wv0l3Ls7MdBSyNzHtbDnnFUrxg5NYhtSuA63wNgB7noyoo0jqGDnATlguiGNnTCfJX+rq5B8w1NIxYnOa1DWcdWF+Q2EWM2IJbjmPAsBwAPFCJD0aV7oV159mP3tiVE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=C4LYYoVw; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="C4LYYoVw" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1CF27C4CED0; Fri, 13 Sep 2024 23:03:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726268600; bh=1BrnE9jNsOh9s3h17ejYdYppCxUeupDFyNyWCmzffhs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=C4LYYoVw2wmkdE2zixrEdLNkowutqxSSRFV9ecM2k4ipvIVMXZcyK5XL67AxMUS66 yuNXxyTwtmMHE26Pl40T945kKQurkPF5nRVCVgVI6Bcf0s1EA9TEsvVxsTo8W71PGg l1JPoxXuDZiP+LAblzyLybrUvLcfUTzA6PizMM6NRzeV0MOpbpHTTuoYoNVaxyyi8O ZoeJHOHbhwLDB1yUEmeMJS+n6MGrLhM4v0V3GFGDGl34tbLrr4Vx8Wp0W58OpgHO21 xPaZdLekXvtVMFd//EM0nbL3G6xkuxsd5yRJHOxIOOehMMeQBepe62qHDmEX7So64P oqbsUw3tAhFHw== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , Namhyung Kim Subject: [PATCH v2 06/11] perf: Remove get_perf_callchain() 'init_nr' argument Date: Sat, 14 Sep 2024 01:02:08 +0200 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The 'init_nr' argument has double duty: it's used to initialize both the number of contexts and the number of stack entries. That's confusing and the callers always pass zero anyway. Hard code the zero. Acked-by: Namhyung Kim Signed-off-by: Josh Poimboeuf --- include/linux/perf_event.h | 2 +- kernel/bpf/stackmap.c | 4 ++-- kernel/events/callchain.c | 12 ++++++------ kernel/events/core.c | 2 +- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 1a8942277dda..4365bb0684a5 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1540,7 +1540,7 @@ DECLARE_PER_CPU(struct perf_callchain_entry, perf_cal= lchain_entry); extern void perf_callchain_user(struct perf_callchain_entry_ctx *entry, st= ruct pt_regs *regs); extern void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, = struct pt_regs *regs); extern struct perf_callchain_entry * -get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool us= er, +get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, u32 max_stack, bool crosstask, bool add_mark); extern int get_callchain_buffers(int max_stack); extern void put_callchain_buffers(void); diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index c99f8e5234ac..0f922e43b524 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -297,7 +297,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, str= uct bpf_map *, map, if (max_depth > sysctl_perf_event_max_stack) max_depth =3D sysctl_perf_event_max_stack; =20 - trace =3D get_perf_callchain(regs, 0, kernel, user, max_depth, + trace =3D get_perf_callchain(regs, kernel, user, max_depth, false, false); =20 if (unlikely(!trace)) @@ -432,7 +432,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struc= t task_struct *task, else if (kernel && task) trace =3D get_callchain_entry_for_task(task, max_depth); else - trace =3D get_perf_callchain(regs, 0, kernel, user, max_depth, + trace =3D get_perf_callchain(regs, kernel, user, max_depth, crosstask, false); if (unlikely(!trace)) goto err_fault; diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index 8a47e52a454f..83834203e144 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -216,7 +216,7 @@ static void fixup_uretprobe_trampoline_entries(struct p= erf_callchain_entry *entr } =20 struct perf_callchain_entry * -get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool us= er, +get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, u32 max_stack, bool crosstask, bool add_mark) { struct perf_callchain_entry *entry; @@ -227,11 +227,11 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr,= bool kernel, bool user, if (!entry) return NULL; =20 - ctx.entry =3D entry; - ctx.max_stack =3D max_stack; - ctx.nr =3D entry->nr =3D init_nr; - ctx.contexts =3D 0; - ctx.contexts_maxed =3D false; + ctx.entry =3D entry; + ctx.max_stack =3D max_stack; + ctx.nr =3D entry->nr =3D 0; + ctx.contexts =3D 0; + ctx.contexts_maxed =3D false; =20 if (kernel && !user_mode(regs)) { if (add_mark) diff --git a/kernel/events/core.c b/kernel/events/core.c index 8a6c6bbcd658..8038cd4d981b 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7694,7 +7694,7 @@ perf_callchain(struct perf_event *event, struct pt_re= gs *regs) if (!kernel && !user) return &__empty_callchain; =20 - callchain =3D get_perf_callchain(regs, 0, kernel, user, + callchain =3D get_perf_callchain(regs, kernel, user, max_stack, crosstask, true); return callchain ?: &__empty_callchain; } --=20 2.46.0 From nobody Fri Nov 29 21:57:38 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D0DA194ACA; Fri, 13 Sep 2024 23:03:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268604; cv=none; b=M9I1GRj1quiQ4Va0IQQt/h+p7a4SisYNlI53Vo7xIdLUiu8iTub8XHhhIB71rEbB92ZYfQHSHEzLm70Ek7OB89aMiWu0DQjz5FXJNj67x0B6v81N1/7n/yysA2LSC0dtA+9wovf3VcvMMj3ledAmKuBnVLW10itAgZd8x4ebAhc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268604; c=relaxed/simple; bh=RpKqx3PGHDLN3yrCzPGsyZkgA6P2RW9jxlJQvW5Vw8U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WWjPjSTfISYEXlPhX4piT514JmvMuXhyk9uY24tQInfjgvtdDf2NqZ9Xlri+4qdBIj4is6XUsyxIoOuQ/5AlwxSHIEtt/YaDET9WlkcSWXo4ZUm2VKxDqcdi9xjRQYQbFAPVUR+6jYaAZW3SHdk3bas+jqnLwH4aWFrnrDXGOd4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rCD4JK3i; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rCD4JK3i" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6513BC4CEC0; Fri, 13 Sep 2024 23:03:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726268604; bh=RpKqx3PGHDLN3yrCzPGsyZkgA6P2RW9jxlJQvW5Vw8U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rCD4JK3i/XQj16l/BTCJAWiIOhqkWxOmsEkRlyH8vEaoPgn6pdzlUqHkz1DpK3ld0 G2e9+9ucDxCeemU+Nrcl/phzhGjYbZIPb9eNiYA5UpXtc83vWgPmP5lDzIpi6gsN72 EWJjYwvw4vqxG6waStNdXLRq0z9fSDuRlXwtCoD3oZpFme6BwaqfBcejImrF5gM4wq FD8dyCWW1tUmPMHFw0mpBOnc0dhl7CFXc81N3UUPivEm+g3tpj1PGKhSSn7X8HFBX7 oGswThlwg1fFSetqALKNjilN7QPhe8GmiHXb//VSWX9mS5g0wMeXbteF5q+7Yi5/Tb cITetUYIzA5ww== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James Subject: [PATCH v2 07/11] perf: Remove get_perf_callchain() 'crosstask' argument Date: Sat, 14 Sep 2024 01:02:09 +0200 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" get_perf_callchain() doesn't support cross-task unwinding, so it doesn't make much sense to have 'crosstask' as an argument. Instead, have perf_callchain() adjust 'user' accordingly. Acked-by: Namhyung Kim Signed-off-by: Josh Poimboeuf --- include/linux/perf_event.h | 2 +- kernel/bpf/stackmap.c | 5 ++--- kernel/events/callchain.c | 6 +----- kernel/events/core.c | 8 ++++---- 4 files changed, 8 insertions(+), 13 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 4365bb0684a5..64f9efe19553 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1541,7 +1541,7 @@ extern void perf_callchain_user(struct perf_callchain= _entry_ctx *entry, struct p extern void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, = struct pt_regs *regs); extern struct perf_callchain_entry * get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, - u32 max_stack, bool crosstask, bool add_mark); + u32 max_stack, bool add_mark); extern int get_callchain_buffers(int max_stack); extern void put_callchain_buffers(void); extern struct perf_callchain_entry *get_callchain_entry(int *rctx); diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 0f922e43b524..ff6f0ef7ba5d 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -297,8 +297,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, str= uct bpf_map *, map, if (max_depth > sysctl_perf_event_max_stack) max_depth =3D sysctl_perf_event_max_stack; =20 - trace =3D get_perf_callchain(regs, kernel, user, max_depth, - false, false); + trace =3D get_perf_callchain(regs, kernel, user, max_depth, false); =20 if (unlikely(!trace)) /* couldn't fetch the stack trace */ @@ -433,7 +432,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struc= t task_struct *task, trace =3D get_callchain_entry_for_task(task, max_depth); else trace =3D get_perf_callchain(regs, kernel, user, max_depth, - crosstask, false); + false); if (unlikely(!trace)) goto err_fault; =20 diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index 83834203e144..655fb25a725b 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -217,7 +217,7 @@ static void fixup_uretprobe_trampoline_entries(struct p= erf_callchain_entry *entr =20 struct perf_callchain_entry * get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, - u32 max_stack, bool crosstask, bool add_mark) + u32 max_stack, bool add_mark) { struct perf_callchain_entry *entry; struct perf_callchain_entry_ctx ctx; @@ -248,9 +248,6 @@ get_perf_callchain(struct pt_regs *regs, bool kernel, b= ool user, } =20 if (regs) { - if (crosstask) - goto exit_put; - if (add_mark) perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); =20 @@ -260,7 +257,6 @@ get_perf_callchain(struct pt_regs *regs, bool kernel, b= ool user, } } =20 -exit_put: put_callchain_entry(rctx); =20 return entry; diff --git a/kernel/events/core.c b/kernel/events/core.c index 8038cd4d981b..19fd7bd38ecf 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7686,16 +7686,16 @@ perf_callchain(struct perf_event *event, struct pt_= regs *regs) { bool kernel =3D !event->attr.exclude_callchain_kernel; bool user =3D !event->attr.exclude_callchain_user; - /* Disallow cross-task user callchains. */ - bool crosstask =3D event->ctx->task && event->ctx->task !=3D current; const u32 max_stack =3D event->attr.sample_max_stack; struct perf_callchain_entry *callchain; =20 + /* Disallow cross-task user callchains. */ + user &=3D !event->ctx->task || event->ctx->task =3D=3D current; + if (!kernel && !user) return &__empty_callchain; =20 - callchain =3D get_perf_callchain(regs, kernel, user, - max_stack, crosstask, true); + callchain =3D get_perf_callchain(regs, kernel, user, max_stack, true); return callchain ?: &__empty_callchain; } =20 --=20 2.46.0 From nobody Fri Nov 29 21:57:38 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68C78194ACA; Fri, 13 Sep 2024 23:03:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268608; cv=none; b=U6RiC34MPwm1qyVuuHAlz2cFlZCzPHmXBElt/OIZnqi62hkESzCxZzOGKcLNOlOO953poa8YAamcGgwsWGXC88ixvl0XTi8AKPpGCRXQkf+vEYJ1EZ8k6n8CgHoFQPMwoUN9KtfuLoX/HWolPocgymr92lgqOMKxkBRyV6YwE+A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268608; c=relaxed/simple; bh=0WuovsNM7TaO1oYzqlLrboI468hoPkzVYKsi9xko4u4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jgaTnPCxcUBDFeNwJFlSs2UnMR9873SQisBF/pIMRp+4WTmQdaZcyYFAuIFi7mzGsiq7R6PNWeH4ewZ1WrC+jhxHUmY2fUUB2e0ni+6QcTn2Zkadt+ClxMthscul1p0X85lar4b0Qrc4zISzIHjeBAFcxZzKuiOWtURNE4d+Nw4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BJJGuHnZ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BJJGuHnZ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8F356C4CEC5; Fri, 13 Sep 2024 23:03:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726268608; bh=0WuovsNM7TaO1oYzqlLrboI468hoPkzVYKsi9xko4u4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=BJJGuHnZDZGHlcrc0068NiNy4moT5d2cYpiizv067ynrq+IRchaaSh3SPSr0V8nTe Dzyxlb23a6MSocyIQkxvC+uUGrnqt2yV/LUK9hdYII1OIbtXz5EYtatAPjlxlDtS1B E0f1slcet58/b0XcmA5QtATfSHWalSCpiaudm6ZrSaNLEBH3SYBSwAJEAzv9RON7PS xNHzTFnX3groYtbQhBe7+tT+fOdGeLlYimIB1kDt5jAqpl3A+OWL6LcjZ2FZLQS62U YyMdL+BoyJbz1eCbruQ0dQXLf/lmHb0BsTWDDHrW6sD1+m0wv7ybrbKsECcIboVJyw FeO7PwuKZxwqg== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James Subject: [PATCH v2 08/11] perf: Simplify get_perf_callchain() user logic Date: Sat, 14 Sep 2024 01:02:10 +0200 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Simplify the get_perf_callchain() user logic a bit. task_pt_regs() should never be NULL. Acked-by: Namhyung Kim Signed-off-by: Josh Poimboeuf --- kernel/events/callchain.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index 655fb25a725b..2278402b7ac9 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -241,22 +241,20 @@ get_perf_callchain(struct pt_regs *regs, bool kernel,= bool user, =20 if (user) { if (!user_mode(regs)) { - if (current->mm) - regs =3D task_pt_regs(current); - else - regs =3D NULL; + if (!current->mm) + goto exit_put; + regs =3D task_pt_regs(current); } =20 - if (regs) { - if (add_mark) - perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); + if (add_mark) + perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); =20 - start_entry_idx =3D entry->nr; - perf_callchain_user(&ctx, regs); - fixup_uretprobe_trampoline_entries(entry, start_entry_idx); - } + start_entry_idx =3D entry->nr; + perf_callchain_user(&ctx, regs); + fixup_uretprobe_trampoline_entries(entry, start_entry_idx); } =20 +exit_put: put_callchain_entry(rctx); =20 return entry; --=20 2.46.0 From nobody Fri Nov 29 21:57:38 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8096188581; Fri, 13 Sep 2024 23:03:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268612; cv=none; b=lNB2iSRkJ3mFTRb0JKKsz2HhqWgSxUSRiZmqzRdrFG31bNZKtCZopbTWfzPB6Cqw5j89sfhj0Ld2u6yMzzK1WDX6COOPuoqKczYeAwgBaW4p8raQLPOv3qYqdKbk2o/uBBjR4jwqBb2Uku6+s48BCVa6D1iKyQQ/2lNI9ZyfAxs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268612; c=relaxed/simple; bh=I10Ag4ofNMWbBk+oEbGUJAFz+TaMxmRZTVkWU5+eNcs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Jw3ZrrWymyBkukSKiI9yaPG+bZLtZPv2NVa+oRntbWnAdKSxtS+DkIQMaBq2vB8K0WDDKzZ5Pi2OAFfthDXVGB2PZ/At90+ndINEu52APz1ZbpAL9V1Bo5IAu5zp2L7cPc0A1lEmu4pSn1vrvxliz7sIfnkj96RxVdOhwiTNMk0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GVo2b6hZ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GVo2b6hZ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C95F0C4CED2; Fri, 13 Sep 2024 23:03:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726268612; bh=I10Ag4ofNMWbBk+oEbGUJAFz+TaMxmRZTVkWU5+eNcs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GVo2b6hZuEnVwdRpnql4jQWs9gpgSH+tr65TtMT2ycYG+EJJstSUJGfhwLL+z4+QR sga4n9v4wLzyP0zR67wWTMXlSS0dDRvwi6cd7RE4CE8nZ7r6vBhFH0JHxp1c1DCcwt XaKcmtIgJp727S7+kKc5W7fw2ODRdKIxHfV/MTKQIZky10zUmCw+gBTmT1IKHdIb9w rvSbDwiVkAHp12Lq7ddB0WukUOTCjAIRIhjeZuYGpjgB09ZQiCj+uweVgvFkMBZIre UkoeUmNKLSPhwsaF5EKKhF/lcpWPxo0FdX2w416ujYfLHoWDPgzVIMYk0e00cP8vtV 9EDnzyhLXYjWA== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James Subject: [PATCH v2 09/11] perf: Introduce deferred user callchains Date: Sat, 14 Sep 2024 01:02:11 +0200 Message-ID: <5bc834b68fe14daaa1782b925ab54fc414245334.1726268190.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Instead of attempting to unwind user space from the NMI handler, defer it to run in task context by sending a self-IPI and then scheduling the unwind to run in the IRQ's exit task work before returning to user space. This allows the user stack page to be paged in if needed, avoids duplicate unwinds for kernel-bound workloads, and prepares for SFrame unwinding (so .sframe sections can be paged in on demand). Suggested-by: Steven Rostedt Suggested-by: Peter Zijlstra Signed-off-by: Josh Poimboeuf --- arch/Kconfig | 3 ++ include/linux/perf_event.h | 9 +++- include/uapi/linux/perf_event.h | 21 ++++++++- kernel/bpf/stackmap.c | 5 +-- kernel/events/callchain.c | 12 +++++- kernel/events/core.c | 76 ++++++++++++++++++++++++++++++++- 6 files changed, 119 insertions(+), 7 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index ff5d5bc5f947..0629c1aa2a5c 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -431,6 +431,9 @@ config HAVE_USER_UNWIND config HAVE_USER_UNWIND_SFRAME bool =20 +config HAVE_PERF_CALLCHAIN_DEFERRED + bool + config HAVE_PERF_REGS bool help diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 64f9efe19553..a617aad2851c 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -787,6 +787,7 @@ struct perf_event { struct callback_head pending_task; unsigned int pending_work; struct rcuwait pending_work_wait; + unsigned int pending_callchain; =20 atomic_t event_limit; =20 @@ -1541,12 +1542,18 @@ extern void perf_callchain_user(struct perf_callcha= in_entry_ctx *entry, struct p extern void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, = struct pt_regs *regs); extern struct perf_callchain_entry * get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, - u32 max_stack, bool add_mark); + u32 max_stack, bool add_mark, bool defer_user); extern int get_callchain_buffers(int max_stack); extern void put_callchain_buffers(void); extern struct perf_callchain_entry *get_callchain_entry(int *rctx); extern void put_callchain_entry(int rctx); =20 +#ifdef CONFIG_HAVE_PERF_CALLCHAIN_DEFERRED +extern void perf_callchain_user_deferred(struct perf_callchain_entry_ctx *= entry, struct pt_regs *regs); +#else +static inline void perf_callchain_user_deferred(struct perf_callchain_entr= y_ctx *entry, struct pt_regs *regs) {} +#endif + extern int sysctl_perf_event_max_stack; extern int sysctl_perf_event_max_contexts_per_stack; =20 diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_even= t.h index 4842c36fdf80..a7f875eb29dd 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -460,7 +460,8 @@ struct perf_event_attr { inherit_thread : 1, /* children only inherit if cloned with CLONE_THR= EAD */ remove_on_exec : 1, /* event is removed from task on exec */ sigtrap : 1, /* send synchronous SIGTRAP on event */ - __reserved_1 : 26; + defer_callchain: 1, /* generate PERF_RECORD_CALLCHAIN_DEFERRED record= s */ + __reserved_1 : 25; =20 union { __u32 wakeup_events; /* wakeup every n events */ @@ -1217,6 +1218,23 @@ enum perf_event_type { */ PERF_RECORD_AUX_OUTPUT_HW_ID =3D 21, =20 + /* + * This user callchain capture was deferred until shortly before + * returning to user space. Previous samples would have kernel + * callchains only and they need to be stitched with this to make full + * callchains. + * + * TODO: do PERF_SAMPLE_{REGS,STACK}_USER also need deferral? + * + * struct { + * struct perf_event_header header; + * u64 nr; + * u64 ips[nr]; + * struct sample_id sample_id; + * }; + */ + PERF_RECORD_CALLCHAIN_DEFERRED =3D 22, + PERF_RECORD_MAX, /* non-ABI */ }; =20 @@ -1247,6 +1265,7 @@ enum perf_callchain_context { PERF_CONTEXT_HV =3D (__u64)-32, PERF_CONTEXT_KERNEL =3D (__u64)-128, PERF_CONTEXT_USER =3D (__u64)-512, + PERF_CONTEXT_USER_DEFERRED =3D (__u64)-640, =20 PERF_CONTEXT_GUEST =3D (__u64)-2048, PERF_CONTEXT_GUEST_KERNEL =3D (__u64)-2176, diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index ff6f0ef7ba5d..9d58a84e553a 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -297,8 +297,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, str= uct bpf_map *, map, if (max_depth > sysctl_perf_event_max_stack) max_depth =3D sysctl_perf_event_max_stack; =20 - trace =3D get_perf_callchain(regs, kernel, user, max_depth, false); - + trace =3D get_perf_callchain(regs, kernel, user, max_depth, false, false); if (unlikely(!trace)) /* couldn't fetch the stack trace */ return -EFAULT; @@ -432,7 +431,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struc= t task_struct *task, trace =3D get_callchain_entry_for_task(task, max_depth); else trace =3D get_perf_callchain(regs, kernel, user, max_depth, - false); + false, false); if (unlikely(!trace)) goto err_fault; =20 diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index 2278402b7ac9..883f0ac9ef3a 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -217,7 +217,7 @@ static void fixup_uretprobe_trampoline_entries(struct p= erf_callchain_entry *entr =20 struct perf_callchain_entry * get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, - u32 max_stack, bool add_mark) + u32 max_stack, bool add_mark, bool defer_user) { struct perf_callchain_entry *entry; struct perf_callchain_entry_ctx ctx; @@ -246,6 +246,16 @@ get_perf_callchain(struct pt_regs *regs, bool kernel, = bool user, regs =3D task_pt_regs(current); } =20 + if (defer_user) { + /* + * Foretell the coming of a + * PERF_RECORD_CALLCHAIN_DEFERRED sample which can be + * stitched to this one. + */ + perf_callchain_store_context(&ctx, PERF_CONTEXT_USER_DEFERRED); + goto exit_put; + } + if (add_mark) perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); =20 diff --git a/kernel/events/core.c b/kernel/events/core.c index 19fd7bd38ecf..5fc7c5156287 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6835,6 +6835,12 @@ static void perf_pending_irq(struct irq_work *entry) struct perf_event *event =3D container_of(entry, struct perf_event, pendi= ng_irq); int rctx; =20 + if (!is_software_event(event)) { + if (event->pending_callchain) + task_work_add(current, &event->pending_task, TWA_RESUME); + return; + } + /* * If we 'fail' here, that's OK, it means recursion is already disabled * and we won't recurse 'further'. @@ -6854,11 +6860,70 @@ static void perf_pending_irq(struct irq_work *entry) perf_swevent_put_recursion_context(rctx); } =20 +struct perf_callchain_deferred_event { + struct perf_event_header header; + struct perf_callchain_entry callchain; +}; + +#define PERF_CALLCHAIN_DEFERRED_EVENT_SIZE \ + sizeof(struct perf_callchain_deferred_event) + \ + (sizeof(__u64) * 1) + /* PERF_CONTEXT_USER */ \ + (sizeof(__u64) * PERF_MAX_STACK_DEPTH) + +static void perf_event_callchain_deferred(struct perf_event *event) +{ + struct pt_regs *regs =3D task_pt_regs(current); + struct perf_callchain_entry *callchain; + struct perf_output_handle handle; + struct perf_sample_data data; + unsigned char buf[PERF_CALLCHAIN_DEFERRED_EVENT_SIZE]; + struct perf_callchain_entry_ctx ctx; + struct perf_callchain_deferred_event *deferred_event; + + deferred_event =3D (void *)&buf; + + callchain =3D &deferred_event->callchain; + callchain->nr =3D 0; + + ctx.entry =3D callchain; + ctx.max_stack =3D MIN(event->attr.sample_max_stack, + PERF_MAX_STACK_DEPTH); + ctx.nr =3D 0; + ctx.contexts =3D 0; + ctx.contexts_maxed =3D false; + + perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); + perf_callchain_user_deferred(&ctx, regs); + + deferred_event->header.type =3D PERF_RECORD_CALLCHAIN_DEFERRED; + deferred_event->header.misc =3D 0; + deferred_event->header.size =3D sizeof(*deferred_event) + + (callchain->nr * sizeof(u64)); + + perf_event_header__init_id(&deferred_event->header, &data, event); + + if (perf_output_begin(&handle, &data, event, + deferred_event->header.size)) + return; + + perf_output_copy(&handle, deferred_event, deferred_event->header.size); + perf_event__output_id_sample(event, &handle, &data); + perf_output_end(&handle); +} + static void perf_pending_task(struct callback_head *head) { struct perf_event *event =3D container_of(head, struct perf_event, pendin= g_task); int rctx; =20 + if (!is_software_event(event)) { + if (event->pending_callchain) { + perf_event_callchain_deferred(event); + event->pending_callchain =3D 0; + } + return; + } + /* * All accesses to the event must belong to the same implicit RCU read-si= de * critical section as the ->pending_work reset. See comment in @@ -7688,6 +7753,8 @@ perf_callchain(struct perf_event *event, struct pt_re= gs *regs) bool user =3D !event->attr.exclude_callchain_user; const u32 max_stack =3D event->attr.sample_max_stack; struct perf_callchain_entry *callchain; + bool defer_user =3D IS_ENABLED(CONFIG_HAVE_PERF_CALLCHAIN_DEFERRED) && + event->attr.defer_callchain; =20 /* Disallow cross-task user callchains. */ user &=3D !event->ctx->task || event->ctx->task =3D=3D current; @@ -7695,7 +7762,14 @@ perf_callchain(struct perf_event *event, struct pt_r= egs *regs) if (!kernel && !user) return &__empty_callchain; =20 - callchain =3D get_perf_callchain(regs, kernel, user, max_stack, true); + callchain =3D get_perf_callchain(regs, kernel, user, max_stack, true, + defer_user); + + if (user && defer_user && !event->pending_callchain) { + event->pending_callchain =3D 1; + irq_work_queue(&event->pending_irq); + } + return callchain ?: &__empty_callchain; } =20 --=20 2.46.0 From nobody Fri Nov 29 21:57:38 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B17C3188581; Fri, 13 Sep 2024 23:03:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268617; cv=none; b=kI49Ermj1n+ZTIMM0A/le3RG8+JG48nlX8ZE0lieffEa4KWqR6t99e01l2LRcxzmJ+wjDcgt3un7tEt14mbmC7wDnnnCFzBKbEETDNZsuOTbx3eP0NFG/GarcUqm5H5hUUL2fo2Q9jW8RmrtDWqzBRy2CPqZMt9ipGqH1Gl1uLk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268617; c=relaxed/simple; bh=tsAykqBgjsMjkNGrLVTIbkzu74rfQg/Ws7aF1PZ48YY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Mz19PaLIlV+BY4/272JQuuLqFnLv3Y3+YvXEYdWvXVoEx8S+gh3aCD7Jgem2eBrbQ2vWJikFY/56Xg9h7ZeLal5MYZbrKtXxRMmrv3GAD0iWwmzTFc/pnP8OFyUk2DOwLu1KpjhE+YDQ2zkqnqNvvUxxVCK/kd0xzbKk/mtHkfw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=asmk2gIM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="asmk2gIM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 40026C4CEC0; Fri, 13 Sep 2024 23:03:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726268617; bh=tsAykqBgjsMjkNGrLVTIbkzu74rfQg/Ws7aF1PZ48YY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=asmk2gIM/anizpcSOaB2wk0Je+81T9mfAQTyFGJ23OYJ0lLGCSfmguO/28DyUZgsZ m2xnUDH7jnK1ItpQ435ZLBx213lYgynyCOSlauj/nA3aISG8bBb+RP0AZP642QEf16 teyUC1VBO4+JBpzS2sD/BK5YoAJttLuburWHMIvU8OI0KgydPHuZ0UhYlosFNg3QLP JvnoDjP9L4Ir3cfUR+WgBtOnKLv5MlqAt+JYwE7XtiyMWsMmFa0V+I6dSnq2yDFMvO WXD5pc9VGYG8C0MBXhtwZb8L1PSNf4zp6QqJ94yRaOWs+JmTPu6klGFuj+I5O558zd HgxCBuhf5S/SA== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James Subject: [PATCH v2 10/11] perf/x86: Add HAVE_PERF_CALLCHAIN_DEFERRED Date: Sat, 14 Sep 2024 01:02:12 +0200 Message-ID: <2b3be1da18a8a1895762c2f394aa353f227d6543.1726268190.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Enable deferred user space unwinding on x86. Frame pointers are still the default for now. Signed-off-by: Josh Poimboeuf --- arch/x86/Kconfig | 1 + arch/x86/events/core.c | 52 +++++++++++++++++++++++++++--------------- 2 files changed, 34 insertions(+), 19 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 0b3d7c72b65b..24d9373cc5e6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -265,6 +265,7 @@ config X86 select HAVE_PERF_EVENTS_NMI select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_N= MI select HAVE_PCI + select HAVE_PERF_CALLCHAIN_DEFERRED select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select MMU_GATHER_RCU_TABLE_FREE if PARAVIRT diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index e82aadf99d9b..d6ea265d9aa8 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2821,8 +2821,8 @@ static unsigned long get_segment_base(unsigned int se= gment) =20 #include =20 -static inline int -perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ct= x *entry) +static int __perf_callchain_user32(struct pt_regs *regs, + struct perf_callchain_entry_ctx *entry) { /* 32-bit process in 64-bit kernel. */ unsigned long ss_base, cs_base; @@ -2836,7 +2836,6 @@ perf_callchain_user32(struct pt_regs *regs, struct pe= rf_callchain_entry_ctx *ent ss_base =3D get_segment_base(regs->ss); =20 fp =3D compat_ptr(ss_base + regs->bp); - pagefault_disable(); while (entry->nr < entry->max_stack) { if (!valid_user_frame(fp, sizeof(frame))) break; @@ -2849,19 +2848,18 @@ perf_callchain_user32(struct pt_regs *regs, struct = perf_callchain_entry_ctx *ent perf_callchain_store(entry, cs_base + frame.return_address); fp =3D compat_ptr(ss_base + frame.next_frame); } - pagefault_enable(); return 1; } -#else -static inline int -perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ct= x *entry) +#else /* !CONFIG_IA32_EMULATION */ +static int __perf_callchain_user32(struct pt_regs *regs, + struct perf_callchain_entry_ctx *entry) { - return 0; + return 0; } -#endif +#endif /* CONFIG_IA32_EMULATION */ =20 -void -perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs= *regs) +static void __perf_callchain_user(struct perf_callchain_entry_ctx *entry, + struct pt_regs *regs, bool atomic) { struct user_unwind_state state; =20 @@ -2878,20 +2876,36 @@ perf_callchain_user(struct perf_callchain_entry_ctx= *entry, struct pt_regs *regs =20 perf_callchain_store(entry, regs->ip); =20 - if (!nmi_uaccess_okay()) - return; + if (atomic) { + if (!nmi_uaccess_okay()) + return; + pagefault_disable(); + } =20 - if (perf_callchain_user32(regs, entry)) - return; - - pagefault_disable(); + if (__perf_callchain_user32(regs, entry)) + goto done; =20 for_each_user_frame(&state, USER_UNWIND_TYPE_FP) { if (perf_callchain_store(entry, state.ip)) - break; + goto done; } =20 - pagefault_enable(); +done: + if (atomic) + pagefault_enable(); +} + + +void perf_callchain_user(struct perf_callchain_entry_ctx *entry, + struct pt_regs *regs) +{ + return __perf_callchain_user(entry, regs, true); +} + +void perf_callchain_user_deferred(struct perf_callchain_entry_ctx *entry, + struct pt_regs *regs) +{ + return __perf_callchain_user(entry, regs, false); } =20 /* --=20 2.46.0 From nobody Fri Nov 29 21:57:38 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C268C18951B; Fri, 13 Sep 2024 23:03:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268621; cv=none; b=EtJ+xXL3hRL+T8eqdahbMnmmajMPrlrW75ZrepIdZse0+pqICXkNBD7zQSaH41mCP6GRIO2aUQ/rB+wQhUcs/Rl0oQyQWc62zNIFHl8shltGez0I79w7o/b14eSrtf7F7Rw3LqyqXk2c8QMxTszLX41u17Z+gKVDqU60mLsAVfc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726268621; c=relaxed/simple; bh=EtRtM2cTGDbwo7RsHNituM7SMCvqa4pTLi2Qzvb7C8E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mJSYz20ZnfFsEQgrVAV9Zf8x903/m0OVGeFatpl4q10D1fbHn6WjIiOTtP9wZ5E5R0I3rLaID9pN2Z68VvgGTwYsdvkyFLH5hVTaiB+kg3GUWcRTrEXLxXnXqIIaQyEkzQSptzLXVcdc6yz+fn+E0DDdqBrx522jphGp1XYRU70= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NVjcV7eQ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NVjcV7eQ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id ABC6EC4CEC5; Fri, 13 Sep 2024 23:03:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726268621; bh=EtRtM2cTGDbwo7RsHNituM7SMCvqa4pTLi2Qzvb7C8E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NVjcV7eQBvm19Nkj2vxcVt4tfTmi9pawMtllHfr4vJ+wH1Q3ftuLhMDYeWkfrPVyM rNQzfYRyCaVkVECwNCV+8B1UMDpK6hSVwAd0DUaZmVBWvEVv0VprxTv0+33RKm4ZRB nXFROWn7vxi4XdPd1Poz9/ZMQHPGlJWtg9LuV4ehQY2eWtwtyh9uF4YHMXp+FwgBIH MdjYRD+o9SBj7vec2skizh4xXfn1u3333/E9xbDiEGhiSPSIWrRF87mSlsbBftUfJX 34/2K3mxzgc1gsIv9TK/kFwB6Lek0OxyyZLV/sqQ2/rDqqu6SgurARL09qCUHXjYNc I9kKme90SybBQ== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James Subject: [PATCH v2 11/11] perf/x86: Enable SFrame unwinding for deferred user callchains Date: Sat, 14 Sep 2024 01:02:13 +0200 Message-ID: <79e489ab275f6df5bf200747a0e9b878469301d4.1726268190.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use SFrame for deferred user callchains, if available. Non-deferred user callchains still need to use frame pointers, as SFrame is likely to fault when it pages in the .sframe section. Signed-off-by: Josh Poimboeuf --- arch/x86/events/core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index d6ea265d9aa8..d618c50865d3 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2861,6 +2861,7 @@ static int __perf_callchain_user32(struct pt_regs *re= gs, static void __perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs, bool atomic) { + bool unwind_type =3D USER_UNWIND_TYPE_AUTO; struct user_unwind_state state; =20 if (perf_guest_state()) { @@ -2879,13 +2880,14 @@ static void __perf_callchain_user(struct perf_callc= hain_entry_ctx *entry, if (atomic) { if (!nmi_uaccess_okay()) return; + unwind_type =3D USER_UNWIND_TYPE_FP; pagefault_disable(); } =20 if (__perf_callchain_user32(regs, entry)) goto done; =20 - for_each_user_frame(&state, USER_UNWIND_TYPE_FP) { + for_each_user_frame(&state, unwind_type) { if (perf_callchain_store(entry, state.ip)) goto done; } --=20 2.46.0