From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 81D402038C5; Mon, 28 Oct 2024 21:48:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152111; cv=none; b=BCrYmU+jUUn/InK9oEb50O6a0rtDsVPBSCKNjBw26G9UzG+rUi4zGlT4wKnqDWalFxb7W+lXOKuzFm7FdjWxL0j8urdeH/w4PYyyENutFPVM4VDJ3PdyzDX4C5KyYD+62xAIYKSj7V4aAvpbVSY0gRFtr28ysldFs0IyPJK7VbY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152111; c=relaxed/simple; bh=0p8Qb45vgp0/wPL3MuhSna7OVGmhZaEJXaATPR0uVvQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=R79i4yURAxOlJx4n4z8MB4d44/+XfywjjZAqiD64+LZmZ4oe552lPfAcRNGKVy7dhVH3y2Z9hq6eL5H5ESbQH/e+sIqH2JvYYs1fP3URQQOI91Z5zIV3uwarUDMYYIvkXslMBBMJj/iyj8rmrYmBZt4w03Rps6I83G/DniGUrK4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=igoXBPa8; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="igoXBPa8" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 83A20C4CEC3; Mon, 28 Oct 2024 21:48:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152111; bh=0p8Qb45vgp0/wPL3MuhSna7OVGmhZaEJXaATPR0uVvQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=igoXBPa8dHZf/M1f1+fJeVKDfXRNRWT4rfmK+s1ZC/l6ChiJRF0/C0yXo4htPcioi QCLk3okovib+olaN/wLqP9COwdc9bCsQu6TPuBt3QEA30AzlPj25NT9P9hSso6bJah bhggtK1fBaTyJJf0DyV9lwAer0cCN8hfMKx800ls7qTILMMpS6lalyk2XtV0tdKX0r 6da59FSWt7suLljhaqe8wLOJ+pCbtsArjSyDCJwUhRBqWqzeXm/hXqY6GgtR1o4sAQ pQPaBWiBWWfka1m9ZvGISds6LrDfkvAD32ULAEqmiiQJ2rFwz3DcJ2t5/R7+iFotCX uGDqbTbXiNG4g== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 01/19] x86/vdso: Fix DWARF generation for getrandom() Date: Mon, 28 Oct 2024 14:47:48 -0700 Message-ID: <8e0fdc4c1b30c6040502c6265f455c44d05ea041.1730150953.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Message-ID: <20241028214748.J7iinBHJcR_9_cOMI7PDTtSO9ET_Lj3XJnIMw7LKWj0@z> Content-Type: text/plain; charset="utf-8" Enable DWARF generation for the VDSO implementation of getrandom(). Fixes: 33385150ac45 ("x86: vdso: Wire up getrandom() vDSO implementation") Signed-off-by: Josh Poimboeuf --- arch/x86/entry/vdso/vgetrandom-chacha.S | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/vdso/vgetrandom-chacha.S b/arch/x86/entry/vdso/= vgetrandom-chacha.S index bcba5639b8ee..cc82da9216fb 100644 --- a/arch/x86/entry/vdso/vgetrandom-chacha.S +++ b/arch/x86/entry/vdso/vgetrandom-chacha.S @@ -4,7 +4,7 @@ */ =20 #include -#include +#include =20 .section .rodata, "a" .align 16 @@ -22,7 +22,7 @@ CONSTANTS: .octa 0x6b20657479622d323320646e61707865 * rcx: number of 64-byte blocks to write to output */ SYM_FUNC_START(__arch_chacha20_blocks_nostack) - + CFI_STARTPROC .set output, %rdi .set key, %rsi .set counter, %rdx @@ -175,4 +175,5 @@ SYM_FUNC_START(__arch_chacha20_blocks_nostack) pxor temp,temp =20 ret + CFI_ENDPROC SYM_FUNC_END(__arch_chacha20_blocks_nostack) --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C533A20400A; Mon, 28 Oct 2024 21:48:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152112; cv=none; b=FmfZanRKmbD2k/vValxaPG7f39sICNdzKIT2WkMfd185/z6t1GtgXtF+2ZlwM7lWn6a5zd0ePaRFQ43V6s5jZOzq717TZif5Q14cqlqVI8kAKYNBWuBRhrL5bTajv0OvozInUDPpcr0Yft6NNYibzMNRrTK9+OE3zCTx5fazZ/s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152112; c=relaxed/simple; bh=Poz0p10l/+kW4oqw8/2L7OSc9QLpdN4nyvik6LbO9aw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Lu3BuD7/jkzSj2lhqHJiRjUbTqzhWbVI4LLvKWg+pxqBDRoatzSVIjVxtONWTjnhj+/utKQ5xVyNdTnBUcey53gqOu5JJarFV9cHtgP6IfjyMijQYvqViZlSPX6W+lF7gcF521N08fApxST4fzDRzYD6UxCaGKFt/sg50H4Q6Mo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VSwAvZNI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VSwAvZNI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 84340C4CEEC; Mon, 28 Oct 2024 21:48:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152112; bh=Poz0p10l/+kW4oqw8/2L7OSc9QLpdN4nyvik6LbO9aw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VSwAvZNISAjFTdYkg5vgLY6JUjprGPxIO0R0+DGDzS+KD6/YxziQI4HMO83s2dTLl RK3l+4MEYDYObswbXSbZxWRNacl07cmrX4jX9tQm979nBXydGn30BMrfucllCa7z/B wS1p2Nj9e5JpMLiDT6ujAhhfRQIjaHX6z7zTuK2H9r9y3ioNa77skSWfKA1CfD4yfN GpfsLYzjLCuiWrR9uLXpXUApUmJsGJa41ZoH2bWZTnnnN0DwMCDS/WrmSg53uYtXmw KV/c6rWLQDWwPUt7ZbCboOVkpW6WnAVXet3ZY/BmDC0WbZeNKLK0A7WcHAOTZHxYpw Q5ASgFNxyW/qQ== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 02/19] x86/asm: Avoid emitting DWARF CFI for non-VDSO Date: Mon, 28 Oct 2024 14:47:49 -0700 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Message-ID: <20241028214749.eqP4mlz5k0szQtXgaPqrapCBKlvG6aM_8ep_6GOzyUw@z> Content-Type: text/plain; charset="utf-8" VDSO is the only part of the "kernel" using DWARF CFI directives. For the kernel proper, ensure the CFI_* macros don't do anything. These macros aren't yet being used outside of VDSO, so there's no functional change. Signed-off-by: Josh Poimboeuf --- arch/x86/include/asm/dwarf2.h | 37 +++++++++++++++++++++++++++-------- 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/dwarf2.h b/arch/x86/include/asm/dwarf2.h index 430fca13bb56..b1aa3fcd5bca 100644 --- a/arch/x86/include/asm/dwarf2.h +++ b/arch/x86/include/asm/dwarf2.h @@ -6,6 +6,15 @@ #warning "asm/dwarf2.h should be only included in pure assembly files" #endif =20 +#ifdef BUILD_VDSO + + /* + * For the vDSO, emit both runtime unwind information and debug + * symbols for the .dbg file. + */ + + .cfi_sections .eh_frame, .debug_frame + #define CFI_STARTPROC .cfi_startproc #define CFI_ENDPROC .cfi_endproc #define CFI_DEF_CFA .cfi_def_cfa @@ -21,7 +30,8 @@ #define CFI_UNDEFINED .cfi_undefined #define CFI_ESCAPE .cfi_escape =20 -#ifndef BUILD_VDSO +#else /* !BUILD_VDSO */ + /* * Emit CFI data in .debug_frame sections, not .eh_frame sections. * The latter we currently just discard since we don't do DWARF @@ -29,13 +39,24 @@ * useful to anyone. Note we should not use this directive if we * ever decide to enable DWARF unwinding at runtime. */ + .cfi_sections .debug_frame -#else - /* - * For the vDSO, emit both runtime unwind information and debug - * symbols for the .dbg file. - */ - .cfi_sections .eh_frame, .debug_frame -#endif + +#define CFI_STARTPROC +#define CFI_ENDPROC +#define CFI_DEF_CFA +#define CFI_DEF_CFA_REGISTER +#define CFI_DEF_CFA_OFFSET +#define CFI_ADJUST_CFA_OFFSET +#define CFI_OFFSET +#define CFI_REL_OFFSET +#define CFI_REGISTER +#define CFI_RESTORE +#define CFI_REMEMBER_STATE +#define CFI_RESTORE_STATE +#define CFI_UNDEFINED +#define CFI_ESCAPE + +#endif /* !BUILD_VDSO */ =20 #endif /* _ASM_X86_DWARF2_H */ --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74691204016; Mon, 28 Oct 2024 21:48:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152113; cv=none; b=gqD0oMJ7pd/jMN7kKrdXreYTFh1JrUXRUGuxyP5Gh0czZjTxGqbyAR5t7lZVIVkvQfphPRYF0GhDuZ1Jn53MmVgMXthYFyacW1duvJgSQRyYyc6nVSz+Vg5Y6qFe5Ivg3ecpF2WlHp4l0zpV4DRYIzXDqarcU2JpOIKprX5eOfg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152113; c=relaxed/simple; bh=3UaGSX/DcLgu2RuoIEL7DdM11xoyzjhEsw5DgNuktbc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bMM2G7GREtr7sdwnLae1mytrSHqd0qm7oZ5pJHxvmdX0gMWEhC4wg0+LwlIumTaBOzAqAKCvpddjHYaA+DlVv/MLsqiPTUaM7+u5hSsrazZol3aqmvHdBGEpe7sk/WoqEpS4yc0WszlE4dC0+yKYjieYAvv3THVtcCt8A/Q8EEQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=HUuG8cMs; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="HUuG8cMs" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 807B7C4CEE9; Mon, 28 Oct 2024 21:48:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152113; bh=3UaGSX/DcLgu2RuoIEL7DdM11xoyzjhEsw5DgNuktbc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HUuG8cMsfc6lzPM5JUjUAzPSBQl+4Cc3fRKd2WIB15NclYzSfCW7DZgwnYDYRXjmr 7Wtu60f05WtXqAIei1cCC72zuN4FQUlPa/xpcpLWQeGCYHMG51Wv95Qq7w59cax5rL p9SRXuXAT7RkkYbs3P0GGnBylvkZWE6Qvck2OUVrEd9ZVpliQ7pCOJPnLRJD73Pyp9 hwOGdMaIanHBIG8l8W83Q+1tlKGVph/mLCQRlpRcPqV1p0KQkeFeCZh+WUdqstOHy5 s7tem+KUzvugSl+7TLnutcOff6lzkcXC9b92czDIL8AQjmeiyK4QSWKOmuPd3uP3WR 7rQOhrho78HzA== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 03/19] x86/asm: Fix VDSO DWARF generation with kernel IBT enabled Date: Mon, 28 Oct 2024 14:47:50 -0700 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Message-ID: <20241028214750.6pqq07EYYABleQ7mcj0x12QkbGUEwwlG5ZAjn25wc48@z> Content-Type: text/plain; charset="utf-8" The DWARF .cfi_startproc annotation needs to be at the very beginning of a function. But with kernel IBT that doesn't happen as ENDBR is sneakily embedded in SYM_FUNC_START. So the DWARF unwinding info is wrong at the beginning of the VDSO functions. Fix it by adding CFI_STARTPROC and CFI_ENDPROC to SYM_FUNC_START_* and SYM_FUNC_END respectively. Note this only affects VDSO, as the CFI_* macros are empty for the kernel proper. Fixes: c4691712b546 ("x86/linkage: Add ENDBR to SYM_FUNC_START*()") Signed-off-by: Josh Poimboeuf --- arch/x86/entry/vdso/vdso-layout.lds.S | 2 +- arch/x86/entry/vdso/vgetrandom-chacha.S | 2 -- arch/x86/entry/vdso/vsgx.S | 4 ---- arch/x86/include/asm/linkage.h | 29 +++++++++++++++++++------ arch/x86/include/asm/vdso.h | 1 - 5 files changed, 23 insertions(+), 15 deletions(-) diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vd= so-layout.lds.S index bafa73f09e92..a42c7d4a33da 100644 --- a/arch/x86/entry/vdso/vdso-layout.lds.S +++ b/arch/x86/entry/vdso/vdso-layout.lds.S @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: GPL-2.0 */ -#include +#include =20 /* * Linker script for vDSO. This is an ELF shared object prelinked to diff --git a/arch/x86/entry/vdso/vgetrandom-chacha.S b/arch/x86/entry/vdso/= vgetrandom-chacha.S index cc82da9216fb..a33212594731 100644 --- a/arch/x86/entry/vdso/vgetrandom-chacha.S +++ b/arch/x86/entry/vdso/vgetrandom-chacha.S @@ -22,7 +22,6 @@ CONSTANTS: .octa 0x6b20657479622d323320646e61707865 * rcx: number of 64-byte blocks to write to output */ SYM_FUNC_START(__arch_chacha20_blocks_nostack) - CFI_STARTPROC .set output, %rdi .set key, %rsi .set counter, %rdx @@ -175,5 +174,4 @@ SYM_FUNC_START(__arch_chacha20_blocks_nostack) pxor temp,temp =20 ret - CFI_ENDPROC SYM_FUNC_END(__arch_chacha20_blocks_nostack) diff --git a/arch/x86/entry/vdso/vsgx.S b/arch/x86/entry/vdso/vsgx.S index 37a3d4c02366..c0342238c976 100644 --- a/arch/x86/entry/vdso/vsgx.S +++ b/arch/x86/entry/vdso/vsgx.S @@ -24,8 +24,6 @@ .section .text, "ax" =20 SYM_FUNC_START(__vdso_sgx_enter_enclave) - /* Prolog */ - .cfi_startproc push %rbp .cfi_adjust_cfa_offset 8 .cfi_rel_offset %rbp, 0 @@ -143,8 +141,6 @@ SYM_FUNC_START(__vdso_sgx_enter_enclave) jle .Lout jmp .Lenter_enclave =20 - .cfi_endproc - _ASM_VDSO_EXTABLE_HANDLE(.Lenclu_eenter_eresume, .Lhandle_exception) =20 SYM_FUNC_END(__vdso_sgx_enter_enclave) diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h index dc31b13b87a0..2866d57ef907 100644 --- a/arch/x86/include/asm/linkage.h +++ b/arch/x86/include/asm/linkage.h @@ -40,6 +40,10 @@ =20 #ifdef __ASSEMBLY__ =20 +#ifndef LINKER_SCRIPT +#include +#endif + #if defined(CONFIG_MITIGATION_RETHUNK) && !defined(__DISABLE_EXPORTS) && != defined(BUILD_VDSO) #define RET jmp __x86_return_thunk #else /* CONFIG_MITIGATION_RETPOLINE */ @@ -112,40 +116,51 @@ # define SYM_FUNC_ALIAS_MEMFUNC SYM_FUNC_ALIAS #endif =20 +#define __SYM_FUNC_START \ + CFI_STARTPROC ASM_NL \ + ENDBR + +#define __SYM_FUNC_END \ + CFI_ENDPROC ASM_NL + /* SYM_TYPED_FUNC_START -- use for indirectly called globals, w/ CFI type = */ #define SYM_TYPED_FUNC_START(name) \ SYM_TYPED_START(name, SYM_L_GLOBAL, SYM_F_ALIGN) \ - ENDBR + __SYM_FUNC_START =20 /* SYM_FUNC_START -- use for global functions */ #define SYM_FUNC_START(name) \ SYM_START(name, SYM_L_GLOBAL, SYM_F_ALIGN) \ - ENDBR + __SYM_FUNC_START =20 /* SYM_FUNC_START_NOALIGN -- use for global functions, w/o alignment */ #define SYM_FUNC_START_NOALIGN(name) \ SYM_START(name, SYM_L_GLOBAL, SYM_A_NONE) \ - ENDBR + __SYM_FUNC_START =20 /* SYM_FUNC_START_LOCAL -- use for local functions */ #define SYM_FUNC_START_LOCAL(name) \ SYM_START(name, SYM_L_LOCAL, SYM_F_ALIGN) \ - ENDBR + __SYM_FUNC_START =20 /* SYM_FUNC_START_LOCAL_NOALIGN -- use for local functions, w/o alignment = */ #define SYM_FUNC_START_LOCAL_NOALIGN(name) \ SYM_START(name, SYM_L_LOCAL, SYM_A_NONE) \ - ENDBR + __SYM_FUNC_START =20 /* SYM_FUNC_START_WEAK -- use for weak functions */ #define SYM_FUNC_START_WEAK(name) \ SYM_START(name, SYM_L_WEAK, SYM_F_ALIGN) \ - ENDBR + __SYM_FUNC_START =20 /* SYM_FUNC_START_WEAK_NOALIGN -- use for weak functions, w/o alignment */ #define SYM_FUNC_START_WEAK_NOALIGN(name) \ SYM_START(name, SYM_L_WEAK, SYM_A_NONE) \ - ENDBR + __SYM_FUNC_START + +#define SYM_FUNC_END(name) \ + __SYM_FUNC_END \ + SYM_END(name, SYM_T_FUNC) =20 #endif /* _ASM_X86_LINKAGE_H */ =20 diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h index d7f6592b74a9..0111c349bbc5 100644 --- a/arch/x86/include/asm/vdso.h +++ b/arch/x86/include/asm/vdso.h @@ -2,7 +2,6 @@ #ifndef _ASM_X86_VDSO_H #define _ASM_X86_VDSO_H =20 -#include #include #include =20 --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7540C204084; Mon, 28 Oct 2024 21:48:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152114; cv=none; b=cvQKM5ifhWCkmQG498PdJnQMxSlYFR5gdt51fCe7QUcPSiXsYegZFFw2jiP98hw9kPmw80N2NMEkzjUvHInXDcWN1hjU8ajTiAkk0PapB+Yl4VSjDVKEynMFDlVq+bqYiYJoMWLMzBlnq7yVFCthl/W61few7jIIUGriH+UYfUc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152114; c=relaxed/simple; bh=gpI5P01K7cWgBTfY/PtSLZYGYq/sI/onO0R36v3clI4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kY0BuNCG1cCYRTbH3BqjhONKZQ7ltiKW5/EOsJaIjmHXdmUdzjc9rdEiFaS7ZUsjwpvpstpX9mynvREQCu5R496BlTP79rV/cPBb7NV/5Y9qWeE5LUUJUDLOSGca0ZDsKUskQ/pqMzKsr1rxb4motLgpRoRGHIw/PvIPUW82TWM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=S5j56e9E; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="S5j56e9E" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7E3D4C4CEE7; Mon, 28 Oct 2024 21:48:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152114; bh=gpI5P01K7cWgBTfY/PtSLZYGYq/sI/onO0R36v3clI4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=S5j56e9E9hdnjtAcvPjAjGQoXgH29Yx3KdsnMx05bDvHl5NCF1HbQg50orUvzpBTf U0cFcI0+e6yEge8tlae5nAtww0bFJgmAv4bm0uXBUxL4OjlW/v1KwkjrbmiSfgARnw nzbSbu2vAcUFFENLryrVS2+PZ/Ayx1/8/ZWkj11BfS9QYuaxlhuuy2GBDGn38FzADt NvrgSYa08lk24rMQkJ+2QmiB1nTKfPeCurd9/1NnfBDG5QOOsMNaPjcAB+WjliZFwO C2httQlBQM+Wg45tdCZU1C+zlIrKB8FwB1ul+B2vERSinPdmLt/lfqJemmFSWihQkK lV4JK3gYuWR+Q== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 04/19] x86/vdso: Use SYM_FUNC_{START,END} in __kernel_vsyscall() Date: Mon, 28 Oct 2024 14:47:51 -0700 Message-ID: <3263d60b868f244d7624ad5223644a6bd8c7bc81.1730150953.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Message-ID: <20241028214751.q7BLXAsIOAfwvXG5sqSRBNR8nx7XjN7ODL2Y1d7rPOE@z> Content-Type: text/plain; charset="utf-8" Use SYM_FUNC_{START,END} instead of all the boilerplate. No functional change. Signed-off-by: Josh Poimboeuf --- arch/x86/entry/vdso/vdso32/system_call.S | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/arch/x86/entry/vdso/vdso32/system_call.S b/arch/x86/entry/vdso= /vdso32/system_call.S index d33c6513fd2c..bdc576548240 100644 --- a/arch/x86/entry/vdso/vdso32/system_call.S +++ b/arch/x86/entry/vdso/vdso32/system_call.S @@ -9,11 +9,7 @@ #include =20 .text - .globl __kernel_vsyscall - .type __kernel_vsyscall,@function - ALIGN -__kernel_vsyscall: - CFI_STARTPROC +SYM_FUNC_START(__kernel_vsyscall) /* * Reshuffle regs so that all of any of the entry instructions * will preserve enough state. @@ -79,7 +75,5 @@ SYM_INNER_LABEL(int80_landing_pad, SYM_L_GLOBAL) CFI_RESTORE ecx CFI_ADJUST_CFA_OFFSET -4 RET - CFI_ENDPROC - - .size __kernel_vsyscall,.-__kernel_vsyscall +SYM_FUNC_END(__kernel_vsyscall) .previous --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FE901EF92F; Mon, 28 Oct 2024 21:48:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152115; cv=none; b=je712gc4CPqWHVrtESw5y1nmLBRKcdTyP0/+Yhvrgz7YQPe/eN0qhhZsEGrLH3CcWWFgLJhUH8k7Z2dL2Ka/78/KppL1V8w4kKBrLLOIxktnQoeWW9l0WQDrZxHcw6gO2t9ksSAT1k+KyzPLjj6QXEjotYRMftpCSsTSlbmtFO0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152115; c=relaxed/simple; bh=novk9n48ZW5HwpctN0xt5g9gOey0iB8co3QMa5J7oe8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XGRzT1rEJIDl/C6INWq9a1nWu4JBowt03j45IDGikm8eps80jvucg4mV5LZ5WfymKNYKUy1HWC4ESdafbG2OWW/J4H5vUZQ17il93B38mK9tFGBDtrIl9jltNR3CV0RUqPsYUWmRmcyaaX+gfKnMN2YsyFRMQ8rwG+hxMrsanbE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Pq4TNxbQ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Pq4TNxbQ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7CAD6C4CEC3; Mon, 28 Oct 2024 21:48:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152115; bh=novk9n48ZW5HwpctN0xt5g9gOey0iB8co3QMa5J7oe8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Pq4TNxbQUWnHzF0/p/OHd81qstZ991ZkhGmctWZz7gnWBviq+8OJUB7u7GT5603Su 7CNdAZluhI7IjrUoZ09cUdbitS/3nAYWDTU7SkHKMaetMOS1YrH9Qrza9LPBvApcl8 +DIznj4q60FYmR7s3U1YEZXfVybdcksgY8KzovUt6LXNHcppeY+MpiaM4Y3r6VWqnv 3FgFxcVwvfj4q93Lm0re6mUPFvdd1AmgG0WAmFDQk8XC6mtPffPU1LGtH5uvMHFVqZ XhyO2WxyM5KCMOEoJg96q1shEniiJzJeLdS76yey3/KlDs/ZpliOZk+mPJEmSKAbwG KPsR5YynqIg8w== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 05/19] x86/vdso: Use CFI macros in __vdso_sgx_enter_enclave() Date: Mon, 28 Oct 2024 14:47:52 -0700 Message-ID: <95cb05cfd657091c0d3126709a55535ebd09e2ba.1730150953.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Message-ID: <20241028214752.EZdSeKXLz2s3lAAICFDTihxSz57ElnO8C23RMovhqxA@z> Content-Type: text/plain; charset="utf-8" Use the CFI macros instead of the raw .cfi_* directives to be consistent with the rest of the VDSO asm. It's also easier on the eyes. No functional changes. Signed-off-by: Josh Poimboeuf --- arch/x86/entry/vdso/vsgx.S | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/arch/x86/entry/vdso/vsgx.S b/arch/x86/entry/vdso/vsgx.S index c0342238c976..8d7b8eb45c50 100644 --- a/arch/x86/entry/vdso/vsgx.S +++ b/arch/x86/entry/vdso/vsgx.S @@ -24,13 +24,14 @@ .section .text, "ax" =20 SYM_FUNC_START(__vdso_sgx_enter_enclave) + SYM_F_ALIGN push %rbp - .cfi_adjust_cfa_offset 8 - .cfi_rel_offset %rbp, 0 + CFI_ADJUST_CFA_OFFSET 8 + CFI_REL_OFFSET %rbp, 0 mov %rsp, %rbp - .cfi_def_cfa_register %rbp + CFI_DEF_CFA_REGISTER %rbp push %rbx - .cfi_rel_offset %rbx, -8 + CFI_REL_OFFSET %rbx, -8 =20 mov %ecx, %eax .Lenter_enclave: @@ -77,13 +78,11 @@ SYM_FUNC_START(__vdso_sgx_enter_enclave) .Lout: pop %rbx leave - .cfi_def_cfa %rsp, 8 + CFI_DEF_CFA %rsp, 8 RET =20 - /* The out-of-line code runs with the pre-leave stack frame. */ - .cfi_def_cfa %rbp, 16 - .Linvalid_input: + CFI_DEF_CFA %rbp, 16 mov $(-EINVAL), %eax jmp .Lout =20 --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D30D204940; Mon, 28 Oct 2024 21:48:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152116; cv=none; b=k2QNeofH4WKrfkVnJnAchhlkB1EZhO6krAnaaumrgHldB+MLUubPFnaYvDgoQQdmllZndpEHqlkUnUMjPmRWhNg/tUtNWF4JIg0Quf94KPKUpOu+mht2raqV5qOqI9TM8jeWtwUcq2HQsbTR9r9Pk/LA5+Y7pSJOVRN+BplguhA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152116; c=relaxed/simple; bh=csgk8n4iRRjcqkq8BEX7gzKoaA91YAbqf7sWW4GqaNA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qcJIAiWx91UJisT5h7igPHb0Jz4h9Sn6ROfzq6veqmDwMq+1wshLwyPV1uR5r0pNwaqL7MNAW0abb2mJxlOAT40QJN4KdTGp0lbgR3856HFaL1m9o0tASis0DeIUXRn9sVRrbfxwHqarNMo0N6NEKU7AJElJiGM5D7Ce166oJSE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ijb5dyNg; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ijb5dyNg" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7E98AC4CEE9; Mon, 28 Oct 2024 21:48:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152116; bh=csgk8n4iRRjcqkq8BEX7gzKoaA91YAbqf7sWW4GqaNA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ijb5dyNgkc4TtnhRxSnT4WDozXMbt2uEjiaWW22Y5IOlFHXx1yy9KqyZcZIBCcwV/ +IE6m8/Iuj3ghBl/o1LcE3qE6Rgr8CtPwOcqZb9ZmInZU/ilOulwHrHoru2zw6mFCT XoPU/4Jcqc1brMpbwMpwDp1UqX5URIOOzPBoFDDUBLNi9LU1Nj9Xx5iZIFKFcTMuWo MMydmTWIQe/o382RuMSoq6AtHssgWq6St9kC557nvwrkUo43Unk7USbTEcknXWKpof 4aqGcnPhl0AxQn673MKuHwew99TM+FP0aT7oDDwxYST5AFDXn0BABTowqpULK6b2KG Eesxu5SYisfdw== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 06/19] x86/vdso: Enable sframe generation in VDSO Date: Mon, 28 Oct 2024 14:47:53 -0700 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Message-ID: <20241028214753.-hfD_ytd9Xj1tlTcNFYf_mV9LQVYCyWVW1ynhto9ln4@z> Content-Type: text/plain; charset="utf-8" Enable sframe generation in the VDSO library so kernel and user space can unwind through it. Signed-off-by: Josh Poimboeuf --- arch/x86/entry/vdso/Makefile | 6 ++++-- arch/x86/entry/vdso/vdso-layout.lds.S | 3 +++ arch/x86/include/asm/dwarf2.h | 5 ++++- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index c9216ac4fb1e..75ae9e093a2d 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -47,13 +47,15 @@ quiet_cmd_vdso2c =3D VDSO2C $@ $(obj)/vdso-image-%.c: $(obj)/vdso%.so.dbg $(obj)/vdso%.so $(obj)/vdso2c F= ORCE $(call if_changed,vdso2c) =20 +SFRAME_CFLAGS :=3D $(call as-option,-Wa$(comma)-gsframe,) + # # Don't omit frame pointers for ease of userspace debugging, but do # optimize sibling calls. # CFL :=3D $(PROFILING) -mcmodel=3Dsmall -fPIC -O2 -fasynchronous-unwind-tab= les -m64 \ $(filter -g%,$(KBUILD_CFLAGS)) -fno-stack-protector \ - -fno-omit-frame-pointer -foptimize-sibling-calls \ + -fno-omit-frame-pointer $(SFRAME_CFLAGS) -foptimize-sibling-calls \ -DDISABLE_BRANCH_PROFILING -DBUILD_VDSO =20 ifdef CONFIG_MITIGATION_RETPOLINE @@ -63,7 +65,7 @@ endif endif =20 $(vobjs): KBUILD_CFLAGS :=3D $(filter-out $(PADDING_CFLAGS) $(CC_FLAGS_LTO= ) $(CC_FLAGS_CFI) $(RANDSTRUCT_CFLAGS) $(GCC_PLUGINS_CFLAGS) $(RETPOLINE_CF= LAGS),$(KBUILD_CFLAGS)) $(CFL) -$(vobjs): KBUILD_AFLAGS +=3D -DBUILD_VDSO +$(vobjs): KBUILD_AFLAGS +=3D -DBUILD_VDSO $(SFRAME_CFLAGS) =20 # # vDSO code runs in userspace and -pg doesn't help with profiling anyway. diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vd= so-layout.lds.S index a42c7d4a33da..f6cd6654bb9e 100644 --- a/arch/x86/entry/vdso/vdso-layout.lds.S +++ b/arch/x86/entry/vdso/vdso-layout.lds.S @@ -69,6 +69,7 @@ SECTIONS .eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr .eh_frame : { KEEP (*(.eh_frame)) } :text =20 + .sframe : { *(.sframe) } :text :sframe =20 /* * Text is well-separated from actual data: there's plenty of @@ -97,6 +98,7 @@ SECTIONS * Very old versions of ld do not recognize this name token; use the const= ant. */ #define PT_GNU_EH_FRAME 0x6474e550 +#define PT_GNU_SFRAME 0x6474e554 =20 /* * We must supply the ELF program headers explicitly to get just one @@ -108,4 +110,5 @@ PHDRS dynamic PT_DYNAMIC FLAGS(4); /* PF_R */ note PT_NOTE FLAGS(4); /* PF_R */ eh_frame_hdr PT_GNU_EH_FRAME; + sframe PT_GNU_SFRAME; } diff --git a/arch/x86/include/asm/dwarf2.h b/arch/x86/include/asm/dwarf2.h index b1aa3fcd5bca..1a49492817a1 100644 --- a/arch/x86/include/asm/dwarf2.h +++ b/arch/x86/include/asm/dwarf2.h @@ -12,8 +12,11 @@ * For the vDSO, emit both runtime unwind information and debug * symbols for the .dbg file. */ - +#ifdef __x86_64__ + .cfi_sections .eh_frame, .debug_frame, .sframe +#else .cfi_sections .eh_frame, .debug_frame +#endif =20 #define CFI_STARTPROC .cfi_startproc #define CFI_ENDPROC .cfi_endproc --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E95D2204F73; Mon, 28 Oct 2024 21:48:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152118; cv=none; b=iB7bJbzollSjEOz2LE0lCJbpkWwH85tMij5Rer2YMTB2pYjl/urqex0opS+g26y+toHHi5eXee00qnEhCeXVo5ActPNdI0OPsGbklFayGypXRyzAltTXQDUuSGJaMqDoaV+H3bdFDyxzqJQlEgnJ4atAQNRj57dIKFc+liYFJ4w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152118; c=relaxed/simple; bh=tFz0jCfBxp5+ANCGDurYSkakpQpnUMZjNRHlpxS7hNY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SseyNiAojnMMxcbiv62gZaqbuajKZvnVOXpVJ3r13dLEmIAnFblw+6dbCqrjdoVmQ/jWbkkKPH0U+QyEZW4slBabaLz5rPwB78rGkwlG5v45Zjwtpt2DTjG94KH537zwoYpTjGTELZn27cICMuFn+05wGFUpw0HVwfWpjK1cu2o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=gq3ZE4Z+; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="gq3ZE4Z+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 820FEC4CEEC; Mon, 28 Oct 2024 21:48:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152117; bh=tFz0jCfBxp5+ANCGDurYSkakpQpnUMZjNRHlpxS7hNY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gq3ZE4Z+8P+K2q+unYPpjbzj13un1GsLe9T90zNsBooSyKFbyiMeM4ZiC+i3E9odL GS6cMHWlZPIe8bRaQocs1CeU6+t+QA3ps+CCSWfPTriO/HE9MDWTTVanl/Wd2hx1Dr wExWU20WYteq7Rpg6hnGGX6SvWnMS3pAod8YaBRp/vr9pBStDRddruqjU82HaqN+mh mLlX3ujTFOY76KNcXHsBvXxTUMXKCbjDLPCIybq29VNSlJsAp//M+Lf9j69uEHEJjG iOxZqX6pqNWcrefq9w+WZ6/maS3yINM1Zf2BiMW8uzY/TTO67AtKr2B4C/e3CowsaM bnd+gjg/8YPCA== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 07/19] unwind: Add user space unwinding API Date: Mon, 28 Oct 2024 14:47:54 -0700 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Message-ID: <20241028214754.KtR7HbQ6yj58_RZaAjfZGud_w6Msm7zxPUlWthWHytU@z> Content-Type: text/plain; charset="utf-8" Introduce a user space unwinder API which provides a generic way to unwind user stacks. Signed-off-by: Josh Poimboeuf --- arch/Kconfig | 7 +++ include/linux/unwind_user.h | 41 +++++++++++++++ kernel/Makefile | 1 + kernel/unwind/Makefile | 1 + kernel/unwind/user.c | 99 +++++++++++++++++++++++++++++++++++++ 5 files changed, 149 insertions(+) create mode 100644 include/linux/unwind_user.h create mode 100644 kernel/unwind/Makefile create mode 100644 kernel/unwind/user.c diff --git a/arch/Kconfig b/arch/Kconfig index 7a95c1052cd5..ee8ec97ea0ef 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -435,6 +435,13 @@ config HAVE_HARDLOCKUP_DETECTOR_ARCH It uses the same command line parameters, and sysctl interface, as the generic hardlockup detectors. =20 +config UNWIND_USER + bool + +config HAVE_UNWIND_USER_FP + bool + select UNWIND_USER + config HAVE_PERF_REGS bool help diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h new file mode 100644 index 000000000000..9d28db06f33f --- /dev/null +++ b/include/linux/unwind_user.h @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_UNWIND_USER_H +#define _LINUX_UNWIND_USER_H + +#include + +enum unwind_user_type { + UNWIND_USER_TYPE_FP, +}; + +struct unwind_stacktrace { + unsigned int nr; + unsigned long *entries; +}; + +struct unwind_user_frame { + s32 cfa_off; + s32 ra_off; + s32 fp_off; + bool use_fp; +}; + +struct unwind_user_state { + unsigned long ip; + unsigned long sp; + unsigned long fp; + enum unwind_user_type type; + bool done; +}; + +/* Synchronous interfaces: */ + +int unwind_user_start(struct unwind_user_state *state); +int unwind_user_next(struct unwind_user_state *state); + +int unwind_user(struct unwind_stacktrace *trace, unsigned int max_entries); + +#define for_each_user_frame(state) \ + for (unwind_user_start((state)); !(state)->done; unwind_user_next((state)= )) + +#endif /* _LINUX_UNWIND_USER_H */ diff --git a/kernel/Makefile b/kernel/Makefile index 87866b037fbe..6cb4b0e02a34 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -50,6 +50,7 @@ obj-y +=3D rcu/ obj-y +=3D livepatch/ obj-y +=3D dma/ obj-y +=3D entry/ +obj-y +=3D unwind/ obj-$(CONFIG_MODULES) +=3D module/ =20 obj-$(CONFIG_KCMP) +=3D kcmp.o diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile new file mode 100644 index 000000000000..349ce3677526 --- /dev/null +++ b/kernel/unwind/Makefile @@ -0,0 +1 @@ + obj-$(CONFIG_UNWIND_USER) +=3D user.o diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c new file mode 100644 index 000000000000..54b989810a0e --- /dev/null +++ b/kernel/unwind/user.c @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-2.0 +/* +* Generic interfaces for unwinding user space +* +* Copyright (C) 2024 Josh Poimboeuf +*/ +#include +#include +#include +#include +#include +#include + +static struct unwind_user_frame fp_frame =3D { + ARCH_INIT_USER_FP_FRAME +}; + +int unwind_user_next(struct unwind_user_state *state) +{ + struct unwind_user_frame _frame; + struct unwind_user_frame *frame =3D &_frame; + unsigned long prev_ip, cfa, fp, ra =3D 0; + + if (state->done) + return -EINVAL; + + prev_ip =3D state->ip; + + switch (state->type) { + case UNWIND_USER_TYPE_FP: + frame =3D &fp_frame; + break; + default: + BUG(); + } + + cfa =3D (frame->use_fp ? state->fp : state->sp) + frame->cfa_off; + + if (frame->ra_off && get_user(ra, (unsigned long __user *)(cfa + frame->r= a_off))) + goto the_end; + + if (ra =3D=3D prev_ip) + goto the_end; + + if (frame->fp_off && get_user(fp, (unsigned long __user *)(cfa + frame->f= p_off))) + goto the_end; + + state->sp =3D cfa; + state->ip =3D ra; + if (frame->fp_off) + state->fp =3D fp; + + return 0; + +the_end: + state->done =3D true; + return -EINVAL; +} + +int unwind_user_start(struct unwind_user_state *state) +{ + struct pt_regs *regs =3D task_pt_regs(current); + + memset(state, 0, sizeof(*state)); + + if (!current->mm) { + state->done =3D true; + return -EINVAL; + } + + state->type =3D UNWIND_USER_TYPE_FP; + + state->sp =3D user_stack_pointer(regs); + state->ip =3D instruction_pointer(regs); + state->fp =3D frame_pointer(regs); + + return 0; +} + +int unwind_user(struct unwind_stacktrace *trace, unsigned int max_entries) +{ + struct unwind_user_state state; + + trace->nr =3D 0; + + if (!max_entries) + return -EINVAL; + + if (!current->mm) + return 0; + + for_each_user_frame(&state) { + trace->entries[trace->nr++] =3D state.ip; + if (trace->nr >=3D max_entries) + break; + } + + return 0; +} --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D28C61F130B; Mon, 28 Oct 2024 21:48:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152118; cv=none; b=lxVuOVQkBMipi4Ldi62A9j5bRep7RLcDZ0re+UslHrDTk8LaTHrlND56AWenL2msiZ9pmZzxCMQpqVkauBRFP9My19g58ZgU3761mluKsf5VcZxnvw7x2QSrUYiGkdaLxIA+GPjozxxC58fLXkLYnGUoSAgemQNwwWjRaObD2gE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152118; c=relaxed/simple; bh=VOfrwUxGBFkmAPsGRr9pXslEtwzpKwt151vnBogM1nc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=McNGowocVzYc2A7pOUEMzuj5x7mnHGPzcml0FHJhYw53JK2zT2M2F/PpdfA374yPw1jOQolA9ibCY6xmcAPIV3Yz8DuOokfpYFFYEB01crQoU56WqLaoMNcP7bsMZfVnLVsFYEHAf5e2+knRQqROUJpQe7U0N0LuArK+pa8U6Q0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RXrGJhHX; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RXrGJhHX" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A146AC4CEE7; Mon, 28 Oct 2024 21:48:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152118; bh=VOfrwUxGBFkmAPsGRr9pXslEtwzpKwt151vnBogM1nc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=RXrGJhHXd/fB3hzcmhRBmG3x8aoTlrm25yRMm8wVXfrZ5wJtNSdDft7oDtf/7+6ip EGYc08uCp/qbnL7NPoY8EoCxtmIX3x2lpRVfPU88LZhoezvkPnCl1nXTPg4zLa0dBm iqnCay6iFgJlVNZzPNNJ/z3aQoFCgGuMwC5+nYxkNUy4fBJSZ77nSI5exTw2v1xvBu PBeEw7mbK1HaaADzuN8o8i7bCuOgMjbgOVldcULgTMIrHppfhBQbOCeNpMiF4RnDjt ND1fS0LEVOhYLahfUn8MMpRk6DnV/Vzg+dgQ/W9dOFiy5iVZvQYF7Vnh4MHf7PuGyB gSmw1VgcMxdhg== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 08/19] unwind/x86: Enable CONFIG_HAVE_UNWIND_USER_FP Date: Mon, 28 Oct 2024 14:47:55 -0700 Message-ID: <2354d43022bd336c390e1e77f7cee68126d5f8c8.1730150953.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Message-ID: <20241028214755.bborRoiWkn6qhWpu8yY5B5RnFfuazhdzaNGESW2QQyk@z> Content-Type: text/plain; charset="utf-8" Use ARCH_INIT_USER_FP_FRAME to describe how frame pointers are unwound on x86, and enable CONFIG_HAVE_UNWIND_USER_FP accordingly so the unwind_user interfaces can be used. Signed-off-by: Josh Poimboeuf --- arch/x86/Kconfig | 1 + arch/x86/include/asm/unwind_user.h | 11 +++++++++++ 2 files changed, 12 insertions(+) create mode 100644 arch/x86/include/asm/unwind_user.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 0bdb7a394f59..f91098d6f535 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -289,6 +289,7 @@ config X86 select HAVE_SYSCALL_TRACEPOINTS select HAVE_UACCESS_VALIDATION if HAVE_OBJTOOL select HAVE_UNSTABLE_SCHED_CLOCK + select HAVE_UNWIND_USER_FP if X86_64 select HAVE_USER_RETURN_NOTIFIER select HAVE_GENERIC_VDSO select VDSO_GETRANDOM if X86_64 diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwi= nd_user.h new file mode 100644 index 000000000000..19df26a65132 --- /dev/null +++ b/arch/x86/include/asm/unwind_user.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_UNWIND_USER_H +#define _ASM_X86_UNWIND_USER_H + +#define ARCH_INIT_USER_FP_FRAME \ + .ra_off =3D (s32)sizeof(long) * -1, \ + .cfa_off =3D (s32)sizeof(long) * 2, \ + .fp_off =3D (s32)sizeof(long) * -2, \ + .use_fp =3D true, + +#endif /* _ASM_X86_UNWIND_USER_H */ --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36223201002; Mon, 28 Oct 2024 21:48:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152099; cv=none; b=CvERukqEH1dVjvat5WEZSZRUSvSHmsDGOU21DxpAh7D1rJEobDjn5E0iPN2f57gwMdGb0RVtTlAgfaSt2NQSx0SENac9CTLhD19fKCq65zTYB+2PPQ6TNN2hPd5m1XZ1L/FtEGqXNULths/s7W4pMxMqfQrDga1Mb96aHh6o4qE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152099; c=relaxed/simple; bh=rdgA2R3zqlWRjVmrhibzoJo1v1b1aQtutEA2+nGCMKM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WbL25kwHqlYRgJJgyWXoX2eHMNbXi4C3mKgiYnS+/roMV0PjdzeCG79wBp1K1RA5SNRYF4v2LJZR7KXuOoIPUG4p2pmcXLqIOUQ8HhRs1V0BYiKEgJHUDdwPYPGY7q3nr4mkRBLkZc+92XOMvg1xOupNrtkNS3vEjcPjT2icpME= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jZ8krWMB; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jZ8krWMB" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D8E05C4CEEB; Mon, 28 Oct 2024 21:48:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152098; bh=rdgA2R3zqlWRjVmrhibzoJo1v1b1aQtutEA2+nGCMKM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jZ8krWMBdEfrED3seMS4/SD8D5pJAaKzSriA88f9y1Ti1aVw+o+R1x3/zbVLgRY8U 6hWSVZBWtzemBgQRHAd+H4QetTO0p0opxR18WkXLPTI53aCjU84prQjpnzpGLC374W //L9jDqFxx61LfEYy1kN3XKEWexXLBpKd+IdThA9dY92p8Fjln/bCWLbm8ZmlE2uHB N+EQ8VP3YrjCYVz5lKeAmtvDyvLctyx/E43X18JDEEwgN8pjmnzYFu1iMXOcmtua/t kNxlOMyZaFVKRtvZwEGIvwW0xfS/+jzaYdSqeCzE1Er0PRhcxMU9ZXvBY5ScdUoMVd WTHavALL4jeRg== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 09/19] unwind: Introduce sframe user space unwinding Date: Mon, 28 Oct 2024 14:47:36 -0700 Message-ID: <42c0a99236af65c09c8182e260af7bcf5aa1e158.1730150953.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some distros have started compiling frame pointers into all their packages to enable the kernel to do system-wide profiling of user space. Unfortunately that creates a runtime performance penalty across the entire system. Using DWARF instead isn't feasible due to the complexity it would add to the kernel. For in-kernel unwinding we solved this problem with the creation of the ORC unwinder for x86_64. Similarly, for user space the GNU assembler has created the sframe format starting with binutils 2.41 for sframe v2. Sframe is a simpler version of .eh_frame. It gets placed in the .sframe section. Add support for unwinding user space using sframe. More information about sframe can be found here: - https://lwn.net/Articles/932209/ - https://lwn.net/Articles/940686/ - https://sourceware.org/binutils/docs/sframe-spec.html Glibc support is needed to implement the prctl() calls to tell the kernel where the .sframe segments are. Signed-off-by: Josh Poimboeuf --- arch/Kconfig | 4 + arch/x86/include/asm/mmu.h | 2 +- fs/binfmt_elf.c | 35 +++- include/linux/mm_types.h | 3 + include/linux/sframe.h | 41 ++++ include/linux/unwind_user.h | 2 + include/uapi/linux/elf.h | 1 + include/uapi/linux/prctl.h | 3 + kernel/fork.c | 10 + kernel/sys.c | 11 ++ kernel/unwind/Makefile | 3 +- kernel/unwind/sframe.c | 380 ++++++++++++++++++++++++++++++++++++ kernel/unwind/sframe.h | 215 ++++++++++++++++++++ kernel/unwind/user.c | 24 ++- mm/init-mm.c | 6 + 15 files changed, 732 insertions(+), 8 deletions(-) create mode 100644 include/linux/sframe.h create mode 100644 kernel/unwind/sframe.c create mode 100644 kernel/unwind/sframe.h diff --git a/arch/Kconfig b/arch/Kconfig index ee8ec97ea0ef..e769c39dd221 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -442,6 +442,10 @@ config HAVE_UNWIND_USER_FP bool select UNWIND_USER =20 +config HAVE_UNWIND_USER_SFRAME + bool + select UNWIND_USER + config HAVE_PERF_REGS bool help diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h index ce4677b8b735..12ea831978cc 100644 --- a/arch/x86/include/asm/mmu.h +++ b/arch/x86/include/asm/mmu.h @@ -73,7 +73,7 @@ typedef struct { .context =3D { \ .ctx_id =3D 1, \ .lock =3D __MUTEX_INITIALIZER(mm.context.lock), \ - } + }, =20 void leave_mm(void); #define leave_mm leave_mm diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 06dc4a57ba78..434c548f0837 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -47,6 +47,7 @@ #include #include #include +#include #include #include =20 @@ -633,11 +634,13 @@ static unsigned long load_elf_interp(struct elfhdr *i= nterp_elf_ex, unsigned long no_base, struct elf_phdr *interp_elf_phdata, struct arch_elf_state *arch_state) { - struct elf_phdr *eppnt; + struct elf_phdr *eppnt, *sframe_phdr =3D NULL; unsigned long load_addr =3D 0; int load_addr_set =3D 0; unsigned long error =3D ~0UL; unsigned long total_size; + unsigned long start_code =3D ~0UL; + unsigned long end_code =3D 0; int i; =20 /* First of all, some simple consistency checks */ @@ -659,7 +662,8 @@ static unsigned long load_elf_interp(struct elfhdr *int= erp_elf_ex, =20 eppnt =3D interp_elf_phdata; for (i =3D 0; i < interp_elf_ex->e_phnum; i++, eppnt++) { - if (eppnt->p_type =3D=3D PT_LOAD) { + switch (eppnt->p_type) { + case PT_LOAD: { int elf_type =3D MAP_PRIVATE; int elf_prot =3D make_prot(eppnt->p_flags, arch_state, true, true); @@ -688,7 +692,7 @@ static unsigned long load_elf_interp(struct elfhdr *int= erp_elf_ex, /* * Check to see if the section's size will overflow the * allowed task size. Note that p_filesz must always be - * <=3D p_memsize so it's only necessary to check p_memsz. + * <=3D p_memsz so it's only necessary to check p_memsz. */ k =3D load_addr + eppnt->p_vaddr; if (BAD_ADDR(k) || @@ -698,9 +702,24 @@ static unsigned long load_elf_interp(struct elfhdr *in= terp_elf_ex, error =3D -ENOMEM; goto out; } + + if ((eppnt->p_flags & PF_X) && k < start_code) + start_code =3D k; + + if ((eppnt->p_flags & PF_X) && k + eppnt->p_filesz > end_code) + end_code =3D k + eppnt->p_filesz; + break; + } + case PT_GNU_SFRAME: + sframe_phdr =3D eppnt; + break; } } =20 + if (sframe_phdr) + sframe_add_section(load_addr + sframe_phdr->p_vaddr, + start_code, end_code); + error =3D load_addr; out: return error; @@ -823,7 +842,7 @@ static int load_elf_binary(struct linux_binprm *bprm) int first_pt_load =3D 1; unsigned long error; struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata =3D NULL; - struct elf_phdr *elf_property_phdata =3D NULL; + struct elf_phdr *elf_property_phdata =3D NULL, *sframe_phdr =3D NULL; unsigned long elf_brk; int retval, i; unsigned long elf_entry; @@ -931,6 +950,10 @@ static int load_elf_binary(struct linux_binprm *bprm) executable_stack =3D EXSTACK_DISABLE_X; break; =20 + case PT_GNU_SFRAME: + sframe_phdr =3D elf_ppnt; + break; + case PT_LOPROC ... PT_HIPROC: retval =3D arch_elf_pt_proc(elf_ex, elf_ppnt, bprm->file, false, @@ -1321,6 +1344,10 @@ static int load_elf_binary(struct linux_binprm *bprm) task_pid_nr(current), retval); } =20 + if (sframe_phdr) + sframe_add_section(load_bias + sframe_phdr->p_vaddr, + start_code, end_code); + regs =3D current_pt_regs(); #ifdef ELF_PLAT_INIT /* diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 381d22eba088..6e7561c1a5fc 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1052,6 +1052,9 @@ struct mm_struct { #endif } lru_gen; #endif /* CONFIG_LRU_GEN_WALKS_MMU */ +#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME + struct maple_tree sframe_mt; +#endif } __randomize_layout; =20 /* diff --git a/include/linux/sframe.h b/include/linux/sframe.h new file mode 100644 index 000000000000..d167e01817c4 --- /dev/null +++ b/include/linux/sframe.h @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_SFRAME_H +#define _LINUX_SFRAME_H + +#include + +struct unwind_user_frame; + +#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME + +#define INIT_MM_SFRAME .sframe_mt =3D MTREE_INIT(sframe_mt, 0), + +extern void sframe_free_mm(struct mm_struct *mm); + +/* text_start, text_end, file_name are optional */ +extern int sframe_add_section(unsigned long sframe_addr, unsigned long tex= t_start, + unsigned long text_end); + +extern int sframe_remove_section(unsigned long sframe_addr); +extern int sframe_find(unsigned long ip, struct unwind_user_frame *frame); + +static inline bool current_has_sframe(void) +{ + struct mm_struct *mm =3D current->mm; + + return mm && !mtree_empty(&mm->sframe_mt); +} + +#else /* !CONFIG_HAVE_UNWIND_USER_SFRAME */ + +static inline void sframe_free_mm(struct mm_struct *mm) {} + +static inline int sframe_add_section(unsigned long sframe_addr, unsigned l= ong text_start, unsigned long text_end) { return -EINVAL; } +static inline int sframe_remove_section(unsigned long sframe_addr) { retur= n -EINVAL; } +static inline int sframe_find(unsigned long ip, struct unwind_user_frame *= frame) { return -EINVAL; } + +static inline bool current_has_sframe(void) { return false; } + +#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */ + +#endif /* _LINUX_SFRAME_H */ diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h index 9d28db06f33f..cde0fde4923e 100644 --- a/include/linux/unwind_user.h +++ b/include/linux/unwind_user.h @@ -5,7 +5,9 @@ #include =20 enum unwind_user_type { + UNWIND_USER_TYPE_NONE, UNWIND_USER_TYPE_FP, + UNWIND_USER_TYPE_SFRAME, }; =20 struct unwind_stacktrace { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index b9935988da5c..4dc3f0ca5af5 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -39,6 +39,7 @@ typedef __s64 Elf64_Sxword; #define PT_GNU_STACK (PT_LOOS + 0x474e551) #define PT_GNU_RELRO (PT_LOOS + 0x474e552) #define PT_GNU_PROPERTY (PT_LOOS + 0x474e553) +#define PT_GNU_SFRAME (PT_LOOS + 0x474e554) =20 =20 /* ARM MTE memory tag segment type */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 35791791a879..69511077c910 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -328,4 +328,7 @@ struct prctl_mm_map { # define PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC 0x10 /* Clear the aspect on exec */ # define PR_PPC_DEXCR_CTRL_MASK 0x1f =20 +#define PR_ADD_SFRAME 74 +#define PR_REMOVE_SFRAME 75 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/fork.c b/kernel/fork.c index c056ea95fe8c..60f14fbab956 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -105,6 +105,7 @@ #include #include #include +#include =20 #include #include @@ -924,6 +925,7 @@ void __mmdrop(struct mm_struct *mm) mm_pasid_drop(mm); mm_destroy_cid(mm); percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS); + sframe_free_mm(mm); =20 free_mm(mm); } @@ -1251,6 +1253,13 @@ static void mm_init_uprobes_state(struct mm_struct *= mm) #endif } =20 +static void mm_init_sframe(struct mm_struct *mm) +{ +#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME + mt_init(&mm->sframe_mt); +#endif +} + static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct = *p, struct user_namespace *user_ns) { @@ -1282,6 +1291,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm= , struct task_struct *p, mm->pmd_huge_pte =3D NULL; #endif mm_init_uprobes_state(mm); + mm_init_sframe(mm); hugetlb_count_init(mm); =20 if (current->mm) { diff --git a/kernel/sys.c b/kernel/sys.c index 4da31f28fda8..7d05a67727db 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -64,6 +64,7 @@ #include #include #include +#include =20 #include =20 @@ -2784,6 +2785,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, a= rg2, unsigned long, arg3, case PR_RISCV_SET_ICACHE_FLUSH_CTX: error =3D RISCV_SET_ICACHE_FLUSH_CTX(arg2, arg3); break; + case PR_ADD_SFRAME: + if (arg5) + return -EINVAL; + error =3D sframe_add_section(arg2, arg3, arg4); + break; + case PR_REMOVE_SFRAME: + if (arg3 || arg4 || arg5) + return -EINVAL; + error =3D sframe_remove_section(arg2); + break; default: error =3D -EINVAL; break; diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile index 349ce3677526..f70380d7a6a6 100644 --- a/kernel/unwind/Makefile +++ b/kernel/unwind/Makefile @@ -1 +1,2 @@ - obj-$(CONFIG_UNWIND_USER) +=3D user.o + obj-$(CONFIG_UNWIND_USER) +=3D user.o + obj-$(CONFIG_HAVE_UNWIND_USER_SFRAME) +=3D sframe.o diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c new file mode 100644 index 000000000000..933e47696e29 --- /dev/null +++ b/kernel/unwind/sframe.c @@ -0,0 +1,380 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define pr_fmt(fmt) "sframe: " fmt + +#include +#include +#include +#include +#include +#include +#include + +#include "sframe.h" + +#define SFRAME_FILENAME_LEN 32 + +struct sframe_section { + struct rcu_head rcu; + + unsigned long sframe_addr; + unsigned long text_addr; + + unsigned long fdes_addr; + unsigned long fres_addr; + unsigned int fdes_nr; + signed char ra_off; + signed char fp_off; +}; + +DEFINE_STATIC_SRCU(sframe_srcu); + +#define __SFRAME_GET_USER(out, user_ptr, type) \ +({ \ + type __tmp; \ + if (get_user(__tmp, (type __user *)user_ptr)) \ + return -EFAULT; \ + user_ptr +=3D sizeof(__tmp); \ + out =3D __tmp; \ +}) + +#define SFRAME_GET_USER(out, user_ptr, size) \ +({ \ + switch (size) { \ + case 1: \ + __SFRAME_GET_USER(out, user_ptr, u8); \ + break; \ + case 2: \ + __SFRAME_GET_USER(out, user_ptr, u16); \ + break; \ + case 4: \ + __SFRAME_GET_USER(out, user_ptr, u32); \ + break; \ + default: \ + return -EINVAL; \ + } \ +}) + +static unsigned char fre_type_to_size(unsigned char fre_type) +{ + if (fre_type > 2) + return 0; + return 1 << fre_type; +} + +static unsigned char offset_size_enum_to_size(unsigned char off_size) +{ + if (off_size > 2) + return 0; + return 1 << off_size; +} + +static int find_fde(struct sframe_section *sec, unsigned long ip, + struct sframe_fde *fde) +{ + struct sframe_fde __user *first, *last, *found =3D NULL; + u32 ip_off, func_off_low =3D 0, func_off_high =3D -1; + + ip_off =3D ip - sec->sframe_addr; + + first =3D (void __user *)sec->fdes_addr; + last =3D first + sec->fdes_nr; + while (first <=3D last) { + struct sframe_fde __user *mid; + u32 func_off; + + mid =3D first + ((last - first) / 2); + + if (get_user(func_off, (s32 __user *)mid)) + return -EFAULT; + + if (ip_off >=3D func_off) { + /* validate sort order */ + if (func_off < func_off_low) + return -EINVAL; + + func_off_low =3D func_off; + + found =3D mid; + first =3D mid + 1; + } else { + /* validate sort order */ + if (func_off > func_off_high) + return -EINVAL; + + func_off_high =3D func_off; + + last =3D mid - 1; + } + } + + if (!found) + return -EINVAL; + + if (copy_from_user(fde, found, sizeof(*fde))) + return -EFAULT; + + /* check for gaps */ + if (ip_off < fde->start_addr || ip_off >=3D fde->start_addr + fde->size) + return -EINVAL; + + return 0; +} + +static int find_fre(struct sframe_section *sec, struct sframe_fde *fde, + unsigned long ip, struct unwind_user_frame *frame) +{ + unsigned char fde_type =3D SFRAME_FUNC_FDE_TYPE(fde->info); + unsigned char fre_type =3D SFRAME_FUNC_FRE_TYPE(fde->info); + unsigned char offset_count, offset_size; + s32 cfa_off, ra_off, fp_off, ip_off; + void __user *f, *last_f =3D NULL; + unsigned char addr_size; + u32 last_fre_ip_off =3D 0; + u8 fre_info =3D 0; + int i; + + addr_size =3D fre_type_to_size(fre_type); + if (!addr_size) + return -EINVAL; + + ip_off =3D ip - (sec->sframe_addr + fde->start_addr); + + f =3D (void __user *)sec->fres_addr + fde->fres_off; + + for (i =3D 0; i < fde->fres_num; i++) { + u32 fre_ip_off; + + SFRAME_GET_USER(fre_ip_off, f, addr_size); + + if (fre_ip_off < last_fre_ip_off) + return -EINVAL; + + last_fre_ip_off =3D fre_ip_off; + + if (fde_type =3D=3D SFRAME_FDE_TYPE_PCINC) { + if (ip_off < fre_ip_off) + break; + } else { + /* SFRAME_FDE_TYPE_PCMASK */ + if (ip_off % fde->rep_size < fre_ip_off) + break; + } + + SFRAME_GET_USER(fre_info, f, 1); + + offset_count =3D SFRAME_FRE_OFFSET_COUNT(fre_info); + offset_size =3D offset_size_enum_to_size(SFRAME_FRE_OFFSET_SIZE(fre_inf= o)); + + if (!offset_count || !offset_size) + return -EINVAL; + + last_f =3D f; + f +=3D offset_count * offset_size; + } + + if (!last_f) + return -EINVAL; + + f =3D last_f; + + SFRAME_GET_USER(cfa_off, f, offset_size); + offset_count--; + + ra_off =3D sec->ra_off; + if (!ra_off) { + if (!offset_count--) + return -EINVAL; + + SFRAME_GET_USER(ra_off, f, offset_size); + } + + fp_off =3D sec->fp_off; + if (!fp_off && offset_count) { + offset_count--; + SFRAME_GET_USER(fp_off, f, offset_size); + } + + if (offset_count) + return -EINVAL; + + frame->cfa_off =3D cfa_off; + frame->ra_off =3D ra_off; + frame->fp_off =3D fp_off; + frame->use_fp =3D SFRAME_FRE_CFA_BASE_REG_ID(fre_info) =3D=3D SFRAME_BASE= _REG_FP; + + return 0; +} + +int sframe_find(unsigned long ip, struct unwind_user_frame *frame) +{ + struct mm_struct *mm =3D current->mm; + struct sframe_section *sec; + struct sframe_fde fde; + int ret =3D -EINVAL; + + if (!mm) + return -EINVAL; + + guard(srcu)(&sframe_srcu); + + sec =3D mtree_load(&mm->sframe_mt, ip); + if (!sec) + return ret; + + ret =3D find_fde(sec, ip, &fde); + if (ret) + return ret; + + ret =3D find_fre(sec, &fde, ip, frame); + if (ret) + return ret; + + return 0; +} + +static int __sframe_add_section(unsigned long sframe_addr, + unsigned long text_start, + unsigned long text_end) +{ + struct maple_tree *sframe_mt =3D ¤t->mm->sframe_mt; + struct sframe_section *sec; + struct sframe_header shdr; + unsigned long header_end; + int ret; + + if (copy_from_user(&shdr, (void __user *)sframe_addr, sizeof(shdr))) + return -EFAULT; + + if (shdr.preamble.magic !=3D SFRAME_MAGIC || + shdr.preamble.version !=3D SFRAME_VERSION_2 || + !(shdr.preamble.flags & SFRAME_F_FDE_SORTED) || + shdr.auxhdr_len || !shdr.num_fdes || !shdr.num_fres || + shdr.fdes_off > shdr.fres_off) { + return -EINVAL; + } + + sec =3D kmalloc(sizeof(*sec), GFP_KERNEL); + if (!sec) + return -ENOMEM; + + header_end =3D sframe_addr + SFRAME_HDR_SIZE(shdr); + + sec->sframe_addr =3D sframe_addr; + sec->text_addr =3D text_start; + sec->fdes_addr =3D header_end + shdr.fdes_off; + sec->fres_addr =3D header_end + shdr.fres_off; + sec->fdes_nr =3D shdr.num_fdes; + sec->ra_off =3D shdr.cfa_fixed_ra_offset; + sec->fp_off =3D shdr.cfa_fixed_fp_offset; + + ret =3D mtree_insert_range(sframe_mt, text_start, text_end, sec, GFP_KERN= EL); + if (ret) { + kfree(sec); + return ret; + } + + return 0; +} + +int sframe_add_section(unsigned long sframe_addr, unsigned long text_start, + unsigned long text_end) +{ + struct mm_struct *mm =3D current->mm; + struct vm_area_struct *sframe_vma; + + mmap_read_lock(mm); + + sframe_vma =3D vma_lookup(mm, sframe_addr); + if (!sframe_vma) + goto err_unlock; + + if (text_start && text_end) { + struct vm_area_struct *text_vma; + + text_vma =3D vma_lookup(mm, text_start); + if (!(text_vma->vm_flags & VM_EXEC)) + goto err_unlock; + + if (PAGE_ALIGN(text_end) !=3D text_vma->vm_end) + goto err_unlock; + } else { + struct vm_area_struct *vma, *text_vma =3D NULL; + VMA_ITERATOR(vmi, mm, 0); + + for_each_vma(vmi, vma) { + if (vma->vm_file !=3D sframe_vma->vm_file || + !(vma->vm_flags & VM_EXEC)) + continue; + + if (text_vma) { + pr_warn_once("%s[%d]: multiple EXEC segments unsupported\n", + current->comm, current->pid); + goto err_unlock; + } + + text_vma =3D vma; + } + + if (!text_vma) + goto err_unlock; + + text_start =3D text_vma->vm_start; + text_end =3D text_vma->vm_end; + } + + mmap_read_unlock(mm); + + return __sframe_add_section(sframe_addr, text_start, text_end); + +err_unlock: + mmap_read_unlock(mm); + return -EINVAL; +} + +static void sframe_free_srcu(struct rcu_head *rcu) +{ + struct sframe_section *sec =3D container_of(rcu, struct sframe_section, r= cu); + + kfree(sec); +} + +static int __sframe_remove_section(struct mm_struct *mm, + struct sframe_section *sec) +{ + sec =3D mtree_erase(&mm->sframe_mt, sec->text_addr); + if (!sec) + return -EINVAL; + + call_srcu(&sframe_srcu, &sec->rcu, sframe_free_srcu); + + return 0; +} + +int sframe_remove_section(unsigned long sframe_addr) +{ + struct mm_struct *mm =3D current->mm; + struct sframe_section *sec; + unsigned long index =3D 0; + + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) { + if (sec->sframe_addr =3D=3D sframe_addr) + return __sframe_remove_section(mm, sec); + } + + return -EINVAL; +} + +void sframe_free_mm(struct mm_struct *mm) +{ + struct sframe_section *sec; + unsigned long index =3D 0; + + if (!mm) + return; + + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) + kfree(sec); + + mtree_destroy(&mm->sframe_mt); +} diff --git a/kernel/unwind/sframe.h b/kernel/unwind/sframe.h new file mode 100644 index 000000000000..11b9be7ad82e --- /dev/null +++ b/kernel/unwind/sframe.h @@ -0,0 +1,215 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2023, Oracle and/or its affiliates. + * + * This file contains definitions for the SFrame stack tracing format, whi= ch is + * documented at https://sourceware.org/binutils/docs + */ +#ifndef _SFRAME_H +#define _SFRAME_H + +#include + +#define SFRAME_VERSION_1 1 +#define SFRAME_VERSION_2 2 +#define SFRAME_MAGIC 0xdee2 + +/* Function Descriptor Entries are sorted on PC. */ +#define SFRAME_F_FDE_SORTED 0x1 +/* Frame-pointer based stack tracing. Defined, but not set. */ +#define SFRAME_F_FRAME_POINTER 0x2 + +#define SFRAME_CFA_FIXED_FP_INVALID 0 +#define SFRAME_CFA_FIXED_RA_INVALID 0 + +/* Supported ABIs/Arch. */ +#define SFRAME_ABI_AARCH64_ENDIAN_BIG 1 /* AARCH64 big endian. */ +#define SFRAME_ABI_AARCH64_ENDIAN_LITTLE 2 /* AARCH64 little endian. */ +#define SFRAME_ABI_AMD64_ENDIAN_LITTLE 3 /* AMD64 little endian. */ + +/* SFrame FRE types. */ +#define SFRAME_FRE_TYPE_ADDR1 0 +#define SFRAME_FRE_TYPE_ADDR2 1 +#define SFRAME_FRE_TYPE_ADDR4 2 + +/* + * SFrame Function Descriptor Entry types. + * + * The SFrame format has two possible representations for functions. The + * choice of which type to use is made according to the instruction patter= ns + * in the relevant program stub. + */ + +/* Unwinders perform a (PC >=3D FRE_START_ADDR) to look up a matching FRE.= */ +#define SFRAME_FDE_TYPE_PCINC 0 +/* + * Unwinders perform a (PC & FRE_START_ADDR_AS_MASK >=3D FRE_START_ADDR_AS= _MASK) + * to look up a matching FRE. Typical usecases are pltN entries, trampolin= es + * etc. + */ +#define SFRAME_FDE_TYPE_PCMASK 1 + +/** + * struct sframe_preamble - SFrame Preamble. + * @magic: Magic number (SFRAME_MAGIC). + * @version: Format version number (SFRAME_VERSION). + * @flags: Various flags. + */ +struct sframe_preamble { + u16 magic; + u8 version; + u8 flags; +} __packed; + +/** + * struct sframe_header - SFrame Header. + * @preamble: SFrame preamble. + * @abi_arch: Identify the arch (including endianness) and ABI. + * @cfa_fixed_fp_offset: Offset for the Frame Pointer (FP) from CFA may be + * fixed for some ABIs ((e.g, in AMD64 when -fno-omit-frame-pointer is + * used). When fixed, this field specifies the fixed stack frame offset + * and the individual FREs do not need to track it. When not fixed, it + * is set to SFRAME_CFA_FIXED_FP_INVALID, and the individual FREs may + * provide the applicable stack frame offset, if any. + * @cfa_fixed_ra_offset: Offset for the Return Address from CFA is fixed f= or + * some ABIs. When fixed, this field specifies the fixed stack frame + * offset and the individual FREs do not need to track it. When not + * fixed, it is set to SFRAME_CFA_FIXED_FP_INVALID. + * @auxhdr_len: Number of bytes making up the auxiliary header, if any. + * Some ABI/arch, in the future, may use this space for extending the + * information in SFrame header. Auxiliary header is contained in bytes + * sequentially following the sframe_header. + * @num_fdes: Number of SFrame FDEs in this SFrame section. + * @num_fres: Number of SFrame Frame Row Entries. + * @fre_len: Number of bytes in the SFrame Frame Row Entry section. + * @fdes_off: Offset of SFrame Function Descriptor Entry section. + * @fres_off: Offset of SFrame Frame Row Entry section. + */ +struct sframe_header { + struct sframe_preamble preamble; + u8 abi_arch; + s8 cfa_fixed_fp_offset; + s8 cfa_fixed_ra_offset; + u8 auxhdr_len; + u32 num_fdes; + u32 num_fres; + u32 fre_len; + u32 fdes_off; + u32 fres_off; +} __packed; + +#define SFRAME_HDR_SIZE(sframe_hdr) \ + ((sizeof(struct sframe_header) + (sframe_hdr).auxhdr_len)) + +/* Two possible keys for executable (instruction) pointers signing. */ +#define SFRAME_AARCH64_PAUTH_KEY_A 0 /* Key A. */ +#define SFRAME_AARCH64_PAUTH_KEY_B 1 /* Key B. */ + +/** + * struct sframe_fde - SFrame Function Descriptor Entry. + * @start_addr: Function start address. Encoded as a signed offset, + * relative to the current FDE. + * @size: Size of the function in bytes. + * @fres_off: Offset of the first SFrame Frame Row Entry of the function, + * relative to the beginning of the SFrame Frame Row Entry sub-section. + * @fres_num: Number of frame row entries for the function. + * @info: Additional information for deciphering the stack trace + * information for the function. Contains information about SFrame FRE + * type, SFrame FDE type, PAC authorization A/B key, etc. + * @rep_size: Block size for SFRAME_FDE_TYPE_PCMASK + * @padding: Unused + */ +struct sframe_fde { + s32 start_addr; + u32 size; + u32 fres_off; + u32 fres_num; + u8 info; + u8 rep_size; + u16 padding; +} __packed; + +/* + * 'func_info' in SFrame FDE contains additional information for decipheri= ng + * the stack trace information for the function. In V1, the information is + * organized as follows: + * - 4-bits: Identify the FRE type used for the function. + * - 1-bit: Identify the FDE type of the function - mask or inc. + * - 1-bit: PAC authorization A/B key (aarch64). + * - 2-bits: Unused. + * --------------------------------------------------------------------- + * | Unused | PAC auth A/B key (aarch64) | FDE type | FRE type | + * | | Unused (amd64) | | | + * --------------------------------------------------------------------- + * 8 6 5 4 0 + */ + +/* Note: Set PAC auth key to SFRAME_AARCH64_PAUTH_KEY_A by default. */ +#define SFRAME_FUNC_INFO(fde_type, fre_enc_type) \ + (((SFRAME_AARCH64_PAUTH_KEY_A & 0x1) << 5) | \ + (((fde_type) & 0x1) << 4) | ((fre_enc_type) & 0xf)) + +#define SFRAME_FUNC_FRE_TYPE(data) ((data) & 0xf) +#define SFRAME_FUNC_FDE_TYPE(data) (((data) >> 4) & 0x1) +#define SFRAME_FUNC_PAUTH_KEY(data) (((data) >> 5) & 0x1) + +/* + * Size of stack frame offsets in an SFrame Frame Row Entry. A single + * SFrame FRE has all offsets of the same size. Offset size may vary + * across frame row entries. + */ +#define SFRAME_FRE_OFFSET_1B 0 +#define SFRAME_FRE_OFFSET_2B 1 +#define SFRAME_FRE_OFFSET_4B 2 + +/* An SFrame Frame Row Entry can be SP or FP based. */ +#define SFRAME_BASE_REG_FP 0 +#define SFRAME_BASE_REG_SP 1 + +/* + * The index at which a specific offset is presented in the variable length + * bytes of an FRE. + */ +#define SFRAME_FRE_CFA_OFFSET_IDX 0 +/* + * The RA stack offset, if present, will always be at index 1 in the varia= ble + * length bytes of the FRE. + */ +#define SFRAME_FRE_RA_OFFSET_IDX 1 +/* + * The FP stack offset may appear at offset 1 or 2, depending on the ABI a= s RA + * may or may not be tracked. + */ +#define SFRAME_FRE_FP_OFFSET_IDX 2 + +/* + * 'fre_info' in SFrame FRE contains information about: + * - 1 bit: base reg for CFA + * - 4 bits: Number of offsets (N). A value of up to 3 is allowed to tra= ck + * all three of CFA, FP and RA (fixed implicit order). + * - 2 bits: information about size of the offsets (S) in bytes. + * Valid values are SFRAME_FRE_OFFSET_1B, SFRAME_FRE_OFFSET_2B, + * SFRAME_FRE_OFFSET_4B + * - 1 bit: Mangled RA state bit (aarch64 only). + * --------------------------------------------------------------- + * | Mangled-RA (aarch64) | Size of | Number of | base_reg | + * | Unused (amd64) | offsets | offsets | | + * --------------------------------------------------------------- + * 8 7 5 1 0 + */ + +/* Note: Set mangled_ra_p to zero by default. */ +#define SFRAME_FRE_INFO(base_reg_id, offset_num, offset_size) \ + (((0 & 0x1) << 7) | (((offset_size) & 0x3) << 5) | \ + (((offset_num) & 0xf) << 1) | ((base_reg_id) & 0x1)) + +/* Set the mangled_ra_p bit as indicated. */ +#define SFRAME_FRE_INFO_UPDATE_MANGLED_RA_P(mangled_ra_p, fre_info) \ + ((((mangled_ra_p) & 0x1) << 7) | ((fre_info) & 0x7f)) + +#define SFRAME_FRE_CFA_BASE_REG_ID(data) ((data) & 0x1) +#define SFRAME_FRE_OFFSET_COUNT(data) (((data) >> 1) & 0xf) +#define SFRAME_FRE_OFFSET_SIZE(data) (((data) >> 5) & 0x3) +#define SFRAME_FRE_MANGLED_RA_P(data) (((data) >> 7) & 0x1) + +#endif /* _SFRAME_H */ diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c index 54b989810a0e..8e47c80e3e54 100644 --- a/kernel/unwind/user.c +++ b/kernel/unwind/user.c @@ -8,12 +8,17 @@ #include #include #include +#include #include -#include =20 +#ifdef CONFIG_HAVE_UNWIND_USER_FP +#include static struct unwind_user_frame fp_frame =3D { ARCH_INIT_USER_FP_FRAME }; +#else +static struct unwind_user_frame fp_frame; +#endif =20 int unwind_user_next(struct unwind_user_state *state) { @@ -30,6 +35,16 @@ int unwind_user_next(struct unwind_user_state *state) case UNWIND_USER_TYPE_FP: frame =3D &fp_frame; break; + case UNWIND_USER_TYPE_SFRAME: + if (sframe_find(state->ip, frame)) { + if (!IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP)) + goto the_end; + + frame =3D &fp_frame; + } + break; + case UNWIND_USER_TYPE_NONE: + goto the_end; default: BUG(); } @@ -68,7 +83,12 @@ int unwind_user_start(struct unwind_user_state *state) return -EINVAL; } =20 - state->type =3D UNWIND_USER_TYPE_FP; + if (current_has_sframe()) + state->type =3D UNWIND_USER_TYPE_SFRAME; + else if (IS_ENABLED(CONFIG_UNWIND_USER_FP)) + state->type =3D UNWIND_USER_TYPE_FP; + else + state->type =3D UNWIND_USER_TYPE_NONE; =20 state->sp =3D user_stack_pointer(regs); state->ip =3D instruction_pointer(regs); diff --git a/mm/init-mm.c b/mm/init-mm.c index 24c809379274..8eb1b122b7bf 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -11,12 +11,17 @@ #include #include #include +#include #include =20 #ifndef INIT_MM_CONTEXT #define INIT_MM_CONTEXT(name) #endif =20 +#ifndef INIT_MM_SFRAME +#define INIT_MM_SFRAME +#endif + const struct vm_operations_struct vma_dummy_vm_ops; =20 /* @@ -45,6 +50,7 @@ struct mm_struct init_mm =3D { .user_ns =3D &init_user_ns, .cpu_bitmap =3D CPU_BITS_NONE, INIT_MM_CONTEXT(init_mm) + INIT_MM_SFRAME }; =20 void setup_initial_init_mm(void *start_code, void *end_code, --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC47A205148; Mon, 28 Oct 2024 21:48:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152120; cv=none; b=dgP/Zm6b5ipoT6CVzZCoRVdDEG8b0d2I5rconHOEiFpa9w1kIrR/LD/Tu1RF0ga3F63aSi765yvM8kiKmcRmGqwIS9joEc5bHW/LMZDZOiuQMyOwFSarldFhf++Gj/d7EBEe1iE6fE1WChAkY3UL75KMRPHuS7QerEzkAUxLk4U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152120; c=relaxed/simple; bh=2VilLd9xLfLAN9IHrO9LwlDC2JxfM2xfGosce4CDqaI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sOfErV4+46X8U1/ZbQKAYeUh+ZLD/aokjkVM7MPiTZbUc0WPNJPSIQEVCpCIEuTDz6Ta3hiDzHEORGzg7BikPUvMvSFel+/tqymQMY5KdXdcDXA88Ya8tNgb+kzF3PBdthtsbj5o0EUUj0jMDCVbUI3rHdEJFhQZWmc4qtG32bM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OywIrhNz; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OywIrhNz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A7E62C4CEEB; Mon, 28 Oct 2024 21:48:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152120; bh=2VilLd9xLfLAN9IHrO9LwlDC2JxfM2xfGosce4CDqaI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OywIrhNzqr6LJuMu2ulMPC8abF1/oKuZac3tGOjkMWgXl5DG215Aub+7CBMBK/DXR M4rnj/DAEaHNv4EBoxMnqKBEqMx1kooVZkyE8onOPrKiLgZdBBGb9EBatJmV2ddphC AEc/mlx0pK+msUZDPL0Y0AcPksyalEKGZFblqZGFNUdSufWQGE0bS4YSnffh7hZurc Mcoyc4uZh4Gu/apsKthSy/viPf+DQMVbjeeJwWACBL8Hh3F7T4dpiXuDJHWGB3vGmc nHyCmw8OpVfIBu4z/iNu9GcWNpSIKbHtZKOnSSZpXod9+9FMoYcg54ZLkQ0X4bNppr uKd6wDt9M1t2A== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 10/19] unwind/x86: Enable CONFIG_HAVE_UNWIND_USER_SFRAME Date: Mon, 28 Oct 2024 14:47:57 -0700 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Message-ID: <20241028214757.M2bD7cCAXu3xfISSwUAr0vdnlyNg7DsXHosy3wEA4yE@z> Content-Type: text/plain; charset="utf-8" Binutils 2.41 supports generating sframe v2 for x86_64. It works well in testing so enable it. Signed-off-by: Josh Poimboeuf --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f91098d6f535..3e6f4c80c5b5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -290,6 +290,7 @@ config X86 select HAVE_UACCESS_VALIDATION if HAVE_OBJTOOL select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_UNWIND_USER_FP if X86_64 + select HAVE_UNWIND_USER_SFRAME if X86_64 select HAVE_USER_RETURN_NOTIFIER select HAVE_GENERIC_VDSO select VDSO_GETRANDOM if X86_64 --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 48B951EE009; Mon, 28 Oct 2024 21:48:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152101; cv=none; b=DDXnnwbneMCbhQ7HmYPY8cueuL3zyci7xf65ZLc28TU4BnZwv2swg0cItVJpygSQn0ktY/K5ZzW+fCjnOQJztE+Vd181Pn+tuatoTyDMPY3bj0uwoMhgWuWyvWa6aqFv6gXJsLYLRAd9ZsLS12MZclyShGoq+8DklszGUTOUPok= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152101; c=relaxed/simple; bh=jD51Fu0vOpWdomX4mvhQMZ5VqCrli3vqWQMm2r9eaGU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F/jULpdJXdJ0SjrvQBct/eMGr0In4b2ar+rjCokdmY69bP6XpB/yLvF+Dtvm/vdIPL/LuOpIzqifeMj0yM98x7RgFyNnKOqcFE5wIfcCKpERSyqSdF5x7YOUz976fVsdfnXpNGjIrhenefpwVeIeEItWHygua23Zgx1hx04oWK8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=uFxVy60C; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="uFxVy60C" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F1918C4CEE7; Mon, 28 Oct 2024 21:48:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152100; bh=jD51Fu0vOpWdomX4mvhQMZ5VqCrli3vqWQMm2r9eaGU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=uFxVy60CEcA/5zuwtfOIreofS/R0sLwen8cwAUMkVBPLlwJhrcip6tYMUKGGUV2f+ ZDzOQhQjMod7etQKBIkCm4pZ9wfCfe8rtdD8dyC8i2yeDtDMA36QOsEzhUbrSx9i2K Y5mvUrvIkCtL0SXXA5E4uWvHSLMWwFU6uGkYMSc9d4Ta71Oa8kb9dinOEXlrp52gaS UMIzyLCsMit4T2wg5VlYROQpvj99EtLmdxE8FoTtItDZa++8zVde6AP9fmLQiwOwGC 07XYy3D6u8wV+W2oLgNZqifNDMNDvVXgFOL+v5SuI5WEItRHsIsJCxhObf25oOmKKb eelS1JX5HWj0Q== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 11/19] unwind: Add deferred user space unwinding API Date: Mon, 28 Oct 2024 14:47:38 -0700 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add unwind_user_deferred() which allows callers to schedule task work to unwind the user space stack before returning to user space. This solves several problems for its callers: - Ensure the unwind happens in task context even if the caller may running in interrupt context. - Only do the unwind once, even if called multiple times either by the same caller or multiple callers. - Create a "context context" cookie which allows trace post-processing to correlate kernel unwinds/traces with the user unwind. Signed-off-by: Josh Poimboeuf --- include/linux/entry-common.h | 3 + include/linux/sched.h | 5 + include/linux/unwind_user.h | 56 ++++++++++ kernel/fork.c | 4 + kernel/unwind/user.c | 199 +++++++++++++++++++++++++++++++++++ 5 files changed, 267 insertions(+) diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index 1e50cdb83ae5..efbe8f964f31 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -12,6 +12,7 @@ #include #include #include +#include =20 #include =20 @@ -111,6 +112,8 @@ static __always_inline void enter_from_user_mode(struct= pt_regs *regs) CT_WARN_ON(__ct_state() !=3D CT_STATE_USER); user_exit_irqoff(); =20 + unwind_enter_from_user_mode(); + instrumentation_begin(); kmsan_unpoison_entry_regs(regs); trace_hardirqs_off_finish(); diff --git a/include/linux/sched.h b/include/linux/sched.h index 5007a8e2d640..31b6f1d763ef 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -47,6 +47,7 @@ #include #include #include +#include =20 /* task_struct member predeclarations (sorted alphabetically): */ struct audit_context; @@ -1592,6 +1593,10 @@ struct task_struct { struct user_event_mm *user_event_mm; #endif =20 +#ifdef CONFIG_UNWIND_USER + struct unwind_task_info unwind_task_info; +#endif + /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h index cde0fde4923e..98e236c843b1 100644 --- a/include/linux/unwind_user.h +++ b/include/linux/unwind_user.h @@ -3,6 +3,9 @@ #define _LINUX_UNWIND_USER_H =20 #include +#include + +#define UNWIND_MAX_CALLBACKS 4 =20 enum unwind_user_type { UNWIND_USER_TYPE_NONE, @@ -30,6 +33,26 @@ struct unwind_user_state { bool done; }; =20 +struct unwind_task_info { + u64 ctx_cookie; + u32 pending_callbacks; + u64 last_cookies[UNWIND_MAX_CALLBACKS]; + void *privs[UNWIND_MAX_CALLBACKS]; + unsigned long *entries; + struct callback_head work; +}; + +typedef void (*unwind_callback_t)(struct unwind_stacktrace *trace, + u64 ctx_cookie, void *data); + +struct unwind_callback { + unwind_callback_t func; + int idx; +}; + + +#ifdef CONFIG_UNWIND_USER + /* Synchronous interfaces: */ =20 int unwind_user_start(struct unwind_user_state *state); @@ -40,4 +63,37 @@ int unwind_user(struct unwind_stacktrace *trace, unsigne= d int max_entries); #define for_each_user_frame(state) \ for (unwind_user_start((state)); !(state)->done; unwind_user_next((state)= )) =20 + +/* Asynchronous interfaces: */ + +void unwind_task_init(struct task_struct *task); +void unwind_task_free(struct task_struct *task); + +int unwind_user_register(struct unwind_callback *callback, unwind_callback= _t func); +int unwind_user_unregister(struct unwind_callback *callback); + +int unwind_user_deferred(struct unwind_callback *callback, u64 *ctx_cookie= , void *data); + +DECLARE_PER_CPU(u64, unwind_ctx_ctr); + +static __always_inline void unwind_enter_from_user_mode(void) +{ + __this_cpu_inc(unwind_ctx_ctr); +} + + +#else /* !CONFIG_UNWIND_USER */ + +static inline void unwind_task_init(struct task_struct *task) {} +static inline void unwind_task_free(struct task_struct *task) {} + +static inline int unwind_user_register(struct unwind_callback *callback, u= nwind_callback_t func) { return -ENOSYS; } +static inline int unwind_user_unregister(struct unwind_callback *callback)= { return -ENOSYS; } + +static inline int unwind_user_deferred(struct unwind_callback *callback, u= 64 *ctx_cookie, void *data) { return -ENOSYS; } + +static inline void unwind_enter_from_user_mode(void) {} + +#endif /* !CONFIG_UNWIND_USER */ + #endif /* _LINUX_UNWIND_USER_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 60f14fbab956..d7580067853d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -105,6 +105,7 @@ #include #include #include +#include #include =20 #include @@ -972,6 +973,7 @@ void __put_task_struct(struct task_struct *tsk) WARN_ON(refcount_read(&tsk->usage)); WARN_ON(tsk =3D=3D current); =20 + unwind_task_free(tsk); sched_ext_free(tsk); io_uring_free(tsk); cgroup_free(tsk); @@ -2348,6 +2350,8 @@ __latent_entropy struct task_struct *copy_process( p->bpf_ctx =3D NULL; #endif =20 + unwind_task_init(p); + /* Perform scheduler related setup. Assign this task to a CPU. */ retval =3D sched_fork(clone_flags, p); if (retval) diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c index 8e47c80e3e54..ed7759c56551 100644 --- a/kernel/unwind/user.c +++ b/kernel/unwind/user.c @@ -10,6 +10,11 @@ #include #include #include +#include +#include +#include + +#define UNWIND_MAX_ENTRIES 512 =20 #ifdef CONFIG_HAVE_UNWIND_USER_FP #include @@ -20,6 +25,12 @@ static struct unwind_user_frame fp_frame =3D { static struct unwind_user_frame fp_frame; #endif =20 +static struct unwind_callback *callbacks[UNWIND_MAX_CALLBACKS]; +static DECLARE_RWSEM(callbacks_rwsem); + +/* Counter for entries from user space */ +DEFINE_PER_CPU(u64, unwind_ctx_ctr); + int unwind_user_next(struct unwind_user_state *state) { struct unwind_user_frame _frame; @@ -117,3 +128,191 @@ int unwind_user(struct unwind_stacktrace *trace, unsi= gned int max_entries) =20 return 0; } + +/* + * The "context cookie" is a unique identifier which allows post-processin= g to + * correlate kernel trace(s) with user unwinds. It has the CPU id the hig= hest + * 16 bits and a per-CPU entry counter in the lower 48 bits. + */ +static u64 ctx_to_cookie(u64 cpu, u64 ctx) +{ + BUILD_BUG_ON(NR_CPUS > 65535); + return (ctx & ((1UL << 48) - 1)) | cpu; +} + +/* + * Schedule a user space unwind to be done in task work before exiting the + * kernel. + * + * The @callback must have previously been registered with + * unwind_user_register(). + * + * The @cookie output is a unique identifer which will also be passed to t= he + * callback function. It can be used to stitch kernel and user traces tog= ether + * in post-processing. + * + * If there are multiple calls to this function for a given @callback, the + * cookie will usually be the same and the callback will only be called on= ce. + * + * The only exception is when the task has migrated to another CPU, *and* = this + * is called while the task work is running (or has already run). Then a = new + * cookie will be generated and the callback will be called again for the = new + * cookie. + */ +int unwind_user_deferred(struct unwind_callback *callback, u64 *ctx_cookie= , void *data) +{ + struct unwind_task_info *info =3D ¤t->unwind_task_info; + u64 cookie =3D info->ctx_cookie; + int idx =3D callback->idx; + + if (WARN_ON_ONCE(in_nmi())) + return -EINVAL; + + if (WARN_ON_ONCE(!callback->func || idx < 0)) + return -EINVAL; + + if (!current->mm) + return -EINVAL; + + guard(irqsave)(); + + if (cookie && (info->pending_callbacks & (1 << idx))) + goto done; + + /* + * If this is the first call from *any* caller since the most recent + * entry from user space, initialize the task context cookie and + * schedule the task work. + */ + if (!cookie) { + u64 ctx_ctr =3D __this_cpu_read(unwind_ctx_ctr); + u64 cpu =3D raw_smp_processor_id(); + + cookie =3D ctx_to_cookie(cpu, ctx_ctr); + + /* + * If called after task work has sent an unwind to the callback + * function but before the exit to user space, skip it as the + * previous call to the callback function should suffice. + * + * The only exception is if this task has migrated to another + * CPU since the first call to unwind_user_deferred(). The + * per-CPU context counter will have changed which will result + * in a new cookie and another unwind (see comment above + * function). + */ + if (cookie =3D=3D info->last_cookies[idx]) + goto done; + + info->ctx_cookie =3D cookie; + task_work_add(current, &info->work, TWA_RESUME); + } + + info->pending_callbacks |=3D (1 << idx); + info->privs[idx] =3D data; + info->last_cookies[idx] =3D cookie; + +done: + if (ctx_cookie) + *ctx_cookie =3D cookie; + return 0; +} + +static void unwind_user_task_work(struct callback_head *head) +{ + struct unwind_task_info *info =3D container_of(head, struct unwind_task_i= nfo, work); + struct task_struct *task =3D container_of(info, struct task_struct, unwin= d_task_info); + void *privs[UNWIND_MAX_CALLBACKS]; + struct unwind_stacktrace trace; + unsigned long pending; + u64 cookie =3D 0; + int i; + + BUILD_BUG_ON(UNWIND_MAX_CALLBACKS > 32); + + if (WARN_ON_ONCE(task !=3D current)) + return; + + if (WARN_ON_ONCE(!info->ctx_cookie || !info->pending_callbacks)) + return; + + scoped_guard(irqsave) { + pending =3D info->pending_callbacks; + cookie =3D info->ctx_cookie; + + info->pending_callbacks =3D 0; + info->ctx_cookie =3D 0; + memcpy(privs, info->privs, sizeof(void *) * UNWIND_MAX_CALLBACKS); + } + + if (!info->entries) { + info->entries =3D kmalloc(UNWIND_MAX_ENTRIES * sizeof(long), + GFP_KERNEL); + if (!info->entries) + return; + } + + trace.entries =3D info->entries; + trace.nr =3D 0; + unwind_user(&trace, UNWIND_MAX_ENTRIES); + + guard(rwsem_read)(&callbacks_rwsem); + + for_each_set_bit(i, &pending, UNWIND_MAX_CALLBACKS) { + if (callbacks[i]) + callbacks[i]->func(&trace, cookie, privs[i]); + } +} + +int unwind_user_register(struct unwind_callback *callback, unwind_callback= _t func) +{ + scoped_guard(rwsem_write, &callbacks_rwsem) { + for (int i =3D 0; i < UNWIND_MAX_CALLBACKS; i++) { + if (!callbacks[i]) { + callback->func =3D func; + callback->idx =3D i; + callbacks[i] =3D callback; + return 0; + } + } + } + + callback->func =3D NULL; + callback->idx =3D -1; + return -ENOSPC; +} + +int unwind_user_unregister(struct unwind_callback *callback) +{ + if (callback->idx < 0) + return -EINVAL; + + scoped_guard(rwsem_write, &callbacks_rwsem) + callbacks[callback->idx] =3D NULL; + + callback->func =3D NULL; + callback->idx =3D -1; + + return 0; +} + +void unwind_task_init(struct task_struct *task) +{ + struct unwind_task_info *info =3D &task->unwind_task_info; + + info->entries =3D NULL; + info->pending_callbacks =3D 0; + info->ctx_cookie =3D 0; + + memset(info->last_cookies, 0, sizeof(u64) * UNWIND_MAX_CALLBACKS); + memset(info->privs, 0, sizeof(u64) * UNWIND_MAX_CALLBACKS); + + init_task_work(&info->work, unwind_user_task_work); +} + +void unwind_task_free(struct task_struct *task) +{ + struct unwind_task_info *info =3D &task->unwind_task_info; + + kfree(info->entries); +} --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B26F201111; Mon, 28 Oct 2024 21:48:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152102; cv=none; b=PEx7Fr4aG0CVRARaXhA9vjzZLzaqCn+bDwCz6NaqPhJN0HnAxSq7YhPLtAaMc7vLXl07VySQvCVLR9PQ2jm1rasNJlEDwAPqaJgLwx1chwsJBlSAc5M05YglF0eqhh0RfeMgYqjFMnXWyFK862M2h4bcVp93NfSqOkonFFioZnM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152102; c=relaxed/simple; bh=Es5oY0xOQfeuhKGtAHCUsq1KBkzukwi8QHbJItNGzbg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pgCZiHuNpZhCPaLR2wTFjX8G7mC8X1I0V2ucHNg7jMr6KZRbC88pth30VC0rOvdQmY11guJmNOtkvxgsLYAvfuDFOGOAz4GImQo7psNGeBtMWJX3uKksYKAMqIasqzFlweI8/Ofyk6EOMyNabwwiUQ3mKrA/eigXHEediytKbHo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Z20iHXcy; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Z20iHXcy" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0AFD5C4CECD; Mon, 28 Oct 2024 21:48:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152101; bh=Es5oY0xOQfeuhKGtAHCUsq1KBkzukwi8QHbJItNGzbg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Z20iHXcy1Mhi6ZQ7zG8JIJqbNhpJtOrZG3zdzDD/xU9PFlVdyeSxRynyrOl0gXM6t GlyrcEIZfEiHGKzs0iLLHarNgYIYOpg8Mls4ScJOCcAbLQn+tX954im9SlJHRPRwbN MufPPMaSTKwGZpUxvE0yzmwxxIg6+4YtyqiLsNLHXhLmhxf68W6bDf5NAkxIO6f0QS SpqrHATG+kJWpeV1YRAzb69MNTfVF0KLbajwlLGoYyI6//1pKkZtDm1Q6phx9T2buw ce073EIqPYQ08MQ0K8RIwvEJAjxD6OVzOIebfBSmRgAEpy1U8L1DC4+rpkMRYoI+rw B/GE9qa2w4b2Q== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski , Namhyung Kim Subject: [PATCH v3 12/19] perf: Remove get_perf_callchain() 'init_nr' argument Date: Mon, 28 Oct 2024 14:47:39 -0700 Message-ID: <886694ebe866d9e20a55516713d9856d38e4a3a4.1730150953.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The 'init_nr' argument has double duty: it's used to initialize both the number of contexts and the number of stack entries. That's confusing and the callers always pass zero anyway. Hard code the zero. Acked-by: Namhyung Kim Signed-off-by: Josh Poimboeuf --- include/linux/perf_event.h | 2 +- kernel/bpf/stackmap.c | 4 ++-- kernel/events/callchain.c | 12 ++++++------ kernel/events/core.c | 2 +- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index fb908843f209..1e956ff9acd3 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1568,7 +1568,7 @@ DECLARE_PER_CPU(struct perf_callchain_entry, perf_cal= lchain_entry); extern void perf_callchain_user(struct perf_callchain_entry_ctx *entry, st= ruct pt_regs *regs); extern void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, = struct pt_regs *regs); extern struct perf_callchain_entry * -get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool us= er, +get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, u32 max_stack, bool crosstask, bool add_mark); extern int get_callchain_buffers(int max_stack); extern void put_callchain_buffers(void); diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 3615c06b7dfa..ec3a57a5fba1 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -314,7 +314,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, str= uct bpf_map *, map, if (max_depth > sysctl_perf_event_max_stack) max_depth =3D sysctl_perf_event_max_stack; =20 - trace =3D get_perf_callchain(regs, 0, kernel, user, max_depth, + trace =3D get_perf_callchain(regs, kernel, user, max_depth, false, false); =20 if (unlikely(!trace)) @@ -451,7 +451,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struc= t task_struct *task, else if (kernel && task) trace =3D get_callchain_entry_for_task(task, max_depth); else - trace =3D get_perf_callchain(regs, 0, kernel, user, max_depth, + trace =3D get_perf_callchain(regs, kernel, user, max_depth, crosstask, false); =20 if (unlikely(!trace) || trace->nr < skip) { diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index 8a47e52a454f..83834203e144 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -216,7 +216,7 @@ static void fixup_uretprobe_trampoline_entries(struct p= erf_callchain_entry *entr } =20 struct perf_callchain_entry * -get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool us= er, +get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, u32 max_stack, bool crosstask, bool add_mark) { struct perf_callchain_entry *entry; @@ -227,11 +227,11 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr,= bool kernel, bool user, if (!entry) return NULL; =20 - ctx.entry =3D entry; - ctx.max_stack =3D max_stack; - ctx.nr =3D entry->nr =3D init_nr; - ctx.contexts =3D 0; - ctx.contexts_maxed =3D false; + ctx.entry =3D entry; + ctx.max_stack =3D max_stack; + ctx.nr =3D entry->nr =3D 0; + ctx.contexts =3D 0; + ctx.contexts_maxed =3D false; =20 if (kernel && !user_mode(regs)) { if (add_mark) diff --git a/kernel/events/core.c b/kernel/events/core.c index df27d08a7232..1654d6e7c148 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7800,7 +7800,7 @@ perf_callchain(struct perf_event *event, struct pt_re= gs *regs) if (!kernel && !user) return &__empty_callchain; =20 - callchain =3D get_perf_callchain(regs, 0, kernel, user, + callchain =3D get_perf_callchain(regs, kernel, user, max_stack, crosstask, true); return callchain ?: &__empty_callchain; } --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63C6B201269; Mon, 28 Oct 2024 21:48:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152103; cv=none; b=o+QlGH+qcAY5GI9cFT+IbhOixPN2SgM4OwJrP6NPR+v+gTHzMJOkRVdNgsl0zA1ytzb5fPl8iPt3e0cRooKKs271LxDDMXrYwzXc16gQBejbFra7OFNS9OFhhuZ5HPI7CKuGWDMcub6qVAe/HGMH1KGjYEcR+nKuz/hT72VWQ1E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152103; c=relaxed/simple; bh=Kt5MpGbZ70VVAEdbCmfBd2vF0ClYcpLyRPJ/jXeS8c4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ksCH0douKyjA5x2wceOyOA8ci3XSxGPLBYpuQC792lJnLFOtRltM5l/8s7ePGZnRVo/MfvuKL8pdS2nldq61l2zgs65asffBSvSseaHVc8NUmitGV57EgT3cguQTEEAphDt3uUjXULRdOJiEL/gaqZUO/w0VYb0sts6AP/qwgug= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RaTz2P6n; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RaTz2P6n" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1E857C4CEE9; Mon, 28 Oct 2024 21:48:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152102; bh=Kt5MpGbZ70VVAEdbCmfBd2vF0ClYcpLyRPJ/jXeS8c4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=RaTz2P6nJCej4XpXPNWOEklykI/2pXMRZHFDUcKvLTCeHIfsbBHgbNljW44Ya8rXA hJaoL/dz/s9xNNLm7SX1OUkhsEBDIHQdgp16soCZfES3Ro5pUuQySKqqK2OpYXBG4E dwom9jF6FKffFKVf7lch59EVfHaXEqKDA8/sQNYjYbUVCz9fOZxxAl40Ef6OeMfmdn D0ZS4N7HGgaqGO/EFzd6cvnWXwO7nRgiHP8eodZyB8QJfKFZJruQsuc8EIFs3OQcs1 1i/gwH+l3HU196/D51YBA4E5XG6Bp2ZTl98YmzfSBiZNCZFNyODaKJUmwriDZSjQqs +AquqsbqscKHQ== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 13/19] perf: Remove get_perf_callchain() 'crosstask' argument Date: Mon, 28 Oct 2024 14:47:40 -0700 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" get_perf_callchain() doesn't support cross-task unwinding, so it doesn't make much sense to have 'crosstask' as an argument. Acked-by: Namhyung Kim Signed-off-by: Josh Poimboeuf --- include/linux/perf_event.h | 2 +- kernel/bpf/stackmap.c | 12 ++++-------- kernel/events/callchain.c | 6 +----- kernel/events/core.c | 9 +++++---- 4 files changed, 11 insertions(+), 18 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 1e956ff9acd3..788f6971d32d 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1569,7 +1569,7 @@ extern void perf_callchain_user(struct perf_callchain= _entry_ctx *entry, struct p extern void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, = struct pt_regs *regs); extern struct perf_callchain_entry * get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, - u32 max_stack, bool crosstask, bool add_mark); + u32 max_stack, bool add_mark); extern int get_callchain_buffers(int max_stack); extern void put_callchain_buffers(void); extern struct perf_callchain_entry *get_callchain_entry(int *rctx); diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index ec3a57a5fba1..ee9701337912 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -314,8 +314,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, str= uct bpf_map *, map, if (max_depth > sysctl_perf_event_max_stack) max_depth =3D sysctl_perf_event_max_stack; =20 - trace =3D get_perf_callchain(regs, kernel, user, max_depth, - false, false); + trace =3D get_perf_callchain(regs, kernel, user, max_depth, false); =20 if (unlikely(!trace)) /* couldn't fetch the stack trace */ @@ -430,10 +429,8 @@ static long __bpf_get_stack(struct pt_regs *regs, stru= ct task_struct *task, if (task && user && !user_mode(regs)) goto err_fault; =20 - /* get_perf_callchain does not support crosstask user stack walking - * but returns an empty stack instead of NULL. - */ - if (crosstask && user) { + /* get_perf_callchain() does not support crosstask stack walking */ + if (crosstask) { err =3D -EOPNOTSUPP; goto clear; } @@ -451,8 +448,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struc= t task_struct *task, else if (kernel && task) trace =3D get_callchain_entry_for_task(task, max_depth); else - trace =3D get_perf_callchain(regs, kernel, user, max_depth, - crosstask, false); + trace =3D get_perf_callchain(regs, kernel, user, max_depth,false); =20 if (unlikely(!trace) || trace->nr < skip) { if (may_fault) diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index 83834203e144..655fb25a725b 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -217,7 +217,7 @@ static void fixup_uretprobe_trampoline_entries(struct p= erf_callchain_entry *entr =20 struct perf_callchain_entry * get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, - u32 max_stack, bool crosstask, bool add_mark) + u32 max_stack, bool add_mark) { struct perf_callchain_entry *entry; struct perf_callchain_entry_ctx ctx; @@ -248,9 +248,6 @@ get_perf_callchain(struct pt_regs *regs, bool kernel, b= ool user, } =20 if (regs) { - if (crosstask) - goto exit_put; - if (add_mark) perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); =20 @@ -260,7 +257,6 @@ get_perf_callchain(struct pt_regs *regs, bool kernel, b= ool user, } } =20 -exit_put: put_callchain_entry(rctx); =20 return entry; diff --git a/kernel/events/core.c b/kernel/events/core.c index 1654d6e7c148..ebf143aa427b 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7792,16 +7792,17 @@ perf_callchain(struct perf_event *event, struct pt_= regs *regs) { bool kernel =3D !event->attr.exclude_callchain_kernel; bool user =3D !event->attr.exclude_callchain_user; - /* Disallow cross-task user callchains. */ - bool crosstask =3D event->ctx->task && event->ctx->task !=3D current; const u32 max_stack =3D event->attr.sample_max_stack; struct perf_callchain_entry *callchain; =20 if (!kernel && !user) return &__empty_callchain; =20 - callchain =3D get_perf_callchain(regs, kernel, user, - max_stack, crosstask, true); + /* Disallow cross-task callchains. */ + if (event->ctx->task && event->ctx->task !=3D current) + return &__empty_callchain; + + callchain =3D get_perf_callchain(regs, kernel, user, max_stack, true); return callchain ?: &__empty_callchain; } =20 --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74DEF2022C9; Mon, 28 Oct 2024 21:48:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152104; cv=none; b=tyuzBdpTHfkDcSWzfvAM5sck+s0nmY2VOXVKZKly3oD1DoYJ/ojHU/EWz7oNQ6dDKQW/hcO6Jw1ovcCcwGvdC9jADUGReD0MwWHKvxML+LfVotf15DyJCSeiBJeaOyj4DzkWRJZZISi38lkE8W1fEnvQ7eElNHMthJWL4FOYRwk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152104; c=relaxed/simple; bh=FAGQnaAp9X/oqvdaCFG+QdxKrKoTF5tQk3+U3LCGtAU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ndJGVdy+mDLFHXW/nfjBfmtNEPu1Wzx+7Ok8wufsGxXuUlHw5nxeA3/zcwKCGesSSnnSU158z10mwhGFifRdRBgk4madjg4nRAZWYujnDWT6P1yUigAQMv4GPGJz8U7R/KU3EYl4BhezhbufI/xz9SjXiuRZMDnt4mwgD9aN6OE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=a8YgP67n; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="a8YgP67n" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 21068C4CEE8; Mon, 28 Oct 2024 21:48:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152104; bh=FAGQnaAp9X/oqvdaCFG+QdxKrKoTF5tQk3+U3LCGtAU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=a8YgP67nbXTS84HJnqRj829E6UCIhjtMS5zu6d2iX9eVIT9kV11oDxbme9g7htqTm jYeSNt/PQ3RMRolswXPWsbk9uSnaUkgt9KIATrlcV7/Vak/S79UlGJJFYUrIA/EvAt Dpdw8z9cgwYppw+R2LXKXFTT6u9WzeUZLMpfW+Usgt3MVe0RxUiqgmRTYulzyxoKF5 +b0zqkspVtQ2ln1iFNnnZJObOgEnCX03AK+8knJmaCPHXQPyMp5D46N5nwkcfwnxzY ps72KpLVHYFKYvStyZfzfJCJi7tV4XGhe/4PC2V0+bsOcKt8h54clhZkTQx6UCEBGk kNTL+Ohh8GNJw== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 14/19] perf: Simplify get_perf_callchain() user logic Date: Mon, 28 Oct 2024 14:47:41 -0700 Message-ID: <4d141847262054cc73551fa33609bc3099f0d2f9.1730150953.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Simplify the get_perf_callchain() user logic a bit. task_pt_regs() should never be NULL. Acked-by: Namhyung Kim Signed-off-by: Josh Poimboeuf --- kernel/events/callchain.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index 655fb25a725b..2278402b7ac9 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -241,22 +241,20 @@ get_perf_callchain(struct pt_regs *regs, bool kernel,= bool user, =20 if (user) { if (!user_mode(regs)) { - if (current->mm) - regs =3D task_pt_regs(current); - else - regs =3D NULL; + if (!current->mm) + goto exit_put; + regs =3D task_pt_regs(current); } =20 - if (regs) { - if (add_mark) - perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); + if (add_mark) + perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); =20 - start_entry_idx =3D entry->nr; - perf_callchain_user(&ctx, regs); - fixup_uretprobe_trampoline_entries(entry, start_entry_idx); - } + start_entry_idx =3D entry->nr; + perf_callchain_user(&ctx, regs); + fixup_uretprobe_trampoline_entries(entry, start_entry_idx); } =20 +exit_put: put_callchain_entry(rctx); =20 return entry; --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7001D2022F0; Mon, 28 Oct 2024 21:48:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152105; cv=none; b=TbOZvp7yVZUbylOhhaKVFv2h5TOfjU469mPBrww2xLvWsVKHk+oQLsUGP/DJCKzDDNnouyl0iVfIdLlSQ4qOnLQA/kK7bVlpxS6oBc540jdGBqwzhA+cwz3tVctPnPZpxxlfb0cN8LDAfRMQXJ6GkB56Fiv8TYCn9hL+B5RiRPg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152105; c=relaxed/simple; bh=BicEo2O0Udl64XLuU3OuWtNR3+Ob8ZlZ0QEDE8lLAZE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hRt1ApzeibNmc1mUQMeo3iGeN5cUWFAvYD/Z2gc77NxSCogvzwH5bNL9QYyAPW+Ssnh37oMnfBJExogGehfwfs/vrdJUH8oR026qL4imN1veF4l/6zS5In1ak9wd+0ocbJUWMkN89IIAQ6mH4fDbod5OMWYPgI8z+jryoWpdSZk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GpKgQcny; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GpKgQcny" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 26EFAC4CEE9; Mon, 28 Oct 2024 21:48:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152105; bh=BicEo2O0Udl64XLuU3OuWtNR3+Ob8ZlZ0QEDE8lLAZE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GpKgQcnyUPTPNvBhvfPV3yf0r/NW8TngUvZBraTuvGpEWeoPgOnJGm+OYXP3YLq83 Ui6KbmlyIi4jKHVWc+/Fnyr+yZQ5nVDlh5hHpQdTII61dp5KOHxaZNd6rvz1EbxBis ZebOnGZImA6T9uCLf+B5GumktZ8B2UXB3Fb1yetRlvpoff8fOKHw8WBivmG75HWzSY HvIKyYA/JACFdVhb6ThHn8gXYutF0UtKq4c751ZXsOTwIfWc4VIlH2gzTR01s7Nlvn 8rq9tfzkhnIclt2JWNj1bi6UC1IzmPAJUvKH+YLBaupmmoMgNT9xhUFnjrD4/oZmNE nlnBZ4gNHRfNw== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 15/19] perf: Add deferred user callchains Date: Mon, 28 Oct 2024 14:47:42 -0700 Message-ID: <1ce857387c781afa66efaa61eb88ff596b352500.1730150953.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Instead of attempting to unwind user space from the NMI handler, defer it to run in task context by sending a self-IPI and then scheduling the unwind to run in the IRQ's exit task work before returning to user space. This allows the user stack page to be paged in if needed, avoids duplicate unwinds for kernel-bound workloads, and prepares for SFrame unwinding (so .sframe sections can be paged in on demand). Suggested-by: Steven Rostedt Suggested-by: Peter Zijlstra Signed-off-by: Josh Poimboeuf --- arch/Kconfig | 3 ++ include/linux/perf_event.h | 10 ++++- include/uapi/linux/perf_event.h | 22 +++++++++- kernel/bpf/stackmap.c | 6 +-- kernel/events/callchain.c | 11 ++++- kernel/events/core.c | 63 ++++++++++++++++++++++++++- tools/include/uapi/linux/perf_event.h | 22 +++++++++- 7 files changed, 129 insertions(+), 8 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index e769c39dd221..33449485eafd 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -446,6 +446,9 @@ config HAVE_UNWIND_USER_SFRAME bool select UNWIND_USER =20 +config HAVE_PERF_CALLCHAIN_DEFERRED + bool + config HAVE_PERF_REGS bool help diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 788f6971d32d..2193b3d16820 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -808,9 +808,11 @@ struct perf_event { unsigned long pending_addr; /* SIGTRAP */ struct irq_work pending_irq; struct irq_work pending_disable_irq; + struct irq_work pending_unwind_irq; struct callback_head pending_task; unsigned int pending_work; struct rcuwait pending_work_wait; + unsigned int pending_unwind; =20 atomic_t event_limit; =20 @@ -1569,12 +1571,18 @@ extern void perf_callchain_user(struct perf_callcha= in_entry_ctx *entry, struct p extern void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, = struct pt_regs *regs); extern struct perf_callchain_entry * get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, - u32 max_stack, bool add_mark); + u32 max_stack, bool add_mark, bool defer_user); extern int get_callchain_buffers(int max_stack); extern void put_callchain_buffers(void); extern struct perf_callchain_entry *get_callchain_entry(int *rctx); extern void put_callchain_entry(int rctx); =20 +#ifdef CONFIG_HAVE_PERF_CALLCHAIN_DEFERRED +extern void perf_callchain_user_deferred(struct perf_callchain_entry_ctx *= entry, struct pt_regs *regs); +#else +static inline void perf_callchain_user_deferred(struct perf_callchain_entr= y_ctx *entry, struct pt_regs *regs) {} +#endif + extern int sysctl_perf_event_max_stack; extern int sysctl_perf_event_max_contexts_per_stack; =20 diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_even= t.h index 4842c36fdf80..6d0524b7d082 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -460,7 +460,8 @@ struct perf_event_attr { inherit_thread : 1, /* children only inherit if cloned with CLONE_THR= EAD */ remove_on_exec : 1, /* event is removed from task on exec */ sigtrap : 1, /* send synchronous SIGTRAP on event */ - __reserved_1 : 26; + defer_callchain: 1, /* generate PERF_RECORD_CALLCHAIN_DEFERRED record= s */ + __reserved_1 : 25; =20 union { __u32 wakeup_events; /* wakeup every n events */ @@ -1217,6 +1218,24 @@ enum perf_event_type { */ PERF_RECORD_AUX_OUTPUT_HW_ID =3D 21, =20 + /* + * This user callchain capture was deferred until shortly before + * returning to user space. Previous samples would have kernel + * callchains only and they need to be stitched with this to make full + * callchains. + * + * TODO: do PERF_SAMPLE_{REGS,STACK}_USER also need deferral? + * + * struct { + * struct perf_event_header header; + * u64 ctx_cookie; + * u64 nr; + * u64 ips[nr]; + * struct sample_id sample_id; + * }; + */ + PERF_RECORD_CALLCHAIN_DEFERRED =3D 22, + PERF_RECORD_MAX, /* non-ABI */ }; =20 @@ -1247,6 +1266,7 @@ enum perf_callchain_context { PERF_CONTEXT_HV =3D (__u64)-32, PERF_CONTEXT_KERNEL =3D (__u64)-128, PERF_CONTEXT_USER =3D (__u64)-512, + PERF_CONTEXT_USER_DEFERRED =3D (__u64)-640, =20 PERF_CONTEXT_GUEST =3D (__u64)-2048, PERF_CONTEXT_GUEST_KERNEL =3D (__u64)-2176, diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index ee9701337912..f073ebaf9c30 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -314,8 +314,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, str= uct bpf_map *, map, if (max_depth > sysctl_perf_event_max_stack) max_depth =3D sysctl_perf_event_max_stack; =20 - trace =3D get_perf_callchain(regs, kernel, user, max_depth, false); - + trace =3D get_perf_callchain(regs, kernel, user, max_depth, false, false); if (unlikely(!trace)) /* couldn't fetch the stack trace */ return -EFAULT; @@ -448,7 +447,8 @@ static long __bpf_get_stack(struct pt_regs *regs, struc= t task_struct *task, else if (kernel && task) trace =3D get_callchain_entry_for_task(task, max_depth); else - trace =3D get_perf_callchain(regs, kernel, user, max_depth,false); + trace =3D get_perf_callchain(regs, kernel, user, max_depth, + false, false); =20 if (unlikely(!trace) || trace->nr < skip) { if (may_fault) diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index 2278402b7ac9..eeb15ba0137f 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -217,7 +217,7 @@ static void fixup_uretprobe_trampoline_entries(struct p= erf_callchain_entry *entr =20 struct perf_callchain_entry * get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, - u32 max_stack, bool add_mark) + u32 max_stack, bool add_mark, bool defer_user) { struct perf_callchain_entry *entry; struct perf_callchain_entry_ctx ctx; @@ -246,6 +246,15 @@ get_perf_callchain(struct pt_regs *regs, bool kernel, = bool user, regs =3D task_pt_regs(current); } =20 + if (defer_user) { + /* + * Foretell the coming of PERF_RECORD_CALLCHAIN_DEFERRED + * which can be stitched to this one. + */ + perf_callchain_store_context(&ctx, PERF_CONTEXT_USER_DEFERRED); + goto exit_put; + } + if (add_mark) perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); =20 diff --git a/kernel/events/core.c b/kernel/events/core.c index ebf143aa427b..bf97b2fa8a9c 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -55,11 +55,14 @@ #include #include #include +#include =20 #include "internal.h" =20 #include =20 +static struct unwind_callback perf_unwind_callback_cb; + typedef int (*remote_function_f)(void *); =20 struct remote_function_call { @@ -6955,6 +6958,53 @@ static void perf_pending_irq(struct irq_work *entry) perf_swevent_put_recursion_context(rctx); } =20 +static void perf_pending_unwind_irq(struct irq_work *entry) +{ + struct perf_event *event =3D container_of(entry, struct perf_event, pendi= ng_unwind_irq); + + if (event->pending_unwind) { + unwind_user_deferred(&perf_unwind_callback_cb, NULL, event); + event->pending_unwind =3D 0; + } +} + +struct perf_callchain_deferred_event { + struct perf_event_header header; + u64 ctx_cookie; + u64 nr; + u64 ips[]; +}; + +static void perf_event_callchain_deferred(struct unwind_stacktrace *trace, + u64 ctx_cookie, void *_data) +{ + struct perf_callchain_deferred_event deferred_event; + u64 callchain_context =3D PERF_CONTEXT_USER; + struct perf_output_handle handle; + struct perf_event *event =3D _data; + struct perf_sample_data data; + u64 nr =3D trace->nr + 1 /* callchain_context */; + + deferred_event.header.type =3D PERF_RECORD_CALLCHAIN_DEFERRED; + deferred_event.header.misc =3D PERF_RECORD_MISC_USER; + deferred_event.header.size =3D sizeof(deferred_event) + (nr * sizeof(u64)= ); + + deferred_event.ctx_cookie =3D ctx_cookie; + deferred_event.nr =3D nr; + + perf_event_header__init_id(&deferred_event.header, &data, event); + + if (perf_output_begin(&handle, &data, event, deferred_event.header.size)) + return; + + perf_output_put(&handle, deferred_event); + perf_output_put(&handle, callchain_context); + perf_output_copy(&handle, trace->entries, trace->nr * sizeof(u64)); + perf_event__output_id_sample(event, &handle, &data); + + perf_output_end(&handle); +} + static void perf_pending_task(struct callback_head *head) { struct perf_event *event =3D container_of(head, struct perf_event, pendin= g_task); @@ -7794,6 +7844,8 @@ perf_callchain(struct perf_event *event, struct pt_re= gs *regs) bool user =3D !event->attr.exclude_callchain_user; const u32 max_stack =3D event->attr.sample_max_stack; struct perf_callchain_entry *callchain; + bool defer_user =3D IS_ENABLED(CONFIG_UNWIND_USER) && + event->attr.defer_callchain; =20 if (!kernel && !user) return &__empty_callchain; @@ -7802,7 +7854,14 @@ perf_callchain(struct perf_event *event, struct pt_r= egs *regs) if (event->ctx->task && event->ctx->task !=3D current) return &__empty_callchain; =20 - callchain =3D get_perf_callchain(regs, kernel, user, max_stack, true); + callchain =3D get_perf_callchain(regs, kernel, user, max_stack, true, + defer_user); + + if (user && defer_user && !event->pending_unwind) { + event->pending_unwind =3D 1; + irq_work_queue(&event->pending_unwind_irq); + } + return callchain ?: &__empty_callchain; } =20 @@ -12171,6 +12230,7 @@ perf_event_alloc(struct perf_event_attr *attr, int = cpu, =20 init_waitqueue_head(&event->waitq); init_irq_work(&event->pending_irq, perf_pending_irq); + event->pending_unwind_irq =3D IRQ_WORK_INIT_HARD(perf_pending_unwind_irq); event->pending_disable_irq =3D IRQ_WORK_INIT_HARD(perf_pending_disable); init_task_work(&event->pending_task, perf_pending_task); rcuwait_init(&event->pending_work_wait); @@ -14093,6 +14153,7 @@ void __init perf_event_init(void) perf_tp_register(); perf_event_init_cpu(smp_processor_id()); register_reboot_notifier(&perf_reboot_notifier); + unwind_user_register(&perf_unwind_callback_cb, perf_event_callchain_defer= red); =20 ret =3D init_hw_breakpoint(); WARN(ret, "hw_breakpoint initialization failed with: %d", ret); diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/lin= ux/perf_event.h index 4842c36fdf80..6d0524b7d082 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -460,7 +460,8 @@ struct perf_event_attr { inherit_thread : 1, /* children only inherit if cloned with CLONE_THR= EAD */ remove_on_exec : 1, /* event is removed from task on exec */ sigtrap : 1, /* send synchronous SIGTRAP on event */ - __reserved_1 : 26; + defer_callchain: 1, /* generate PERF_RECORD_CALLCHAIN_DEFERRED record= s */ + __reserved_1 : 25; =20 union { __u32 wakeup_events; /* wakeup every n events */ @@ -1217,6 +1218,24 @@ enum perf_event_type { */ PERF_RECORD_AUX_OUTPUT_HW_ID =3D 21, =20 + /* + * This user callchain capture was deferred until shortly before + * returning to user space. Previous samples would have kernel + * callchains only and they need to be stitched with this to make full + * callchains. + * + * TODO: do PERF_SAMPLE_{REGS,STACK}_USER also need deferral? + * + * struct { + * struct perf_event_header header; + * u64 ctx_cookie; + * u64 nr; + * u64 ips[nr]; + * struct sample_id sample_id; + * }; + */ + PERF_RECORD_CALLCHAIN_DEFERRED =3D 22, + PERF_RECORD_MAX, /* non-ABI */ }; =20 @@ -1247,6 +1266,7 @@ enum perf_callchain_context { PERF_CONTEXT_HV =3D (__u64)-32, PERF_CONTEXT_KERNEL =3D (__u64)-128, PERF_CONTEXT_USER =3D (__u64)-512, + PERF_CONTEXT_USER_DEFERRED =3D (__u64)-640, =20 PERF_CONTEXT_GUEST =3D (__u64)-2048, PERF_CONTEXT_GUEST_KERNEL =3D (__u64)-2176, --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 27638202630; Mon, 28 Oct 2024 21:48:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152106; cv=none; b=h8EfVwSwvkqVrLfhBgsyLWL1iymH7CpgwETIHACnLarq5MtwPqAG2umv//U+4173hg1g2OWZMaZ3gZIgwEscvC1r/5KOhXBMX5C8hpyOYi7ZGuh4CJqABU6O5/BLfWYFDx9hNK3267FaBNjPewuP6b+lYeipx98N5Z/AsOYzdXk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152106; c=relaxed/simple; bh=4ySqHpb26PEIVwpjzY9d4GdAV8RZQRDkbWdvkKJsevA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DeMGFhqUz3l0WqsRhB6PwQLtb+Di/MTNq98dmNzp9npbOl5lAgUxxrGiEUdgcXVSblg27iqkaIuHxHTxvcNJVQr1N1PhVJaSPrP0JLIVTR10S0mbEqmQ1DpUlEiofUF1f93JLA7xpB+fFSIlMAj6Pea8sxVNPIWvPPUTrC40+q4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TpOMz08N; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TpOMz08N" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2BE59C4CEE8; Mon, 28 Oct 2024 21:48:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152106; bh=4ySqHpb26PEIVwpjzY9d4GdAV8RZQRDkbWdvkKJsevA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TpOMz08N7ORrhs6lLuJyu87Lbb46PTz5FxS39m0XHFQLMGh4EbR6oHq4+cgyLYulu 46HWe867wq3bV4yX7ulYo63zOPHLhWfEnqUf/SjJEDIqsOYEPEjG2bQCQt09jQNAL6 hpsdcBgG9vDX1jtFqGI4ohHzpAowSDYRxv0lR6pktKnAuKNYeFQL34wHXgsca78jmE uCY6M8ENAV90i0G2p56AOlX7Wnp2EgMft8UNm/PTxugOiAJKZOf73m9k+zyyaE2l9z vKjyG9gPP5KQYD70uUy9cGPR2NDGtL+59J83axqthnW1Qq4qQq2sG7hB3cCFRcDulU CeBVhGL5GWfVg== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 16/19] perf tools: Minimal CALLCHAIN_DEFERRED support Date: Mon, 28 Oct 2024 14:47:43 -0700 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Namhyung Kim Add a new event type for deferred callchains and a new callback for the struct perf_tool. For now it doesn't actually handle the deferred callchains but it just marks the sample if it has the PERF_CONTEXT_ USER_DEFFERED in the callchain array. At least, perf report can dump the raw data with this change. Actually this requires the next commit to enable attr.defer_callchain, but if you already have a data file, it'll show the following result. $ perf report -D ... 0x5fe0@perf.data [0x40]: event: 22 . . ... raw event: size 64 bytes . 0000: 16 00 00 00 02 00 40 00 02 00 00 00 00 00 00 00 ......@.......= .. . 0010: 00 fe ff ff ff ff ff ff 4b d3 3f 25 45 7f 00 00 ........K.?%E.= .. . 0020: 21 03 00 00 21 03 00 00 43 02 12 ab 05 00 00 00 !...!...C.....= .. . 0030: 00 00 00 00 00 00 00 00 09 00 00 00 00 00 00 00 ..............= .. 0 24344920643 0x5fe0 [0x40]: PERF_RECORD_CALLCHAIN_DEFERRED(IP, 0x2): 801= /801: 0 ... FP chain: nr:2 ..... 0: fffffffffffffe00 ..... 1: 00007f45253fd34b : unhandled! Signed-off-by: Namhyung Kim Signed-off-by: Josh Poimboeuf --- tools/lib/perf/include/perf/event.h | 7 +++++++ tools/perf/util/event.c | 1 + tools/perf/util/evsel.c | 15 +++++++++++++++ tools/perf/util/machine.c | 1 + tools/perf/util/perf_event_attr_fprintf.c | 1 + tools/perf/util/sample.h | 3 ++- tools/perf/util/session.c | 17 +++++++++++++++++ tools/perf/util/tool.c | 1 + tools/perf/util/tool.h | 3 ++- 9 files changed, 47 insertions(+), 2 deletions(-) diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/p= erf/event.h index 37bb7771d914..f643a6a2b9fc 100644 --- a/tools/lib/perf/include/perf/event.h +++ b/tools/lib/perf/include/perf/event.h @@ -151,6 +151,12 @@ struct perf_record_switch { __u32 next_prev_tid; }; =20 +struct perf_record_callchain_deferred { + struct perf_event_header header; + __u64 nr; + __u64 ips[]; +}; + struct perf_record_header_attr { struct perf_event_header header; struct perf_event_attr attr; @@ -494,6 +500,7 @@ union perf_event { struct perf_record_read read; struct perf_record_throttle throttle; struct perf_record_sample sample; + struct perf_record_callchain_deferred callchain_deferred; struct perf_record_bpf_event bpf; struct perf_record_ksymbol ksymbol; struct perf_record_text_poke_event text_poke; diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index aac96d5d1917..8cdec373db44 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -58,6 +58,7 @@ static const char *perf_event__names[] =3D { [PERF_RECORD_CGROUP] =3D "CGROUP", [PERF_RECORD_TEXT_POKE] =3D "TEXT_POKE", [PERF_RECORD_AUX_OUTPUT_HW_ID] =3D "AUX_OUTPUT_HW_ID", + [PERF_RECORD_CALLCHAIN_DEFERRED] =3D "CALLCHAIN_DEFERRED", [PERF_RECORD_HEADER_ATTR] =3D "ATTR", [PERF_RECORD_HEADER_EVENT_TYPE] =3D "EVENT_TYPE", [PERF_RECORD_HEADER_TRACING_DATA] =3D "TRACING_DATA", diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index dbf9c8cee3c5..701092d6b1b6 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -2676,6 +2676,18 @@ int evsel__parse_sample(struct evsel *evsel, union p= erf_event *event, data->data_src =3D PERF_MEM_DATA_SRC_NONE; data->vcpu =3D -1; =20 + if (event->header.type =3D=3D PERF_RECORD_CALLCHAIN_DEFERRED) { + const u64 max_callchain_nr =3D UINT64_MAX / sizeof(u64); + + data->callchain =3D (struct ip_callchain *)&event->callchain_deferred.nr; + if (data->callchain->nr > max_callchain_nr) + return -EFAULT; + + if (evsel->core.attr.sample_id_all) + perf_evsel__parse_id_sample(evsel, event, data); + return 0; + } + if (event->header.type !=3D PERF_RECORD_SAMPLE) { if (!evsel->core.attr.sample_id_all) return 0; @@ -2806,6 +2818,9 @@ int evsel__parse_sample(struct evsel *evsel, union pe= rf_event *event, if (data->callchain->nr > max_callchain_nr) return -EFAULT; sz =3D data->callchain->nr * sizeof(u64); + if (evsel->core.attr.defer_callchain && data->callchain->nr >=3D 1 && + data->callchain->ips[data->callchain->nr - 1] =3D=3D PERF_CONTEXT_US= ER_DEFERRED) + data->deferred_callchain =3D true; OVERFLOW_CHECK(array, sz, max_size); array =3D (void *)array + sz; } diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index fad227b625d1..f367577c91ff 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2085,6 +2085,7 @@ static int add_callchain_ip(struct thread *thread, *cpumode =3D PERF_RECORD_MISC_KERNEL; break; case PERF_CONTEXT_USER: + case PERF_CONTEXT_USER_DEFERRED: *cpumode =3D PERF_RECORD_MISC_USER; break; default: diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/pe= rf_event_attr_fprintf.c index 59fbbba79697..113845b35110 100644 --- a/tools/perf/util/perf_event_attr_fprintf.c +++ b/tools/perf/util/perf_event_attr_fprintf.c @@ -321,6 +321,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_even= t_attr *attr, PRINT_ATTRf(inherit_thread, p_unsigned); PRINT_ATTRf(remove_on_exec, p_unsigned); PRINT_ATTRf(sigtrap, p_unsigned); + PRINT_ATTRf(defer_callchain, p_unsigned); =20 PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsig= ned, false); PRINT_ATTRf(bp_type, p_unsigned); diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h index 70b2c3135555..010659dc80f8 100644 --- a/tools/perf/util/sample.h +++ b/tools/perf/util/sample.h @@ -108,7 +108,8 @@ struct perf_sample { u16 p_stage_cyc; u16 retire_lat; }; - bool no_hw_idx; /* No hw_idx collected in branch_stack */ + bool no_hw_idx; /* No hw_idx collected in branch_stack */ + bool deferred_callchain; /* Has deferred user callchains */ char insn[MAX_INSN]; void *raw_data; struct ip_callchain *callchain; diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index dbaf07bf6c5f..1248a0317a2f 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -714,6 +714,7 @@ static perf_event__swap_op perf_event__swap_ops[] =3D { [PERF_RECORD_CGROUP] =3D perf_event__cgroup_swap, [PERF_RECORD_TEXT_POKE] =3D perf_event__text_poke_swap, [PERF_RECORD_AUX_OUTPUT_HW_ID] =3D perf_event__all64_swap, + [PERF_RECORD_CALLCHAIN_DEFERRED] =3D perf_event__all64_swap, [PERF_RECORD_HEADER_ATTR] =3D perf_event__hdr_attr_swap, [PERF_RECORD_HEADER_EVENT_TYPE] =3D perf_event__event_type_swap, [PERF_RECORD_HEADER_TRACING_DATA] =3D perf_event__tracing_data_swap, @@ -1107,6 +1108,19 @@ static void dump_sample(struct evsel *evsel, union p= erf_event *event, sample_read__printf(sample, evsel->core.attr.read_format); } =20 +static void dump_deferred_callchain(struct evsel *evsel, union perf_event = *event, + struct perf_sample *sample) +{ + if (!dump_trace) + return; + + printf("(IP, 0x%x): %d/%d: %#" PRIx64 "\n", + event->header.misc, sample->pid, sample->tid, sample->ip); + + if (evsel__has_callchain(evsel)) + callchain__printf(evsel, sample); +} + static void dump_read(struct evsel *evsel, union perf_event *event) { struct perf_record_read *read_event =3D &event->read; @@ -1327,6 +1341,9 @@ static int machines__deliver_event(struct machines *m= achines, return tool->text_poke(tool, event, sample, machine); case PERF_RECORD_AUX_OUTPUT_HW_ID: return tool->aux_output_hw_id(tool, event, sample, machine); + case PERF_RECORD_CALLCHAIN_DEFERRED: + dump_deferred_callchain(evsel, event, sample); + return tool->callchain_deferred(tool, event, sample, evsel, machine); default: ++evlist->stats.nr_unknown_events; return -1; diff --git a/tools/perf/util/tool.c b/tools/perf/util/tool.c index 3b7f390f26eb..e78f16de912e 100644 --- a/tools/perf/util/tool.c +++ b/tools/perf/util/tool.c @@ -259,6 +259,7 @@ void perf_tool__init(struct perf_tool *tool, bool order= ed_events) tool->read =3D process_event_sample_stub; tool->throttle =3D process_event_stub; tool->unthrottle =3D process_event_stub; + tool->callchain_deferred =3D process_event_sample_stub; tool->attr =3D process_event_synth_attr_stub; tool->event_update =3D process_event_synth_event_update_stub; tool->tracing_data =3D process_event_synth_tracing_data_stub; diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h index db1c7642b0d1..9987bbde6d5e 100644 --- a/tools/perf/util/tool.h +++ b/tools/perf/util/tool.h @@ -42,7 +42,8 @@ enum show_feature_header { =20 struct perf_tool { event_sample sample, - read; + read, + callchain_deferred; event_op mmap, mmap2, comm, --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2A78D20265A; Mon, 28 Oct 2024 21:48:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152107; cv=none; b=IP+q8HL88ByUISCOnwzU1Vxmpl/NTVwVMV1amklfBxtTHV1Ws+V1WR4ap/dt65pL5Ir/FQG7cvW334uM4XLQAnAgRdT3iAoZrwtfhUkjA5GkX9PI4WDsP5LuojdbuoU4rOjzaqKEdUCEnWja8Hjw9g4zRJimD4Bl/cX2ix3rp38= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152107; c=relaxed/simple; bh=VzfRU4G/Kb7tAIbwnjXLnk+gZjKfLSgD/FG+/ML7ies=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lGai9aGHRHkCMkrb+EXfFO0SzjuFm9tqxkFuy7Cqe+EDGhRAiJdTvOM8LkbmeSIzNrzrBZLgeDN7neuehOB1axPZFieZNCmzysMfZCOn/vyiqrecG7y14X+CH4OQLOiApTlxSXhO4zmHBltbIpA7on4iajR43mr3dYyWH03pfZE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ivH6GT58; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ivH6GT58" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 265BAC4CEEB; Mon, 28 Oct 2024 21:48:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152107; bh=VzfRU4G/Kb7tAIbwnjXLnk+gZjKfLSgD/FG+/ML7ies=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ivH6GT58gjZgyJMxuEayQGz8xgpRdBtk/oL8Djp2oLtCbm8C825dMP5wxwtz9WcFl 86UQVSXHpFuM4Y1yu3p0DS/NhKj8BFNWDQUnjOAl7Zub3/4qKQW4W0bdh81KzLbAKe 5ZhS78iGSFM7OdGrHMtb0wabd1v8/NGc6NHRBxVY4OhKQRez62bSPxbuld0tJPzvFR jLc31bhrYCe3rrng2x5a4ZSlNGfram7FC7r8dcmfaY66B7QE51dpU0+x6ykLgmNAew xUy0dimS5xDLlyxr1rcAy3fDoM/H1iA3BD3ZYUdj4vZ0msuyaW5e+z45x5eCJ7PdCb T4DLtuGQePXag== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 17/19] perf record: Enable defer_callchain for user callchains Date: Mon, 28 Oct 2024 14:47:44 -0700 Message-ID: <498473058e06c83fdd01ea9e83d499b6233ce080.1730150953.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Namhyung Kim And add the missing feature detection logic to clear the flag on old kernels. $ perf record -g -vv true ... ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CALLCHAIN|PERIOD read_format ID|LOST disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 sample_id_all 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 defer_callchain 1 ------------------------------------------------------------ sys_perf_event_open: pid 162755 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open failed, error -22 switching off deferred callchain support Signed-off-by: Namhyung Kim Signed-off-by: Josh Poimboeuf --- tools/perf/util/evsel.c | 17 ++++++++++++++++- tools/perf/util/evsel.h | 1 + 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 701092d6b1b6..ad89644b32f2 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -912,6 +912,14 @@ static void __evsel__config_callchain(struct evsel *ev= sel, struct record_opts *o } } =20 + if (param->record_mode =3D=3D CALLCHAIN_FP && !attr->exclude_callchain_us= er) { + /* + * Enable deferred callchains optimistically. It'll be switched + * off later if the kernel doesn't support it. + */ + attr->defer_callchain =3D 1; + } + if (function) { pr_info("Disabling user space callchains for function trace event.\n"); attr->exclude_callchain_user =3D 1; @@ -2089,6 +2097,8 @@ static int __evsel__prepare_open(struct evsel *evsel,= struct perf_cpu_map *cpus, =20 static void evsel__disable_missing_features(struct evsel *evsel) { + if (perf_missing_features.defer_callchain) + evsel->core.attr.defer_callchain =3D 0; if (perf_missing_features.branch_counters) evsel->core.attr.branch_sample_type &=3D ~PERF_SAMPLE_BRANCH_COUNTERS; if (perf_missing_features.read_lost) @@ -2144,7 +2154,12 @@ bool evsel__detect_missing_features(struct evsel *ev= sel) * Must probe features in the order they were added to the * perf_event_attr interface. */ - if (!perf_missing_features.branch_counters && + if (!perf_missing_features.defer_callchain && + evsel->core.attr.defer_callchain) { + perf_missing_features.defer_callchain =3D true; + pr_debug2("switching off deferred callchain support\n"); + return true; + } else if (!perf_missing_features.branch_counters && (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS)) { perf_missing_features.branch_counters =3D true; pr_debug2("switching off branch counters support\n"); diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index 15e745a9a798..f0a1e1d78942 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -221,6 +221,7 @@ struct perf_missing_features { bool weight_struct; bool read_lost; bool branch_counters; + bool defer_callchain; }; =20 extern struct perf_missing_features perf_missing_features; --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E6E0202F89; Mon, 28 Oct 2024 21:48:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152108; cv=none; b=iTjrmvZYNIxm2wAxQI4RsCOBUd7gvL3PXaxov+dsSvpuIjve3+bF/JvsuFtdBhGx4kQ18X+6QnynvwZn5temLk8L1f7bbRrQWNDaQVnxk7WWK2ehDbzS6w8x06DI430VYGjC/V1cZbqG5wkjCCRkzEOFdvAbwAPo9ID1y2zX8pQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152108; c=relaxed/simple; bh=ypsDBvSscFTP4kYetsVF+YmFMPEp+IMuIsyncuhHNwU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UUrwTju3elfoTytGeCe6N2LolpHHjQqhrUnxdhly++AXdhbbw3AobaIOMfou+9ceSSqYtb7HecibSKxHZhkx+C5+ulY8gh53TLkhOgdFEck7cRJAZdOEA2iZ+q3U0zvNy6fAJBWRdkasctCEjC48kmhiav/JgSJVtBRjAM4XHFU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=iVrP9zNI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iVrP9zNI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2DE43C4CEEA; Mon, 28 Oct 2024 21:48:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152108; bh=ypsDBvSscFTP4kYetsVF+YmFMPEp+IMuIsyncuhHNwU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=iVrP9zNIoOe31C4yYHG0CxtfJsV+1KIkhPvlmWvGsxp91zHoJ5Mm56U0O8lCoxAzX hUIr3kvXytSJOw5rSPas04XSB11hZS83aQm/AGzQzaH1KcYBmuwMYdixRYDvdIEKGO qGUZaZ5OsHXAl7S3Lwvrrrpec5TidsMtJC1TBm5c4AmvlaOiFZWwHGDrxBIm3O2B44 GVImEC54693YBLmUa+n7O9qz+cjDbKr+zpjtpe2W50v0n9MXLVd25pOyFBb1Tzcd/p jBQN1zruAirX5VjEJdfZE63YzVBfZnf6SG7C1VytXiQDBnoeI+KKuTqf83GRte+gr0 Eu+xchKDDyAdw== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 18/19] perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED Date: Mon, 28 Oct 2024 14:47:45 -0700 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Namhyung Kim Handle the deferred callchains in the script output. $ perf script perf 801 [000] 18.031793: 1 cycles:P: ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kall= syms]) ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms]) ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms]) ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms]) ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms]) ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms]) ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms]) ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms= ]) ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms]) ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms]) ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms]) ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms]) ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms]) perf 801 [000] 18.031814: DEFERRED CALLCHAIN 7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/lib= c.so.6) Signed-off-by: Namhyung Kim Signed-off-by: Josh Poimboeuf --- tools/perf/builtin-script.c | 89 +++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index a644787fa9e1..311580e25f5b 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -2540,6 +2540,93 @@ static int process_sample_event(const struct perf_to= ol *tool, return ret; } =20 +static int process_deferred_sample_event(const struct perf_tool *tool, + union perf_event *event, + struct perf_sample *sample, + struct evsel *evsel, + struct machine *machine) +{ + struct perf_script *scr =3D container_of(tool, struct perf_script, tool); + struct perf_event_attr *attr =3D &evsel->core.attr; + struct evsel_script *es =3D evsel->priv; + unsigned int type =3D output_type(attr->type); + struct addr_location al; + FILE *fp =3D es->fp; + int ret =3D 0; + + if (output[type].fields =3D=3D 0) + return 0; + + /* Set thread to NULL to indicate addr_al and al are not initialized */ + addr_location__init(&al); + + if (perf_time__ranges_skip_sample(scr->ptime_range, scr->range_num, + sample->time)) { + goto out_put; + } + + if (debug_mode) { + if (sample->time < last_timestamp) { + pr_err("Samples misordered, previous: %" PRIu64 + " this: %" PRIu64 "\n", last_timestamp, + sample->time); + nr_unordered++; + } + last_timestamp =3D sample->time; + goto out_put; + } + + if (filter_cpu(sample)) + goto out_put; + + if (machine__resolve(machine, &al, sample) < 0) { + pr_err("problem processing %d event, skipping it.\n", + event->header.type); + ret =3D -1; + goto out_put; + } + + if (al.filtered) + goto out_put; + + if (!show_event(sample, evsel, al.thread, &al, NULL)) + goto out_put; + + if (evswitch__discard(&scr->evswitch, evsel)) + goto out_put; + + perf_sample__fprintf_start(scr, sample, al.thread, evsel, + PERF_RECORD_CALLCHAIN_DEFERRED, fp); + fprintf(fp, "DEFERRED CALLCHAIN"); + + if (PRINT_FIELD(IP)) { + struct callchain_cursor *cursor =3D NULL; + + if (symbol_conf.use_callchain && sample->callchain) { + cursor =3D get_tls_callchain_cursor(); + if (thread__resolve_callchain(al.thread, cursor, evsel, + sample, NULL, NULL, + scripting_max_stack)) { + pr_info("cannot resolve deferred callchains\n"); + cursor =3D NULL; + } + } + + fputc(cursor ? '\n' : ' ', fp); + sample__fprintf_sym(sample, &al, 0, output[type].print_ip_opts, + cursor, symbol_conf.bt_stop_list, fp); + } + + fprintf(fp, "\n"); + + if (verbose > 0) + fflush(fp); + +out_put: + addr_location__exit(&al); + return ret; +} + // Used when scr->per_event_dump is not set static struct evsel_script es_stdout; =20 @@ -4325,6 +4412,7 @@ int cmd_script(int argc, const char **argv) =20 perf_tool__init(&script.tool, !unsorted_dump); script.tool.sample =3D process_sample_event; + script.tool.callchain_deferred =3D process_deferred_sample_event; script.tool.mmap =3D perf_event__process_mmap; script.tool.mmap2 =3D perf_event__process_mmap2; script.tool.comm =3D perf_event__process_comm; @@ -4351,6 +4439,7 @@ int cmd_script(int argc, const char **argv) script.tool.throttle =3D process_throttle_event; script.tool.unthrottle =3D process_throttle_event; script.tool.ordering_requires_timestamps =3D true; + script.tool.merge_deferred_callchains =3D false; session =3D perf_session__new(&data, &script.tool); if (IS_ERR(session)) return PTR_ERR(session); --=20 2.47.0 From nobody Mon Nov 25 07:24:59 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 676B92036F4; Mon, 28 Oct 2024 21:48:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152109; cv=none; b=hnmq5/qm+mF7ja9Siz656VX82OUrZEgA8dHHkEVZncsx1YjCI4+SczdglNd1hoad0u3Nd2PstkgNqR6SYY+ZFwKgg+NKMOKDczrm2kczSkrjF1ZccvrqpFLmLSC+Bo1/1SKFs1kMlIKknNlGkrqvfKtocIFs/tEdzcwzvqrAJAA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730152109; c=relaxed/simple; bh=WyQ7upbPHHR2F6skLp0O6redGtIa3A+g4lv+tFWe+IU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nMWyufVwuKirQ7FhQ0oY39SxEZ19Akyr1eS724J+uXqWTbD5BAJLTgSswdXj3e+XG+LL7Lr3WMF4DZDXIdE3TOlNXx56tbYcstijDYI7HQT+rElTD2sqZk1KqcL3Lg14Aq5XBOpxePt2Z0cMMcpSjMnPqE3RSudwZbxtbQvbpRQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=h9YnRCvU; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="h9YnRCvU" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6490BC4CEEB; Mon, 28 Oct 2024 21:48:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730152109; bh=WyQ7upbPHHR2F6skLp0O6redGtIa3A+g4lv+tFWe+IU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=h9YnRCvUQsg4iit5spBlCLNATRQdFhWrTSwXXO22Q4JQygwrJFvDIlX2RwyGdVJmJ YzIdWJKQR/mrPeqPf8c/EON8zxNdKLcMbbUquvrmJyUrkQ7ja1nn7zoFwGrBK+FKwF nwzPH6NJZlsDlj5dj2ht/BntRcaj7UEPYW1yu27YCMVw4wf4wd+JaytulEDWCePYOP L1YtdrLaVJgStgXiR9H8W1KKivTvw7rWgymrceWN53QnwKoSCvSSorD/CxO9FMiAKF VR91rhL0UHoQurXnQAUf/E1BiRhc52KOfZTV+4fysik2UG05NBU1TiWaovoX3RJ2td ++Rtc4G9qJFvw== From: Josh Poimboeuf To: x86@kernel.org Cc: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org, Jordan Rome , Sam James , linux-trace-kernel@vger.kerne.org, Andrii Nakryiko , Jens Remus , Mathieu Desnoyers , Florian Weimer , Andy Lutomirski Subject: [PATCH v3 19/19] perf tools: Merge deferred user callchains Date: Mon, 28 Oct 2024 14:47:46 -0700 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Namhyung Kim Save samples with deferred callchains in a separate list and deliver them after merging the user callchains. If users don't want to merge they can set tool->merge_deferred_callchains to false to prevent the behavior. With previous result, now perf script will show the merged callchains. $ perf script perf 801 [000] 18.031793: 1 cycles:P: ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kall= syms]) ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms]) ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms]) ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms]) ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms]) ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms]) ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms]) ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms= ]) ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms]) ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms]) ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms]) ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms]) ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms]) 7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/lib= c.so.6) ... The old output can be get using --no-merge-callchain option. Also perf report can get the user callchain entry at the end. $ perf report --no-children --percent-limit=3D0 --stdio -q -S __intel_pmu= _enable_all.isra.0 # symbol: __intel_pmu_enable_all.isra.0 0.00% perf [kernel.kallsyms] | ---__intel_pmu_enable_all.isra.0 perf_ctx_enable event_function remote_function generic_exec_single smp_call_function_single event_function_call perf_event_for_each_child _perf_ioctl perf_ioctl __x64_sys_ioctl do_syscall_64 entry_SYSCALL_64 __GI___ioctl Signed-off-by: Namhyung Kim Signed-off-by: Josh Poimboeuf --- tools/perf/Documentation/perf-script.txt | 5 ++ tools/perf/builtin-script.c | 5 +- tools/perf/util/callchain.c | 24 +++++++++ tools/perf/util/callchain.h | 3 ++ tools/perf/util/evlist.c | 1 + tools/perf/util/evlist.h | 1 + tools/perf/util/session.c | 63 +++++++++++++++++++++++- tools/perf/util/tool.c | 1 + tools/perf/util/tool.h | 1 + 9 files changed, 102 insertions(+), 2 deletions(-) diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Document= ation/perf-script.txt index b72866ef270b..69f018b3d199 100644 --- a/tools/perf/Documentation/perf-script.txt +++ b/tools/perf/Documentation/perf-script.txt @@ -518,6 +518,11 @@ include::itrace.txt[] The known limitations include exception handing such as setjmp/longjmp will have calls/returns not match. =20 +--merge-callchains:: + Enable merging deferred user callchains if available. This is the + default behavior. If you want to see separate CALLCHAIN_DEFERRED + records for some reason, use --no-merge-callchains explicitly. + :GMEXAMPLECMD: script :GMEXAMPLESUBCMD: include::guest-files.txt[] diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 311580e25f5b..e3acf4979c36 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -4031,6 +4031,7 @@ int cmd_script(int argc, const char **argv) bool header_only =3D false; bool script_started =3D false; bool unsorted_dump =3D false; + bool merge_deferred_callchains =3D true; char *rec_script_path =3D NULL; char *rep_script_path =3D NULL; struct perf_session *session; @@ -4184,6 +4185,8 @@ int cmd_script(int argc, const char **argv) "Guest code can be found in hypervisor process"), OPT_BOOLEAN('\0', "stitch-lbr", &script.stitch_lbr, "Enable LBR callgraph stitching approach"), + OPT_BOOLEAN('\0', "merge-callchains", &merge_deferred_callchains, + "Enable merge deferred user callchains"), OPTS_EVSWITCH(&script.evswitch), OPT_END() }; @@ -4439,7 +4442,7 @@ int cmd_script(int argc, const char **argv) script.tool.throttle =3D process_throttle_event; script.tool.unthrottle =3D process_throttle_event; script.tool.ordering_requires_timestamps =3D true; - script.tool.merge_deferred_callchains =3D false; + script.tool.merge_deferred_callchains =3D merge_deferred_callchains; session =3D perf_session__new(&data, &script.tool); if (IS_ERR(session)) return PTR_ERR(session); diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c index 0c7564747a14..d1114491c3da 100644 --- a/tools/perf/util/callchain.c +++ b/tools/perf/util/callchain.c @@ -1832,3 +1832,27 @@ int sample__for_each_callchain_node(struct thread *t= hread, struct evsel *evsel, } return 0; } + +int sample__merge_deferred_callchain(struct perf_sample *sample_orig, + struct perf_sample *sample_callchain) +{ + u64 nr_orig =3D sample_orig->callchain->nr - 1; + u64 nr_deferred =3D sample_callchain->callchain->nr; + struct ip_callchain *callchain; + + callchain =3D calloc(1 + nr_orig + nr_deferred, sizeof(u64)); + if (callchain =3D=3D NULL) { + sample_orig->deferred_callchain =3D false; + return -ENOMEM; + } + + callchain->nr =3D nr_orig + nr_deferred; + /* copy except for the last PERF_CONTEXT_USER_DEFERRED */ + memcpy(callchain->ips, sample_orig->callchain->ips, nr_orig * sizeof(u64)= ); + /* copy deferred use callchains */ + memcpy(&callchain->ips[nr_orig], sample_callchain->callchain->ips, + nr_deferred * sizeof(u64)); + + sample_orig->callchain =3D callchain; + return 0; +} diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h index 86ed9e4d04f9..89785125ed25 100644 --- a/tools/perf/util/callchain.h +++ b/tools/perf/util/callchain.h @@ -317,4 +317,7 @@ int sample__for_each_callchain_node(struct thread *thre= ad, struct evsel *evsel, struct perf_sample *sample, int max_stack, bool symbols, callchain_iter_fn cb, void *data); =20 +int sample__merge_deferred_callchain(struct perf_sample *sample_orig, + struct perf_sample *sample_callchain); + #endif /* __PERF_CALLCHAIN_H */ diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index f14b7e6ff1dc..f27d8c4a22aa 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -81,6 +81,7 @@ void evlist__init(struct evlist *evlist, struct perf_cpu_= map *cpus, evlist->ctl_fd.ack =3D -1; evlist->ctl_fd.pos =3D -1; evlist->nr_br_cntr =3D -1; + INIT_LIST_HEAD(&evlist->deferred_samples); } =20 struct evlist *evlist__new(void) diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index bcc1c6984bb5..c26379366554 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -84,6 +84,7 @@ struct evlist { int pos; /* index at evlist core object to check signals */ } ctl_fd; struct event_enable_timer *eet; + struct list_head deferred_samples; }; =20 struct evsel_str_handler { diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 1248a0317a2f..e0a21b896b57 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -1256,6 +1256,56 @@ static int evlist__deliver_sample(struct evlist *evl= ist, const struct perf_tool &sample->read.one, machine); } =20 +struct deferred_event { + struct list_head list; + union perf_event *event; +}; + +static int evlist__deliver_deferred_samples(struct evlist *evlist, + const struct perf_tool *tool, + union perf_event *event, + struct perf_sample *sample, + struct machine *machine) +{ + struct deferred_event *de, *tmp; + struct evsel *evsel; + int ret =3D 0; + + if (!tool->merge_deferred_callchains) { + evsel =3D evlist__id2evsel(evlist, sample->id); + return tool->callchain_deferred(tool, event, sample, + evsel, machine); + } + + list_for_each_entry_safe(de, tmp, &evlist->deferred_samples, list) { + struct perf_sample orig_sample; + + ret =3D evlist__parse_sample(evlist, de->event, &orig_sample); + if (ret < 0) { + pr_err("failed to parse original sample\n"); + break; + } + + if (sample->tid !=3D orig_sample.tid) + continue; + + evsel =3D evlist__id2evsel(evlist, orig_sample.id); + sample__merge_deferred_callchain(&orig_sample, sample); + ret =3D evlist__deliver_sample(evlist, tool, de->event, + &orig_sample, evsel, machine); + + if (orig_sample.deferred_callchain) + free(orig_sample.callchain); + + list_del(&de->list); + free(de); + + if (ret) + break; + } + return ret; +} + static int machines__deliver_event(struct machines *machines, struct evlist *evlist, union perf_event *event, @@ -1284,6 +1334,16 @@ static int machines__deliver_event(struct machines *= machines, return 0; } dump_sample(evsel, event, sample, perf_env__arch(machine->env)); + if (sample->deferred_callchain && tool->merge_deferred_callchains) { + struct deferred_event *de =3D malloc(sizeof(*de)); + + if (de =3D=3D NULL) + return -ENOMEM; + + de->event =3D event; + list_add_tail(&de->list, &evlist->deferred_samples); + return 0; + } return evlist__deliver_sample(evlist, tool, event, sample, evsel, machin= e); case PERF_RECORD_MMAP: return tool->mmap(tool, event, sample, machine); @@ -1343,7 +1403,8 @@ static int machines__deliver_event(struct machines *m= achines, return tool->aux_output_hw_id(tool, event, sample, machine); case PERF_RECORD_CALLCHAIN_DEFERRED: dump_deferred_callchain(evsel, event, sample); - return tool->callchain_deferred(tool, event, sample, evsel, machine); + return evlist__deliver_deferred_samples(evlist, tool, event, + sample, machine); default: ++evlist->stats.nr_unknown_events; return -1; diff --git a/tools/perf/util/tool.c b/tools/perf/util/tool.c index e78f16de912e..385043e06627 100644 --- a/tools/perf/util/tool.c +++ b/tools/perf/util/tool.c @@ -238,6 +238,7 @@ void perf_tool__init(struct perf_tool *tool, bool order= ed_events) tool->cgroup_events =3D false; tool->no_warn =3D false; tool->show_feat_hdr =3D SHOW_FEAT_NO_HEADER; + tool->merge_deferred_callchains =3D true; =20 tool->sample =3D process_event_sample_stub; tool->mmap =3D process_event_stub; diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h index 9987bbde6d5e..d06580478ab1 100644 --- a/tools/perf/util/tool.h +++ b/tools/perf/util/tool.h @@ -87,6 +87,7 @@ struct perf_tool { bool cgroup_events; bool no_warn; bool dont_split_sample_group; + bool merge_deferred_callchains; enum show_feature_header show_feat_hdr; }; =20 --=20 2.47.0