From nobody Sat Feb 7 14:00:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA815C4332F for ; Thu, 9 Nov 2023 00:43:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232135AbjKIAnp (ORCPT ); Wed, 8 Nov 2023 19:43:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48782 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230515AbjKIAnl (ORCPT ); Wed, 8 Nov 2023 19:43:41 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88DB418E; Wed, 8 Nov 2023 16:43:39 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9534DC433C9; Thu, 9 Nov 2023 00:43:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699490619; bh=4+3ZnMlsc/QpNeAUS1yS0d/MnrzAaX9AOG0Qj3+lXzU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dKAGQWH3CcMl08puONswfb5ftsDQf75y295GZbojL3VfiIeY6vg1VEB7aoHCCd7Lv M/BesqpG6Io/cpueK6CzvYyuWe7n70LK3plfTLc0sMBHBdQQ2fBZVLHtDxeROfDYtf 00tt1/+7+CCJlr62wDeOtDZwGJD2pJntLDJb/Bj9p39Gh4Em8rF5xPUmiV9UwlXoIl mGim56lCK1BkdUvjiz12z1wqCA0xw0u4+6xatapMLRvAsv3kTA1N/kXMwptFBRnVXw +mFhB9SQRWGXN0RLnSV4+x5kqVs0RA7JQwrSp9EG+/7AR5SelF3IOYGCFANj+644bl Boubij6+Nv2ow== From: Josh Poimboeuf To: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org Subject: [PATCH RFC 01/10] perf: Remove get_perf_callchain() 'init_nr' argument Date: Wed, 8 Nov 2023 16:41:06 -0800 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The 'init_nr' argument has double duty: it's used to initialize both the number of contexts and the number of stack entries. That's confusing and the callers always pass zero anyway. Hard code the zero. Signed-off-by: Josh Poimboeuf Acked-by: Namhyung Kim --- include/linux/perf_event.h | 2 +- kernel/bpf/stackmap.c | 4 ++-- kernel/events/callchain.c | 12 ++++++------ kernel/events/core.c | 2 +- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index afb028c54f33..f4b05954076c 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1533,7 +1533,7 @@ DECLARE_PER_CPU(struct perf_callchain_entry, perf_cal= lchain_entry); extern void perf_callchain_user(struct perf_callchain_entry_ctx *entry, st= ruct pt_regs *regs); extern void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, = struct pt_regs *regs); extern struct perf_callchain_entry * -get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool us= er, +get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, u32 max_stack, bool crosstask, bool add_mark); extern int get_callchain_buffers(int max_stack); extern void put_callchain_buffers(void); diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index d6b277482085..b0b0fbff7c18 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -294,7 +294,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, str= uct bpf_map *, map, if (max_depth > sysctl_perf_event_max_stack) max_depth =3D sysctl_perf_event_max_stack; =20 - trace =3D get_perf_callchain(regs, 0, kernel, user, max_depth, + trace =3D get_perf_callchain(regs, kernel, user, max_depth, false, false); =20 if (unlikely(!trace)) @@ -420,7 +420,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struc= t task_struct *task, else if (kernel && task) trace =3D get_callchain_entry_for_task(task, max_depth); else - trace =3D get_perf_callchain(regs, 0, kernel, user, max_depth, + trace =3D get_perf_callchain(regs, kernel, user, max_depth, false, false); if (unlikely(!trace)) goto err_fault; diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index 1273be84392c..1e135195250c 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -177,7 +177,7 @@ put_callchain_entry(int rctx) } =20 struct perf_callchain_entry * -get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool us= er, +get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, u32 max_stack, bool crosstask, bool add_mark) { struct perf_callchain_entry *entry; @@ -188,11 +188,11 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr,= bool kernel, bool user, if (!entry) return NULL; =20 - ctx.entry =3D entry; - ctx.max_stack =3D max_stack; - ctx.nr =3D entry->nr =3D init_nr; - ctx.contexts =3D 0; - ctx.contexts_maxed =3D false; + ctx.entry =3D entry; + ctx.max_stack =3D max_stack; + ctx.nr =3D entry->nr =3D 0; + ctx.contexts =3D 0; + ctx.contexts_maxed =3D false; =20 if (kernel && !user_mode(regs)) { if (add_mark) diff --git a/kernel/events/core.c b/kernel/events/core.c index 683dc086ef10..b0d62df7df4e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7600,7 +7600,7 @@ perf_callchain(struct perf_event *event, struct pt_re= gs *regs) if (!kernel && !user) return &__empty_callchain; =20 - callchain =3D get_perf_callchain(regs, 0, kernel, user, + callchain =3D get_perf_callchain(regs, kernel, user, max_stack, crosstask, true); return callchain ?: &__empty_callchain; } --=20 2.41.0 From nobody Sat Feb 7 14:00:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1293CC4167B for ; Thu, 9 Nov 2023 00:43:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232180AbjKIAnr (ORCPT ); Wed, 8 Nov 2023 19:43:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231661AbjKIAnm (ORCPT ); Wed, 8 Nov 2023 19:43:42 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4BC70C6; Wed, 8 Nov 2023 16:43:40 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5BA41C433CB; Thu, 9 Nov 2023 00:43:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699490619; bh=74tVzDmLPcHwbDDKMqbOB8N7t7wMHGUYDY+FMXFJ2KU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LekKBPpDKpVAqs69PeoufY+Ll7vEGOQ6UMGe4vq8n0G3YeziGosRPHLqvIhKOkqWl TQY08u1shmOVtc/K7G7Z/r42NF4+v2bTZAQ43U+HKUwyn5gzBouHHETsTeek06+uYq uU3+H17p9lWYMJ3yPFg8fMeS3H6fAwDf+KTJ0dC3n1kXpJ9xSpNjSbVTPZecZp6z5C o2ZoXy72Syup4MIeKwbKil72fV0InZAtQe56am3XRQNFnS8QGNAs3qsPoum2VliQlv 0FfkR5NSeCbYdoyPOUcHXluW8GjM6kIANxbWYP2If5yDBfMW8P9EtQRl7YYxmUDvFB VaEa8QpvEqm8w== From: Josh Poimboeuf To: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org Subject: [PATCH RFC 02/10] perf: Remove get_perf_callchain() 'crosstask' argument Date: Wed, 8 Nov 2023 16:41:07 -0800 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" get_perf_callchain() doesn't support cross-task unwinding, so it doesn't make much sense to have 'crosstask' as an argument. Instead, have perf_callchain() adjust 'user' accordingly. Signed-off-by: Josh Poimboeuf Acked-by: Namhyung Kim --- include/linux/perf_event.h | 2 +- kernel/bpf/stackmap.c | 5 ++--- kernel/events/callchain.c | 6 +----- kernel/events/core.c | 8 ++++---- 4 files changed, 8 insertions(+), 13 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index f4b05954076c..2d8fa253b9df 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1534,7 +1534,7 @@ extern void perf_callchain_user(struct perf_callchain= _entry_ctx *entry, struct p extern void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, = struct pt_regs *regs); extern struct perf_callchain_entry * get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, - u32 max_stack, bool crosstask, bool add_mark); + u32 max_stack, bool add_mark); extern int get_callchain_buffers(int max_stack); extern void put_callchain_buffers(void); extern struct perf_callchain_entry *get_callchain_entry(int *rctx); diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index b0b0fbff7c18..e4827ca5378d 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -294,8 +294,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, str= uct bpf_map *, map, if (max_depth > sysctl_perf_event_max_stack) max_depth =3D sysctl_perf_event_max_stack; =20 - trace =3D get_perf_callchain(regs, kernel, user, max_depth, - false, false); + trace =3D get_perf_callchain(regs, kernel, user, max_depth, false); =20 if (unlikely(!trace)) /* couldn't fetch the stack trace */ @@ -421,7 +420,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struc= t task_struct *task, trace =3D get_callchain_entry_for_task(task, max_depth); else trace =3D get_perf_callchain(regs, kernel, user, max_depth, - false, false); + false); if (unlikely(!trace)) goto err_fault; =20 diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index 1e135195250c..aa5f9d11c28d 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -178,7 +178,7 @@ put_callchain_entry(int rctx) =20 struct perf_callchain_entry * get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, - u32 max_stack, bool crosstask, bool add_mark) + u32 max_stack, bool add_mark) { struct perf_callchain_entry *entry; struct perf_callchain_entry_ctx ctx; @@ -209,9 +209,6 @@ get_perf_callchain(struct pt_regs *regs, bool kernel, b= ool user, } =20 if (regs) { - if (crosstask) - goto exit_put; - if (add_mark) perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); =20 @@ -219,7 +216,6 @@ get_perf_callchain(struct pt_regs *regs, bool kernel, b= ool user, } } =20 -exit_put: put_callchain_entry(rctx); =20 return entry; diff --git a/kernel/events/core.c b/kernel/events/core.c index b0d62df7df4e..5e41a3b70bcd 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7592,16 +7592,16 @@ perf_callchain(struct perf_event *event, struct pt_= regs *regs) { bool kernel =3D !event->attr.exclude_callchain_kernel; bool user =3D !event->attr.exclude_callchain_user; - /* Disallow cross-task user callchains. */ - bool crosstask =3D event->ctx->task && event->ctx->task !=3D current; const u32 max_stack =3D event->attr.sample_max_stack; struct perf_callchain_entry *callchain; =20 + /* Disallow cross-task user callchains. */ + user &=3D !event->ctx->task || event->ctx->task =3D=3D current; + if (!kernel && !user) return &__empty_callchain; =20 - callchain =3D get_perf_callchain(regs, kernel, user, - max_stack, crosstask, true); + callchain =3D get_perf_callchain(regs, kernel, user, max_stack, true); return callchain ?: &__empty_callchain; } =20 --=20 2.41.0 From nobody Sat Feb 7 14:00:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E387C4332F for ; Thu, 9 Nov 2023 00:43:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232225AbjKIAnt (ORCPT ); Wed, 8 Nov 2023 19:43:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231660AbjKIAnn (ORCPT ); Wed, 8 Nov 2023 19:43:43 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B14D18E; Wed, 8 Nov 2023 16:43:41 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1ED83C433C7; Thu, 9 Nov 2023 00:43:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699490620; bh=hx3Djh/kYdFvuxP2oBZvphYZwmXCDf5Wl1myLGWOnwM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UkgQZtvQExzTHY3D9/R+gZTaVgixkNq7It9ld+PdOzhlcIJywX31Vqfpwp5trYRh1 TMPQt3zCAaSNL+vrxVJ2CX8UJki9Okqz/7lcD+9X/mykWEDYdH8QUOOrJDHmSfFBj0 80gxfa0dhcCHmclYMfhki//DWESzs6TZL1ln5Rt4w+xCHdEICqOJ4uYmcsVx5YSA6/ rONHgpZNMdzJdvo+B4kOJmxCWpqfT50m35xO+gvZhLptDC3lZUSviD1p4PXhn0DiDx +XNXnevNVU/cgDgbrf/aZjDfnDAXF3s6S5luXuhEIiwdwH4l16qbfh7DjJDIpILzys wnHgkXJtYJYkg== From: Josh Poimboeuf To: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org Subject: [PATCH RFC 03/10] perf: Simplify get_perf_callchain() user logic Date: Wed, 8 Nov 2023 16:41:08 -0800 Message-ID: <6456d4d523841fb97b639433731540b8783529a1.1699487758.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Simplify the get_perf_callchain() user logic a bit. task_pt_regs() should never be NULL. Signed-off-by: Josh Poimboeuf Acked-by: Namhyung Kim --- kernel/events/callchain.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index aa5f9d11c28d..2bee8b6fda0e 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -202,20 +202,18 @@ get_perf_callchain(struct pt_regs *regs, bool kernel,= bool user, =20 if (user) { if (!user_mode(regs)) { - if (current->mm) - regs =3D task_pt_regs(current); - else - regs =3D NULL; + if (!current->mm) + goto exit_put; + regs =3D task_pt_regs(current); } =20 - if (regs) { - if (add_mark) - perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); + if (add_mark) + perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); =20 - perf_callchain_user(&ctx, regs); - } + perf_callchain_user(&ctx, regs); } =20 +exit_put: put_callchain_entry(rctx); =20 return entry; --=20 2.41.0 From nobody Sat Feb 7 14:00:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1969C4332F for ; Thu, 9 Nov 2023 00:53:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232463AbjKIAn5 (ORCPT ); Wed, 8 Nov 2023 19:43:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232100AbjKIAno (ORCPT ); Wed, 8 Nov 2023 19:43:44 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C802018E; Wed, 8 Nov 2023 16:43:41 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D3E93C43397; Thu, 9 Nov 2023 00:43:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699490621; bh=oURUhZ94tVLepGPeb4PAIE8X5Cgy1Q4AJy231PK2fX0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SZrVQDr+QQzBAHoxasIgqiHV0GqWsbsa8kMqjChG7olM+iPOxwx0SjBxFBPRN++0T Vt1iV9Eoq3QAVnkVzF1Gfs0jCzxipRz8cDXSjYq47ldPKBZtwAtkDAGTe87o/woW3g 7TVSvFC5dKGoDbmY3aTODF/YnRNAL4/hXduPynHJ+P0XWKu39zD0IGoCSVXYM139pq zFKlAtp3KTbXJQuxAIQlR0RnzanfIU20qrhXqj2ycNPVzTQDOXWqShv9KPj3QgJsU6 B14Z6kFlW/Sw3IvvOxhnI+ykc9vtms2Q+3lxh4El9yyVG1MEoRjoSCf/OByFHR8Kjt c2OBKn8gJLuNA== From: Josh Poimboeuf To: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org Subject: [PATCH RFC 04/10] perf: Introduce deferred user callchains Date: Wed, 8 Nov 2023 16:41:09 -0800 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Instead of attempting to unwind user space from the NMI handler, defer it to run in task context by sending a self-IPI and then scheduling the unwind to run in the IRQ's exit task work before returning to user space. This allows the user stack page to be paged in if needed, avoids duplicate unwinds for kernel-bound workloads, and prepares for SFrame unwinding (so .sframe sections can be paged in on demand). Suggested-by: Steven Rostedt Suggested-by: Peter Zijlstra Signed-off-by: Josh Poimboeuf --- arch/Kconfig | 3 ++ include/linux/perf_event.h | 22 ++++++-- include/uapi/linux/perf_event.h | 1 + kernel/bpf/stackmap.c | 5 +- kernel/events/callchain.c | 7 ++- kernel/events/core.c | 90 ++++++++++++++++++++++++++++++--- 6 files changed, 115 insertions(+), 13 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index f4b210ab0612..690c82212224 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -425,6 +425,9 @@ config HAVE_HARDLOCKUP_DETECTOR_ARCH It uses the same command line parameters, and sysctl interface, as the generic hardlockup detectors. =20 +config HAVE_PERF_CALLCHAIN_DEFERRED + bool + config HAVE_PERF_REGS bool help diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 2d8fa253b9df..2f232111dff2 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -786,6 +786,7 @@ struct perf_event { struct irq_work pending_irq; struct callback_head pending_task; unsigned int pending_work; + unsigned int pending_unwind; =20 atomic_t event_limit; =20 @@ -1113,7 +1114,10 @@ int perf_event_read_local(struct perf_event *event, = u64 *value, extern u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running); =20 -extern struct perf_callchain_entry *perf_callchain(struct perf_event *even= t, struct pt_regs *regs); +extern void perf_callchain(struct perf_sample_data *data, + struct perf_event *event, struct pt_regs *regs); +extern void perf_callchain_deferred(struct perf_sample_data *data, + struct perf_event *event, struct pt_regs *regs); =20 static inline bool branch_sample_no_flags(const struct perf_event *event) { @@ -1189,6 +1193,7 @@ struct perf_sample_data { u64 data_page_size; u64 code_page_size; u64 aux_size; + bool deferred; } ____cacheline_aligned; =20 /* default value for data source */ @@ -1206,6 +1211,7 @@ static inline void perf_sample_data_init(struct perf_= sample_data *data, data->sample_flags =3D PERF_SAMPLE_PERIOD; data->period =3D period; data->dyn_size =3D 0; + data->deferred =3D false; =20 if (addr) { data->addr =3D addr; @@ -1219,7 +1225,11 @@ static inline void perf_sample_save_callchain(struct= perf_sample_data *data, { int size =3D 1; =20 - data->callchain =3D perf_callchain(event, regs); + if (data->deferred) + perf_callchain_deferred(data, event, regs); + else + perf_callchain(data, event, regs); + size +=3D data->callchain->nr; =20 data->dyn_size +=3D size * sizeof(u64); @@ -1534,12 +1544,18 @@ extern void perf_callchain_user(struct perf_callcha= in_entry_ctx *entry, struct p extern void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, = struct pt_regs *regs); extern struct perf_callchain_entry * get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, - u32 max_stack, bool add_mark); + u32 max_stack, bool add_mark, bool defer_user); extern int get_callchain_buffers(int max_stack); extern void put_callchain_buffers(void); extern struct perf_callchain_entry *get_callchain_entry(int *rctx); extern void put_callchain_entry(int rctx); =20 +#ifdef CONFIG_HAVE_PERF_CALLCHAIN_DEFERRED +extern void perf_callchain_user_deferred(struct perf_callchain_entry_ctx *= entry, struct pt_regs *regs); +#else +static inline void perf_callchain_user_deferred(struct perf_callchain_entr= y_ctx *entry, struct pt_regs *regs) {} +#endif + extern int sysctl_perf_event_max_stack; extern int sysctl_perf_event_max_contexts_per_stack; =20 diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_even= t.h index 39c6a250dd1b..9a1127af4cda 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -1237,6 +1237,7 @@ enum perf_callchain_context { PERF_CONTEXT_HV =3D (__u64)-32, PERF_CONTEXT_KERNEL =3D (__u64)-128, PERF_CONTEXT_USER =3D (__u64)-512, + PERF_CONTEXT_USER_DEFERRED =3D (__u64)-640, =20 PERF_CONTEXT_GUEST =3D (__u64)-2048, PERF_CONTEXT_GUEST_KERNEL =3D (__u64)-2176, diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index e4827ca5378d..fcdd26715b12 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -294,8 +294,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, str= uct bpf_map *, map, if (max_depth > sysctl_perf_event_max_stack) max_depth =3D sysctl_perf_event_max_stack; =20 - trace =3D get_perf_callchain(regs, kernel, user, max_depth, false); - + trace =3D get_perf_callchain(regs, kernel, user, max_depth, false, false); if (unlikely(!trace)) /* couldn't fetch the stack trace */ return -EFAULT; @@ -420,7 +419,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struc= t task_struct *task, trace =3D get_callchain_entry_for_task(task, max_depth); else trace =3D get_perf_callchain(regs, kernel, user, max_depth, - false); + false, false); if (unlikely(!trace)) goto err_fault; =20 diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index 2bee8b6fda0e..16571c8d6771 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -178,7 +178,7 @@ put_callchain_entry(int rctx) =20 struct perf_callchain_entry * get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, - u32 max_stack, bool add_mark) + u32 max_stack, bool add_mark, bool defer_user) { struct perf_callchain_entry *entry; struct perf_callchain_entry_ctx ctx; @@ -207,6 +207,11 @@ get_perf_callchain(struct pt_regs *regs, bool kernel, = bool user, regs =3D task_pt_regs(current); } =20 + if (defer_user) { + perf_callchain_store_context(&ctx, PERF_CONTEXT_USER_DEFERRED); + goto exit_put; + } + if (add_mark) perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); =20 diff --git a/kernel/events/core.c b/kernel/events/core.c index 5e41a3b70bcd..290e06b0071c 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6751,6 +6751,12 @@ static void perf_pending_irq(struct irq_work *entry) struct perf_event *event =3D container_of(entry, struct perf_event, pendi= ng_irq); int rctx; =20 + if (!is_software_event(event)) { + if (event->pending_unwind) + task_work_add(current, &event->pending_task, TWA_RESUME); + return; + } + /* * If we 'fail' here, that's OK, it means recursion is already disabled * and we won't recurse 'further'. @@ -6772,11 +6778,57 @@ static void perf_pending_irq(struct irq_work *entry) perf_swevent_put_recursion_context(rctx); } =20 +static void perf_pending_task_unwind(struct perf_event *event) +{ + struct pt_regs *regs =3D task_pt_regs(current); + struct perf_output_handle handle; + struct perf_event_header header; + struct perf_sample_data data; + struct perf_callchain_entry *callchain; + + callchain =3D kmalloc(sizeof(struct perf_callchain_entry) + + (sizeof(__u64) * event->attr.sample_max_stack) + + (sizeof(__u64) * 1) /* one context */, + GFP_KERNEL); + if (!callchain) + return; + + callchain->nr =3D 0; + data.callchain =3D callchain; + + perf_sample_data_init(&data, 0, event->hw.last_period); + + data.deferred =3D true; + + perf_prepare_sample(&data, event, regs); + + perf_prepare_header(&header, &data, event, regs); + + if (perf_output_begin(&handle, &data, event, header.size)) + goto done; + + perf_output_sample(&handle, &header, &data, event); + + perf_output_end(&handle); + +done: + kfree(callchain); +} + + static void perf_pending_task(struct callback_head *head) { struct perf_event *event =3D container_of(head, struct perf_event, pendin= g_task); int rctx; =20 + if (!is_software_event(event)) { + if (event->pending_unwind) { + perf_pending_task_unwind(event); + event->pending_unwind =3D 0; + } + return; + } + /* * If we 'fail' here, that's OK, it means recursion is already disabled * and we won't recurse 'further'. @@ -7587,22 +7639,48 @@ static u64 perf_get_page_size(unsigned long addr) =20 static struct perf_callchain_entry __empty_callchain =3D { .nr =3D 0, }; =20 -struct perf_callchain_entry * -perf_callchain(struct perf_event *event, struct pt_regs *regs) +void perf_callchain(struct perf_sample_data *data, struct perf_event *even= t, + struct pt_regs *regs) { bool kernel =3D !event->attr.exclude_callchain_kernel; bool user =3D !event->attr.exclude_callchain_user; const u32 max_stack =3D event->attr.sample_max_stack; - struct perf_callchain_entry *callchain; + bool defer_user =3D IS_ENABLED(CONFIG_HAVE_PERF_CALLCHAIN_DEFERRED); =20 /* Disallow cross-task user callchains. */ user &=3D !event->ctx->task || event->ctx->task =3D=3D current; =20 if (!kernel && !user) - return &__empty_callchain; + goto empty; =20 - callchain =3D get_perf_callchain(regs, kernel, user, max_stack, true); - return callchain ?: &__empty_callchain; + data->callchain =3D get_perf_callchain(regs, kernel, user, max_stack, tru= e, defer_user); + if (!data->callchain) + goto empty; + + if (user && defer_user && !event->pending_unwind) { + event->pending_unwind =3D 1; + irq_work_queue(&event->pending_irq); + } + + return; + +empty: + data->callchain =3D &__empty_callchain; +} + +void perf_callchain_deferred(struct perf_sample_data *data, + struct perf_event *event, struct pt_regs *regs) +{ + struct perf_callchain_entry_ctx ctx; + + ctx.entry =3D data->callchain; + ctx.max_stack =3D event->attr.sample_max_stack; + ctx.nr =3D 0; + ctx.contexts =3D 0; + ctx.contexts_maxed =3D false; + + perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); + perf_callchain_user_deferred(&ctx, regs); } =20 static __always_inline u64 __cond_set(u64 flags, u64 s, u64 d) --=20 2.41.0 From nobody Sat Feb 7 14:00:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97DF7C4332F for ; Thu, 9 Nov 2023 00:44:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231765AbjKIAoK (ORCPT ); Wed, 8 Nov 2023 19:44:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232110AbjKIAno (ORCPT ); Wed, 8 Nov 2023 19:43:44 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CA60C6; Wed, 8 Nov 2023 16:43:42 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9B19DC433D9; Thu, 9 Nov 2023 00:43:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699490622; bh=bFP5wZvQMY9wapcWbDmX8LVc7sfYwfgWM4sJBg6/eKk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=knFtUxnN6wPG6xhBZpvn0uI8m+GEtw2VjZm+LwHUtH0oFHjlAD9dg2IJQe92+LJVo i/zpKNYoU8p5Sp3TMuCWpT8yCJ79V+rCwu1t8S2oaP/09oNux8k0RcklZUJZMcnrTT T+7ENIjRESkNZmZNEiCI3Y/AHs7BDLJMAKhOA7pUr3h+URDmu9Zfh7JS7xwo3NKtTA S6CxrAXWXae4L/xmchd4/l5dNbNLGruDqq2cUKzGgMyYDKllUxM++1MnWN4JKWByTD 35wxRDx3MZmsAuKQlRQ7zaOICgNgGOwkcFVfqCSA1a+P6L3Hy3OlocmxzvmlZ7aIAt asSagC/Kat+sw== From: Josh Poimboeuf To: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org Subject: [PATCH RFC 05/10] perf/x86: Add HAVE_PERF_CALLCHAIN_DEFERRED Date: Wed, 8 Nov 2023 16:41:10 -0800 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Enable deferred user space unwinding on x86. Signed-off-by: Josh Poimboeuf --- arch/x86/Kconfig | 1 + arch/x86/events/core.c | 47 ++++++++++++++++++++++++++++-------------- 2 files changed, 32 insertions(+), 16 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3762f41bb092..cacf11ac4b10 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -256,6 +256,7 @@ config X86 select HAVE_PERF_EVENTS_NMI select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_N= MI select HAVE_PCI + select HAVE_PERF_CALLCHAIN_DEFERRED select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select MMU_GATHER_RCU_TABLE_FREE if PARAVIRT diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 40ad1425ffa2..ae264437f794 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2816,8 +2816,8 @@ static unsigned long get_segment_base(unsigned int se= gment) =20 #include =20 -static inline int -perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ct= x *entry) +static inline int __perf_callchain_user32(struct pt_regs *regs, + struct perf_callchain_entry_ctx *entry) { /* 32-bit process in 64-bit kernel. */ unsigned long ss_base, cs_base; @@ -2831,7 +2831,6 @@ perf_callchain_user32(struct pt_regs *regs, struct pe= rf_callchain_entry_ctx *ent ss_base =3D get_segment_base(regs->ss); =20 fp =3D compat_ptr(ss_base + regs->bp); - pagefault_disable(); while (entry->nr < entry->max_stack) { if (!valid_user_frame(fp, sizeof(frame))) break; @@ -2844,19 +2843,18 @@ perf_callchain_user32(struct pt_regs *regs, struct = perf_callchain_entry_ctx *ent perf_callchain_store(entry, cs_base + frame.return_address); fp =3D compat_ptr(ss_base + frame.next_frame); } - pagefault_enable(); return 1; } -#else -static inline int -perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ct= x *entry) +#else /* !CONFIG_IA32_EMULATION */ +static inline int __perf_callchain_user32(struct pt_regs *regs, + struct perf_callchain_entry_ctx *entry) { - return 0; + return 0; } -#endif +#endif /* CONFIG_IA32_EMULATION */ =20 -void -perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs= *regs) +void __perf_callchain_user(struct perf_callchain_entry_ctx *entry, + struct pt_regs *regs, bool atomic) { struct stack_frame frame; const struct stack_frame __user *fp; @@ -2876,13 +2874,15 @@ perf_callchain_user(struct perf_callchain_entry_ctx= *entry, struct pt_regs *regs =20 perf_callchain_store(entry, regs->ip); =20 - if (!nmi_uaccess_okay()) + if (atomic && !nmi_uaccess_okay()) return; =20 - if (perf_callchain_user32(regs, entry)) - return; + if (atomic) + pagefault_disable(); + + if (__perf_callchain_user32(regs, entry)) + goto done; =20 - pagefault_disable(); while (entry->nr < entry->max_stack) { if (!valid_user_frame(fp, sizeof(frame))) break; @@ -2895,7 +2895,22 @@ perf_callchain_user(struct perf_callchain_entry_ctx = *entry, struct pt_regs *regs perf_callchain_store(entry, frame.return_address); fp =3D (void __user *)frame.next_frame; } - pagefault_enable(); +done: + if (atomic) + pagefault_enable(); +} + + +void perf_callchain_user(struct perf_callchain_entry_ctx *entry, + struct pt_regs *regs) +{ + return __perf_callchain_user(entry, regs, true); +} + +void perf_callchain_user_deferred(struct perf_callchain_entry_ctx *entry, + struct pt_regs *regs) +{ + return __perf_callchain_user(entry, regs, false); } =20 /* --=20 2.41.0 From nobody Sat Feb 7 14:00:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8612DC4167B for ; Thu, 9 Nov 2023 00:44:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232646AbjKIAoL (ORCPT ); Wed, 8 Nov 2023 19:44:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232113AbjKIAnp (ORCPT ); Wed, 8 Nov 2023 19:43:45 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4EDC51992; Wed, 8 Nov 2023 16:43:43 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5D534C433BA; Thu, 9 Nov 2023 00:43:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699490623; bh=v76o2vA/VG9otrtJQHKg2j3xEjD5Gu2nvX2ekprmSw0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=og2t3FQyVJK/vz8cFtmD/kFi17HXxKWytVXbx7ruZxtkXOtaU17/nq2+veBi+qCH5 7TJ/nCsW5q1wpKGCkm649MYUnu8x8x+SemvUGkos4s1bnb05fTvZr6QCajYf1Oa8jT 209Yn9rDlUkpi6CdhrYqWbEV/FrGywanlTEBEg1UAHNdYwJSWWQXdV/bopDt861AOQ FZQFPWxZULZ++CdBZdyBxqdZRo30tgAWWl8b0OmvNL9M38HlHzTarYYm6DFbkaxW9C 8zXU+Qv/DXmqmoYRO/K26NFM3DISCHg4CgvwchqqToCMo1wH52xHMTNa4AqVMgoqQO wFm6TIIpuKgqw== From: Josh Poimboeuf To: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org Subject: [PATCH RFC 06/10] unwind: Introduce generic user space unwinding interfaces Date: Wed, 8 Nov 2023 16:41:11 -0800 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Introduce generic user space unwinder interfaces which will provide a unified way for architectures to unwind different user space stack frame types. Signed-off-by: Josh Poimboeuf --- arch/Kconfig | 3 ++ include/linux/user_unwind.h | 32 +++++++++++++++ kernel/Makefile | 1 + kernel/unwind/Makefile | 1 + kernel/unwind/user.c | 77 +++++++++++++++++++++++++++++++++++++ 5 files changed, 114 insertions(+) create mode 100644 include/linux/user_unwind.h create mode 100644 kernel/unwind/Makefile create mode 100644 kernel/unwind/user.c diff --git a/arch/Kconfig b/arch/Kconfig index 690c82212224..c4a08485835e 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -428,6 +428,9 @@ config HAVE_HARDLOCKUP_DETECTOR_ARCH config HAVE_PERF_CALLCHAIN_DEFERRED bool =20 +config HAVE_USER_UNWIND + bool + config HAVE_PERF_REGS bool help diff --git a/include/linux/user_unwind.h b/include/linux/user_unwind.h new file mode 100644 index 000000000000..2812b88c95fd --- /dev/null +++ b/include/linux/user_unwind.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_USER_UNWIND_H +#define _LINUX_USER_UNWIND_H + +#include +#include + +enum user_unwind_type { + USER_UNWIND_TYPE_AUTO, + USER_UNWIND_TYPE_FP, +}; + +struct user_unwind_frame { + s32 cfa_off; + s32 ra_off; + s32 fp_off; + bool use_fp; +}; + +struct user_unwind_state { + unsigned long ip, sp, fp; + enum user_unwind_type type; + bool done; +}; + +extern int user_unwind_start(struct user_unwind_state *state, enum user_un= wind_type); +extern int user_unwind_next(struct user_unwind_state *state); + +#define for_each_user_frame(state, type) \ + for (user_unwind_start(&state, type); !state.done; user_unwind_next(&stat= e)) + +#endif /* _LINUX_USER_UNWIND_H */ diff --git a/kernel/Makefile b/kernel/Makefile index 3947122d618b..bddf58b3b496 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -50,6 +50,7 @@ obj-y +=3D rcu/ obj-y +=3D livepatch/ obj-y +=3D dma/ obj-y +=3D entry/ +obj-y +=3D unwind/ obj-$(CONFIG_MODULES) +=3D module/ =20 obj-$(CONFIG_KCMP) +=3D kcmp.o diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile new file mode 100644 index 000000000000..eb466d6a3295 --- /dev/null +++ b/kernel/unwind/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_HAVE_USER_UNWIND) +=3D user.o diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c new file mode 100644 index 000000000000..8f9432306482 --- /dev/null +++ b/kernel/unwind/user.c @@ -0,0 +1,77 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include +#include + +static struct user_unwind_frame fp_frame =3D { + ARCH_INIT_USER_FP_FRAME +}; + +int user_unwind_next(struct user_unwind_state *state) +{ + struct user_unwind_frame _frame; + struct user_unwind_frame *frame =3D &_frame; + unsigned long cfa, fp, ra; + int ret =3D -EINVAL; + + if (state->done) + return -EINVAL; + + switch (state->type) { + case USER_UNWIND_TYPE_FP: + frame =3D &fp_frame; + break; + default: + BUG(); + } + + cfa =3D (frame->use_fp ? state->fp : state->sp) + frame->cfa_off; + + if (frame->ra_off && get_user(ra, (unsigned long *)(cfa + frame->ra_off))) + goto the_end; + + if (frame->fp_off && get_user(fp, (unsigned long *)(cfa + frame->fp_off))) + goto the_end; + + state->sp =3D cfa; + state->ip =3D ra; + if (frame->fp_off) + state->fp =3D fp; + + return 0; + +the_end: + state->done =3D true; + return ret; +} + +int user_unwind_start(struct user_unwind_state *state, + enum user_unwind_type type) +{ + struct pt_regs *regs =3D task_pt_regs(current); + + might_sleep(); + + memset(state, 0, sizeof(*state)); + + if (!current->mm) { + state->done =3D true; + return -EINVAL; + } + + if (type =3D=3D USER_UNWIND_TYPE_AUTO) + state->type =3D USER_UNWIND_TYPE_FP; + else + state->type =3D type; + + state->sp =3D user_stack_pointer(regs); + state->ip =3D instruction_pointer(regs); + state->fp =3D frame_pointer(regs); + + return user_unwind_next(state); +} --=20 2.41.0 From nobody Sat Feb 7 14:00:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4D0DC4332F for ; Thu, 9 Nov 2023 00:44:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233092AbjKIAoO (ORCPT ); Wed, 8 Nov 2023 19:44:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232146AbjKIAnq (ORCPT ); Wed, 8 Nov 2023 19:43:46 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10A7B268E; Wed, 8 Nov 2023 16:43:44 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1E910C433A9; Thu, 9 Nov 2023 00:43:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699490623; bh=xJB21L9dDRQHhaZpwIhtN50CQs4Yf0e5zt5k2dhu4QU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HUaiXTRL250n6iKIIJWJgFX2BYYc8xHMB8dBKSHWAYgeFFfMXWc1wbELVb7AerlRU /1Nm0Bgv24Pip5qHxFO8F8rbh8yBeh+OnfDyYhJwNhdHtkfBUqNvyUMtBKQkALh6uw 0npc149H7tDMPAZE/ipZa4Vq0j9G2DrraSAu1ynTWnu/s41xBCCEwn8NUFNPODCHhN B1KMe4XsBailE7Un+C9aCSEuJLRt+LzB9cgBXjCrA7gmAJoJHj+cROTo57wPzsiPKj /dFq2WSeudFYMGr1bTRo2LB/qtMg3+WkUHfZcRIeggimNvJp4gNPbgIxDA6B+8k0/a xoCOax2EjJGPw== From: Josh Poimboeuf To: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org Subject: [PATCH RFC 07/10] unwind/x86: Add HAVE_USER_UNWIND Date: Wed, 8 Nov 2023 16:41:12 -0800 Message-ID: <28efef61435d2fbb42bf26277adae0c630e05cf4.1699487758.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Use ARCH_INIT_USER_FP_FRAME to describe how frame pointers are unwound on x86, and enable HAVE_USER_UNWIND accordinlgy so the user unwind interfaces can be used. Signed-off-by: Josh Poimboeuf --- arch/x86/Kconfig | 1 + arch/x86/include/asm/user_unwind.h | 11 +++++++++++ 2 files changed, 12 insertions(+) create mode 100644 arch/x86/include/asm/user_unwind.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index cacf11ac4b10..95939cd54dfe 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -278,6 +278,7 @@ config X86 select HAVE_UACCESS_VALIDATION if HAVE_OBJTOOL select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER + select HAVE_USER_UNWIND select HAVE_GENERIC_VDSO select HOTPLUG_PARALLEL if SMP && X86_64 select HOTPLUG_SMT if SMP diff --git a/arch/x86/include/asm/user_unwind.h b/arch/x86/include/asm/user= _unwind.h new file mode 100644 index 000000000000..caa6266abbb4 --- /dev/null +++ b/arch/x86/include/asm/user_unwind.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_USER_UNWIND_H +#define _ASM_X86_USER_UNWIND_H + +#define ARCH_INIT_USER_FP_FRAME \ + .ra_off =3D sizeof(long) * -1, \ + .cfa_off =3D sizeof(long) * 2, \ + .fp_off =3D sizeof(long) * -2, \ + .use_fp =3D true, + +#endif /* _ASM_X86_USER_UNWIND_H */ --=20 2.41.0 From nobody Sat Feb 7 14:00:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86C7FC4332F for ; Thu, 9 Nov 2023 00:44:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233187AbjKIAoR (ORCPT ); Wed, 8 Nov 2023 19:44:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232234AbjKIAny (ORCPT ); Wed, 8 Nov 2023 19:43:54 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C6ED6269E; Wed, 8 Nov 2023 16:43:44 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D3A3BC433AD; Thu, 9 Nov 2023 00:43:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699490624; bh=f/u46cSAc9qsUaWdoxS6P4+cNiKWEhTEIbvF7580Ups=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lXp+s9E0fNS9MJQQqIe6YpNebFG18Lo8YkTcYXLLlRVDororUiczYubXEBB5TMmxZ igB/cg7yUP7U6IN88bh4PrX4H5zCOHAVoE4csA+6C7pXpWLk0XLX5ygnwFvQveZwUL uh7nQskU7YkdxiSGMcaCXV/XBEAy2qYkRsmtqFbSFFoirqMhOCdCalk+CWsIHRp0UQ ykKxHb9W2V3AMbCt5JxBSFUtXSNRT+GYwlmfzEQSjl7E/xGd/G4uEbzT1zz9mRaAx8 wcDfJHLM7X4hNHTTsFpVfdzrHP6PKko7wdXFtDqPbg0mF/gKUuHEP3+8eo+DYt1C/t 4ObatLhGGLFkw== From: Josh Poimboeuf To: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org Subject: [PATCH RFC 08/10] perf/x86: Use user_unwind interface Date: Wed, 8 Nov 2023 16:41:13 -0800 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Simplify __perf_callchain_user() and prepare for sframe user space unwinding by switching to the generic user unwind interface. Signed-off-by: Josh Poimboeuf --- arch/x86/events/core.c | 20 +++++--------------- 1 file changed, 5 insertions(+), 15 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index ae264437f794..5c41a11f058f 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -29,6 +29,7 @@ #include #include #include +#include =20 #include #include @@ -2856,8 +2857,7 @@ static inline int __perf_callchain_user32(struct pt_r= egs *regs, void __perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs, bool atomic) { - struct stack_frame frame; - const struct stack_frame __user *fp; + struct user_unwind_state state; =20 if (perf_guest_state()) { /* TODO: We don't support guest os callchain now */ @@ -2870,8 +2870,6 @@ void __perf_callchain_user(struct perf_callchain_entr= y_ctx *entry, if (regs->flags & (X86_VM_MASK | PERF_EFLAGS_VM)) return; =20 - fp =3D (void __user *)regs->bp; - perf_callchain_store(entry, regs->ip); =20 if (atomic && !nmi_uaccess_okay()) @@ -2883,17 +2881,9 @@ void __perf_callchain_user(struct perf_callchain_ent= ry_ctx *entry, if (__perf_callchain_user32(regs, entry)) goto done; =20 - while (entry->nr < entry->max_stack) { - if (!valid_user_frame(fp, sizeof(frame))) - break; - - if (__get_user(frame.next_frame, &fp->next_frame)) - break; - if (__get_user(frame.return_address, &fp->return_address)) - break; - - perf_callchain_store(entry, frame.return_address); - fp =3D (void __user *)frame.next_frame; + for_each_user_frame(state, USER_UNWIND_TYPE_AUTO) { + if (perf_callchain_store(entry, state.ip)) + goto done; } done: if (atomic) --=20 2.41.0 From nobody Sat Feb 7 14:00:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30CF6C4332F for ; Thu, 9 Nov 2023 00:54:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231390AbjKIAx7 (ORCPT ); Wed, 8 Nov 2023 19:53:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58344 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232261AbjKIAny (ORCPT ); Wed, 8 Nov 2023 19:43:54 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D37726A8; Wed, 8 Nov 2023 16:43:45 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 98544C433C8; Thu, 9 Nov 2023 00:43:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699490625; bh=cXsOHNn+S8qfwpvwFH36rAwvuBgX2J0C04yxO+6mQDY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=kpkv1QKevKR5z3yfdHV2zhWlLGEbImV1sehH1eGeGQFYUR5x9q8d6r/5fOdVI4FUn mpfrB2rnPtt/dNw8hytED3oy+tIX+C9XHH8djRho7PekWPPy7MWRrrNfOgBm3Fqyx0 ARgETCQSc0vYFuqJC0+ns24981Jp6mctice/jIwYq/n3r0I0beMWjbra9k6ESrXsxY mmTw1cuY/w0b8fGflRXfd7doigB1d/lgV8V2X8N5n2X0Z+kEYmAnPyXpxD81yO7MBP n6SNw/miMRzpoAj1SpBNTZn+9fsthw/SeoAomEO3qf9LEoF/onlWnKvXFEtP035Nw7 mNJ28jSI55sdw== From: Josh Poimboeuf To: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org Subject: [PATCH RFC 09/10] unwind: Introduce SFrame user space unwinding Date: Wed, 8 Nov 2023 16:41:14 -0800 Message-ID: <09460e60dd1c2f8ea1abb8d9188195db699ce76f.1699487758.git.jpoimboe@kernel.org> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Some distros have started compiling frame pointers into all their packages to enable the kernel to do system-wide profiling of user space. Unfortunately that creates a runtime performance penalty across the entire system. Using DWARF instead isn't feasible due to complexity and slowness. For in-kernel unwinding we solved this problem with the creation of the ORC unwinder for x86_64. Similarly, for user space the GNU assembler has created the SFrame format starting with binutils 2.40. SFrame is a simpler version of .eh_frame which gets placed in the .sframe section. Add support for unwinding user space using SFrame. More information about SFrame can be found here: - https://lwn.net/Articles/932209/ - https://lwn.net/Articles/940686/ - https://sourceware.org/binutils/docs/sframe-spec.html Signed-off-by: Josh Poimboeuf --- arch/Kconfig | 3 + arch/x86/include/asm/mmu.h | 2 +- fs/binfmt_elf.c | 46 +++- include/linux/mm_types.h | 3 + include/linux/sframe.h | 46 ++++ include/linux/user_unwind.h | 1 + include/uapi/linux/elf.h | 1 + include/uapi/linux/prctl.h | 3 + kernel/fork.c | 10 + kernel/sys.c | 11 + kernel/unwind/Makefile | 1 + kernel/unwind/sframe.c | 414 ++++++++++++++++++++++++++++++++++++ kernel/unwind/sframe.h | 217 +++++++++++++++++++ kernel/unwind/user.c | 15 +- mm/init-mm.c | 2 + 15 files changed, 768 insertions(+), 7 deletions(-) create mode 100644 include/linux/sframe.h create mode 100644 kernel/unwind/sframe.c create mode 100644 kernel/unwind/sframe.h diff --git a/arch/Kconfig b/arch/Kconfig index c4a08485835e..b133b03102c7 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -431,6 +431,9 @@ config HAVE_PERF_CALLCHAIN_DEFERRED config HAVE_USER_UNWIND bool =20 +config HAVE_USER_UNWIND_SFRAME + bool + config HAVE_PERF_REGS bool help diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h index 0da5c227f490..9cf9cae8345f 100644 --- a/arch/x86/include/asm/mmu.h +++ b/arch/x86/include/asm/mmu.h @@ -73,7 +73,7 @@ typedef struct { .context =3D { \ .ctx_id =3D 1, \ .lock =3D __MUTEX_INITIALIZER(mm.context.lock), \ - } + }, =20 void leave_mm(int cpu); #define leave_mm leave_mm diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 5397b552fbeb..bca207844a70 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -47,6 +47,7 @@ #include #include #include +#include #include #include =20 @@ -633,11 +634,13 @@ static unsigned long load_elf_interp(struct elfhdr *i= nterp_elf_ex, unsigned long no_base, struct elf_phdr *interp_elf_phdata, struct arch_elf_state *arch_state) { - struct elf_phdr *eppnt; + struct elf_phdr *eppnt, *sframe_phdr =3D NULL; unsigned long load_addr =3D 0; int load_addr_set =3D 0; unsigned long error =3D ~0UL; unsigned long total_size; + unsigned long start_code =3D ~0UL; + unsigned long end_code =3D 0; int i; =20 /* First of all, some simple consistency checks */ @@ -659,7 +662,8 @@ static unsigned long load_elf_interp(struct elfhdr *int= erp_elf_ex, =20 eppnt =3D interp_elf_phdata; for (i =3D 0; i < interp_elf_ex->e_phnum; i++, eppnt++) { - if (eppnt->p_type =3D=3D PT_LOAD) { + switch (eppnt->p_type) { + case PT_LOAD: { int elf_type =3D MAP_PRIVATE; int elf_prot =3D make_prot(eppnt->p_flags, arch_state, true, true); @@ -698,7 +702,29 @@ static unsigned long load_elf_interp(struct elfhdr *in= terp_elf_ex, error =3D -ENOMEM; goto out; } + + if ((eppnt->p_flags & PF_X) && k < start_code) + start_code =3D k; + + k =3D load_addr + eppnt->p_vaddr + eppnt->p_filesz; + if ((eppnt->p_flags & PF_X) && k > end_code) + end_code =3D k; + break; } + case PT_GNU_SFRAME: + sframe_phdr =3D eppnt; + break; + } + } + + if (sframe_phdr) { + struct sframe_file sfile =3D { + .sframe_addr =3D load_addr + sframe_phdr->p_vaddr, + .text_start =3D start_code, + .text_end =3D end_code, + }; + + __sframe_add_section(&sfile); } =20 error =3D load_addr; @@ -823,7 +849,7 @@ static int load_elf_binary(struct linux_binprm *bprm) int first_pt_load =3D 1; unsigned long error; struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata =3D NULL; - struct elf_phdr *elf_property_phdata =3D NULL; + struct elf_phdr *elf_property_phdata =3D NULL, *sframe_phdr =3D NULL; unsigned long elf_brk; int retval, i; unsigned long elf_entry; @@ -931,6 +957,10 @@ static int load_elf_binary(struct linux_binprm *bprm) executable_stack =3D EXSTACK_DISABLE_X; break; =20 + case PT_GNU_SFRAME: + sframe_phdr =3D elf_ppnt; + break; + case PT_LOPROC ... PT_HIPROC: retval =3D arch_elf_pt_proc(elf_ex, elf_ppnt, bprm->file, false, @@ -1279,6 +1309,16 @@ static int load_elf_binary(struct linux_binprm *bprm) MAP_FIXED | MAP_PRIVATE, 0); } =20 + if (sframe_phdr) { + struct sframe_file sfile =3D { + .sframe_addr =3D load_bias + sframe_phdr->p_vaddr, + .text_start =3D start_code, + .text_end =3D end_code, + }; + + __sframe_add_section(&sfile); + } + regs =3D current_pt_regs(); #ifdef ELF_PLAT_INIT /* diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 957ce38768b2..7c361a9ccf75 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -974,6 +974,9 @@ struct mm_struct { #endif } lru_gen; #endif /* CONFIG_LRU_GEN */ +#ifdef CONFIG_HAVE_USER_UNWIND_SFRAME + struct maple_tree sframe_mt; +#endif } __randomize_layout; =20 /* diff --git a/include/linux/sframe.h b/include/linux/sframe.h new file mode 100644 index 000000000000..72a2e8625026 --- /dev/null +++ b/include/linux/sframe.h @@ -0,0 +1,46 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_SFRAME_H +#define _LINUX_SFRAME_H + +#include + +struct sframe_file { + unsigned long sframe_addr, text_start, text_end; +}; + +struct user_unwind_frame; + +#ifdef CONFIG_HAVE_USER_UNWIND_SFRAME + +#define INIT_MM_SFRAME .sframe_mt =3D MTREE_INIT(sframe_mt, 0), + +extern void sframe_free_mm(struct mm_struct *mm); + +extern int __sframe_add_section(struct sframe_file *file); +extern int sframe_add_section(unsigned long sframe_addr, unsigned long tex= t_start, unsigned long text_end); +extern int sframe_remove_section(unsigned long sframe_addr); +extern int sframe_find(unsigned long ip, struct user_unwind_frame *frame); + +static inline bool sframe_enabled_current(void) +{ + struct mm_struct *mm =3D current->mm; + + return mm && !mtree_empty(&mm->sframe_mt); +} + +#else /* !CONFIG_HAVE_USER_UNWIND_SFRAME */ + +#define INIT_MM_SFRAME + +static inline void sframe_free_mm(struct mm_struct *mm) {} + +static inline int __sframe_add_section(struct sframe_file *file) { return = -EINVAL; } +static inline int sframe_add_section(unsigned long sframe_addr, unsigned l= ong text_start, unsigned long text_end) { return -EINVAL; } +static inline int sframe_remove_section(unsigned long sframe_addr) { retur= n -EINVAL; } +static inline int sframe_find(unsigned long ip, struct user_unwind_frame *= frame) { return -EINVAL; } + +static inline bool sframe_enabled_current(void) { return false; } + +#endif /* CONFIG_HAVE_USER_UNWIND_SFRAME */ + +#endif /* _LINUX_SFRAME_H */ diff --git a/include/linux/user_unwind.h b/include/linux/user_unwind.h index 2812b88c95fd..9a5e6e557530 100644 --- a/include/linux/user_unwind.h +++ b/include/linux/user_unwind.h @@ -8,6 +8,7 @@ enum user_unwind_type { USER_UNWIND_TYPE_AUTO, USER_UNWIND_TYPE_FP, + USER_UNWIND_TYPE_SFRAME, }; =20 struct user_unwind_frame { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index 9417309b7230..e3a08ee03fe4 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -39,6 +39,7 @@ typedef __s64 Elf64_Sxword; #define PT_GNU_STACK (PT_LOOS + 0x474e551) #define PT_GNU_RELRO (PT_LOOS + 0x474e552) #define PT_GNU_PROPERTY (PT_LOOS + 0x474e553) +#define PT_GNU_SFRAME (PT_LOOS + 0x474e554) =20 =20 /* ARM MTE memory tag segment type */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 370ed14b1ae0..336277ea9782 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -306,4 +306,7 @@ struct prctl_mm_map { # define PR_RISCV_V_VSTATE_CTRL_NEXT_MASK 0xc # define PR_RISCV_V_VSTATE_CTRL_MASK 0x1f =20 +#define PR_ADD_SFRAME 71 +#define PR_REMOVE_SFRAME 72 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 10917c3e1f03..0ec13004d86c 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -99,6 +99,7 @@ #include #include #include +#include =20 #include #include @@ -924,6 +925,7 @@ void __mmdrop(struct mm_struct *mm) mm_pasid_drop(mm); mm_destroy_cid(mm); percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS); + sframe_free_mm(mm); =20 free_mm(mm); } @@ -1254,6 +1256,13 @@ static void mm_init_uprobes_state(struct mm_struct *= mm) #endif } =20 +static void mm_init_sframe(struct mm_struct *mm) +{ +#ifdef CONFIG_HAVE_USER_UNWIND_SFRAME + mt_init(&mm->sframe_mt); +#endif +} + static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct = *p, struct user_namespace *user_ns) { @@ -1285,6 +1294,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm= , struct task_struct *p, mm->pmd_huge_pte =3D NULL; #endif mm_init_uprobes_state(mm); + mm_init_sframe(mm); hugetlb_count_init(mm); =20 if (current->mm) { diff --git a/kernel/sys.c b/kernel/sys.c index 420d9cb9cc8e..4f2d6f91814d 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -64,6 +64,7 @@ #include #include #include +#include =20 #include =20 @@ -2739,6 +2740,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, a= rg2, unsigned long, arg3, case PR_RISCV_V_GET_CONTROL: error =3D RISCV_V_GET_CONTROL(); break; + case PR_ADD_SFRAME: + if (arg5) + return -EINVAL; + error =3D sframe_add_section(arg2, arg3, arg4); + break; + case PR_REMOVE_SFRAME: + if (arg3 || arg4 || arg5) + return -EINVAL; + error =3D sframe_remove_section(arg2); + break; default: error =3D -EINVAL; break; diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile index eb466d6a3295..6f202c5840cf 100644 --- a/kernel/unwind/Makefile +++ b/kernel/unwind/Makefile @@ -1 +1,2 @@ obj-$(CONFIG_HAVE_USER_UNWIND) +=3D user.o +obj-$(CONFIG_HAVE_USER_UNWIND_SFRAME) +=3D sframe.o diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c new file mode 100644 index 000000000000..b167c19497e5 --- /dev/null +++ b/kernel/unwind/sframe.c @@ -0,0 +1,414 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include +#include + +#include "sframe.h" + +#define SFRAME_FILENAME_LEN 32 + +struct sframe_section { + struct rcu_head rcu; + + unsigned long sframe_addr; + unsigned long text_addr; + + unsigned long fdes_addr; + unsigned long fres_addr; + unsigned int fdes_num; + signed char ra_off, fp_off; +}; + +DEFINE_STATIC_SRCU(sframe_srcu); + +#define __SFRAME_GET_USER(out, user_ptr, type) \ +({ \ + type __tmp; \ + if (get_user(__tmp, (type *)user_ptr)) \ + return -EFAULT; \ + user_ptr +=3D sizeof(__tmp); \ + out =3D __tmp; \ +}) + +#define SFRAME_GET_USER_SIGNED(out, user_ptr, size) \ +({ \ + switch (size) { \ + case 1: \ + __SFRAME_GET_USER(out, user_ptr, s8); \ + break; \ + case 2: \ + __SFRAME_GET_USER(out, user_ptr, s16); \ + break; \ + case 4: \ + __SFRAME_GET_USER(out, user_ptr, s32); \ + break; \ + default: \ + return -EINVAL; \ + } \ +}) + +#define SFRAME_GET_USER_UNSIGNED(out, user_ptr, size) \ +({ \ + switch (size) { \ + case 1: \ + __SFRAME_GET_USER(out, user_ptr, u8); \ + break; \ + case 2: \ + __SFRAME_GET_USER(out, user_ptr, u16); \ + break; \ + case 4: \ + __SFRAME_GET_USER(out, user_ptr, u32); \ + break; \ + default: \ + return -EINVAL; \ + } \ +}) + +static unsigned char fre_type_to_size(unsigned char fre_type) +{ + if (fre_type > 2) + return 0; + return 1 << fre_type; +} + +static unsigned char offset_size_enum_to_size(unsigned char off_size) +{ + if (off_size > 2) + return 0; + return 1 << off_size; +} + +static int find_fde(struct sframe_section *sec, unsigned long ip, + struct sframe_fde *fde) +{ + s32 func_off, ip_off; + struct sframe_fde __user *first, *last, *mid, *found; + + ip_off =3D ip - sec->sframe_addr; + + first =3D (void *)sec->fdes_addr; + last =3D first + sec->fdes_num; + while (first <=3D last) { + mid =3D first + ((last - first) / 2); + if (get_user(func_off, (s32 *)mid)) + return -EFAULT; + if (ip_off >=3D func_off) { + found =3D mid; + first =3D mid + 1; + } else + last =3D mid - 1; + } + + if (!found) + return -EINVAL; + + if (copy_from_user(fde, found, sizeof(*fde))) + return -EFAULT; + + return 0; +} + +static int find_fre(struct sframe_section *sec, struct sframe_fde *fde, + unsigned long ip, struct user_unwind_frame *frame) +{ + unsigned char fde_type =3D SFRAME_FUNC_FDE_TYPE(fde->info); + unsigned char fre_type =3D SFRAME_FUNC_FRE_TYPE(fde->info); + s32 fre_ip_off, cfa_off, ra_off, fp_off, ip_off; + unsigned char offset_count, offset_size; + unsigned char addr_size; + void __user *f, *last_f; + u8 fre_info; + int i; + + addr_size =3D fre_type_to_size(fre_type); + if (!addr_size) + return -EINVAL; + + ip_off =3D ip - sec->sframe_addr - fde->start_addr; + + f =3D (void *)sec->fres_addr + fde->fres_off; + + for (i =3D 0; i < fde->fres_num; i++) { + + SFRAME_GET_USER_UNSIGNED(fre_ip_off, f, addr_size); + + if (fde_type =3D=3D SFRAME_FDE_TYPE_PCINC) { + if (fre_ip_off > ip_off) + break; + } else { + /* SFRAME_FDE_TYPE_PCMASK */ +#if 0 /* sframe v2 */ + if (ip_off % fde->rep_size < fre_ip_off) + break; +#endif + } + + SFRAME_GET_USER_UNSIGNED(fre_info, f, 1); + + offset_count =3D SFRAME_FRE_OFFSET_COUNT(fre_info); + offset_size =3D offset_size_enum_to_size(SFRAME_FRE_OFFSET_SIZE(fre_inf= o)); + + if (!offset_count || !offset_size) + return -EINVAL; + + last_f =3D f; + f +=3D offset_count * offset_size; + } + + if (!last_f) + return -EINVAL; + + f =3D last_f; + + SFRAME_GET_USER_UNSIGNED(cfa_off, f, offset_size); + offset_count--; + + ra_off =3D sec->ra_off; + if (!ra_off) { + if (!offset_count--) + return -EINVAL; + SFRAME_GET_USER_SIGNED(ra_off, f, offset_size); + } + + fp_off =3D sec->fp_off; + if (!fp_off && offset_count) { + offset_count--; + SFRAME_GET_USER_SIGNED(fp_off, f, offset_size); + } + + if (offset_count) + return -EINVAL; + + frame->cfa_off =3D cfa_off; + frame->ra_off =3D ra_off; + frame->fp_off =3D fp_off; + frame->use_fp =3D SFRAME_FRE_CFA_BASE_REG_ID(fre_info) =3D=3D SFRAME_BASE= _REG_FP; + + return 0; +} + +int sframe_find(unsigned long ip, struct user_unwind_frame *frame) +{ + struct mm_struct *mm =3D current->mm; + struct sframe_section *sec; + struct sframe_fde fde; + int srcu_idx; + int ret =3D -EINVAL; + + srcu_idx =3D srcu_read_lock(&sframe_srcu); + + sec =3D mtree_load(&mm->sframe_mt, ip); + if (!sec) { + srcu_read_unlock(&sframe_srcu, srcu_idx); + return -EINVAL; + } + + + ret =3D find_fde(sec, ip, &fde); + if (ret) + goto err_unlock; + + ret =3D find_fre(sec, &fde, ip, frame); + if (ret) + goto err_unlock; + + srcu_read_unlock(&sframe_srcu, srcu_idx); + return 0; + +err_unlock: + srcu_read_unlock(&sframe_srcu, srcu_idx); + return ret; +} + +static int get_sframe_file(unsigned long sframe_addr, struct sframe_file *= file) +{ + struct mm_struct *mm =3D current->mm; + struct vm_area_struct *sframe_vma, *text_vma, *vma; + VMA_ITERATOR(vmi, mm, 0); + + mmap_read_lock(mm); + + sframe_vma =3D vma_lookup(mm, sframe_addr); + if (!sframe_vma || !sframe_vma->vm_file) + goto err_unlock; + + text_vma =3D NULL; + + for_each_vma(vmi, vma) { + if (vma->vm_file !=3D sframe_vma->vm_file) + continue; + if (vma->vm_flags & VM_EXEC) { + if (text_vma) { + /* + * Multiple EXEC segments in a single file + * aren't currently supported, is that a thing? + */ + WARN_ON_ONCE(1); + goto err_unlock; + } + text_vma =3D vma; + } + } + + file->sframe_addr =3D sframe_addr; + file->text_start =3D text_vma->vm_start; + file->text_end =3D text_vma->vm_end; + + mmap_read_unlock(mm); + return 0; + +err_unlock: + mmap_read_unlock(mm); + return -EINVAL; +} + +static int validate_sframe_addrs(struct sframe_file *file) +{ + struct mm_struct *mm =3D current->mm; + struct vm_area_struct *text_vma; + + mmap_read_lock(mm); + + if (!vma_lookup(mm, file->sframe_addr)) + goto err_unlock; + + text_vma =3D vma_lookup(mm, file->text_start); + if (!(text_vma->vm_flags & VM_EXEC)) + goto err_unlock; + + if (vma_lookup(mm, file->text_end-1) !=3D text_vma) + goto err_unlock; + + mmap_read_unlock(mm); + return 0; + +err_unlock: + mmap_read_unlock(mm); + return -EINVAL; +} + +int __sframe_add_section(struct sframe_file *file) +{ + struct maple_tree *sframe_mt =3D ¤t->mm->sframe_mt; + struct sframe_section *sec; + struct sframe_header shdr; + unsigned long header_end; + int ret; + + if (copy_from_user(&shdr, (void *)file->sframe_addr, sizeof(shdr))) + return -EFAULT; + + if (shdr.preamble.magic !=3D SFRAME_MAGIC || + shdr.preamble.version !=3D SFRAME_VERSION_1 || + (!shdr.preamble.flags & SFRAME_F_FDE_SORTED) || + shdr.auxhdr_len || !shdr.num_fdes || !shdr.num_fres || + shdr.fdes_off > shdr.fres_off) { + return -EINVAL; + } + + header_end =3D file->sframe_addr + SFRAME_HDR_SIZE(shdr); + + sec =3D kmalloc(sizeof(*sec), GFP_KERNEL); + if (!sec) + return -ENOMEM; + + sec->sframe_addr =3D file->sframe_addr; + sec->text_addr =3D file->text_start; + sec->fdes_addr =3D header_end + shdr.fdes_off; + sec->fres_addr =3D header_end + shdr.fres_off; + sec->fdes_num =3D shdr.num_fdes; + sec->ra_off =3D shdr.cfa_fixed_ra_offset; + sec->fp_off =3D shdr.cfa_fixed_fp_offset; + + ret =3D mtree_insert_range(sframe_mt, file->text_start, file->text_end, + sec, GFP_KERNEL); + if (ret) { + kfree(sec); + return ret; + } + + return 0; +} + +int sframe_add_section(unsigned long sframe_addr, unsigned long text_start= , unsigned long text_end) +{ + struct sframe_file file; + int ret; + + if (!text_start || !text_end) { + ret =3D get_sframe_file(sframe_addr, &file); + if (ret) + return ret; + } else { + /* + * This is mainly for generated code, for which the text isn't + * file-backed so the user has to give the text bounds. + */ + file.sframe_addr =3D sframe_addr; + file.text_start =3D text_start; + file.text_end =3D text_end; + ret =3D validate_sframe_addrs(&file); + if (ret) + return ret; + } + + return __sframe_add_section(&file); +} + +static void sframe_free_rcu(struct rcu_head *rcu) +{ + struct sframe_section *sec =3D container_of(rcu, struct sframe_section, r= cu); + + kfree(sec); +} + +static int __sframe_remove_section(struct mm_struct *mm, + struct sframe_section *sec) +{ + struct sframe_section *s; + + s =3D mtree_erase(&mm->sframe_mt, sec->text_addr); + if (!s || WARN_ON_ONCE(s !=3D sec)) + return -EINVAL; + + call_srcu(&sframe_srcu, &sec->rcu, sframe_free_rcu); + + return 0; +} + +int sframe_remove_section(unsigned long sframe_addr) +{ + struct mm_struct *mm =3D current->mm; + struct sframe_section *sec; + unsigned long index =3D 0; + + sec =3D mtree_load(&mm->sframe_mt, sframe_addr); + if (!sec) + return -EINVAL; + + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) { + if (sec->sframe_addr =3D=3D sframe_addr) + return __sframe_remove_section(mm, sec); + } + + return -EINVAL; +} + +void sframe_free_mm(struct mm_struct *mm) +{ + struct sframe_section *sec; + unsigned long index =3D 0; + + if (!mm) + return; + + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) + kfree(sec); + + mtree_destroy(&mm->sframe_mt); +} diff --git a/kernel/unwind/sframe.h b/kernel/unwind/sframe.h new file mode 100644 index 000000000000..1f91b696daf5 --- /dev/null +++ b/kernel/unwind/sframe.h @@ -0,0 +1,217 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _SFRAME_H +#define _SFRAME_H +/* + * Copyright (C) 2023, Oracle and/or its affiliates. + * + * This file contains definitions for the SFrame stack tracing format, whi= ch is + * documented at https://sourceware.org/binutils/docs + */ + +#include + +#define SFRAME_VERSION_1 1 +#define SFRAME_VERSION_2 2 +#define SFRAME_MAGIC 0xdee2 + +/* Function Descriptor Entries are sorted on PC. */ +#define SFRAME_F_FDE_SORTED 0x1 +/* Frame-pointer based stack tracing. Defined, but not set. */ +#define SFRAME_F_FRAME_POINTER 0x2 + +#define SFRAME_CFA_FIXED_FP_INVALID 0 +#define SFRAME_CFA_FIXED_RA_INVALID 0 + +/* Supported ABIs/Arch. */ +#define SFRAME_ABI_AARCH64_ENDIAN_BIG 1 /* AARCH64 big endian. */ +#define SFRAME_ABI_AARCH64_ENDIAN_LITTLE 2 /* AARCH64 little endian. */ +#define SFRAME_ABI_AMD64_ENDIAN_LITTLE 3 /* AMD64 little endian. */ + +/* SFrame FRE types. */ +#define SFRAME_FRE_TYPE_ADDR1 0 +#define SFRAME_FRE_TYPE_ADDR2 1 +#define SFRAME_FRE_TYPE_ADDR4 2 + +/* + * SFrame Function Descriptor Entry types. + * + * The SFrame format has two possible representations for functions. The + * choice of which type to use is made according to the instruction patter= ns + * in the relevant program stub. + */ + +/* Unwinders perform a (PC >=3D FRE_START_ADDR) to look up a matching FRE.= */ +#define SFRAME_FDE_TYPE_PCINC 0 +/* + * Unwinders perform a (PC & FRE_START_ADDR_AS_MASK >=3D FRE_START_ADDR_AS= _MASK) + * to look up a matching FRE. Typical usecases are pltN entries, trampolin= es + * etc. + */ +#define SFRAME_FDE_TYPE_PCMASK 1 + +/** + * struct sframe_preamble - SFrame Preamble. + * @magic: Magic number (SFRAME_MAGIC). + * @version: Format version number (SFRAME_VERSION). + * @flags: Various flags. + */ +struct sframe_preamble { + u16 magic; + u8 version; + u8 flags; +} __packed; + +/** + * struct sframe_header - SFrame Header. + * @preamble: SFrame preamble. + * @abi_arch: Identify the arch (including endianness) and ABI. + * @cfa_fixed_fp_offset: Offset for the Frame Pointer (FP) from CFA may be + * fixed for some ABIs ((e.g, in AMD64 when -fno-omit-frame-pointer is + * used). When fixed, this field specifies the fixed stack frame offset + * and the individual FREs do not need to track it. When not fixed, it + * is set to SFRAME_CFA_FIXED_FP_INVALID, and the individual FREs may + * provide the applicable stack frame offset, if any. + * @cfa_fixed_ra_offset: Offset for the Return Address from CFA is fixed f= or + * some ABIs. When fixed, this field specifies the fixed stack frame + * offset and the individual FREs do not need to track it. When not + * fixed, it is set to SFRAME_CFA_FIXED_FP_INVALID. + * @auxhdr_len: Number of bytes making up the auxiliary header, if any. + * Some ABI/arch, in the future, may use this space for extending the + * information in SFrame header. Auxiliary header is contained in bytes + * sequentially following the sframe_header. + * @num_fdes: Number of SFrame FDEs in this SFrame section. + * @num_fres: Number of SFrame Frame Row Entries. + * @fre_len: Number of bytes in the SFrame Frame Row Entry section. + * @fdes_off: Offset of SFrame Function Descriptor Entry section. + * @fres_off: Offset of SFrame Frame Row Entry section. + */ +struct sframe_header { + struct sframe_preamble preamble; + u8 abi_arch; + s8 cfa_fixed_fp_offset; + s8 cfa_fixed_ra_offset; + u8 auxhdr_len; + u32 num_fdes; + u32 num_fres; + u32 fre_len; + u32 fdes_off; + u32 fres_off; +} __packed; + +#define SFRAME_HDR_SIZE(sframe_hdr) \ + ((sizeof(struct sframe_header) + (sframe_hdr).auxhdr_len)) + +/* Two possible keys for executable (instruction) pointers signing. */ +#define SFRAME_AARCH64_PAUTH_KEY_A 0 /* Key A. */ +#define SFRAME_AARCH64_PAUTH_KEY_B 1 /* Key B. */ + +/** + * struct sframe_fde - SFrame Function Descriptor Entry. + * @start_addr: Function start address. Encoded as a signed offset, + * relative to the current FDE. + * @size: Size of the function in bytes. + * @fres_off: Offset of the first SFrame Frame Row Entry of the function, + * relative to the beginning of the SFrame Frame Row Entry sub-section. + * @fres_num: Number of frame row entries for the function. + * @info: Additional information for deciphering the stack trace + * information for the function. Contains information about SFrame FRE + * type, SFrame FDE type, PAC authorization A/B key, etc. + * @rep_size: Block size for SFRAME_FDE_TYPE_PCMASK + * @padding: Unused + */ +struct sframe_fde { + s32 start_addr; + u32 size; + u32 fres_off; + u32 fres_num; + u8 info; +#if 0 /* TODO sframe v2 */ + u8 rep_size; + u16 padding; +#endif +} __packed; + +/* + * 'func_info' in SFrame FDE contains additional information for decipheri= ng + * the stack trace information for the function. In V1, the information is + * organized as follows: + * - 4-bits: Identify the FRE type used for the function. + * - 1-bit: Identify the FDE type of the function - mask or inc. + * - 1-bit: PAC authorization A/B key (aarch64). + * - 2-bits: Unused. + * --------------------------------------------------------------------- + * | Unused | PAC auth A/B key (aarch64) | FDE type | FRE type | + * | | Unused (amd64) | | | + * --------------------------------------------------------------------- + * 8 6 5 4 0 + */ + +/* Note: Set PAC auth key to SFRAME_AARCH64_PAUTH_KEY_A by default. */ +#define SFRAME_FUNC_INFO(fde_type, fre_enc_type) \ + (((SFRAME_AARCH64_PAUTH_KEY_A & 0x1) << 5) | \ + (((fde_type) & 0x1) << 4) | ((fre_enc_type) & 0xf)) + +#define SFRAME_FUNC_FRE_TYPE(data) ((data) & 0xf) +#define SFRAME_FUNC_FDE_TYPE(data) (((data) >> 4) & 0x1) +#define SFRAME_FUNC_PAUTH_KEY(data) (((data) >> 5) & 0x1) + +/* + * Size of stack frame offsets in an SFrame Frame Row Entry. A single + * SFrame FRE has all offsets of the same size. Offset size may vary + * across frame row entries. + */ +#define SFRAME_FRE_OFFSET_1B 0 +#define SFRAME_FRE_OFFSET_2B 1 +#define SFRAME_FRE_OFFSET_4B 2 + +/* An SFrame Frame Row Entry can be SP or FP based. */ +#define SFRAME_BASE_REG_FP 0 +#define SFRAME_BASE_REG_SP 1 + +/* + * The index at which a specific offset is presented in the variable length + * bytes of an FRE. + */ +#define SFRAME_FRE_CFA_OFFSET_IDX 0 +/* + * The RA stack offset, if present, will always be at index 1 in the varia= ble + * length bytes of the FRE. + */ +#define SFRAME_FRE_RA_OFFSET_IDX 1 +/* + * The FP stack offset may appear at offset 1 or 2, depending on the ABI a= s RA + * may or may not be tracked. + */ +#define SFRAME_FRE_FP_OFFSET_IDX 2 + +/* + * 'fre_info' in SFrame FRE contains information about: + * - 1 bit: base reg for CFA + * - 4 bits: Number of offsets (N). A value of up to 3 is allowed to tra= ck + * all three of CFA, FP and RA (fixed implicit order). + * - 2 bits: information about size of the offsets (S) in bytes. + * Valid values are SFRAME_FRE_OFFSET_1B, SFRAME_FRE_OFFSET_2B, + * SFRAME_FRE_OFFSET_4B + * - 1 bit: Mangled RA state bit (aarch64 only). + * --------------------------------------------------------------- + * | Mangled-RA (aarch64) | Size of | Number of | base_reg | + * | Unused (amd64) | offsets | offsets | | + * --------------------------------------------------------------- + * 8 7 5 1 0 + */ + +/* Note: Set mangled_ra_p to zero by default. */ +#define SFRAME_FRE_INFO(base_reg_id, offset_num, offset_size) \ + (((0 & 0x1) << 7) | (((offset_size) & 0x3) << 5) | \ + (((offset_num) & 0xf) << 1) | ((base_reg_id) & 0x1)) + +/* Set the mangled_ra_p bit as indicated. */ +#define SFRAME_FRE_INFO_UPDATE_MANGLED_RA_P(mangled_ra_p, fre_info) \ + ((((mangled_ra_p) & 0x1) << 7) | ((fre_info) & 0x7f)) + +#define SFRAME_FRE_CFA_BASE_REG_ID(data) ((data) & 0x1) +#define SFRAME_FRE_OFFSET_COUNT(data) (((data) >> 1) & 0xf) +#define SFRAME_FRE_OFFSET_SIZE(data) (((data) >> 5) & 0x3) +#define SFRAME_FRE_MANGLED_RA_P(data) (((data) >> 7) & 0x1) + +#endif /* _SFRAME_H */ diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c index 8f9432306482..4194180df154 100644 --- a/kernel/unwind/user.c +++ b/kernel/unwind/user.c @@ -26,6 +26,11 @@ int user_unwind_next(struct user_unwind_state *state) case USER_UNWIND_TYPE_FP: frame =3D &fp_frame; break; + case USER_UNWIND_TYPE_SFRAME: + ret =3D sframe_find(state->ip, frame); + if (ret) + goto the_end; + break; default: BUG(); } @@ -64,10 +69,14 @@ int user_unwind_start(struct user_unwind_state *state, return -EINVAL; } =20 - if (type =3D=3D USER_UNWIND_TYPE_AUTO) - state->type =3D USER_UNWIND_TYPE_FP; - else + if (type =3D=3D USER_UNWIND_TYPE_AUTO) { + state->type =3D sframe_enabled_current() ? USER_UNWIND_TYPE_SFRAME + : USER_UNWIND_TYPE_FP; + } else { + if (type =3D=3D USER_UNWIND_TYPE_SFRAME && !sframe_enabled_current()) + return -EINVAL; state->type =3D type; + } =20 state->sp =3D user_stack_pointer(regs); state->ip =3D instruction_pointer(regs); diff --git a/mm/init-mm.c b/mm/init-mm.c index cfd367822cdd..288885a39e12 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -11,6 +11,7 @@ #include #include #include +#include #include =20 #ifndef INIT_MM_CONTEXT @@ -48,6 +49,7 @@ struct mm_struct init_mm =3D { .pasid =3D IOMMU_PASID_INVALID, #endif INIT_MM_CONTEXT(init_mm) + INIT_MM_SFRAME }; =20 void setup_initial_init_mm(void *start_code, void *end_code, --=20 2.41.0 From nobody Sat Feb 7 14:00:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94A5DC4167D for ; Thu, 9 Nov 2023 00:54:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231313AbjKIAyC (ORCPT ); Wed, 8 Nov 2023 19:54:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232382AbjKIAnz (ORCPT ); Wed, 8 Nov 2023 19:43:55 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59AAF26B3; Wed, 8 Nov 2023 16:43:46 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 62139C43397; Thu, 9 Nov 2023 00:43:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699490626; bh=8NHcfWQYiDrY3TdwV6wLuh/GROrzM7rVN7CxC7jZJZk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mDPeGyqPNnU7PAB45cqEq79LGLRkEdObLZzQGAG1qfceKrjCHpMV31YkpCXB8L4xd hI8sCYhCTeKSiJLUlMpQlxAqX1UQ2ircgRL9k/nvMBFVMXmZ242juUtWiLFRRfC57t nIKxHGRLWRTyTRRH5lFbgf1mu4zylFPvtvIyRpCXU3WzLnIRu7Xre6IGQxEEyXHfNC z/5lUvuvfyuUi9FHRb8w3B82gHViUaaTKIq0qsO8wzC/MjyHX0PRjyTJpgxXGPiDO0 RbTm6Y/bDOGG0qDwpFLNTd39iVmQn1kqZJOqXg04VjAI5SzNQdlQfBOkdsQtJO6QZb GCl0JIyobKx2Q== From: Josh Poimboeuf To: Peter Zijlstra , Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Indu Bhagat , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, Mark Brown , linux-toolchains@vger.kernel.org Subject: [PATCH RFC 10/10] unwind/x86/64: Add HAVE_USER_UNWIND_SFRAME Date: Wed, 8 Nov 2023 16:41:15 -0800 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Binutils 2.40 supports generating sframe for x86_64. It works well in testing so enable it. NOTE: An out-of-tree glibc patch is still needed to enable setting PR_ADD_SFRAME for shared libraries and dlopens. Signed-off-by: Josh Poimboeuf --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 95939cd54dfe..770d0528e4c9 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -279,6 +279,7 @@ config X86 select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER select HAVE_USER_UNWIND + select HAVE_USER_UNWIND_SFRAME if X86_64 select HAVE_GENERIC_VDSO select HOTPLUG_PARALLEL if SMP && X86_64 select HOTPLUG_SMT if SMP --=20 2.41.0