From nobody Thu Nov 28 11:04:01 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 28E5219ABC3; Tue, 1 Oct 2024 21:42:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727818937; cv=none; b=qLP9MS16OLjqfkoOBKKXZoAdU6AXj41K42LwI3jN7icL0Dp9upyM8htmWqaISKR1keUPoubmpjP49kvuGMIYLPFiZIDWeqW9OGWrmwyARZTId1uXBAd8/iqgdH0+GICkgpqcC6ij/nch0+8m0pUhiHmuqIuwftb2VW30MMmHSJw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727818937; c=relaxed/simple; bh=NtNIDtWfmMqbA9v2UdDV5yx1WdIZnZBe0SGFt9K5j3c=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=C35m2fUegg3BGWO6uFtK30m1dROnz7OpJ+Gw602XwpKCFuRTFoY9O8GcI2x2JG9DYI/DcNsAc1PvK8JRddy13/Z8lM8pYzUuZfYd/Nn3kayO9GyMQFL01hVLfN/mbvKIuJhMWLDg1WjZTyO2ghtpKmkZ3Cp4dzK+4S24XJNehPQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id D6AD8C4CECF; Tue, 1 Oct 2024 21:42:16 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1svkeB-00000004csc-0LzK; Tue, 01 Oct 2024 17:43:07 -0400 Message-ID: <20241001214306.944809736@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 01 Oct 2024 17:42:42 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , stable@vger.kernel.org Subject: [for-next][PATCH 1/5] tracing: Fix function timing profiler to initialize hashtable References: <20241001214241.688116616@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Masami Hiramatsu (Google)" Since the new fgraph requires to initialize fgraph_ops.ops.func_hash before calling register_ftrace_graph(), initialize it with default (tracing all functions) parameter. Cc: stable@vger.kernel.org Fixes: 5fccc7552ccb ("ftrace: Add subops logic to allow one ops to manage m= any") Signed-off-by: Masami Hiramatsu (Google) Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ftrace.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 4c28dd177ca6..d2dd71d04b8a 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -883,6 +883,10 @@ static void profile_graph_return(struct ftrace_graph_r= et *trace, } =20 static struct fgraph_ops fprofiler_ops =3D { + .ops =3D { + .flags =3D FTRACE_OPS_FL_INITIALIZED, + INIT_OPS_HASH(fprofiler_ops.ops) + }, .entryfunc =3D &profile_graph_entry, .retfunc =3D &profile_graph_return, }; --=20 2.45.2 From nobody Thu Nov 28 11:04:01 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 577CF1CDFB8 for ; Tue, 1 Oct 2024 21:42:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727818937; cv=none; b=gZE/pWd4kNwfIsnQTe8v7/zfcwCK1YpRWbTf1Zvg0JQ18wo+w4KiaUS6gEN7Bg24n6rDcwH5N1Kl06IPpPtMRqdgsY7m/09nsf4881FPJxvptVxyzELIvmBK+ZNivvkggtAj+ziAxM6VfjmLG/ZMrcgTcyTDAkh47saViOkHwwU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727818937; c=relaxed/simple; bh=/lpA0CNPze4Ctr/3p3uEcP3TvtHCUn9gAzEx06/NOPk=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=kU/YTrswpX3YrBgMa1tHjnQSrD6CNGps3Nr0Q5vJsJZbJ3kw1scg7m66t2dlqjjPC+QqVVppS1kuN81H2bLHK8GFkk9MYpb6uwaedyqQS3Z3BHFWd1VaVy/zpogm5kYoSKOgRGvM9Cao3DB568ZqxP2WEOantmLbnbkkWgEFuLg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id E9E3DC4CED2; Tue, 1 Oct 2024 21:42:16 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1svkeB-00000004ct6-10oa; Tue, 01 Oct 2024 17:43:07 -0400 Message-ID: <20241001214307.105844677@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 01 Oct 2024 17:42:43 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton Subject: [for-next][PATCH 2/5] tracing: Add a comment about ftrace_regs definition References: <20241001214241.688116616@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Masami Hiramatsu (Google)" To clarify what will be expected on ftrace_regs, add a comment to the architecture independent definition of the ftrace_regs. Signed-off-by: Masami Hiramatsu (Google) Acked-by: Mark Rutland Signed-off-by: Steven Rostedt (Google) --- include/linux/ftrace.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index fd5e84d0ec47..42106b3de396 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -117,6 +117,32 @@ extern int ftrace_enabled; =20 #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS =20 +/** + * ftrace_regs - ftrace partial/optimal register set + * + * ftrace_regs represents a group of registers which is used at the + * function entry and exit. There are three types of registers. + * + * - Registers for passing the parameters to callee, including the stack + * pointer. (e.g. rcx, rdx, rdi, rsi, r8, r9 and rsp on x86_64) + * - Registers for passing the return values to caller. + * (e.g. rax and rdx on x86_64) + * - Registers for hooking the function call and return including the + * frame pointer (the frame pointer is architecture/config dependent) + * (e.g. rip, rbp and rsp for x86_64) + * + * Also, architecture dependent fields can be used for internal process. + * (e.g. orig_ax on x86_64) + * + * On the function entry, those registers will be restored except for + * the stack pointer, so that user can change the function parameters + * and instruction pointer (e.g. live patching.) + * On the function exit, only registers which is used for return values + * are restored. + * + * NOTE: user *must not* access regs directly, only do it via APIs, because + * the member can be changed according to the architecture. + */ struct ftrace_regs { struct pt_regs regs; }; --=20 2.45.2 From nobody Thu Nov 28 11:04:01 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A4E31CCEF2 for ; Tue, 1 Oct 2024 21:42:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727818937; cv=none; b=Ig2m3K0Vw2TTLowWbRGTbk6Niu+iK2a9HfYZk0e891igmQwkhZUjLmvgLTrFT+9OUYpBpk5wPMRP7dLgcj39cwXQAWTwV6yB89EFM5udfc13lrVlbvl6YwhPiDTskBLOuIZTauRj1Xm1LFPEfSZfVWyfLmM3d1d5sLaRZzBEQM4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727818937; c=relaxed/simple; bh=m5XMvIg0ae5syhrqN+4d8WP0q5+Df96LS+NLh220V9o=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=KFA554T8EFM44SJzysbT3quJtvyBEaFIfXIL92Z4iVSX+PFryjXBbkTbIlEli6J+654KuB076CjT7CCVbxzyV6FBVq/cn74jsh4vWEouccHL/zGhXCZlab+/uSTH3fMsA1kHRE05vFd/LNFsv73qggLAu8704RT9EbUwef78tH4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2CF8AC4CED3; Tue, 1 Oct 2024 21:42:17 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1svkeB-00000004cta-1gKX; Tue, 01 Oct 2024 17:43:07 -0400 Message-ID: <20241001214307.262550910@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 01 Oct 2024 17:42:44 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Jiri Olsa Subject: [for-next][PATCH 3/5] fgraph: Use fgraph data to store subtime for profiler References: <20241001214241.688116616@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Instead of having the "subtime" for the function profiler in the infrastructure ftrace_ret_stack structure, have it use the fgraph data reserve and retrieve functions. This will keep the limited shadow stack from wasting 8 bytes for something that is seldom used. Cc: Mark Rutland Cc: Mathieu Desnoyers Cc: Andrew Morton Cc: Jiri Olsa Link: https://lore.kernel.org/20240914214826.780323141@goodmis.org Acked-by: Masami Hiramatsu (Google) Signed-off-by: Steven Rostedt (Google) --- include/linux/ftrace.h | 4 +-- kernel/trace/fgraph.c | 64 ++++++++++++++++++++++++++++++++---------- kernel/trace/ftrace.c | 23 +++++++-------- 3 files changed, 62 insertions(+), 29 deletions(-) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 42106b3de396..aabd348cad4a 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -1081,6 +1081,7 @@ struct fgraph_ops { =20 void *fgraph_reserve_data(int idx, int size_bytes); void *fgraph_retrieve_data(int idx, int *size_bytes); +void *fgraph_retrieve_parent_data(int idx, int *size_bytes, int depth); =20 /* * Stack of return addresses for functions @@ -1091,9 +1092,6 @@ struct ftrace_ret_stack { unsigned long ret; unsigned long func; unsigned long long calltime; -#ifdef CONFIG_FUNCTION_PROFILER - unsigned long long subtime; -#endif #ifdef HAVE_FUNCTION_GRAPH_FP_TEST unsigned long fp; #endif diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index d7d4fb403f6f..095ceb752b28 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -390,21 +390,7 @@ void *fgraph_reserve_data(int idx, int size_bytes) */ void *fgraph_retrieve_data(int idx, int *size_bytes) { - int offset =3D current->curr_ret_stack - 1; - unsigned long val; - - val =3D get_fgraph_entry(current, offset); - while (__get_type(val) =3D=3D FGRAPH_TYPE_DATA) { - if (__get_data_index(val) =3D=3D idx) - goto found; - offset -=3D __get_data_size(val) + 1; - val =3D get_fgraph_entry(current, offset); - } - return NULL; -found: - if (size_bytes) - *size_bytes =3D __get_data_size(val) * sizeof(long); - return get_data_type_data(current, offset); + return fgraph_retrieve_parent_data(idx, size_bytes, 0); } =20 /** @@ -460,6 +446,54 @@ get_ret_stack(struct task_struct *t, int offset, int *= frame_offset) return RET_STACK(t, offset); } =20 +/** + * fgraph_retrieve_parent_data - get data from a parent function + * @idx: The index into the fgraph_array (fgraph_ops::idx) + * @size_bytes: A pointer to retrieved data size + * @depth: The depth to find the parent (0 is the current function) + * + * This is similar to fgraph_retrieve_data() but can be used to retrieve + * data from a parent caller function. + * + * Return: a pointer to the specified parent data or NULL if not found + */ +void *fgraph_retrieve_parent_data(int idx, int *size_bytes, int depth) +{ + struct ftrace_ret_stack *ret_stack =3D NULL; + int offset =3D current->curr_ret_stack; + unsigned long val; + + if (offset <=3D 0) + return NULL; + + for (;;) { + int next_offset; + + ret_stack =3D get_ret_stack(current, offset, &next_offset); + if (!ret_stack || --depth < 0) + break; + offset =3D next_offset; + } + + if (!ret_stack) + return NULL; + + offset--; + + val =3D get_fgraph_entry(current, offset); + while (__get_type(val) =3D=3D FGRAPH_TYPE_DATA) { + if (__get_data_index(val) =3D=3D idx) + goto found; + offset -=3D __get_data_size(val) + 1; + val =3D get_fgraph_entry(current, offset); + } + return NULL; +found: + if (size_bytes) + *size_bytes =3D __get_data_size(val) * sizeof(long); + return get_data_type_data(current, offset); +} + /* Both enabled by default (can be cleared by function_graph tracer flags = */ static bool fgraph_sleep_time =3D true; =20 diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index d2dd71d04b8a..bac1f2ee1983 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -823,7 +823,7 @@ void ftrace_graph_graph_time_control(bool enable) static int profile_graph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops) { - struct ftrace_ret_stack *ret_stack; + unsigned long long *subtime; =20 function_profile_call(trace->func, 0, NULL, NULL); =20 @@ -831,9 +831,9 @@ static int profile_graph_entry(struct ftrace_graph_ent = *trace, if (!current->ret_stack) return 0; =20 - ret_stack =3D ftrace_graph_get_ret_stack(current, 0); - if (ret_stack) - ret_stack->subtime =3D 0; + subtime =3D fgraph_reserve_data(gops->idx, sizeof(*subtime)); + if (subtime) + *subtime =3D 0; =20 return 1; } @@ -841,11 +841,12 @@ static int profile_graph_entry(struct ftrace_graph_en= t *trace, static void profile_graph_return(struct ftrace_graph_ret *trace, struct fgraph_ops *gops) { - struct ftrace_ret_stack *ret_stack; struct ftrace_profile_stat *stat; unsigned long long calltime; + unsigned long long *subtime; struct ftrace_profile *rec; unsigned long flags; + int size; =20 local_irq_save(flags); stat =3D this_cpu_ptr(&ftrace_profile_stats); @@ -861,13 +862,13 @@ static void profile_graph_return(struct ftrace_graph_= ret *trace, if (!fgraph_graph_time) { =20 /* Append this call time to the parent time to subtract */ - ret_stack =3D ftrace_graph_get_ret_stack(current, 1); - if (ret_stack) - ret_stack->subtime +=3D calltime; + subtime =3D fgraph_retrieve_parent_data(gops->idx, &size, 1); + if (subtime) + *subtime +=3D calltime; =20 - ret_stack =3D ftrace_graph_get_ret_stack(current, 0); - if (ret_stack && ret_stack->subtime < calltime) - calltime -=3D ret_stack->subtime; + subtime =3D fgraph_retrieve_data(gops->idx, &size); + if (subtime && *subtime && *subtime < calltime) + calltime -=3D *subtime; else calltime =3D 0; } --=20 2.45.2 From nobody Thu Nov 28 11:04:01 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66EF61CEAAD for ; Tue, 1 Oct 2024 21:42:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727818937; cv=none; b=IfUeuSRDqN+PHldnMRjliaOfI7WufAI6ulKQiSvs/gBILPUhLn+TUPZNcDAzqyuvrvGDUKjgHsiWXv3OqpncglEfyzMV9DgCsBgmUwgNq115k9KkcP0k2oofgiXi1RUL0HGdzS7KGrMTaahHZeNL10h9kyfiRdbajS4YiNEc2Ow= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727818937; c=relaxed/simple; bh=Krv6efg8hJdteZuAcFpOSSHynouyppKNKWBgj1OCF+0=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=VUH2odiHjTtgesRpAdCtRXFxnbTv1vopaDDBwS0ubBoybPp7otDpeiTVi3SS3aTzTFt7U4pc3E+qX2QKmkmDYgQb0gWwRioiQKzK0Ot2CJkIeGtMVUTZShTXyH5PsPuUWjBV50LwSA4gdfW3YJ1OgWbLRUQtVhBrPUA+BVg7iHo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3EE96C4CEC6; Tue, 1 Oct 2024 21:42:17 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1svkeB-00000004cu4-2NdY; Tue, 01 Oct 2024 17:43:07 -0400 Message-ID: <20241001214307.423864695@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 01 Oct 2024 17:42:45 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Jiri Olsa Subject: [for-next][PATCH 4/5] ftrace: Use a running sleeptime instead of saving on shadow stack References: <20241001214241.688116616@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt The fgraph "sleep-time" option tells the function graph tracer and the profiler whether to include the time a function "sleeps" (is scheduled off the CPU) in its duration for the function. By default it is true, which means the duration of a function is calculated by the timestamp of when the function was entered to the timestamp of when it exits. If the "sleep-time" option is disabled, it needs to remove the time that the task was not running on the CPU during the function. Currently it is done in a sched_switch tracepoint probe where it moves the "calltime" (time of entry of the function) forward by the sleep time calculated. It updates all the calltime in the shadow stack. This is time consuming for those users of the function graph tracer that does not care about the sleep time. Instead, add a "ftrace_sleeptime" to the task_struct that gets the sleep time added each time the task wakes up. Then have the function entry save the current "ftrace_sleeptime" and on function exit, move the calltime forward by the difference of the current "ftrace_sleeptime" from the saved sleeptime. This removes one dependency of "calltime" needed to be on the shadow stack. It also simplifies the code that removes the sleep time of functions. TODO: Only enable the sched_switch tracepoint when this is needed. Cc: Mark Rutland Cc: Mathieu Desnoyers Cc: Andrew Morton Cc: Jiri Olsa Link: https://lore.kernel.org/20240914214826.938908568@goodmis.org Acked-by: Masami Hiramatsu (Google) Signed-off-by: Steven Rostedt (Google) --- include/linux/sched.h | 1 + kernel/trace/fgraph.c | 16 ++---------- kernel/trace/ftrace.c | 39 ++++++++++++++++++++-------- kernel/trace/trace.h | 1 + kernel/trace/trace_functions_graph.c | 28 ++++++++++++++++++++ 5 files changed, 60 insertions(+), 25 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index e6ee4258169a..c08f3bdb11a5 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1441,6 +1441,7 @@ struct task_struct { =20 /* Timestamp for last schedule: */ unsigned long long ftrace_timestamp; + unsigned long long ftrace_sleeptime; =20 /* * Number of functions that haven't been traced diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index 095ceb752b28..b2e95bf82211 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -495,7 +495,7 @@ void *fgraph_retrieve_parent_data(int idx, int *size_by= tes, int depth) } =20 /* Both enabled by default (can be cleared by function_graph tracer flags = */ -static bool fgraph_sleep_time =3D true; +bool fgraph_sleep_time =3D true; =20 #ifdef CONFIG_DYNAMIC_FTRACE /* @@ -1046,9 +1046,7 @@ ftrace_graph_probe_sched_switch(void *ignore, bool pr= eempt, struct task_struct *next, unsigned int prev_state) { - struct ftrace_ret_stack *ret_stack; unsigned long long timestamp; - int offset; =20 /* * Does the user want to count the time a function was asleep. @@ -1065,17 +1063,7 @@ ftrace_graph_probe_sched_switch(void *ignore, bool p= reempt, if (!next->ftrace_timestamp) return; =20 - /* - * Update all the counters in next to make up for the - * time next was sleeping. - */ - timestamp -=3D next->ftrace_timestamp; - - for (offset =3D next->curr_ret_stack; offset > 0; ) { - ret_stack =3D get_ret_stack(next, offset, &offset); - if (ret_stack) - ret_stack->calltime +=3D timestamp; - } + next->ftrace_sleeptime +=3D timestamp - next->ftrace_timestamp; } =20 static DEFINE_PER_CPU(unsigned long *, idle_ret_stack); diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index bac1f2ee1983..90b3975d5315 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -820,10 +820,15 @@ void ftrace_graph_graph_time_control(bool enable) fgraph_graph_time =3D enable; } =20 +struct profile_fgraph_data { + unsigned long long subtime; + unsigned long long sleeptime; +}; + static int profile_graph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops) { - unsigned long long *subtime; + struct profile_fgraph_data *profile_data; =20 function_profile_call(trace->func, 0, NULL, NULL); =20 @@ -831,9 +836,12 @@ static int profile_graph_entry(struct ftrace_graph_ent= *trace, if (!current->ret_stack) return 0; =20 - subtime =3D fgraph_reserve_data(gops->idx, sizeof(*subtime)); - if (subtime) - *subtime =3D 0; + profile_data =3D fgraph_reserve_data(gops->idx, sizeof(*profile_data)); + if (!profile_data) + return 0; + + profile_data->subtime =3D 0; + profile_data->sleeptime =3D current->ftrace_sleeptime; =20 return 1; } @@ -841,9 +849,10 @@ static int profile_graph_entry(struct ftrace_graph_ent= *trace, static void profile_graph_return(struct ftrace_graph_ret *trace, struct fgraph_ops *gops) { + struct profile_fgraph_data *profile_data; + struct profile_fgraph_data *parent_data; struct ftrace_profile_stat *stat; unsigned long long calltime; - unsigned long long *subtime; struct ftrace_profile *rec; unsigned long flags; int size; @@ -859,16 +868,24 @@ static void profile_graph_return(struct ftrace_graph_= ret *trace, =20 calltime =3D trace->rettime - trace->calltime; =20 + if (!fgraph_sleep_time) { + profile_data =3D fgraph_retrieve_data(gops->idx, &size); + if (profile_data && current->ftrace_sleeptime) + calltime -=3D current->ftrace_sleeptime - profile_data->sleeptime; + } + if (!fgraph_graph_time) { =20 /* Append this call time to the parent time to subtract */ - subtime =3D fgraph_retrieve_parent_data(gops->idx, &size, 1); - if (subtime) - *subtime +=3D calltime; + parent_data =3D fgraph_retrieve_parent_data(gops->idx, &size, 1); + if (parent_data) + parent_data->subtime +=3D calltime; + + if (!profile_data) + profile_data =3D fgraph_retrieve_data(gops->idx, &size); =20 - subtime =3D fgraph_retrieve_data(gops->idx, &size); - if (subtime && *subtime && *subtime < calltime) - calltime -=3D *subtime; + if (profile_data && profile_data->subtime && profile_data->subtime < cal= ltime) + calltime -=3D profile_data->subtime; else calltime =3D 0; } diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index c866991b9c78..2f8017f8d34d 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1048,6 +1048,7 @@ static inline void ftrace_graph_addr_finish(struct fg= raph_ops *gops, struct ftra #endif /* CONFIG_DYNAMIC_FTRACE */ =20 extern unsigned int fgraph_max_depth; +extern bool fgraph_sleep_time; =20 static inline bool ftrace_graph_ignore_func(struct fgraph_ops *gops, struct ftrace_graph_ent = *trace) diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_func= tions_graph.c index a569daaac4c4..bbd898f5a73c 100644 --- a/kernel/trace/trace_functions_graph.c +++ b/kernel/trace/trace_functions_graph.c @@ -133,6 +133,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace, unsigned long *task_var =3D fgraph_get_task_var(gops); struct trace_array *tr =3D gops->private; struct trace_array_cpu *data; + unsigned long *sleeptime; unsigned long flags; unsigned int trace_ctx; long disabled; @@ -167,6 +168,13 @@ int trace_graph_entry(struct ftrace_graph_ent *trace, if (ftrace_graph_ignore_irqs()) return 0; =20 + /* save the current sleep time if we are to ignore it */ + if (!fgraph_sleep_time) { + sleeptime =3D fgraph_reserve_data(gops->idx, sizeof(*sleeptime)); + if (sleeptime) + *sleeptime =3D current->ftrace_sleeptime; + } + /* * Stop here if tracing_threshold is set. We only write function return * events to the ring buffer. @@ -238,6 +246,22 @@ void __trace_graph_return(struct trace_array *tr, trace_buffer_unlock_commit_nostack(buffer, event); } =20 +static void handle_nosleeptime(struct ftrace_graph_ret *trace, + struct fgraph_ops *gops) +{ + unsigned long long *sleeptime; + int size; + + if (fgraph_sleep_time) + return; + + sleeptime =3D fgraph_retrieve_data(gops->idx, &size); + if (!sleeptime) + return; + + trace->calltime +=3D current->ftrace_sleeptime - *sleeptime; +} + void trace_graph_return(struct ftrace_graph_ret *trace, struct fgraph_ops *gops) { @@ -256,6 +280,8 @@ void trace_graph_return(struct ftrace_graph_ret *trace, return; } =20 + handle_nosleeptime(trace, gops); + local_irq_save(flags); cpu =3D raw_smp_processor_id(); data =3D per_cpu_ptr(tr->array_buffer.data, cpu); @@ -278,6 +304,8 @@ static void trace_graph_thresh_return(struct ftrace_gra= ph_ret *trace, return; } =20 + handle_nosleeptime(trace, gops); + if (tracing_thresh && (trace->rettime - trace->calltime < tracing_thresh)) return; --=20 2.45.2 From nobody Thu Nov 28 11:04:01 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 885AA1CEAD2 for ; Tue, 1 Oct 2024 21:42:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727818937; cv=none; b=VZV7h5QBWRw5Hut6Ahsg5Npccv2eWFLLoo2IRcjdR1aro8GmEPY2fW3JT2h01fICpTnvIwJwiq2MbTDvEcp6373zkzo7v9vjkbeNlR+op3TQFDFBJrP1C3lYR+hfowMFkU75b0nLvyUFnsvK1SosqpCcQF5bnxxyl7n+Eo/ozFg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727818937; c=relaxed/simple; bh=e7q3K/UZUHRoVe64R1MhxDfeTAXAQyE0IPS//p4D/rw=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=U8OjBBnl/bI4WozSqARW95XVLp928EpYd4PbrndT6dKUhPiukXA77d5rgikAbJ74ryOKlSI/QeIRN0zUKA8ywZP2zirBJKmUbnELZOL9+QeBbZjmISaguSRUa6aL1JvzprpQyy+4B6YbFHsWV2VZdFMc3bAc2KYjc6EQEU9iLgM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 56434C4CECF; Tue, 1 Oct 2024 21:42:17 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1svkeB-00000004cuY-34Ge; Tue, 01 Oct 2024 17:43:07 -0400 Message-ID: <20241001214307.590205295@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 01 Oct 2024 17:42:46 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Jiri Olsa Subject: [for-next][PATCH 5/5] ftrace: Have calltime be saved in the fgraph storage References: <20241001214241.688116616@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt The calltime field in the shadow stack frame is only used by the function graph tracer and profiler. But now that there's other users of the function graph infrastructure, this adds overhead and wastes space on the shadow stack. Move the calltime to the fgraph data storage, where the function graph and profiler entry functions will save it in its own graph storage and retrieve it in its exit functions. Cc: Mark Rutland Cc: Mathieu Desnoyers Cc: Andrew Morton Cc: Jiri Olsa Link: https://lore.kernel.org/20240914214827.096968730@goodmis.org Acked-by: Masami Hiramatsu (Google) Signed-off-by: Steven Rostedt (Google) --- include/linux/ftrace.h | 1 - kernel/trace/fgraph.c | 5 --- kernel/trace/ftrace.c | 19 ++++----- kernel/trace/trace_functions_graph.c | 60 +++++++++++++++++++--------- 4 files changed, 51 insertions(+), 34 deletions(-) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index aabd348cad4a..e684addf6508 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -1091,7 +1091,6 @@ void *fgraph_retrieve_parent_data(int idx, int *size_= bytes, int depth); struct ftrace_ret_stack { unsigned long ret; unsigned long func; - unsigned long long calltime; #ifdef HAVE_FUNCTION_GRAPH_FP_TEST unsigned long fp; #endif diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index b2e95bf82211..58a28ec35dab 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -558,7 +558,6 @@ ftrace_push_return_trace(unsigned long ret, unsigned lo= ng func, int fgraph_idx) { struct ftrace_ret_stack *ret_stack; - unsigned long long calltime; unsigned long val; int offset; =20 @@ -588,8 +587,6 @@ ftrace_push_return_trace(unsigned long ret, unsigned lo= ng func, return -EBUSY; } =20 - calltime =3D trace_clock_local(); - offset =3D READ_ONCE(current->curr_ret_stack); ret_stack =3D RET_STACK(current, offset); offset +=3D FGRAPH_FRAME_OFFSET; @@ -623,7 +620,6 @@ ftrace_push_return_trace(unsigned long ret, unsigned lo= ng func, =20 ret_stack->ret =3D ret; ret_stack->func =3D func; - ret_stack->calltime =3D calltime; #ifdef HAVE_FUNCTION_GRAPH_FP_TEST ret_stack->fp =3D frame_pointer; #endif @@ -757,7 +753,6 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace,= unsigned long *ret, *offset +=3D FGRAPH_FRAME_OFFSET; *ret =3D ret_stack->ret; trace->func =3D ret_stack->func; - trace->calltime =3D ret_stack->calltime; trace->overrun =3D atomic_read(¤t->trace_overrun); trace->depth =3D current->curr_ret_depth; /* diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 90b3975d5315..cae388122ca8 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -821,6 +821,7 @@ void ftrace_graph_graph_time_control(bool enable) } =20 struct profile_fgraph_data { + unsigned long long calltime; unsigned long long subtime; unsigned long long sleeptime; }; @@ -842,6 +843,7 @@ static int profile_graph_entry(struct ftrace_graph_ent = *trace, =20 profile_data->subtime =3D 0; profile_data->sleeptime =3D current->ftrace_sleeptime; + profile_data->calltime =3D trace_clock_local(); =20 return 1; } @@ -850,9 +852,9 @@ static void profile_graph_return(struct ftrace_graph_re= t *trace, struct fgraph_ops *gops) { struct profile_fgraph_data *profile_data; - struct profile_fgraph_data *parent_data; struct ftrace_profile_stat *stat; unsigned long long calltime; + unsigned long long rettime =3D trace_clock_local(); struct ftrace_profile *rec; unsigned long flags; int size; @@ -862,29 +864,28 @@ static void profile_graph_return(struct ftrace_graph_= ret *trace, if (!stat->hash || !ftrace_profile_enabled) goto out; =20 + profile_data =3D fgraph_retrieve_data(gops->idx, &size); + /* If the calltime was zero'd ignore it */ - if (!trace->calltime) + if (!profile_data || !profile_data->calltime) goto out; =20 - calltime =3D trace->rettime - trace->calltime; + calltime =3D rettime - profile_data->calltime; =20 if (!fgraph_sleep_time) { - profile_data =3D fgraph_retrieve_data(gops->idx, &size); - if (profile_data && current->ftrace_sleeptime) + if (current->ftrace_sleeptime) calltime -=3D current->ftrace_sleeptime - profile_data->sleeptime; } =20 if (!fgraph_graph_time) { + struct profile_fgraph_data *parent_data; =20 /* Append this call time to the parent time to subtract */ parent_data =3D fgraph_retrieve_parent_data(gops->idx, &size, 1); if (parent_data) parent_data->subtime +=3D calltime; =20 - if (!profile_data) - profile_data =3D fgraph_retrieve_data(gops->idx, &size); - - if (profile_data && profile_data->subtime && profile_data->subtime < cal= ltime) + if (profile_data->subtime && profile_data->subtime < calltime) calltime -=3D profile_data->subtime; else calltime =3D 0; diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_func= tions_graph.c index bbd898f5a73c..5c1b150fbba3 100644 --- a/kernel/trace/trace_functions_graph.c +++ b/kernel/trace/trace_functions_graph.c @@ -127,13 +127,18 @@ static inline int ftrace_graph_ignore_irqs(void) return in_hardirq(); } =20 +struct fgraph_times { + unsigned long long calltime; + unsigned long long sleeptime; /* may be optional! */ +}; + int trace_graph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops) { unsigned long *task_var =3D fgraph_get_task_var(gops); struct trace_array *tr =3D gops->private; struct trace_array_cpu *data; - unsigned long *sleeptime; + struct fgraph_times *ftimes; unsigned long flags; unsigned int trace_ctx; long disabled; @@ -168,12 +173,18 @@ int trace_graph_entry(struct ftrace_graph_ent *trace, if (ftrace_graph_ignore_irqs()) return 0; =20 - /* save the current sleep time if we are to ignore it */ - if (!fgraph_sleep_time) { - sleeptime =3D fgraph_reserve_data(gops->idx, sizeof(*sleeptime)); - if (sleeptime) - *sleeptime =3D current->ftrace_sleeptime; + if (fgraph_sleep_time) { + /* Only need to record the calltime */ + ftimes =3D fgraph_reserve_data(gops->idx, sizeof(ftimes->calltime)); + } else { + ftimes =3D fgraph_reserve_data(gops->idx, sizeof(*ftimes)); + if (ftimes) + ftimes->sleeptime =3D current->ftrace_sleeptime; } + if (!ftimes) + return 0; + + ftimes->calltime =3D trace_clock_local(); =20 /* * Stop here if tracing_threshold is set. We only write function return @@ -247,19 +258,13 @@ void __trace_graph_return(struct trace_array *tr, } =20 static void handle_nosleeptime(struct ftrace_graph_ret *trace, - struct fgraph_ops *gops) + struct fgraph_times *ftimes, + int size) { - unsigned long long *sleeptime; - int size; - - if (fgraph_sleep_time) - return; - - sleeptime =3D fgraph_retrieve_data(gops->idx, &size); - if (!sleeptime) + if (fgraph_sleep_time || size < sizeof(*ftimes)) return; =20 - trace->calltime +=3D current->ftrace_sleeptime - *sleeptime; + ftimes->calltime +=3D current->ftrace_sleeptime - ftimes->sleeptime; } =20 void trace_graph_return(struct ftrace_graph_ret *trace, @@ -268,9 +273,11 @@ void trace_graph_return(struct ftrace_graph_ret *trace, unsigned long *task_var =3D fgraph_get_task_var(gops); struct trace_array *tr =3D gops->private; struct trace_array_cpu *data; + struct fgraph_times *ftimes; unsigned long flags; unsigned int trace_ctx; long disabled; + int size; int cpu; =20 ftrace_graph_addr_finish(gops, trace); @@ -280,7 +287,13 @@ void trace_graph_return(struct ftrace_graph_ret *trace, return; } =20 - handle_nosleeptime(trace, gops); + ftimes =3D fgraph_retrieve_data(gops->idx, &size); + if (!ftimes) + return; + + handle_nosleeptime(trace, ftimes, size); + + trace->calltime =3D ftimes->calltime; =20 local_irq_save(flags); cpu =3D raw_smp_processor_id(); @@ -297,6 +310,9 @@ void trace_graph_return(struct ftrace_graph_ret *trace, static void trace_graph_thresh_return(struct ftrace_graph_ret *trace, struct fgraph_ops *gops) { + struct fgraph_times *ftimes; + int size; + ftrace_graph_addr_finish(gops, trace); =20 if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) { @@ -304,10 +320,16 @@ static void trace_graph_thresh_return(struct ftrace_g= raph_ret *trace, return; } =20 - handle_nosleeptime(trace, gops); + ftimes =3D fgraph_retrieve_data(gops->idx, &size); + if (!ftimes) + return; + + handle_nosleeptime(trace, ftimes, size); + + trace->calltime =3D ftimes->calltime; =20 if (tracing_thresh && - (trace->rettime - trace->calltime < tracing_thresh)) + (trace->rettime - ftimes->calltime < tracing_thresh)) return; else trace_graph_return(trace, gops); --=20 2.45.2