[RESEND PATCH v3] exit: move trace_sched_process_exit earlier in do_exit()

wenyang.linux@foxmail.com posted 1 patch 1 year, 10 months ago
kernel/exit.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[RESEND PATCH v3] exit: move trace_sched_process_exit earlier in do_exit()
Posted by wenyang.linux@foxmail.com 1 year, 10 months ago
From: Wen Yang <wenyang.linux@foxmail.com>

In a safety critical system, when some processes exit abnormally,
it is hoped that prompt information can be reported to the monitor
as soon as possible.

Commit 2d4bcf886e42 ("exit: Remove profile_task_exit & profile_munmap")
simplified the code, but also removed profile_task_exit(), which may
prevent third-party kernel modules from detecting process exits timely.

Compared to adding an extra tracking point, it is better to move the
existing trace_sched_process_exit() earlier in do_exit(), since any tracer
interested in knowing the point where a task is really reclaimed is
trace_sched_process_free() called from delayed_put_task_struct().[1]

Andrew raised a concern:
If userspace is awaiting this notification to say "it's now OK to read
the dump file" then it could break things?
The nearby proc_exit_connector() can be used for this purpose. But we
couldn't find any specific code that concerns the location of
trace_sched_process_exit().

Oleg initially proposed this suggestion, and Steven further provided some
detailed suggestions, and Mathieu carefully checked the historical code
and said:
: I've checked with Matthew Khouzam (maintainer of Trace Compass)
: which care about this tracepoint, and we have not identified any
: significant impact of moving it on its model of the scheduler, other
: than slightly changing its timing.
: I've also checked quickly in lttng-analyses and have not found
: any code that care about its specific placement.
: So I would say go ahead and move it earlier in do_exit(), it's
: fine by me. [2]

[1]: https://lore.kernel.org/all/c411eda5-5378-4511-bea3-d1566174c8c7@efficios.com/
[2]: https://lore.kernel.org/all/c9427e40-10b1-49eb-9baa-dde1364e8fe5@efficios.com/

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Wen Yang <wenyang.linux@foxmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
---
 kernel/exit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 493647fd7c07..2cff6533cb39 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -826,6 +826,7 @@ void __noreturn do_exit(long code)
 
 	WARN_ON(tsk->plug);
 
+	trace_sched_process_exit(tsk);
 	kcov_task_exit(tsk);
 	kmsan_task_exit(tsk);
 
@@ -866,7 +867,6 @@ void __noreturn do_exit(long code)
 
 	if (group_dead)
 		acct_process();
-	trace_sched_process_exit(tsk);
 
 	exit_sem(tsk);
 	exit_shm(tsk);
-- 
2.25.1
Re: [RESEND PATCH v3] exit: move trace_sched_process_exit earlier in do_exit()
Posted by Ingo Molnar 1 year, 10 months ago
* wenyang.linux@foxmail.com <wenyang.linux@foxmail.com> wrote:

> From: Wen Yang <wenyang.linux@foxmail.com>
> 
> In a safety critical system, when some processes exit abnormally, it 
> is hoped that prompt information can be reported to the monitor as 
> soon as possible.

If this event is so critical to catch, a probe can be put on do_exit(). 
This will be superior to your patch, because it will notify about the 
event even sooner.

> Commit 2d4bcf886e42 ("exit: Remove profile_task_exit & 
> profile_munmap") simplified the code, but also removed 
> profile_task_exit(), which may prevent third-party kernel modules 
> from detecting process exits timely.

Could you point out an example of such third-party kernel modules, and 
why we should care about them?

> Compared to adding an extra tracking point, it is better to move the 
> existing trace_sched_process_exit() earlier in do_exit(), since any 
> tracer interested in knowing the point where a task is really 
> reclaimed is trace_sched_process_free() called from 
> delayed_put_task_struct().[1]

I disagree, I think this scheduler tracepoint should be moved even 
*later* in the exit sequence, and be combined with 
sched_autogroup_exit_task(), so that the scheduler only has a single 
exit-notification callback in essence.

Until this is all done cleanly no tree should pick up this change:

    NAKed-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo