[PATCH resend] hung_task: add task->flags, blocked by coredump to log

Oxana Kharitonova posted 1 patch 11 months, 1 week ago
kernel/hung_task.c  | 2 ++
kernel/sched/core.c | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)
[PATCH resend] hung_task: add task->flags, blocked by coredump to log
Posted by Oxana Kharitonova 11 months, 1 week ago
Resending this patch as I haven't received feedback on my initial 
submission https://lore.kernel.org/all/20241204182953.10854-1-oxana@cloudflare.com/

For the processes which are terminated abnormally the kernel can provide 
a coredump if enabled. When the coredump is performed, the process and 
all its threads are put into the D state 
(TASK_UNINTERRUPTIBLE | TASK_FREEZABLE). 

On the other hand, we have kernel thread khungtaskd which monitors the 
processes in the D state. If the task stuck in the D state more than 
kernel.hung_task_timeout_secs, the hung_task alert appears in the kernel 
log.

The higher memory usage of a process, the longer it takes to create 
coredump, the longer tasks are in the D state. We have hung_task alerts 
for the processes with memory usage above 10Gb. Although, our 
kernel.hung_task_timeout_secs is 10 sec when the default is 120 sec.

Adding additional information to the log that the task is blocked by 
coredump will help with monitoring. Another approach might be to 
completely filter out alerts for such tasks, but in that case we would 
lose transparency about what is putting pressure on some system 
resources, e.g. we saw an increase in I/O when coredump occurs due its 
writing to disk.

Additionally, it would be helpful to have task_struct->flags in the log 
from the function sched_show_task(). Currently it prints 
task_struct->thread_info->flags, this seems misleading as the line 
starts with "task:xxxx".

Signed-off-by: Oxana Kharitonova <oxana@cloudflare.com>
---
 kernel/hung_task.c  | 2 ++
 kernel/sched/core.c | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index c18717189f32..953169893a95 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -147,6 +147,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
 			print_tainted(), init_utsname()->release,
 			(int)strcspn(init_utsname()->version, " "),
 			init_utsname()->version);
+		if (t->flags & PF_POSTCOREDUMP)
+			pr_err("      Blocked by coredump.\n");
 		pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
 			" disables this message.\n");
 		sched_show_task(t);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3e5a6bf587f9..77b6af12e146 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7701,9 +7701,9 @@ void sched_show_task(struct task_struct *p)
 	if (pid_alive(p))
 		ppid = task_pid_nr(rcu_dereference(p->real_parent));
 	rcu_read_unlock();
-	pr_cont(" stack:%-5lu pid:%-5d tgid:%-5d ppid:%-6d flags:0x%08lx\n",
+	pr_cont(" stack:%-5lu pid:%-5d tgid:%-5d ppid:%-6d task_flags:0x%08lx flags:0x%08lx\n",
 		free, task_pid_nr(p), task_tgid_nr(p),
-		ppid, read_task_thread_flags(p));
+		ppid, p->flags, read_task_thread_flags(p));
 
 	print_worker_info(KERN_INFO, p);
 	print_stop_info(KERN_INFO, p);
-- 
2.39.5
Re: [PATCH resend] hung_task: add task->flags, blocked by coredump to log
Posted by Andrew Morton 11 months, 1 week ago
On Fri, 10 Jan 2025 16:03:28 +0000 Oxana Kharitonova <oxana@cloudflare.com> wrote:

> Resending this patch as I haven't received feedback on my initial 
> submission https://lore.kernel.org/all/20241204182953.10854-1-oxana@cloudflare.com/
> 
> For the processes which are terminated abnormally the kernel can provide 
> a coredump if enabled. When the coredump is performed, the process and 
> all its threads are put into the D state 
> (TASK_UNINTERRUPTIBLE | TASK_FREEZABLE). 
> 
> On the other hand, we have kernel thread khungtaskd which monitors the 
> processes in the D state. If the task stuck in the D state more than 
> kernel.hung_task_timeout_secs, the hung_task alert appears in the kernel 
> log.
> 
> The higher memory usage of a process, the longer it takes to create 
> coredump, the longer tasks are in the D state. We have hung_task alerts 
> for the processes with memory usage above 10Gb. Although, our 
> kernel.hung_task_timeout_secs is 10 sec when the default is 120 sec.
> 
> Adding additional information to the log that the task is blocked by 
> coredump will help with monitoring. Another approach might be to 
> completely filter out alerts for such tasks, but in that case we would 
> lose transparency about what is putting pressure on some system 
> resources, e.g. we saw an increase in I/O when coredump occurs due its 
> writing to disk.
> 
> Additionally, it would be helpful to have task_struct->flags in the log 
> from the function sched_show_task(). Currently it prints 
> task_struct->thread_info->flags, this seems misleading as the line 
> starts with "task:xxxx".
> 

I added the below fix.


From: Andrew Morton <akpm@linux-foundation.org>
Subject: hung_task-add-task-flags-blocked-by-coredump-to-log-fix
Date: Fri Jan 10 07:51:53 PM PST 2025

fix printk control string

In file included from ./include/asm-generic/bug.h:22,
                 from ./arch/x86/include/asm/bug.h:99,
                 from ./include/linux/bug.h:5,
                 from ./arch/x86/include/asm/paravirt.h:19,
                 from ./arch/x86/include/asm/irqflags.h:80,
                 from ./include/linux/irqflags.h:18,
                 from ./include/linux/spinlock.h:59,
                 from ./include/linux/wait.h:9,
                 from ./include/linux/wait_bit.h:8,
                 from ./include/linux/fs.h:6,
                 from ./include/linux/highmem.h:5,
                 from kernel/sched/core.c:10:
kernel/sched/core.c: In function 'sched_show_task':
./include/linux/kern_levels.h:5:25: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'unsigned int' [-Werror=format=]
    5 | #define KERN_SOH        "\001"          /* ASCII Start Of Header */
      |                         ^~~~~~
./include/linux/printk.h:473:25: note: in definition of macro 'printk_index_wrap'
  473 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
      |                         ^~~~
./include/linux/printk.h:586:9: note: in expansion of macro 'printk'
  586 |         printk(KERN_CONT fmt, ##__VA_ARGS__)
      |         ^~~~~~
./include/linux/kern_levels.h:24:25: note: in expansion of macro 'KERN_SOH'
   24 | #define KERN_CONT       KERN_SOH "c"
      |                         ^~~~~~~~
./include/linux/printk.h:586:16: note: in expansion of macro 'KERN_CONT'
  586 |         printk(KERN_CONT fmt, ##__VA_ARGS__)
      |                ^~~~~~~~~
kernel/sched/core.c:7704:9: note: in expansion of macro 'pr_cont'
 7704 |         pr_cont(" stack:%-5lu pid:%-5d tgid:%-5d ppid:%-6d task_flags:0x%08lx flags:0x%08lx\n",
      |         ^~~~~~~
cc1: all warnings being treated as errors

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ben Segall <bsegall@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oxana Kharitonova <oxana@cloudflare.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/sched/core.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/sched/core.c~hung_task-add-task-flags-blocked-by-coredump-to-log-fix
+++ a/kernel/sched/core.c
@@ -7701,7 +7701,7 @@ void sched_show_task(struct task_struct
 	if (pid_alive(p))
 		ppid = task_pid_nr(rcu_dereference(p->real_parent));
 	rcu_read_unlock();
-	pr_cont(" stack:%-5lu pid:%-5d tgid:%-5d ppid:%-6d task_flags:0x%08lx flags:0x%08lx\n",
+	pr_cont(" stack:%-5lu pid:%-5d tgid:%-5d ppid:%-6d task_flags:0x%04x flags:0x%08lx\n",
 		free, task_pid_nr(p), task_tgid_nr(p),
 		ppid, p->flags, read_task_thread_flags(p));
 
_
Re: [PATCH resend] hung_task: add task->flags, blocked by coredump to log
Posted by Oxana Kharitonova 11 months, 1 week ago
On Sat, Jan 11, 2025 at 3:54 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> I added the below fix.
>

Thank you for taking care of the patch and adding the fix!

>
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: hung_task-add-task-flags-blocked-by-coredump-to-log-fix
> Date: Fri Jan 10 07:51:53 PM PST 2025
>
> fix printk control string
>
> In file included from ./include/asm-generic/bug.h:22,
>                  from ./arch/x86/include/asm/bug.h:99,
>                  from ./include/linux/bug.h:5,
>                  from ./arch/x86/include/asm/paravirt.h:19,
>                  from ./arch/x86/include/asm/irqflags.h:80,
>                  from ./include/linux/irqflags.h:18,
>                  from ./include/linux/spinlock.h:59,
>                  from ./include/linux/wait.h:9,
>                  from ./include/linux/wait_bit.h:8,
>                  from ./include/linux/fs.h:6,
>                  from ./include/linux/highmem.h:5,
>                  from kernel/sched/core.c:10:
> kernel/sched/core.c: In function 'sched_show_task':
> ./include/linux/kern_levels.h:5:25: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'unsigned int' [-Werror=format=]
>     5 | #define KERN_SOH        "\001"          /* ASCII Start Of Header */
>       |                         ^~~~~~
> ./include/linux/printk.h:473:25: note: in definition of macro 'printk_index_wrap'
>   473 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
>       |                         ^~~~
> ./include/linux/printk.h:586:9: note: in expansion of macro 'printk'
>   586 |         printk(KERN_CONT fmt, ##__VA_ARGS__)
>       |         ^~~~~~
> ./include/linux/kern_levels.h:24:25: note: in expansion of macro 'KERN_SOH'
>    24 | #define KERN_CONT       KERN_SOH "c"
>       |                         ^~~~~~~~
> ./include/linux/printk.h:586:16: note: in expansion of macro 'KERN_CONT'
>   586 |         printk(KERN_CONT fmt, ##__VA_ARGS__)
>       |                ^~~~~~~~~
> kernel/sched/core.c:7704:9: note: in expansion of macro 'pr_cont'
>  7704 |         pr_cont(" stack:%-5lu pid:%-5d tgid:%-5d ppid:%-6d task_flags:0x%08lx flags:0x%08lx\n",
>       |         ^~~~~~~
> cc1: all warnings being treated as errors
>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Oxana Kharitonova <oxana@cloudflare.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>  kernel/sched/core.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/kernel/sched/core.c~hung_task-add-task-flags-blocked-by-coredump-to-log-fix
> +++ a/kernel/sched/core.c
> @@ -7701,7 +7701,7 @@ void sched_show_task(struct task_struct
>         if (pid_alive(p))
>                 ppid = task_pid_nr(rcu_dereference(p->real_parent));
>         rcu_read_unlock();
> -       pr_cont(" stack:%-5lu pid:%-5d tgid:%-5d ppid:%-6d task_flags:0x%08lx flags:0x%08lx\n",
> +       pr_cont(" stack:%-5lu pid:%-5d tgid:%-5d ppid:%-6d task_flags:0x%04x flags:0x%08lx\n",
>                 free, task_pid_nr(p), task_tgid_nr(p),
>                 ppid, p->flags, read_task_thread_flags(p));
>
> _
>