From nobody Thu Dec 18 09:47:07 2025 Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CAA9120DD5A for ; Fri, 10 Jan 2025 16:04:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736525065; cv=none; b=BwRTrOGwYY7XvK5vdhESebtomNi6Cmc2gBqakX+y8zMVbtT4aKsHumJ7cZ0YzKjsOhjC9J5+6IeZrKKaepGvdZyZAHjkMFMG8FBAE1vLm9zkBuYc619hLbIcn2ZIaWs5bONaQSP+ozVvXMOgmhbEyeCrKwhmyFGjtPNYfXodB7E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736525065; c=relaxed/simple; bh=F8Mla67SWQ9r7Z07mBQtXqM397VN7ABDNoM/gmtYNmw=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=IYnWZs8dLQkss1q8OEJyaKmvhIZWA+Mr9u6hqEh3ujfY011K1C7dDN4lGdOrWdv+QeCv/VQTLLSv/LZdGdDfLn5gdh3+/8gbm5D+DVIFW6zFO9kohayCL3w8AEjwr3a32dQK3z66YUCILC2yUGYmVhHXuQCAE4kuYO3NMnsN+5I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cloudflare.com; spf=pass smtp.mailfrom=cloudflare.com; dkim=pass (2048-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b=HcCjzzCI; arc=none smtp.client-ip=209.85.221.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cloudflare.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cloudflare.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b="HcCjzzCI" Received: by mail-wr1-f52.google.com with SMTP id ffacd0b85a97d-386329da1d9so1205330f8f.1 for ; Fri, 10 Jan 2025 08:04:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google09082023; t=1736525061; x=1737129861; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=vSoP01tzB02pbAfxR1chCM56yNEmqMh6nl0Xn/8+k8g=; b=HcCjzzCIS+9jSDJK1W1q8HTpWlW4/Orfv012xLUR2I9/Oc9fczugXoATbz0qFPXrU+ TlXsV8B6pO+evdUAbYAZJS8eXWRnKHUWFyU0vC33n9n0O+i87Qd+dlDhtQR4qE0i9Kua os9g2HwfVYrOcub8TFK/eucLhjqrlgB1hNgo3t0TaSgFAo8vjzImA4kG7IYBVunPZqUT lX6DKqTL22f8wt8BovJ/tipJGbwrUQQ+jwUlW5u4Mffs2AzkzPT4kY2NJEakxSH3zTzM dOBzKCIPeFvwe0M7StfyOprY8cibG3KbcT3EwmFuyOFUwJSFG7Cne+NO8mjBW3kym2L8 wVgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736525061; x=1737129861; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=vSoP01tzB02pbAfxR1chCM56yNEmqMh6nl0Xn/8+k8g=; b=TDshNBqSyLxcLRjnYxqzXitsEnEdnBIP5Gb9v2Q84z+34NaE9d7q3YiFHlmdj1pixj WL7i0KFZdcyyie6fIpmzqCvKcOGyEP0uYPCE5u72zHCuiuTZpk7FLOgO/vXVqS7ZXuty /TotgK3bxZDEJAE5XF7YeDgqiSIpWn4FU6Wfk3K3UxqMW+VA+ByvKkDXjbF3Op16Qbhv ufERIvTwUv603TZBBgx4YrYOy34YSGTbkcucWPd+O+ebFR/lAvHHribIuttemDHwLCi5 hXRiNPxO3XQvlh2zNlOhUhnvQBcxNTFYcCK2NZoYl4VO7KQxGY9PEh7Env763kSlmFKN Ph3w== X-Forwarded-Encrypted: i=1; AJvYcCVErEww5ltRQZIxODAxHFnWPX8o2WgMqILOPKtr7aJ4Eeg/z6prWZ8Cm0SgP2LBeBC+pt3UeooIH67WMhA=@vger.kernel.org X-Gm-Message-State: AOJu0Yy6Sy8Q96PPf+aK8m3qR8cIKqKTWJfzyoxQasxdYdVPTftLqNel N1iXQeznN6ALvTMkV+eMbI7pOfddVApU9w0A79jdRfHKsFY+4zenYAvoHKftia0= X-Gm-Gg: ASbGncvhvT7oPYJ+Yrn6lY8Q5X67c3YOUSLlZFDLT5wbqzS59AxHooDlnOvkMvC2oo7 xbLV/Gxo0bHpDidAtZDWhWVD8CaweQ1e+1vp8PIQ1ywQdvIJY56kaA6RJ5KIeYvvNyTmFNIKvDB pJ3WixqJ+sH1S/lKwsjvJ3T+B1b+PYgfRCeaMQ1XldsV4HniVmCQx206BXgmFRbQ0/dqk2y/V6h +EsPG4Jzv0GfGmPiq6SkI4z/8FlayN2CRnYM+S3g28wHwPBVSPPpC/3q8I= X-Google-Smtp-Source: AGHT+IFVRrENAObyodMQ5YC33DsM8BDHNSlIFDvMNTFaX9MhydGcOuM/RCyIiTcS/mvDAuqQ8uXDBQ== X-Received: by 2002:a05:6000:1a8c:b0:385:ebea:969d with SMTP id ffacd0b85a97d-38a872db37emr10434876f8f.22.1736525059564; Fri, 10 Jan 2025 08:04:19 -0800 (PST) Received: from localhost.localdomain ([2a09:bac5:3807:1cdc::2e0:ae]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38a8e4b81b1sm4953413f8f.66.2025.01.10.08.04.18 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 10 Jan 2025 08:04:18 -0800 (PST) From: Oxana Kharitonova To: akpm@linux-foundation.org, brauner@kernel.org, bsegall@google.com, dietmar.eggemann@arm.com, jack@suse.cz, juri.lelli@redhat.com, mgorman@suse.de, mingo@redhat.com, peterz@infradead.org, rostedt@goodmis.org, vincent.guittot@linaro.org, viro@zeniv.linux.org.uk, vschneid@redhat.com Cc: kernel-team@cloudflare.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, oxana@cloudflare.com Subject: [PATCH resend] hung_task: add task->flags, blocked by coredump to log Date: Fri, 10 Jan 2025 16:03:28 +0000 Message-Id: <20250110160328.64947-1-oxana@cloudflare.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Resending this patch as I haven't received feedback on my initial=20 submission https://lore.kernel.org/all/20241204182953.10854-1-oxana@cloudfl= are.com/ For the processes which are terminated abnormally the kernel can provide=20 a coredump if enabled. When the coredump is performed, the process and=20 all its threads are put into the D state=20 (TASK_UNINTERRUPTIBLE | TASK_FREEZABLE).=20 On the other hand, we have kernel thread khungtaskd which monitors the=20 processes in the D state. If the task stuck in the D state more than=20 kernel.hung_task_timeout_secs, the hung_task alert appears in the kernel=20 log. The higher memory usage of a process, the longer it takes to create=20 coredump, the longer tasks are in the D state. We have hung_task alerts=20 for the processes with memory usage above 10Gb. Although, our=20 kernel.hung_task_timeout_secs is 10 sec when the default is 120 sec. Adding additional information to the log that the task is blocked by=20 coredump will help with monitoring. Another approach might be to=20 completely filter out alerts for such tasks, but in that case we would=20 lose transparency about what is putting pressure on some system=20 resources, e.g. we saw an increase in I/O when coredump occurs due its=20 writing to disk. Additionally, it would be helpful to have task_struct->flags in the log=20 from the function sched_show_task(). Currently it prints=20 task_struct->thread_info->flags, this seems misleading as the line=20 starts with "task:xxxx". Signed-off-by: Oxana Kharitonova --- kernel/hung_task.c | 2 ++ kernel/sched/core.c | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/hung_task.c b/kernel/hung_task.c index c18717189f32..953169893a95 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -147,6 +147,8 @@ static void check_hung_task(struct task_struct *t, unsi= gned long timeout) print_tainted(), init_utsname()->release, (int)strcspn(init_utsname()->version, " "), init_utsname()->version); + if (t->flags & PF_POSTCOREDUMP) + pr_err(" Blocked by coredump.\n"); pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"" " disables this message.\n"); sched_show_task(t); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3e5a6bf587f9..77b6af12e146 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7701,9 +7701,9 @@ void sched_show_task(struct task_struct *p) if (pid_alive(p)) ppid =3D task_pid_nr(rcu_dereference(p->real_parent)); rcu_read_unlock(); - pr_cont(" stack:%-5lu pid:%-5d tgid:%-5d ppid:%-6d flags:0x%08lx\n", + pr_cont(" stack:%-5lu pid:%-5d tgid:%-5d ppid:%-6d task_flags:0x%08lx fla= gs:0x%08lx\n", free, task_pid_nr(p), task_tgid_nr(p), - ppid, read_task_thread_flags(p)); + ppid, p->flags, read_task_thread_flags(p)); =20 print_worker_info(KERN_INFO, p); print_stop_info(KERN_INFO, p); --=20 2.39.5