From nobody Sat Feb 7 17:13:22 2026 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA17A2FE33 for ; Sun, 27 Oct 2024 12:08:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730030903; cv=none; b=nRAH72cjdTUkfMLdiz47uvg1ycqfblvq/+Da5Nz14ue6qs1bjtMlobTVKYBzySYH+KlA2jX7CSDyW54tUxCJLpsbdgMe6M2SU6rSOZsyz/v+HKAsy8kX//aMRHpOHR3E9aMaA+xVaWyBVJcW/sgzrtlmTAKJ2SshhkSyBmQdu9E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730030903; c=relaxed/simple; bh=52tOPjSDZZ5PaUlFM91mvluoXT+WpaA7tF2vOrmXbuI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=j1Cu9H7/4VQDmrFfHAaGLqpYBoHJNT7hVEKNpWjIekErwfw1qEtAYijIV6+VjJrPWuq+794zH+fNOf3JhjcWer9X+wCvOlGq0TQ41pJaVzfHzL1B/4slOXqauD8Rc6QsIhMpmQ5N4Y+X1wwo94Qo4dSBHzKtOs7WKnYGUc9BRQE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HXeuQDD4; arc=none smtp.client-ip=209.85.214.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HXeuQDD4" Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-20e576dbc42so34500505ad.0 for ; Sun, 27 Oct 2024 05:08:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730030901; x=1730635701; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=C0YBTty29fCDfURPSOlaCYcuKhU2mEB3YdomHT84Hu0=; b=HXeuQDD44wjKqAEXqoAiiw+ShqL4G3mpBXsM5VAC2mFq6TXP/XxCYH3AA4PlD1iJHq Bz+fPVI0h2IUdFcWW8qaHnvbYRPVmojX9lZlbldQGS1AtKxP8wYtkLv3v78FQsem6Jmf 5m37jjWZdRGhOUqKShDDflGZNmunio+Jd/HUreFm0EVMh/RCEldSRcJRuCWk7P2xTL7m seyUcnRg2fWh67pglaY/hr7DrZpNPK2HFa0k5s1MFZ5+f76zPVAA8HnOcZTgKSkI7Dib x2qjFAA1K2zCTEP5wyBVg9+5XvZqAnKOHZvtd8DzCn7+o3AlXQwm8iSOoycWQJh8UgUM SxKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730030901; x=1730635701; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=C0YBTty29fCDfURPSOlaCYcuKhU2mEB3YdomHT84Hu0=; b=JXlFoxj1JN1R/4oMGSJ77xTowbiNZ9ocP5GjOErCEj74/lRkp/68/2RZe0PFlmCfpS yC8gkg9a0HHFsEkccahcZ3qK8YYKOlpGZvVnwc7VhqOUdOSZbe8bfXW11i0FGvG/WPXw niHcogQtI4wTEwLlmxG1Xvwc55f2vXY/mV/I2TOT7I9mKMypNhgD8Mctdr3IDRIfyP8g oBlWBN7S2di7A+UfjHd3BJZ3Yd59GWBu6HIVHB55eGADN9tXntaRtHh3A5ecwpAV2Hyc lYfLU2rn9YE8S5577JcB6NU7UuU3u4LyozRSrGvEAF59qAH0qq2e3IlkX7l1zY/C0ZUP qgRQ== X-Forwarded-Encrypted: i=1; AJvYcCXU9HLbDzS3BGSaAmzYAmI0CXIh3YbYTkLaQP/LhzewiY6sv6B7eFcV2YFo0MX7OJVSa1DrAEeiZXpk6AM=@vger.kernel.org X-Gm-Message-State: AOJu0YxIWZHsL8BJljQbaIyYiPLs18KAt1MLoHi7O7EhormRnmSJFwD9 5VBJXdh57GKbl1H90+x+v2JLkA8eMVFckXjqYEwxU3nwlvjQd1ne X-Google-Smtp-Source: AGHT+IEC7SL1yaU6CsPvivYj99USvnYvIa+jRLOO6wsH1fIK/VVaqfW9WsUEB/88wKPnAtzTyVWulA== X-Received: by 2002:a17:903:8ce:b0:20c:c631:d81f with SMTP id d9443c01a7336-210c6c28284mr59132545ad.21.1730030901053; Sun, 27 Oct 2024 05:08:21 -0700 (PDT) Received: from localhost.localdomain ([124.156.216.125]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-210bbf43476sm34897435ad.24.2024.10.27.05.08.15 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 27 Oct 2024 05:08:20 -0700 (PDT) From: Lance Yang To: akpm@linux-foundation.org Cc: dj456119@gmail.com, cunhuang@tencent.com, leonylgao@tencent.com, j.granados@samsung.com, jsiddle@redhat.com, kent.overstreet@linux.dev, 21cnbao@gmail.com, ryan.roberts@arm.com, david@redhat.com, ziy@nvidia.com, libang.li@antgroup.com, baolin.wang@linux.alibaba.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, joel.granados@kernel.org, linux@weissschuh.net, Lance Yang , Mingzhe Yang Subject: [PATCH v2 1/2] hung_task: add detect count for hung tasks Date: Sun, 27 Oct 2024 20:07:46 +0800 Message-ID: <20241027120747.42833-2-ioworker0@gmail.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241027120747.42833-1-ioworker0@gmail.com> References: <20241027120747.42833-1-ioworker0@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This commit adds a counter, hung_task_detect_count, to track the number of times hung tasks are detected. IHMO, hung tasks are a critical metric. Currently, we detect them by periodically parsing dmesg. However, this method isn't as user-friendly as using a counter. Sometimes, a short-lived issue with NIC or hard drive can quickly decrease the hung_task_warnings to zero. Without warnings, we must directly access the node to ensure that there are no more hung tasks and that the system has recovered. After all, load average alone cannot provide a clear picture. Once this counter is in place, in a high-density deployment pattern, we plan to set hung_task_timeout_secs to a lower number to improve stability, even though this might result in false positives. And then we can set a time-based threshold: if hung tasks last beyond this duration, we will automatically migrate containers to other nodes. Based on past experience, this approach could help avoid many production disruptions. Moreover, just like other important events such as OOM that already have counters, having a dedicated counter for hung tasks makes sense. Signed-off-by: Mingzhe Yang Signed-off-by: Lance Yang --- kernel/hung_task.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/kernel/hung_task.c b/kernel/hung_task.c index 959d99583d1c..229ff3d4e501 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -30,6 +30,11 @@ */ static int __read_mostly sysctl_hung_task_check_count =3D PID_MAX_LIMIT; =20 +/* + * Total number of tasks detected as hung since boot: + */ +static unsigned long __read_mostly sysctl_hung_task_detect_count; + /* * Limit number of tasks checked in a batch. * @@ -115,6 +120,12 @@ static void check_hung_task(struct task_struct *t, uns= igned long timeout) if (time_is_after_jiffies(t->last_switch_time + timeout * HZ)) return; =20 + /* + * This counter tracks the total number of tasks detected as hung + * since boot. + */ + sysctl_hung_task_detect_count++; + trace_sched_process_hang(t); =20 if (sysctl_hung_task_panic) { @@ -314,6 +325,13 @@ static struct ctl_table hung_task_sysctls[] =3D { .proc_handler =3D proc_dointvec_minmax, .extra1 =3D SYSCTL_NEG_ONE, }, + { + .procname =3D "hung_task_detect_count", + .data =3D &sysctl_hung_task_detect_count, + .maxlen =3D sizeof(unsigned long), + .mode =3D 0444, + .proc_handler =3D proc_dointvec, + }, }; =20 static void __init hung_task_sysctl_init(void) --=20 2.45.2 From nobody Sat Feb 7 17:13:22 2026 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7C732FE33 for ; Sun, 27 Oct 2024 12:08:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730030910; cv=none; b=mNL1Ntv2vO3MRg+DdMvrP0LssR0rVHI2b/K76ldP0b584cx/E/xLKTnbknu1Sn+gZIDyXWUaXzjW/3tHdR+3eRTZgBJEkfYakDHLnJVjzXZiBlAekDvW4jDdMeBc3DpVDhHizHCq2IfvDmDcaEMbzfwkqJajsXnXedZ1HGIUxBk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730030910; c=relaxed/simple; bh=D/ynVHYmqi2Ckh7aFWJwqCLmolJOnpYqtWvCpDp09O8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=neLpAjPyGUnDBYla7kl6DUEbQ0WUKx9vqGhmlX2UFdZ5hFKd+VMy09EXwTxYmPPkqCS/zTwKdylYLJFWZECCkAjiMIufITMARolNvkhCPPdZX6bnirRIf1JzsafetbooB0Y/U5A/E8oN+EDSNGLQbvYgD2N65EDVEzAlSDd0jOQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kJyCwH1a; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kJyCwH1a" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-20c803787abso26246675ad.0 for ; Sun, 27 Oct 2024 05:08:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730030907; x=1730635707; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BAlFXpVPV8uFWMmnJmNJ4B3bgnH5ueF+92q0u74/c1Q=; b=kJyCwH1aAHr+w0ApbQwn8jP59mwtutvZjpFn086qbxQwDujHWy78wdDBtOiu63qmQc XupcsMt5MaUmPpozQOYsNrDXqqnDx7e5ja8ncoRHpqR0D9BvWGgCpeiEZ1/jw+/SAo3C 1geQ7X2XrOH6AIqO1qRAMqSijgsqcguFp70AwYfjN03jxHDexPEoNmjYJ0k4ym2O8665 3DoUSnl37xDBwB3Rt7jGrtTa6vNK6UJVd7XdnDPFYerx0BmVMkpNpZhd5TSreuj0fAiG UjDxkvTs0mqxHVPVMCZxEapncdmtLNncnfNPibPF6u1nUwFL14N3mhhTKFLdC0D4xJ8I 4CmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730030907; x=1730635707; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BAlFXpVPV8uFWMmnJmNJ4B3bgnH5ueF+92q0u74/c1Q=; b=VSbyntgENS87KJ/p5LC75/YJOuM+yt2xpO1C8b08Za3OM3i9i0v/HeF9kJVWGPDG3H t9RGWIpCQVWqK0CeYhKYypLtEhNJdvLA0yrD3EMnuKV3FSLe3doUe+lo6fckUUPq/EGE wrGPEEHl2ds2ITwKVVG2TW3PQCjqOWB9SeE4PHijdr4/3FGAwg5lU0D9nKk2A6Bx2e93 6T6AVz5ABnNyKQLqFpbthIKm6zcD7cfabcuDdWVDNCjaDGquYWmXZLqC7UHcYwe+AZii 3XUXIJdZN8Bdsq+jGTIHKkSTe0gBKNXKgbmC15tT6i0Nhob7wrJRReIGcOJiiz5mHCsd XxUQ== X-Forwarded-Encrypted: i=1; AJvYcCW8/b9SHn3rsJjRvArxFD+OEAQ5eI8Gf4kEyPu3+IjkasqcnMQqZIubsejjD/aIxUCN+eD7PNxPoR2xNbs=@vger.kernel.org X-Gm-Message-State: AOJu0YzeiuMnsWGbP0h7DUCjEAiPQLL1hv9t1RRkZDDCcNMCAWrZrQj1 hDS3CB87MhUSLExgHhvRSc6Ex1p2mPDc2eq5zaOvYR1tv+b2Dq8D X-Google-Smtp-Source: AGHT+IHKTzcPBHYCFSnQ1nhY97KwfygCayjkBvcjT39mMpBhj6B1ICPcfkvwjOJwvhjJetE2jY9bZg== X-Received: by 2002:a17:902:d2c8:b0:20a:fd4e:fef6 with SMTP id d9443c01a7336-20fb88d5d7fmr161143025ad.8.1730030906970; Sun, 27 Oct 2024 05:08:26 -0700 (PDT) Received: from localhost.localdomain ([124.156.216.125]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-210bbf43476sm34897435ad.24.2024.10.27.05.08.21 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 27 Oct 2024 05:08:26 -0700 (PDT) From: Lance Yang To: akpm@linux-foundation.org Cc: dj456119@gmail.com, cunhuang@tencent.com, leonylgao@tencent.com, j.granados@samsung.com, jsiddle@redhat.com, kent.overstreet@linux.dev, 21cnbao@gmail.com, ryan.roberts@arm.com, david@redhat.com, ziy@nvidia.com, libang.li@antgroup.com, baolin.wang@linux.alibaba.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, joel.granados@kernel.org, linux@weissschuh.net, Lance Yang , Mingzhe Yang Subject: [PATCH v2 2/2] hung_task: add docs for hung_task_detect_count Date: Sun, 27 Oct 2024 20:07:47 +0800 Message-ID: <20241027120747.42833-3-ioworker0@gmail.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241027120747.42833-1-ioworker0@gmail.com> References: <20241027120747.42833-1-ioworker0@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This commit introduces documentation for hung_task_detect_count in kernel.rst. Signed-off-by: Mingzhe Yang Signed-off-by: Lance Yang --- Documentation/admin-guide/sysctl/kernel.rst | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/ad= min-guide/sysctl/kernel.rst index f8bc1630eba0..b2b36d0c3094 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -401,6 +401,15 @@ The upper bound on the number of tasks that are checke= d. This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. =20 =20 +hung_task_detect_count +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Indicates the total number of tasks that have been detected as hung since +the system boot. + +This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. + + hung_task_timeout_secs =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 --=20 2.45.2