From nobody Sun Feb 8 09:10:38 2026 Received: from baidu.com (mx24.baidu.com [111.206.215.185]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EAC5C1F3BA4; Sat, 11 Oct 2025 12:04:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=111.206.215.185 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760184290; cv=none; b=kbZ66TnfhF+WaPgArNuQGxz59fwnuOyncMMXLS5mPxbh7iSv9atcw9c0VanaP0ldvT1tryJwWK2N78reVbqa7AAomXrUjLc9YhOeMnzDBj5WldqycwJsv4vcvfhU+/nYHSGSF5P7Bueb2G+Xh5tY/VZg9DNvUfGDltUa7d9AWfo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760184290; c=relaxed/simple; bh=KxskEjVwkXKRdFE3D2R/m92ezMP7L/SFnTaFKz8we0M=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=sk/5XHoe0jfFYn6Oolx90uMlli3lkOO6FNCe1fmjpfhAN4QsFYJ3HTOk5QaPuZ165lzyUUeW+bEfOD6HmVYHSg+PRcLCaCPAmlJXphajR5hhaH4FTWDI9UFSHyFz7uWS5vYLN5rSStZb5j3T3Fp075gEx7KIW6WF6i09tvHeEhg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=baidu.com; spf=pass smtp.mailfrom=baidu.com; arc=none smtp.client-ip=111.206.215.185 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=baidu.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=baidu.com From: "Li,Rongqing" To: Masami Hiramatsu CC: "corbet@lwn.net" , "akpm@linux-foundation.org" , "lance.yang@linux.dev" , "paulmck@kernel.org" , "pawan.kumar.gupta@linux.intel.com" , "mingo@kernel.org" , "dave.hansen@linux.intel.com" , "rostedt@goodmis.org" , "kees@kernel.org" , "arnd@arndb.de" , "feng.tang@linux.alibaba.com" , "pauld@redhat.com" , "joel.granados@kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: RE: [????] Re: [PATCH][v2] hung_task: Panic after fixed number of hung tasks Thread-Topic: [????] Re: [PATCH][v2] hung_task: Panic after fixed number of hung tasks Thread-Index: AQHcMDksdcRioXB8tEOzfxW7SI8WgLSozsCAgBQcxiA= Date: Sat, 11 Oct 2025 12:03:23 +0000 Message-ID: <2f26d112e8834d378dd00f20cc384f39@baidu.com> References: <20250928053137.3412-1-lirongqing@baidu.com> <20250929094739.e2d49113f52a315a900a2cd7@kernel.org> In-Reply-To: <20250929094739.e2d49113f52a315a900a2cd7@kernel.org> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FEAS-Client-IP: 172.31.50.46 X-FE-Policy-ID: 52:10:53:SYSTEM Content-Type: text/plain; charset="utf-8" > -----Original Message----- > From: Masami Hiramatsu > Sent: 2025=E5=B9=B49=E6=9C=8829=E6=97=A5 8:48 > To: Li,Rongqing > Cc: corbet@lwn.net; akpm@linux-foundation.org; lance.yang@linux.dev; > paulmck@kernel.org; pawan.kumar.gupta@linux.intel.com; mingo@kernel.org; > dave.hansen@linux.intel.com; rostedt@goodmis.org; kees@kernel.org; > arnd@arndb.de; feng.tang@linux.alibaba.com; pauld@redhat.com; > joel.granados@kernel.org; linux-doc@vger.kernel.org; > linux-kernel@vger.kernel.org > Subject: [????] Re: [PATCH][v2] hung_task: Panic after fixed number of hu= ng > tasks >=20 > On Sun, 28 Sep 2025 13:31:37 +0800 > lirongqing wrote: >=20 > > From: Li RongQing > > > > Currently, when hung_task_panic is enabled, kernel will panic > > immediately upon detecting the first hung task. However, some hung > > tasks are transient and the system can recover fully, while others are > > unrecoverable and trigger consecutive hung task reports, and a panic is > expected. > > > > This commit adds a new sysctl parameter hung_task_count_to_panic to > > allows specifying the number of consecutive hung tasks that must be > > detected before triggering a kernel panic. This provides finer control > > for environments where transient hangs maybe happen but persistent > > hangs should still be fatal. >=20 > IIUC, perhaps there are multiple groups that require different timeouts f= or > hang checks, and you want to set the hung task timeout to match the short= er > one, but ignore the longer ones at that point. >=20 > If so, this is essentially a problem with a long process that is performe= d under > TASK_UNINTERRUPTIBLE. Ideally, the progress of such process should be > checked periodically and the hang check should be reset unless it is real > blocked. > But this is not currently implemented. (For example, depending on the med= ia, > it may not be possible to check whether long IO is being > performed.) >=20 > The hung_tasks will even simulate these types of hangs as task hang-ups. = But if > you set a long detection time accordingly, you will also have to wait unt= il that > detection time for hangs that occur in a short period of time. >=20 > The hung tasks on one major lock can spread in a domino effect. > So setting a reasonably short detection time, but not panicking until the= re are > enough of them, seems like a reasonable strategy. > But in this case, I think we also need a "hard timeout limit" > of hung tasks, which will detect longer ones. And also you should use peak > value not accumulation value. >=20 > If it is really transient (thus, it is not hung), accumulation of such no= rmal but > just slow operation will still kick hung_tasks. >=20 Is it reasonable to detect the existence of a hung task continuously for a = certain number of times to trigger panic? Like diff --git a/kernel/hung_task.c b/kernel/hung_task.c index d17cd3f..045bef5 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -304,6 +304,8 @@ static void check_hung_uninterruptible_tasks(unsigned l= ong timeout) int max_count =3D sysctl_hung_task_check_count; unsigned long last_break =3D jiffies; struct task_struct *g, *t; + unsigned long pre_detect_count =3D sysctl_hung_task_detect_count; + static unsigned long contiguous_detect_count; /* * If the system crashed already then all bets are off, @@ -326,6 +328,15 @@ static void check_hung_uninterruptible_tasks(unsigned = long timeout) check_hung_task(t, timeout); } + + if (sysctl_hung_task_detect_count !=3D pre_detect_count) { + contiguous_detect_count++; + if (sysctl_max_hung_task_to_panic && + contiguous_detect_count > sysctl_max_hung_t= ask_to_panic) + hung_task_call_panic =3D 1; + } + else + contiguous_detect_count =3D 0; unlock: rcu_read_unlock(); if (hung_task_show_lock) -Li > Thank you, >=20 > > > > Acked-by: Lance Yang > > Signed-off-by: Li RongQing > > --- > > Diff with v1: change documentation as Lance suggested > > > > Documentation/admin-guide/sysctl/kernel.rst | 8 ++++++++ > > kernel/hung_task.c | 14 +++++++++++++- > > 2 files changed, 21 insertions(+), 1 deletion(-) > > > > diff --git a/Documentation/admin-guide/sysctl/kernel.rst > > b/Documentation/admin-guide/sysctl/kernel.rst > > index 8b49eab..98b47a7 100644 > > --- a/Documentation/admin-guide/sysctl/kernel.rst > > +++ b/Documentation/admin-guide/sysctl/kernel.rst > > @@ -405,6 +405,14 @@ This file shows up if > ``CONFIG_DETECT_HUNG_TASK`` is enabled. > > 1 Panic immediately. > > =3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D > > > > +hung_task_count_to_panic > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > + > > +When set to a non-zero value, a kernel panic will be triggered if the > > +number of detected hung tasks reaches this value. > > + > > +Note that setting hung_task_panic=3D1 will still cause an immediate > > +panic on the first hung task. >=20 > What happen if it is 0? >=20 > > > > hung_task_check_count > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c index > > 8708a12..87a6421 100644 > > --- a/kernel/hung_task.c > > +++ b/kernel/hung_task.c > > @@ -83,6 +83,8 @@ static unsigned int __read_mostly > > sysctl_hung_task_all_cpu_backtrace; > > static unsigned int __read_mostly sysctl_hung_task_panic =3D > > IS_ENABLED(CONFIG_BOOTPARAM_HUNG_TASK_PANIC); > > > > +static unsigned int __read_mostly sysctl_hung_task_count_to_panic; > > + > > static int > > hung_task_panic(struct notifier_block *this, unsigned long event, > > void *ptr) { @@ -219,7 +221,9 @@ static void check_hung_task(struct > > task_struct *t, unsigned long timeout) > > > > trace_sched_process_hang(t); > > > > - if (sysctl_hung_task_panic) { > > + if (sysctl_hung_task_panic || > > + (sysctl_hung_task_count_to_panic && > > + (sysctl_hung_task_detect_count >=3D > > +sysctl_hung_task_count_to_panic))) { > > console_verbose(); > > hung_task_show_lock =3D true; > > hung_task_call_panic =3D true; > > @@ -388,6 +392,14 @@ static const struct ctl_table hung_task_sysctls[] = =3D { > > .extra2 =3D SYSCTL_ONE, > > }, > > { > > + .procname =3D "hung_task_count_to_panic", > > + .data =3D &sysctl_hung_task_count_to_panic, > > + .maxlen =3D sizeof(int), > > + .mode =3D 0644, > > + .proc_handler =3D proc_dointvec_minmax, > > + .extra1 =3D SYSCTL_ZERO, > > + }, > > + { > > .procname =3D "hung_task_check_count", > > .data =3D &sysctl_hung_task_check_count, > > .maxlen =3D sizeof(int), > > -- > > 2.9.4 > > >=20 >=20 > -- > Masami Hiramatsu (Google)