hung_task: Differentiate between I/O and Lock/Resource waits

[PATCH] hung_task: Differentiate between I/O and Lock/Resource waits

Posted by Aaron Tomlin 1 week, 5 days ago

Currently, the hung task reporting mechanism does not differentiate
between the underlying causes of a D state, labelling all such tasks
merely as "blocked". Consequently, administrators must perform manual
stack trace inspection to ascertain if the delay stems from an I/O wait
(indicative of hardware or filesystem issues) or a lock wait (indicative
of software contention).

This change utilises the in_iowait field from struct task_struct to
distinguish between two distinct failure modes in the log output:

        1. D state "Disk I/O": The task is waiting in io_schedule().
           This typically implies a storage device, filesystem, or
           network filesystem (e.g., NFS) is unresponsive.

        2. D state "Lock/Resource": The task is waiting on a kernel
           primitive (e.g., mutex). This typically implies a software
           bug, deadlock, or resource starvation.

It is safe to read in_iowait in this manner because
check_hung_uninterruptible_tasks() holds the RCU read lock, preserving
the task structure. Moreover, the task is effectively quiescent (in a
persistent TASK_UNINTERRUPTIBLE state) and thus cannot update its own
in_iowait status, guaranteeing a stable, race-free value.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
 kernel/hung_task.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 350093de0535..608731c7ccba 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -250,8 +250,9 @@ static void hung_task_info(struct task_struct *t, unsigned long timeout,
 	if (sysctl_hung_task_warnings || hung_task_call_panic) {
 		if (sysctl_hung_task_warnings > 0)
 			sysctl_hung_task_warnings--;
-		pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
-		       t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
+		pr_err("INFO: task %s:%d blocked in %s state for more than %ld seconds.\n",
+		       t->comm, t->pid, t->in_iowait ? "D (Disk I/O)" : "D (Lock/Resource)",
+		       (jiffies - t->last_switch_time) / HZ);
 		pr_err("      %s %s %.*s\n",
 			print_tainted(), init_utsname()->release,
 			(int)strcspn(init_utsname()->version, " "),
-- 
2.51.0

Re: [PATCH] hung_task: Differentiate between I/O and Lock/Resource waits

Posted by Masami Hiramatsu (Google) 1 week, 3 days ago

On Sun, 25 Jan 2026 15:39:05 -0500
Aaron Tomlin <atomlin@atomlin.com> wrote:

> Currently, the hung task reporting mechanism does not differentiate
> between the underlying causes of a D state, labelling all such tasks
> merely as "blocked". Consequently, administrators must perform manual
> stack trace inspection to ascertain if the delay stems from an I/O wait
> (indicative of hardware or filesystem issues) or a lock wait (indicative
> of software contention).
> 
> This change utilises the in_iowait field from struct task_struct to
> distinguish between two distinct failure modes in the log output:
> 
>         1. D state "Disk I/O": The task is waiting in io_schedule().
>            This typically implies a storage device, filesystem, or
>            network filesystem (e.g., NFS) is unresponsive.
> 
>         2. D state "Lock/Resource": The task is waiting on a kernel
>            primitive (e.g., mutex). This typically implies a software
>            bug, deadlock, or resource starvation.
> 
> It is safe to read in_iowait in this manner because
> check_hung_uninterruptible_tasks() holds the RCU read lock, preserving
> the task structure. Moreover, the task is effectively quiescent (in a
> persistent TASK_UNINTERRUPTIBLE state) and thus cannot update its own
> in_iowait status, guaranteeing a stable, race-free value.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
>  kernel/hung_task.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 350093de0535..608731c7ccba 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -250,8 +250,9 @@ static void hung_task_info(struct task_struct *t, unsigned long timeout,
>  	if (sysctl_hung_task_warnings || hung_task_call_panic) {
>  		if (sysctl_hung_task_warnings > 0)
>  			sysctl_hung_task_warnings--;
> -		pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
> -		       t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
> +		pr_err("INFO: task %s:%d blocked in %s state for more than %ld seconds.\n",
> +		       t->comm, t->pid, t->in_iowait ? "D (Disk I/O)" : "D (Lock/Resource)",

If this is only for human readability, I rather like just adding
"in iowait" at the end. "D" state seems redundant, and "Lock/Resource"
can mislead. What about something like below?

	pr_err("INFO: task %s:%d blocked for more than %ld seconds%s.\n",
		..., t->in_iowait ? " in I/O wait" : "");

Thank you,

> +		       (jiffies - t->last_switch_time) / HZ);
>  		pr_err("      %s %s %.*s\n",
>  			print_tainted(), init_utsname()->release,
>  			(int)strcspn(init_utsname()->version, " "),
> -- 
> 2.51.0
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

Re: [PATCH] hung_task: Differentiate between I/O and Lock/Resource waits

Posted by Aaron Tomlin 1 week, 2 days ago

On Wed, Jan 28, 2026 at 05:17:49PM +0900, Masami Hiramatsu wrote:
> If this is only for human readability, I rather like just adding
> "in iowait" at the end. "D" state seems redundant, and "Lock/Resource"
> can mislead. What about something like below?
> 
> 	pr_err("INFO: task %s:%d blocked for more than %ld seconds%s.\n",
> 		..., t->in_iowait ? " in I/O wait" : "");
> 

Hi Masami,

Thank you for your feedback.

I concur. I'll incorporate the suggestion.


Kind regards,
-- 
Aaron Tomlin

Re: [PATCH] hung_task: Differentiate between I/O and Lock/Resource waits

Posted by Lance Yang 1 week, 3 days ago


On 2026/1/28 16:17, Masami Hiramatsu (Google) wrote:
> On Sun, 25 Jan 2026 15:39:05 -0500
> Aaron Tomlin <atomlin@atomlin.com> wrote:
> 
>> Currently, the hung task reporting mechanism does not differentiate
>> between the underlying causes of a D state, labelling all such tasks
>> merely as "blocked". Consequently, administrators must perform manual
>> stack trace inspection to ascertain if the delay stems from an I/O wait
>> (indicative of hardware or filesystem issues) or a lock wait (indicative
>> of software contention).
>>
>> This change utilises the in_iowait field from struct task_struct to
>> distinguish between two distinct failure modes in the log output:
>>
>>          1. D state "Disk I/O": The task is waiting in io_schedule().
>>             This typically implies a storage device, filesystem, or
>>             network filesystem (e.g., NFS) is unresponsive.
>>
>>          2. D state "Lock/Resource": The task is waiting on a kernel
>>             primitive (e.g., mutex). This typically implies a software
>>             bug, deadlock, or resource starvation.
>>
>> It is safe to read in_iowait in this manner because
>> check_hung_uninterruptible_tasks() holds the RCU read lock, preserving
>> the task structure. Moreover, the task is effectively quiescent (in a
>> persistent TASK_UNINTERRUPTIBLE state) and thus cannot update its own
>> in_iowait status, guaranteeing a stable, race-free value.
>>
>> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
>> ---
>>   kernel/hung_task.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>> index 350093de0535..608731c7ccba 100644
>> --- a/kernel/hung_task.c
>> +++ b/kernel/hung_task.c
>> @@ -250,8 +250,9 @@ static void hung_task_info(struct task_struct *t, unsigned long timeout,
>>   	if (sysctl_hung_task_warnings || hung_task_call_panic) {
>>   		if (sysctl_hung_task_warnings > 0)
>>   			sysctl_hung_task_warnings--;
>> -		pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
>> -		       t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
>> +		pr_err("INFO: task %s:%d blocked in %s state for more than %ld seconds.\n",
>> +		       t->comm, t->pid, t->in_iowait ? "D (Disk I/O)" : "D (Lock/Resource)",
> 
> If this is only for human readability, I rather like just adding
> "in iowait" at the end. "D" state seems redundant, and "Lock/Resource"
> can mislead. What about something like below?
> 
> 	pr_err("INFO: task %s:%d blocked for more than %ld seconds%s.\n",
> 		..., t->in_iowait ? " in I/O wait" : "");

That would be better, looks good to me ;)

Re: [PATCH] hung_task: Differentiate between I/O and Lock/Resource waits

Posted by Lance Yang 1 week, 5 days ago


On 2026/1/26 04:39, Aaron Tomlin wrote:
> Currently, the hung task reporting mechanism does not differentiate
> between the underlying causes of a D state, labelling all such tasks
> merely as "blocked". Consequently, administrators must perform manual
> stack trace inspection to ascertain if the delay stems from an I/O wait
> (indicative of hardware or filesystem issues) or a lock wait (indicative
> of software contention).
> 
> This change utilises the in_iowait field from struct task_struct to
> distinguish between two distinct failure modes in the log output:
> 
>          1. D state "Disk I/O": The task is waiting in io_schedule().
>             This typically implies a storage device, filesystem, or
>             network filesystem (e.g., NFS) is unresponsive.
> 
>          2. D state "Lock/Resource": The task is waiting on a kernel
>             primitive (e.g., mutex). This typically implies a software
>             bug, deadlock, or resource starvation.
> 
> It is safe to read in_iowait in this manner because
> check_hung_uninterruptible_tasks() holds the RCU read lock, preserving
> the task structure. Moreover, the task is effectively quiescent (in a
> persistent TASK_UNINTERRUPTIBLE state) and thus cannot update its own
> in_iowait status, guaranteeing a stable, race-free value.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
>   kernel/hung_task.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 350093de0535..608731c7ccba 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -250,8 +250,9 @@ static void hung_task_info(struct task_struct *t, unsigned long timeout,
>   	if (sysctl_hung_task_warnings || hung_task_call_panic) {
>   		if (sysctl_hung_task_warnings > 0)
>   			sysctl_hung_task_warnings--;
> -		pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
> -		       t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
> +		pr_err("INFO: task %s:%d blocked in %s state for more than %ld seconds.\n",
> +		       t->comm, t->pid, t->in_iowait ? "D (Disk I/O)" : "D (Lock/Resource)",
> +		       (jiffies - t->last_switch_time) / HZ);
>   		pr_err("      %s %s %.*s\n",
>   			print_tainted(), init_utsname()->release,
>   			(int)strcspn(init_utsname()->version, " "),

Why do we need this?

It's rather obvious that the stack trace already shows whether it
is in "D (Disk I/O)" or "D (Lock/Resource)" or "D (...)".

Re: [PATCH] hung_task: Differentiate between I/O and Lock/Resource waits

Posted by Aaron Tomlin 1 week, 5 days ago

On Mon, Jan 26, 2026 at 10:30:22AM +0800, Lance Yang wrote:
> Why do we need this?
> 
> It's rather obvious that the stack trace already shows whether it
> is in "D (Disk I/O)" or "D (Lock/Resource)" or "D (...)".

Hi Lance,

Thank you for your review.

While I agree that a seasoned kernel developer can often deduce the root
cause by inspecting the stack trace, this level of analysis is not always
immediately accessible to some system administrators or first-line support
engineers.

The primary benefit of this patch is to provide high-level clarity at a
glance.

By explicitly distinguishing between "Disk I/O" and "Lock/Resource"
contention in the initial log message, we allow some administrators or
support engineers to rapidly route the incident to the appropriate team
(e.g., storage/network vs. kernel etc.) without needing to parse or
understand the nuances of kernel stack traces.

Kind regards,
-- 
Aaron Tomlin