kernel/hung_task.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
Currently, the hung task reporting mechanism does not differentiate
between the underlying causes of a D state, labelling all such tasks
merely as "blocked". Consequently, administrators must perform manual
stack trace inspection to ascertain if the delay stems from an I/O wait
(indicative of hardware or filesystem issues) or a lock wait (indicative
of software contention).
This change utilises the in_iowait field from struct task_struct to
distinguish between two distinct failure modes in the log output:
1. D state "Disk I/O": The task is waiting in io_schedule().
This typically implies a storage device, filesystem, or
network filesystem (e.g., NFS) is unresponsive.
2. D state "Lock/Resource": The task is waiting on a kernel
primitive (e.g., mutex). This typically implies a software
bug, deadlock, or resource starvation.
It is safe to read in_iowait in this manner because
check_hung_uninterruptible_tasks() holds the RCU read lock, preserving
the task structure. Moreover, the task is effectively quiescent (in a
persistent TASK_UNINTERRUPTIBLE state) and thus cannot update its own
in_iowait status, guaranteeing a stable, race-free value.
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
kernel/hung_task.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 350093de0535..608731c7ccba 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -250,8 +250,9 @@ static void hung_task_info(struct task_struct *t, unsigned long timeout,
if (sysctl_hung_task_warnings || hung_task_call_panic) {
if (sysctl_hung_task_warnings > 0)
sysctl_hung_task_warnings--;
- pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
- t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
+ pr_err("INFO: task %s:%d blocked in %s state for more than %ld seconds.\n",
+ t->comm, t->pid, t->in_iowait ? "D (Disk I/O)" : "D (Lock/Resource)",
+ (jiffies - t->last_switch_time) / HZ);
pr_err(" %s %s %.*s\n",
print_tainted(), init_utsname()->release,
(int)strcspn(init_utsname()->version, " "),
--
2.51.0
On Sun, 25 Jan 2026 15:39:05 -0500
Aaron Tomlin <atomlin@atomlin.com> wrote:
> Currently, the hung task reporting mechanism does not differentiate
> between the underlying causes of a D state, labelling all such tasks
> merely as "blocked". Consequently, administrators must perform manual
> stack trace inspection to ascertain if the delay stems from an I/O wait
> (indicative of hardware or filesystem issues) or a lock wait (indicative
> of software contention).
>
> This change utilises the in_iowait field from struct task_struct to
> distinguish between two distinct failure modes in the log output:
>
> 1. D state "Disk I/O": The task is waiting in io_schedule().
> This typically implies a storage device, filesystem, or
> network filesystem (e.g., NFS) is unresponsive.
>
> 2. D state "Lock/Resource": The task is waiting on a kernel
> primitive (e.g., mutex). This typically implies a software
> bug, deadlock, or resource starvation.
>
> It is safe to read in_iowait in this manner because
> check_hung_uninterruptible_tasks() holds the RCU read lock, preserving
> the task structure. Moreover, the task is effectively quiescent (in a
> persistent TASK_UNINTERRUPTIBLE state) and thus cannot update its own
> in_iowait status, guaranteeing a stable, race-free value.
>
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
> kernel/hung_task.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 350093de0535..608731c7ccba 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -250,8 +250,9 @@ static void hung_task_info(struct task_struct *t, unsigned long timeout,
> if (sysctl_hung_task_warnings || hung_task_call_panic) {
> if (sysctl_hung_task_warnings > 0)
> sysctl_hung_task_warnings--;
> - pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
> - t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
> + pr_err("INFO: task %s:%d blocked in %s state for more than %ld seconds.\n",
> + t->comm, t->pid, t->in_iowait ? "D (Disk I/O)" : "D (Lock/Resource)",
If this is only for human readability, I rather like just adding
"in iowait" at the end. "D" state seems redundant, and "Lock/Resource"
can mislead. What about something like below?
pr_err("INFO: task %s:%d blocked for more than %ld seconds%s.\n",
..., t->in_iowait ? " in I/O wait" : "");
Thank you,
> + (jiffies - t->last_switch_time) / HZ);
> pr_err(" %s %s %.*s\n",
> print_tainted(), init_utsname()->release,
> (int)strcspn(init_utsname()->version, " "),
> --
> 2.51.0
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
On Wed, Jan 28, 2026 at 05:17:49PM +0900, Masami Hiramatsu wrote:
> If this is only for human readability, I rather like just adding
> "in iowait" at the end. "D" state seems redundant, and "Lock/Resource"
> can mislead. What about something like below?
>
> pr_err("INFO: task %s:%d blocked for more than %ld seconds%s.\n",
> ..., t->in_iowait ? " in I/O wait" : "");
>
Hi Masami,
Thank you for your feedback.
I concur. I'll incorporate the suggestion.
Kind regards,
--
Aaron Tomlin
On 2026/1/28 16:17, Masami Hiramatsu (Google) wrote:
> On Sun, 25 Jan 2026 15:39:05 -0500
> Aaron Tomlin <atomlin@atomlin.com> wrote:
>
>> Currently, the hung task reporting mechanism does not differentiate
>> between the underlying causes of a D state, labelling all such tasks
>> merely as "blocked". Consequently, administrators must perform manual
>> stack trace inspection to ascertain if the delay stems from an I/O wait
>> (indicative of hardware or filesystem issues) or a lock wait (indicative
>> of software contention).
>>
>> This change utilises the in_iowait field from struct task_struct to
>> distinguish between two distinct failure modes in the log output:
>>
>> 1. D state "Disk I/O": The task is waiting in io_schedule().
>> This typically implies a storage device, filesystem, or
>> network filesystem (e.g., NFS) is unresponsive.
>>
>> 2. D state "Lock/Resource": The task is waiting on a kernel
>> primitive (e.g., mutex). This typically implies a software
>> bug, deadlock, or resource starvation.
>>
>> It is safe to read in_iowait in this manner because
>> check_hung_uninterruptible_tasks() holds the RCU read lock, preserving
>> the task structure. Moreover, the task is effectively quiescent (in a
>> persistent TASK_UNINTERRUPTIBLE state) and thus cannot update its own
>> in_iowait status, guaranteeing a stable, race-free value.
>>
>> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
>> ---
>> kernel/hung_task.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>> index 350093de0535..608731c7ccba 100644
>> --- a/kernel/hung_task.c
>> +++ b/kernel/hung_task.c
>> @@ -250,8 +250,9 @@ static void hung_task_info(struct task_struct *t, unsigned long timeout,
>> if (sysctl_hung_task_warnings || hung_task_call_panic) {
>> if (sysctl_hung_task_warnings > 0)
>> sysctl_hung_task_warnings--;
>> - pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
>> - t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
>> + pr_err("INFO: task %s:%d blocked in %s state for more than %ld seconds.\n",
>> + t->comm, t->pid, t->in_iowait ? "D (Disk I/O)" : "D (Lock/Resource)",
>
> If this is only for human readability, I rather like just adding
> "in iowait" at the end. "D" state seems redundant, and "Lock/Resource"
> can mislead. What about something like below?
>
> pr_err("INFO: task %s:%d blocked for more than %ld seconds%s.\n",
> ..., t->in_iowait ? " in I/O wait" : "");
That would be better, looks good to me ;)
On 2026/1/26 04:39, Aaron Tomlin wrote:
> Currently, the hung task reporting mechanism does not differentiate
> between the underlying causes of a D state, labelling all such tasks
> merely as "blocked". Consequently, administrators must perform manual
> stack trace inspection to ascertain if the delay stems from an I/O wait
> (indicative of hardware or filesystem issues) or a lock wait (indicative
> of software contention).
>
> This change utilises the in_iowait field from struct task_struct to
> distinguish between two distinct failure modes in the log output:
>
> 1. D state "Disk I/O": The task is waiting in io_schedule().
> This typically implies a storage device, filesystem, or
> network filesystem (e.g., NFS) is unresponsive.
>
> 2. D state "Lock/Resource": The task is waiting on a kernel
> primitive (e.g., mutex). This typically implies a software
> bug, deadlock, or resource starvation.
>
> It is safe to read in_iowait in this manner because
> check_hung_uninterruptible_tasks() holds the RCU read lock, preserving
> the task structure. Moreover, the task is effectively quiescent (in a
> persistent TASK_UNINTERRUPTIBLE state) and thus cannot update its own
> in_iowait status, guaranteeing a stable, race-free value.
>
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
> kernel/hung_task.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 350093de0535..608731c7ccba 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -250,8 +250,9 @@ static void hung_task_info(struct task_struct *t, unsigned long timeout,
> if (sysctl_hung_task_warnings || hung_task_call_panic) {
> if (sysctl_hung_task_warnings > 0)
> sysctl_hung_task_warnings--;
> - pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
> - t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
> + pr_err("INFO: task %s:%d blocked in %s state for more than %ld seconds.\n",
> + t->comm, t->pid, t->in_iowait ? "D (Disk I/O)" : "D (Lock/Resource)",
> + (jiffies - t->last_switch_time) / HZ);
> pr_err(" %s %s %.*s\n",
> print_tainted(), init_utsname()->release,
> (int)strcspn(init_utsname()->version, " "),
Why do we need this?
It's rather obvious that the stack trace already shows whether it
is in "D (Disk I/O)" or "D (Lock/Resource)" or "D (...)".
On Mon, Jan 26, 2026 at 10:30:22AM +0800, Lance Yang wrote: > Why do we need this? > > It's rather obvious that the stack trace already shows whether it > is in "D (Disk I/O)" or "D (Lock/Resource)" or "D (...)". Hi Lance, Thank you for your review. While I agree that a seasoned kernel developer can often deduce the root cause by inspecting the stack trace, this level of analysis is not always immediately accessible to some system administrators or first-line support engineers. The primary benefit of this patch is to provide high-level clarity at a glance. By explicitly distinguishing between "Disk I/O" and "Lock/Resource" contention in the initial log message, we allow some administrators or support engineers to rapidly route the incident to the appropriate team (e.g., storage/network vs. kernel etc.) without needing to parse or understand the nuances of kernel stack traces. Kind regards, -- Aaron Tomlin
© 2016 - 2026 Red Hat, Inc.