Introduce a write-only sysfs attribute,
/sys/kernel/hung_task_detect_count_reset, to reset the total count of
detected hung tasks at runtime.
The attribute requires writing the value "1" to trigger the reset and returns
-EINVAL for any other input, ensuring robustness.
This addition is primarily justified by the need for enhanced
administrative control and improved diagnostics workflow in a running
production environment. It addresses a key limitation of the existing
mechanism: the inability to clear persistent state without resorting to a
full system reboot. The sysfs interface provides a non-disruptive, runtime
method to manage the diagnostic state.
After a system administrator investigates a potential hang and corrects the issue,
it is now possible to clear the history back to zero. This provides a clean
slate for subsequent monitoring, ensuring that any new recurrence of a hung
task is immediately reflected by a counter value greater than zero,
streamlining the post-mortem diagnostic phase.
Aaron Tomlin (2):
hung_task: Consolidate hung task warning into an atomic log block
hung_task: Provide runtime reset interface for hung task detector
.../sysfs-kernel-hung_task_detect_count_reset | 8 +++
kernel/hung_task.c | 68 ++++++++++++++++---
2 files changed, 66 insertions(+), 10 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-kernel-hung_task_detect_count_reset
--
2.51.0