hw_protection_shutdown() will kick off an orderly shutdown and if that
takes longer than a configurable amount of time, an emergency shutdown
will occur.
Recently, hw_protection_reboot() was added for those systems that don't
implement a proper shutdown and are better served by rebooting and
having the boot firmware worry about doing something about the critical
condition.
On timeout of the orderly reboot of hw_protection_reboot(), the system
would go into shutdown, instead of reboot. This is not a good idea, as
going into shutdown was explicitly not asked for.
Fix this by always doing an emergency reboot if hw_protection_reboot()
is called and the orderly reboot takes too long.
Fixes: 79fa723ba84c ("reboot: Introduce thermal_zone_device_critical_reboot()")
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
kernel/reboot.c | 46 +++++++++++++++++++++++++++++++++++-----------
1 file changed, 35 insertions(+), 11 deletions(-)
diff --git a/kernel/reboot.c b/kernel/reboot.c
index f92aa66cbfec0f57ded43ba352a39c54d0c24a25..8e3680d36654587b57db44806a3d7b0228b10f67 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -932,6 +932,20 @@ void orderly_reboot(void)
}
EXPORT_SYMBOL_GPL(orderly_reboot);
+static const char *hw_protection_action_str(enum hw_protection_action action)
+{
+ switch (action) {
+ case HWPROT_ACT_SHUTDOWN:
+ return "shutdown";
+ case HWPROT_ACT_REBOOT:
+ return "reboot";
+ default:
+ return "undefined";
+ }
+}
+
+static enum hw_protection_action hw_failure_emergency_action;
+
/**
* hw_failure_emergency_poweroff_func - emergency poweroff work after a known delay
* @work: work_struct associated with the emergency poweroff function
@@ -941,21 +955,29 @@ EXPORT_SYMBOL_GPL(orderly_reboot);
*/
static void hw_failure_emergency_poweroff_func(struct work_struct *work)
{
+ const char *action_str = hw_protection_action_str(hw_failure_emergency_action);
+
+ pr_emerg("Hardware protection timed-out. Trying forced %s\n",
+ action_str);
+
/*
- * We have reached here after the emergency shutdown waiting period has
- * expired. This means orderly_poweroff has not been able to shut off
- * the system for some reason.
+ * We have reached here after the emergency action waiting period has
+ * expired. This means orderly_poweroff/reboot has not been able to
+ * shut off the system for some reason.
*
- * Try to shut down the system immediately using kernel_power_off
- * if populated
+ * Try to shut off the system immediately if possible
*/
- pr_emerg("Hardware protection timed-out. Trying forced poweroff\n");
- kernel_power_off();
+
+ if (hw_failure_emergency_action == HWPROT_ACT_REBOOT)
+ kernel_restart(NULL);
+ else
+ kernel_power_off();
/*
* Worst of the worst case trigger emergency restart
*/
- pr_emerg("Hardware protection shutdown failed. Trying emergency restart\n");
+ pr_emerg("Hardware protection %s failed. Trying emergency restart\n",
+ action_str);
emergency_restart();
}
@@ -963,15 +985,17 @@ static DECLARE_DELAYED_WORK(hw_failure_emergency_poweroff_work,
hw_failure_emergency_poweroff_func);
/**
- * hw_failure_emergency_poweroff - Trigger an emergency system poweroff
+ * hw_failure_emergency_schedule - Schedule an emergency system shutdown or reboot
*
* This may be called from any critical situation to trigger a system shutdown
* after a given period of time. If time is negative this is not scheduled.
*/
-static void hw_failure_emergency_poweroff(int poweroff_delay_ms)
+static void hw_failure_emergency_schedule(enum hw_protection_action action,
+ int poweroff_delay_ms)
{
if (poweroff_delay_ms <= 0)
return;
+ hw_failure_emergency_action = action;
schedule_delayed_work(&hw_failure_emergency_poweroff_work,
msecs_to_jiffies(poweroff_delay_ms));
}
@@ -1009,7 +1033,7 @@ void __hw_protection_shutdown(const char *reason, int ms_until_forced,
* Queue a backup emergency shutdown in the event of
* orderly_poweroff failure
*/
- hw_failure_emergency_poweroff(ms_until_forced);
+ hw_failure_emergency_schedule(action, ms_until_forced);
if (action == HWPROT_ACT_REBOOT)
orderly_reboot();
else
--
2.39.5
Hi Ahmad,
kernel test robot noticed the following build warnings:
[auto build test WARNING on 78d4f34e2115b517bcbfe7ec0d018bbbb6f9b0b8]
url: https://github.com/intel-lab-lkp/linux/commits/Ahmad-Fatoum/reboot-replace-__hw_protection_shutdown-bool-action-parameter-with-an-enum/20241219-155416
base: 78d4f34e2115b517bcbfe7ec0d018bbbb6f9b0b8
patch link: https://lore.kernel.org/r/20241219-hw_protection-reboot-v1-2-263a0c1df802%40pengutronix.de
patch subject: [PATCH 02/11] reboot: reboot, not shutdown, on hw_protection_reboot timeout
config: i386-buildonly-randconfig-003-20241220 (https://download.01.org/0day-ci/archive/20241220/202412201310.JWkUQ9qf-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241220/202412201310.JWkUQ9qf-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202412201310.JWkUQ9qf-lkp@intel.com/
All warnings (new ones prefixed by >>):
kernel/reboot.c:241: warning: Function parameter or struct member 'cmd' not described in 'do_kernel_restart'
>> kernel/reboot.c:995: warning: Function parameter or struct member 'action' not described in 'hw_failure_emergency_schedule'
>> kernel/reboot.c:995: warning: Function parameter or struct member 'poweroff_delay_ms' not described in 'hw_failure_emergency_schedule'
kernel/reboot.c:1023: warning: Function parameter or struct member 'action' not described in '__hw_protection_shutdown'
kernel/reboot.c:1023: warning: Excess function parameter 'shutdown' description in '__hw_protection_shutdown'
vim +995 kernel/reboot.c
dfa19b11385d4c Matti Vaittinen 2021-06-03 983
dfa19b11385d4c Matti Vaittinen 2021-06-03 984 static DECLARE_DELAYED_WORK(hw_failure_emergency_poweroff_work,
dfa19b11385d4c Matti Vaittinen 2021-06-03 985 hw_failure_emergency_poweroff_func);
dfa19b11385d4c Matti Vaittinen 2021-06-03 986
dfa19b11385d4c Matti Vaittinen 2021-06-03 987 /**
595ab92650cc28 Ahmad Fatoum 2024-12-19 988 * hw_failure_emergency_schedule - Schedule an emergency system shutdown or reboot
dfa19b11385d4c Matti Vaittinen 2021-06-03 989 *
dfa19b11385d4c Matti Vaittinen 2021-06-03 990 * This may be called from any critical situation to trigger a system shutdown
dfa19b11385d4c Matti Vaittinen 2021-06-03 991 * after a given period of time. If time is negative this is not scheduled.
dfa19b11385d4c Matti Vaittinen 2021-06-03 992 */
595ab92650cc28 Ahmad Fatoum 2024-12-19 993 static void hw_failure_emergency_schedule(enum hw_protection_action action,
595ab92650cc28 Ahmad Fatoum 2024-12-19 994 int poweroff_delay_ms)
dfa19b11385d4c Matti Vaittinen 2021-06-03 @995 {
dfa19b11385d4c Matti Vaittinen 2021-06-03 996 if (poweroff_delay_ms <= 0)
dfa19b11385d4c Matti Vaittinen 2021-06-03 997 return;
595ab92650cc28 Ahmad Fatoum 2024-12-19 998 hw_failure_emergency_action = action;
dfa19b11385d4c Matti Vaittinen 2021-06-03 999 schedule_delayed_work(&hw_failure_emergency_poweroff_work,
dfa19b11385d4c Matti Vaittinen 2021-06-03 1000 msecs_to_jiffies(poweroff_delay_ms));
dfa19b11385d4c Matti Vaittinen 2021-06-03 1001 }
dfa19b11385d4c Matti Vaittinen 2021-06-03 1002
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On 20.12.24 07:12, kernel test robot wrote:
> Hi Ahmad,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on 78d4f34e2115b517bcbfe7ec0d018bbbb6f9b0b8]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Ahmad-Fatoum/reboot-replace-__hw_protection_shutdown-bool-action-parameter-with-an-enum/20241219-155416
> base: 78d4f34e2115b517bcbfe7ec0d018bbbb6f9b0b8
> patch link: https://lore.kernel.org/r/20241219-hw_protection-reboot-v1-2-263a0c1df802%40pengutronix.de
> patch subject: [PATCH 02/11] reboot: reboot, not shutdown, on hw_protection_reboot timeout
> config: i386-buildonly-randconfig-003-20241220 (https://download.01.org/0day-ci/archive/20241220/202412201310.JWkUQ9qf-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241220/202412201310.JWkUQ9qf-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202412201310.JWkUQ9qf-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
> kernel/reboot.c:241: warning: Function parameter or struct member 'cmd' not described in 'do_kernel_restart'
>>> kernel/reboot.c:995: warning: Function parameter or struct member 'action' not described in 'hw_failure_emergency_schedule'
>>> kernel/reboot.c:995: warning: Function parameter or struct member 'poweroff_delay_ms' not described in 'hw_failure_emergency_schedule'
Will fix the kernel doc issues for v2.
> kernel/reboot.c:1023: warning: Function parameter or struct member 'action' not described in '__hw_protection_shutdown'
> kernel/reboot.c:1023: warning: Excess function parameter 'shutdown' description in '__hw_protection_shutdown'
>
>
> vim +995 kernel/reboot.c
>
> dfa19b11385d4c Matti Vaittinen 2021-06-03 983
> dfa19b11385d4c Matti Vaittinen 2021-06-03 984 static DECLARE_DELAYED_WORK(hw_failure_emergency_poweroff_work,
> dfa19b11385d4c Matti Vaittinen 2021-06-03 985 hw_failure_emergency_poweroff_func);
> dfa19b11385d4c Matti Vaittinen 2021-06-03 986
> dfa19b11385d4c Matti Vaittinen 2021-06-03 987 /**
> 595ab92650cc28 Ahmad Fatoum 2024-12-19 988 * hw_failure_emergency_schedule - Schedule an emergency system shutdown or reboot
> dfa19b11385d4c Matti Vaittinen 2021-06-03 989 *
> dfa19b11385d4c Matti Vaittinen 2021-06-03 990 * This may be called from any critical situation to trigger a system shutdown
> dfa19b11385d4c Matti Vaittinen 2021-06-03 991 * after a given period of time. If time is negative this is not scheduled.
> dfa19b11385d4c Matti Vaittinen 2021-06-03 992 */
> 595ab92650cc28 Ahmad Fatoum 2024-12-19 993 static void hw_failure_emergency_schedule(enum hw_protection_action action,
> 595ab92650cc28 Ahmad Fatoum 2024-12-19 994 int poweroff_delay_ms)
> dfa19b11385d4c Matti Vaittinen 2021-06-03 @995 {
> dfa19b11385d4c Matti Vaittinen 2021-06-03 996 if (poweroff_delay_ms <= 0)
> dfa19b11385d4c Matti Vaittinen 2021-06-03 997 return;
> 595ab92650cc28 Ahmad Fatoum 2024-12-19 998 hw_failure_emergency_action = action;
> dfa19b11385d4c Matti Vaittinen 2021-06-03 999 schedule_delayed_work(&hw_failure_emergency_poweroff_work,
> dfa19b11385d4c Matti Vaittinen 2021-06-03 1000 msecs_to_jiffies(poweroff_delay_ms));
> dfa19b11385d4c Matti Vaittinen 2021-06-03 1001 }
> dfa19b11385d4c Matti Vaittinen 2021-06-03 1002
>
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
© 2016 - 2025 Red Hat, Inc.