hw_protection_shutdown() will kick off an orderly shutdown and if that
takes longer than a configurable amount of time, an emergency shutdown
will occur.
Recently, hw_protection_reboot() was added for those systems that don't
implement a proper shutdown and are better served by rebooting and
having the boot firmware worry about doing something about the critical
condition.
On timeout of the orderly reboot of hw_protection_reboot(), the system
would go into shutdown, instead of reboot. This is not a good idea, as
going into shutdown was explicitly not asked for.
Fix this by always doing an emergency reboot if hw_protection_reboot()
is called and the orderly reboot takes too long.
Fixes: 79fa723ba84c ("reboot: Introduce thermal_zone_device_critical_reboot()")
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
kernel/reboot.c | 70 ++++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 49 insertions(+), 21 deletions(-)
diff --git a/kernel/reboot.c b/kernel/reboot.c
index 847ac5d17a659981c6765699eac323f5e87f48c1..222b63dfd31020d0e2bc1b1402dbfa82adc71990 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -932,48 +932,76 @@ void orderly_reboot(void)
}
EXPORT_SYMBOL_GPL(orderly_reboot);
+static const char *hw_protection_action_str(enum hw_protection_action action)
+{
+ switch (action) {
+ case HWPROT_ACT_SHUTDOWN:
+ return "shutdown";
+ case HWPROT_ACT_REBOOT:
+ return "reboot";
+ default:
+ return "undefined";
+ }
+}
+
+static enum hw_protection_action hw_failure_emergency_action;
+
/**
- * hw_failure_emergency_poweroff_func - emergency poweroff work after a known delay
- * @work: work_struct associated with the emergency poweroff function
+ * hw_failure_emergency_action_func - emergency action work after a known delay
+ * @work: work_struct associated with the emergency action function
*
* This function is called in very critical situations to force
- * a kernel poweroff after a configurable timeout value.
+ * a kernel poweroff or reboot after a configurable timeout value.
*/
-static void hw_failure_emergency_poweroff_func(struct work_struct *work)
+static void hw_failure_emergency_action_func(struct work_struct *work)
{
+ const char *action_str = hw_protection_action_str(hw_failure_emergency_action);
+
+ pr_emerg("Hardware protection timed-out. Trying forced %s\n",
+ action_str);
+
/*
- * We have reached here after the emergency shutdown waiting period has
- * expired. This means orderly_poweroff has not been able to shut off
- * the system for some reason.
+ * We have reached here after the emergency action waiting period has
+ * expired. This means orderly_poweroff/reboot has not been able to
+ * shut off the system for some reason.
*
- * Try to shut down the system immediately using kernel_power_off
- * if populated
+ * Try to shut off the system immediately if possible
*/
- pr_emerg("Hardware protection timed-out. Trying forced poweroff\n");
- kernel_power_off();
+
+ if (hw_failure_emergency_action == HWPROT_ACT_REBOOT)
+ kernel_restart(NULL);
+ else
+ kernel_power_off();
/*
* Worst of the worst case trigger emergency restart
*/
- pr_emerg("Hardware protection shutdown failed. Trying emergency restart\n");
+ pr_emerg("Hardware protection %s failed. Trying emergency restart\n",
+ action_str);
emergency_restart();
}
-static DECLARE_DELAYED_WORK(hw_failure_emergency_poweroff_work,
- hw_failure_emergency_poweroff_func);
+static DECLARE_DELAYED_WORK(hw_failure_emergency_action_work,
+ hw_failure_emergency_action_func);
/**
- * hw_failure_emergency_poweroff - Trigger an emergency system poweroff
+ * hw_failure_emergency_schedule - Schedule an emergency system shutdown or reboot
+ *
+ * @action: The hardware protection action to be taken
+ * @action_delay_ms: Time in milliseconds to elapse before triggering action
*
* This may be called from any critical situation to trigger a system shutdown
- * after a given period of time. If time is negative this is not scheduled.
+ * or reboot after a given period of time.
+ * If time is negative this is not scheduled.
*/
-static void hw_failure_emergency_poweroff(int poweroff_delay_ms)
+static void hw_failure_emergency_schedule(enum hw_protection_action action,
+ int action_delay_ms)
{
- if (poweroff_delay_ms <= 0)
+ if (action_delay_ms <= 0)
return;
- schedule_delayed_work(&hw_failure_emergency_poweroff_work,
- msecs_to_jiffies(poweroff_delay_ms));
+ hw_failure_emergency_action = action;
+ schedule_delayed_work(&hw_failure_emergency_action_work,
+ msecs_to_jiffies(action_delay_ms));
}
/**
@@ -1006,7 +1034,7 @@ void __hw_protection_shutdown(const char *reason, int ms_until_forced,
* Queue a backup emergency shutdown in the event of
* orderly_poweroff failure
*/
- hw_failure_emergency_poweroff(ms_until_forced);
+ hw_failure_emergency_schedule(action, ms_until_forced);
if (action == HWPROT_ACT_REBOOT)
orderly_reboot();
else
--
2.39.5
On 13/01/2025 18:25, Ahmad Fatoum wrote:
> hw_protection_shutdown() will kick off an orderly shutdown and if that
> takes longer than a configurable amount of time, an emergency shutdown
> will occur.
>
> Recently, hw_protection_reboot() was added for those systems that don't
> implement a proper shutdown and are better served by rebooting and
> having the boot firmware worry about doing something about the critical
> condition.
>
> On timeout of the orderly reboot of hw_protection_reboot(), the system
> would go into shutdown, instead of reboot. This is not a good idea, as
> going into shutdown was explicitly not asked for.
>
> Fix this by always doing an emergency reboot if hw_protection_reboot()
> is called and the orderly reboot takes too long.
>
> Fixes: 79fa723ba84c ("reboot: Introduce thermal_zone_device_critical_reboot()")
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
> ---
> kernel/reboot.c | 70 ++++++++++++++++++++++++++++++++++++++++-----------------
> 1 file changed, 49 insertions(+), 21 deletions(-)
>
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index 847ac5d17a659981c6765699eac323f5e87f48c1..222b63dfd31020d0e2bc1b1402dbfa82adc71990 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -932,48 +932,76 @@ void orderly_reboot(void)
> }
> EXPORT_SYMBOL_GPL(orderly_reboot);
>
> +static const char *hw_protection_action_str(enum hw_protection_action action)
> +{
> + switch (action) {
> + case HWPROT_ACT_SHUTDOWN:
> + return "shutdown";
> + case HWPROT_ACT_REBOOT:
> + return "reboot";
> + default:
> + return "undefined";
> + }
> +}
> +
> +static enum hw_protection_action hw_failure_emergency_action;
nit: Do we have a (theoretical) possibility that two emergency restarts
get scheduled with different actions? Should the action be allocated
(maybe not) for each caller, or should there be a check if an operation
with conflicting action is already scheduled?
If this was already considered and thought it is not an issue:
Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com>
Yours,
-- Matti
Hello Matti,
On 22.01.25 12:28, Matti Vaittinen wrote:
> On 13/01/2025 18:25, Ahmad Fatoum wrote:
>> hw_protection_shutdown() will kick off an orderly shutdown and if that
>> takes longer than a configurable amount of time, an emergency shutdown
>> will occur.
>>
>> Recently, hw_protection_reboot() was added for those systems that don't
>> implement a proper shutdown and are better served by rebooting and
>> having the boot firmware worry about doing something about the critical
>> condition.
>>
>> On timeout of the orderly reboot of hw_protection_reboot(), the system
>> would go into shutdown, instead of reboot. This is not a good idea, as
>> going into shutdown was explicitly not asked for.
>>
>> Fix this by always doing an emergency reboot if hw_protection_reboot()
>> is called and the orderly reboot takes too long.
>>
>> Fixes: 79fa723ba84c ("reboot: Introduce thermal_zone_device_critical_reboot()")
>> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
>> ---
>> kernel/reboot.c | 70 ++++++++++++++++++++++++++++++++++++++++-----------------
>> 1 file changed, 49 insertions(+), 21 deletions(-)
>>
>> diff --git a/kernel/reboot.c b/kernel/reboot.c
>> index 847ac5d17a659981c6765699eac323f5e87f48c1..222b63dfd31020d0e2bc1b1402dbfa82adc71990 100644
>> --- a/kernel/reboot.c
>> +++ b/kernel/reboot.c
>> @@ -932,48 +932,76 @@ void orderly_reboot(void)
>> }
>> EXPORT_SYMBOL_GPL(orderly_reboot);
>> +static const char *hw_protection_action_str(enum hw_protection_action action)
>> +{
>> + switch (action) {
>> + case HWPROT_ACT_SHUTDOWN:
>> + return "shutdown";
>> + case HWPROT_ACT_REBOOT:
>> + return "reboot";
>> + default:
>> + return "undefined";
>> + }
>> +}
>> +
>> +static enum hw_protection_action hw_failure_emergency_action;
>
> nit: Do we have a (theoretical) possibility that two emergency restarts get scheduled with different actions? Should the action be allocated (maybe not) for each caller, or should there be a check if an operation with conflicting action is already scheduled?
>
> If this was already considered and thought it is not an issue:
>
> Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com>
__hw_protection_trigger (née __hw_protection_shutdown) has this at its start:
static atomic_t allow_proceed = ATOMIC_INIT(1);
/* Shutdown should be initiated only once. */
if (!atomic_dec_and_test(&allow_proceed))
return;
It's thus not possible to have a later emergency restart race against the first.
Thanks for your R-b,
Ahmad
>
>
> Yours,
> -- Matti
>
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
On 17/02/2025 22:22, Ahmad Fatoum wrote:
> Hello Matti,
>
> On 22.01.25 12:28, Matti Vaittinen wrote:
>> On 13/01/2025 18:25, Ahmad Fatoum wrote:
>>> hw_protection_shutdown() will kick off an orderly shutdown and if that
>>> takes longer than a configurable amount of time, an emergency shutdown
>>> will occur.
>>>
>>> Recently, hw_protection_reboot() was added for those systems that don't
>>> implement a proper shutdown and are better served by rebooting and
>>> having the boot firmware worry about doing something about the critical
>>> condition.
>>>
>>> On timeout of the orderly reboot of hw_protection_reboot(), the system
>>> would go into shutdown, instead of reboot. This is not a good idea, as
>>> going into shutdown was explicitly not asked for.
>>>
>>> Fix this by always doing an emergency reboot if hw_protection_reboot()
>>> is called and the orderly reboot takes too long.
>>>
>>> Fixes: 79fa723ba84c ("reboot: Introduce thermal_zone_device_critical_reboot()")
>>> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
>>> ---
>>> kernel/reboot.c | 70 ++++++++++++++++++++++++++++++++++++++++-----------------
>>> 1 file changed, 49 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/kernel/reboot.c b/kernel/reboot.c
>>> index 847ac5d17a659981c6765699eac323f5e87f48c1..222b63dfd31020d0e2bc1b1402dbfa82adc71990 100644
>>> --- a/kernel/reboot.c
>>> +++ b/kernel/reboot.c
>>> @@ -932,48 +932,76 @@ void orderly_reboot(void)
>>> }
>>> EXPORT_SYMBOL_GPL(orderly_reboot);
>>> +static const char *hw_protection_action_str(enum hw_protection_action action)
>>> +{
>>> + switch (action) {
>>> + case HWPROT_ACT_SHUTDOWN:
>>> + return "shutdown";
>>> + case HWPROT_ACT_REBOOT:
>>> + return "reboot";
>>> + default:
>>> + return "undefined";
>>> + }
>>> +}
>>> +
>>> +static enum hw_protection_action hw_failure_emergency_action;
>>
>> nit: Do we have a (theoretical) possibility that two emergency restarts get scheduled with different actions? Should the action be allocated (maybe not) for each caller, or should there be a check if an operation with conflicting action is already scheduled?
>>
>> If this was already considered and thought it is not an issue:
>>
>> Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com>
>
> __hw_protection_trigger (née __hw_protection_shutdown) has this at its start:
>
> static atomic_t allow_proceed = ATOMIC_INIT(1);
>
> /* Shutdown should be initiated only once. */
> if (!atomic_dec_and_test(&allow_proceed))
> return;
>
> It's thus not possible to have a later emergency restart race against the first.
>
Ah, indeed. I missed this. Thanks for the clarification! :)
Yours,
-- Matti
On Mon, Jan 13, 2025 at 05:25:27PM +0100, Ahmad Fatoum wrote:
> hw_protection_shutdown() will kick off an orderly shutdown and if that
> takes longer than a configurable amount of time, an emergency shutdown
> will occur.
>
> Recently, hw_protection_reboot() was added for those systems that don't
> implement a proper shutdown and are better served by rebooting and
> having the boot firmware worry about doing something about the critical
> condition.
>
> On timeout of the orderly reboot of hw_protection_reboot(), the system
> would go into shutdown, instead of reboot. This is not a good idea, as
> going into shutdown was explicitly not asked for.
>
> Fix this by always doing an emergency reboot if hw_protection_reboot()
> is called and the orderly reboot takes too long.
>
> Fixes: 79fa723ba84c ("reboot: Introduce thermal_zone_device_critical_reboot()")
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
© 2016 - 2025 Red Hat, Inc.