[PATCH v4 7/7] thermal: core: Record PSCR before hw_protection_shutdown()

Oleksij Rempel posted 7 patches 11 months, 1 week ago
There is a newer version of this series
[PATCH v4 7/7] thermal: core: Record PSCR before hw_protection_shutdown()
Posted by Oleksij Rempel 11 months, 1 week ago
Enhance the thermal core to record the Power State Change Reason (PSCR)
prior to invoking hw_protection_shutdown(). This change integrates the
PSCR framework with the thermal subsystem, ensuring that reasons for
power state changes, such as overtemperature events, are stored in a
dedicated non-volatile memory (NVMEM) cell.

This 'black box' recording is crucial for post-mortem analysis, enabling
a deeper understanding of system failures and abrupt shutdowns,
especially in scenarios where PMICs or watchdog timers are incapable of
logging such events.  The recorded data can be utilized during system
recovery routines in the bootloader or early kernel stages of subsequent
boots, significantly enhancing system diagnostics, reliability, and
debugging capabilities.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
---
 drivers/thermal/thermal_core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 2328ac0d8561..af4e9cf22bf6 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -16,6 +16,7 @@
 #include <linux/kdev_t.h>
 #include <linux/idr.h>
 #include <linux/thermal.h>
+#include <linux/pscrr.h>
 #include <linux/reboot.h>
 #include <linux/string.h>
 #include <linux/of.h>
@@ -380,6 +381,8 @@ static void thermal_zone_device_halt(struct thermal_zone_device *tz, bool shutdo
 
 	dev_emerg(&tz->device, "%s: critical temperature reached\n", tz->type);
 
+	set_power_state_change_reason(PSCR_OVERTEMPERATURE);
+
 	if (shutdown)
 		hw_protection_shutdown(msg, poweroff_delay_ms);
 	else
-- 
2.39.5
Re: [PATCH v4 7/7] thermal: core: Record PSCR before hw_protection_shutdown()
Posted by Daniel Lezcano 11 months ago
Hi Oleksij,


On 06/03/2025 10:38, Oleksij Rempel wrote:
> Enhance the thermal core to record the Power State Change Reason (PSCR)
> prior to invoking hw_protection_shutdown(). This change integrates the
> PSCR framework with the thermal subsystem, ensuring that reasons for
> power state changes, such as overtemperature events, are stored in a
> dedicated non-volatile memory (NVMEM) cell.
> 
> This 'black box' recording is crucial for post-mortem analysis, enabling
> a deeper understanding of system failures and abrupt shutdowns,
> especially in scenarios where PMICs or watchdog timers are incapable of
> logging such events.  The recorded data can be utilized during system
> recovery routines in the bootloader or early kernel stages of subsequent
> boots, significantly enhancing system diagnostics, reliability, and
> debugging capabilities.
> 
> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
> ---
>   drivers/thermal/thermal_core.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 2328ac0d8561..af4e9cf22bf6 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -16,6 +16,7 @@
>   #include <linux/kdev_t.h>
>   #include <linux/idr.h>
>   #include <linux/thermal.h>
> +#include <linux/pscrr.h>
>   #include <linux/reboot.h>
>   #include <linux/string.h>
>   #include <linux/of.h>
> @@ -380,6 +381,8 @@ static void thermal_zone_device_halt(struct thermal_zone_device *tz, bool shutdo
>   
>   	dev_emerg(&tz->device, "%s: critical temperature reached\n", tz->type);
>   
> +	set_power_state_change_reason(PSCR_OVERTEMPERATURE);
> +
>   	if (shutdown)
>   		hw_protection_shutdown(msg, poweroff_delay_ms);
>   	else

In the future could you add me as recipient to the series instead of 
this one ? so I can get more context.

Given there are no so much hw_protection_shutdown() users in the kernel, 
it could be more interesting to change the function to receive a enum 
pscr_reason and then in the hw_protection_shutdown() call 
pscrr_reason_to_str().



-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Re: [PATCH v4 7/7] thermal: core: Record PSCR before hw_protection_shutdown()
Posted by Oleksij Rempel 11 months ago
Hi Daniel,

On Wed, Mar 12, 2025 at 04:35:51PM +0100, Daniel Lezcano wrote:
> > @@ -380,6 +381,8 @@ static void thermal_zone_device_halt(struct thermal_zone_device *tz, bool shutdo
> >   	dev_emerg(&tz->device, "%s: critical temperature reached\n", tz->type);
> > +	set_power_state_change_reason(PSCR_OVERTEMPERATURE);
> > +
> >   	if (shutdown)
> >   		hw_protection_shutdown(msg, poweroff_delay_ms);
> >   	else
> 
> In the future could you add me as recipient to the series instead of this
> one ? so I can get more context.

ack.

> Given there are no so much hw_protection_shutdown() users in the kernel, it
> could be more interesting to change the function to receive a enum
> pscr_reason and then in the hw_protection_shutdown() call
> pscrr_reason_to_str().
 
Do you mean, make it work with CONFIG_PSCRR=n?

Beside, the latest version is v5:
https://lore.kernel.org/all/20250310103732.423542-1-o.rempel@pengutronix.de/

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
Re: [PATCH v4 7/7] thermal: core: Record PSCR before hw_protection_shutdown()
Posted by Daniel Lezcano 11 months ago
On 12/03/2025 17:51, Oleksij Rempel wrote:
> Hi Daniel,
> 
> On Wed, Mar 12, 2025 at 04:35:51PM +0100, Daniel Lezcano wrote:
>>> @@ -380,6 +381,8 @@ static void thermal_zone_device_halt(struct thermal_zone_device *tz, bool shutdo
>>>    	dev_emerg(&tz->device, "%s: critical temperature reached\n", tz->type);
>>> +	set_power_state_change_reason(PSCR_OVERTEMPERATURE);
>>> +
>>>    	if (shutdown)
>>>    		hw_protection_shutdown(msg, poweroff_delay_ms);
>>>    	else
>>
>> In the future could you add me as recipient to the series instead of this
>> one ? so I can get more context.
> 
> ack.
> 
>> Given there are no so much hw_protection_shutdown() users in the kernel, it
>> could be more interesting to change the function to receive a enum
>> pscr_reason and then in the hw_protection_shutdown() call
>> pscrr_reason_to_str().
>   
> Do you mean, make it work with CONFIG_PSCRR=n?

No I meant instead of doing:

+	set_power_state_change_reason(PSCR_OVERTEMPERATURE);
+
  	if (shutdown)
  		hw_protection_shutdown(msg, poweroff_delay_ms);

Replace it by:

  	if (shutdown)
  		hw_protection_shutdown(PSCR_OVERTEMPERATURE, poweroff_delay_ms);


and in hw_protection_shutdown() use pscrr_reason_to_str() to display a msg.

That can work with CONFIG_PSCRR=n


> Beside, the latest version is v5:
> https://lore.kernel.org/all/20250310103732.423542-1-o.rempel@pengutronix.de/

Ah ok thanks for the pointer


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog