[PATCH v5 7/7] thermal: core: Record PSCR before hw_protection_shutdown()

Oleksij Rempel posted 7 patches 11 months ago
There is a newer version of this series
[PATCH v5 7/7] thermal: core: Record PSCR before hw_protection_shutdown()
Posted by Oleksij Rempel 11 months ago
Enhance the thermal core to record the Power State Change Reason (PSCR)
prior to invoking hw_protection_shutdown(). This change integrates the
PSCR framework with the thermal subsystem, ensuring that reasons for
power state changes, such as overtemperature events, are stored in a
dedicated non-volatile memory (NVMEM) cell.

This 'black box' recording is crucial for post-mortem analysis, enabling
a deeper understanding of system failures and abrupt shutdowns,
especially in scenarios where PMICs or watchdog timers are incapable of
logging such events.  The recorded data can be utilized during system
recovery routines in the bootloader or early kernel stages of subsequent
boots, significantly enhancing system diagnostics, reliability, and
debugging capabilities.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
---
 drivers/thermal/thermal_core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 2328ac0d8561..af4e9cf22bf6 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -16,6 +16,7 @@
 #include <linux/kdev_t.h>
 #include <linux/idr.h>
 #include <linux/thermal.h>
+#include <linux/pscrr.h>
 #include <linux/reboot.h>
 #include <linux/string.h>
 #include <linux/of.h>
@@ -380,6 +381,8 @@ static void thermal_zone_device_halt(struct thermal_zone_device *tz, bool shutdo
 
 	dev_emerg(&tz->device, "%s: critical temperature reached\n", tz->type);
 
+	set_power_state_change_reason(PSCR_OVERTEMPERATURE);
+
 	if (shutdown)
 		hw_protection_shutdown(msg, poweroff_delay_ms);
 	else
-- 
2.39.5
Re: [PATCH v5 7/7] thermal: core: Record PSCR before hw_protection_shutdown()
Posted by kernel test robot 10 months, 4 weeks ago
Hi Oleksij,

kernel test robot noticed the following build errors:

[auto build test ERROR on sre-power-supply/for-next]
[also build test ERROR on broonie-regulator/for-next rafael-pm/thermal linus/master v6.14-rc7]
[cannot apply to next-20250318]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Oleksij-Rempel/power-Extend-power_on_reason-h-for-upcoming-PSCRR-framework/20250310-184319
base:   https://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git for-next
patch link:    https://lore.kernel.org/r/20250310103732.423542-8-o.rempel%40pengutronix.de
patch subject: [PATCH v5 7/7] thermal: core: Record PSCR before hw_protection_shutdown()
config: m68k-randconfig-r073-20250314 (https://download.01.org/0day-ci/archive/20250319/202503190248.stJdS2ru-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250319/202503190248.stJdS2ru-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202503190248.stJdS2ru-lkp@intel.com/

All errors (new ones prefixed by >>):

   m68k-linux-ld: drivers/thermal/thermal_core.o: in function `thermal_zone_device_halt':
>> drivers/thermal/thermal_core.c:384:(.text.unlikely+0x24): undefined reference to `set_power_state_change_reason'
   m68k-linux-ld: drivers/regulator/core.o: in function `regulator_handle_critical':
   drivers/regulator/core.c:5270:(.text+0x20c6): undefined reference to `set_power_state_change_reason'


vim +384 drivers/thermal/thermal_core.c

   372	
   373	static void thermal_zone_device_halt(struct thermal_zone_device *tz, bool shutdown)
   374	{
   375		/*
   376		 * poweroff_delay_ms must be a carefully profiled positive value.
   377		 * Its a must for forced_emergency_poweroff_work to be scheduled.
   378		 */
   379		int poweroff_delay_ms = CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS;
   380		const char *msg = "Temperature too high";
   381	
   382		dev_emerg(&tz->device, "%s: critical temperature reached\n", tz->type);
   383	
 > 384		set_power_state_change_reason(PSCR_OVERTEMPERATURE);
   385	
   386		if (shutdown)
   387			hw_protection_shutdown(msg, poweroff_delay_ms);
   388		else
   389			hw_protection_reboot(msg, poweroff_delay_ms);
   390	}
   391	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki