[PATCH mainline-linux] thermal/core: Populate max_state before setting up cooling dev sysfs

Ovidiu Panait posted 1 patch 4 days, 4 hours ago
drivers/thermal/thermal_core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
[PATCH mainline-linux] thermal/core: Populate max_state before setting up cooling dev sysfs
Posted by Ovidiu Panait 4 days, 4 hours ago
Since commit 13f4e660a126 ("thermal/core:
Split __thermal_cooling_device_register() into two functions"),
thermal_cooling_device_setup_sysfs() is called before the
->get_max_state() callback in thermal_cooling_device_add().
However, cooling_device_stats_setup() allocates space based on
cdev->max_state, which is not initialized at that point.

With CONFIG_THERMAL_STATISTICS=y, an out of bounds access happens
inside thermal_cooling_device_stats_update(), followed by a kernel
crash:

Unable to handle kernel paging request at virtual address ffff800081329e60
Call trace:
 queued_spin_lock_slowpath+0x1cc/0x320 (P)
 thermal_cooling_device_stats_update+0x28/0xa4
 __thermal_cdev_update+0x74/0x88
 thermal_cdev_update+0x44/0x58
 step_wise_manage+0x1b8/0x300
 __thermal_zone_device_update+0x270/0x414
 thermal_zone_device_check+0x28/0x40
 process_one_work+0x150/0x290
 worker_thread+0x18c/0x300
 kthread+0x114/0x120
 ret_from_fork+0x10/0x20

To fix this, restore the original ordering of ->get_max_state() and
thermal_cooling_device_setup_sysfs(). Note that with this reordering,
the dev_set_name() and ->get_max_state() error paths now reach
thermal_cdev_release() without setup_sysfs() having run. This is safe
because cdev->stats is NULL in that case and destroy_sysfs() is a no-op.

Fixes: 13f4e660a126 ("thermal/core: Split __thermal_cooling_device_register() into two functions")
Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
---
 drivers/thermal/thermal_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 4e2a17fdb6a7..9b9fa51067bd 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1015,8 +1015,6 @@ static int thermal_cooling_device_add(struct thermal_cooling_device *cdev, void
 	device_initialize(&cdev->device);
 	cdev->devdata = devdata;
 
-	thermal_cooling_device_setup_sysfs(cdev);
-
 	ret = dev_set_name(&cdev->device, "cooling_device%d", cdev->id);
 	if (ret)
 		goto out_put_device;
@@ -1037,6 +1035,8 @@ static int thermal_cooling_device_add(struct thermal_cooling_device *cdev, void
 	if (ret)
 		current_state = ULONG_MAX;
 
+	thermal_cooling_device_setup_sysfs(cdev);
+
 	ret = device_add(&cdev->device);
 	if (ret)
 		goto out_put_device;
-- 
2.34.1
Re: [PATCH mainline-linux] thermal/core: Populate max_state before setting up cooling dev sysfs
Posted by Daniel Lezcano 4 days, 3 hours ago
On 5/20/26 18:58, Ovidiu Panait wrote:
> Since commit 13f4e660a126 ("thermal/core:
> Split __thermal_cooling_device_register() into two functions"),
> thermal_cooling_device_setup_sysfs() is called before the
> ->get_max_state() callback in thermal_cooling_device_add().
> However, cooling_device_stats_setup() allocates space based on
> cdev->max_state, which is not initialized at that point.
> 
> With CONFIG_THERMAL_STATISTICS=y, an out of bounds access happens
> inside thermal_cooling_device_stats_update(), followed by a kernel
> crash:
> 
> Unable to handle kernel paging request at virtual address ffff800081329e60
> Call trace:
>   queued_spin_lock_slowpath+0x1cc/0x320 (P)
>   thermal_cooling_device_stats_update+0x28/0xa4
>   __thermal_cdev_update+0x74/0x88
>   thermal_cdev_update+0x44/0x58
>   step_wise_manage+0x1b8/0x300
>   __thermal_zone_device_update+0x270/0x414
>   thermal_zone_device_check+0x28/0x40
>   process_one_work+0x150/0x290
>   worker_thread+0x18c/0x300
>   kthread+0x114/0x120
>   ret_from_fork+0x10/0x20
> 
> To fix this, restore the original ordering of ->get_max_state() and
> thermal_cooling_device_setup_sysfs(). Note that with this reordering,
> the dev_set_name() and ->get_max_state() error paths now reach
> thermal_cdev_release() without setup_sysfs() having run. This is safe
> because cdev->stats is NULL in that case and destroy_sysfs() is a no-op.
> 
> Fixes: 13f4e660a126 ("thermal/core: Split __thermal_cooling_device_register() into two functions")
> Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
> ---

Thanks for fixing this !

Reviewed-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Re: [PATCH mainline-linux] thermal/core: Populate max_state before setting up cooling dev sysfs
Posted by Rafael J. Wysocki 4 days, 3 hours ago
On Wed, May 20, 2026 at 6:58 PM Ovidiu Panait
<ovidiu.panait.rb@renesas.com> wrote:
>
> Since commit 13f4e660a126 ("thermal/core:
> Split __thermal_cooling_device_register() into two functions"),
> thermal_cooling_device_setup_sysfs() is called before the
> ->get_max_state() callback in thermal_cooling_device_add().
> However, cooling_device_stats_setup() allocates space based on
> cdev->max_state, which is not initialized at that point.
>
> With CONFIG_THERMAL_STATISTICS=y, an out of bounds access happens
> inside thermal_cooling_device_stats_update(), followed by a kernel
> crash:
>
> Unable to handle kernel paging request at virtual address ffff800081329e60
> Call trace:
>  queued_spin_lock_slowpath+0x1cc/0x320 (P)
>  thermal_cooling_device_stats_update+0x28/0xa4
>  __thermal_cdev_update+0x74/0x88
>  thermal_cdev_update+0x44/0x58
>  step_wise_manage+0x1b8/0x300
>  __thermal_zone_device_update+0x270/0x414
>  thermal_zone_device_check+0x28/0x40
>  process_one_work+0x150/0x290
>  worker_thread+0x18c/0x300
>  kthread+0x114/0x120
>  ret_from_fork+0x10/0x20
>
> To fix this, restore the original ordering of ->get_max_state() and
> thermal_cooling_device_setup_sysfs(). Note that with this reordering,
> the dev_set_name() and ->get_max_state() error paths now reach
> thermal_cdev_release() without setup_sysfs() having run. This is safe
> because cdev->stats is NULL in that case and destroy_sysfs() is a no-op.
>
> Fixes: 13f4e660a126 ("thermal/core: Split __thermal_cooling_device_register() into two functions")
> Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
> ---
>  drivers/thermal/thermal_core.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 4e2a17fdb6a7..9b9fa51067bd 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1015,8 +1015,6 @@ static int thermal_cooling_device_add(struct thermal_cooling_device *cdev, void
>         device_initialize(&cdev->device);
>         cdev->devdata = devdata;
>
> -       thermal_cooling_device_setup_sysfs(cdev);
> -
>         ret = dev_set_name(&cdev->device, "cooling_device%d", cdev->id);
>         if (ret)
>                 goto out_put_device;
> @@ -1037,6 +1035,8 @@ static int thermal_cooling_device_add(struct thermal_cooling_device *cdev, void
>         if (ret)
>                 current_state = ULONG_MAX;
>
> +       thermal_cooling_device_setup_sysfs(cdev);
> +
>         ret = device_add(&cdev->device);
>         if (ret)
>                 goto out_put_device;
> --

The fixed commit is in linux-next only ATM, not in the mainline, but
the problem addressed by the patch is genuine AFAICS.

Applied, thanks!