drivers/thermal/thermal_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
Since commit 13f4e660a126 ("thermal/core:
Split __thermal_cooling_device_register() into two functions"),
thermal_cooling_device_setup_sysfs() is called before the
->get_max_state() callback in thermal_cooling_device_add().
However, cooling_device_stats_setup() allocates space based on
cdev->max_state, which is not initialized at that point.
With CONFIG_THERMAL_STATISTICS=y, an out of bounds access happens
inside thermal_cooling_device_stats_update(), followed by a kernel
crash:
Unable to handle kernel paging request at virtual address ffff800081329e60
Call trace:
queued_spin_lock_slowpath+0x1cc/0x320 (P)
thermal_cooling_device_stats_update+0x28/0xa4
__thermal_cdev_update+0x74/0x88
thermal_cdev_update+0x44/0x58
step_wise_manage+0x1b8/0x300
__thermal_zone_device_update+0x270/0x414
thermal_zone_device_check+0x28/0x40
process_one_work+0x150/0x290
worker_thread+0x18c/0x300
kthread+0x114/0x120
ret_from_fork+0x10/0x20
To fix this, restore the original ordering of ->get_max_state() and
thermal_cooling_device_setup_sysfs(). Note that with this reordering,
the dev_set_name() and ->get_max_state() error paths now reach
thermal_cdev_release() without setup_sysfs() having run. This is safe
because cdev->stats is NULL in that case and destroy_sysfs() is a no-op.
Fixes: 13f4e660a126 ("thermal/core: Split __thermal_cooling_device_register() into two functions")
Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
---
drivers/thermal/thermal_core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 4e2a17fdb6a7..9b9fa51067bd 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1015,8 +1015,6 @@ static int thermal_cooling_device_add(struct thermal_cooling_device *cdev, void
device_initialize(&cdev->device);
cdev->devdata = devdata;
- thermal_cooling_device_setup_sysfs(cdev);
-
ret = dev_set_name(&cdev->device, "cooling_device%d", cdev->id);
if (ret)
goto out_put_device;
@@ -1037,6 +1035,8 @@ static int thermal_cooling_device_add(struct thermal_cooling_device *cdev, void
if (ret)
current_state = ULONG_MAX;
+ thermal_cooling_device_setup_sysfs(cdev);
+
ret = device_add(&cdev->device);
if (ret)
goto out_put_device;
--
2.34.1
On 5/20/26 18:58, Ovidiu Panait wrote:
> Since commit 13f4e660a126 ("thermal/core:
> Split __thermal_cooling_device_register() into two functions"),
> thermal_cooling_device_setup_sysfs() is called before the
> ->get_max_state() callback in thermal_cooling_device_add().
> However, cooling_device_stats_setup() allocates space based on
> cdev->max_state, which is not initialized at that point.
>
> With CONFIG_THERMAL_STATISTICS=y, an out of bounds access happens
> inside thermal_cooling_device_stats_update(), followed by a kernel
> crash:
>
> Unable to handle kernel paging request at virtual address ffff800081329e60
> Call trace:
> queued_spin_lock_slowpath+0x1cc/0x320 (P)
> thermal_cooling_device_stats_update+0x28/0xa4
> __thermal_cdev_update+0x74/0x88
> thermal_cdev_update+0x44/0x58
> step_wise_manage+0x1b8/0x300
> __thermal_zone_device_update+0x270/0x414
> thermal_zone_device_check+0x28/0x40
> process_one_work+0x150/0x290
> worker_thread+0x18c/0x300
> kthread+0x114/0x120
> ret_from_fork+0x10/0x20
>
> To fix this, restore the original ordering of ->get_max_state() and
> thermal_cooling_device_setup_sysfs(). Note that with this reordering,
> the dev_set_name() and ->get_max_state() error paths now reach
> thermal_cdev_release() without setup_sysfs() having run. This is safe
> because cdev->stats is NULL in that case and destroy_sysfs() is a no-op.
>
> Fixes: 13f4e660a126 ("thermal/core: Split __thermal_cooling_device_register() into two functions")
> Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
> ---
Thanks for fixing this !
Reviewed-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
On Wed, May 20, 2026 at 6:58 PM Ovidiu Panait
<ovidiu.panait.rb@renesas.com> wrote:
>
> Since commit 13f4e660a126 ("thermal/core:
> Split __thermal_cooling_device_register() into two functions"),
> thermal_cooling_device_setup_sysfs() is called before the
> ->get_max_state() callback in thermal_cooling_device_add().
> However, cooling_device_stats_setup() allocates space based on
> cdev->max_state, which is not initialized at that point.
>
> With CONFIG_THERMAL_STATISTICS=y, an out of bounds access happens
> inside thermal_cooling_device_stats_update(), followed by a kernel
> crash:
>
> Unable to handle kernel paging request at virtual address ffff800081329e60
> Call trace:
> queued_spin_lock_slowpath+0x1cc/0x320 (P)
> thermal_cooling_device_stats_update+0x28/0xa4
> __thermal_cdev_update+0x74/0x88
> thermal_cdev_update+0x44/0x58
> step_wise_manage+0x1b8/0x300
> __thermal_zone_device_update+0x270/0x414
> thermal_zone_device_check+0x28/0x40
> process_one_work+0x150/0x290
> worker_thread+0x18c/0x300
> kthread+0x114/0x120
> ret_from_fork+0x10/0x20
>
> To fix this, restore the original ordering of ->get_max_state() and
> thermal_cooling_device_setup_sysfs(). Note that with this reordering,
> the dev_set_name() and ->get_max_state() error paths now reach
> thermal_cdev_release() without setup_sysfs() having run. This is safe
> because cdev->stats is NULL in that case and destroy_sysfs() is a no-op.
>
> Fixes: 13f4e660a126 ("thermal/core: Split __thermal_cooling_device_register() into two functions")
> Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
> ---
> drivers/thermal/thermal_core.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 4e2a17fdb6a7..9b9fa51067bd 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1015,8 +1015,6 @@ static int thermal_cooling_device_add(struct thermal_cooling_device *cdev, void
> device_initialize(&cdev->device);
> cdev->devdata = devdata;
>
> - thermal_cooling_device_setup_sysfs(cdev);
> -
> ret = dev_set_name(&cdev->device, "cooling_device%d", cdev->id);
> if (ret)
> goto out_put_device;
> @@ -1037,6 +1035,8 @@ static int thermal_cooling_device_add(struct thermal_cooling_device *cdev, void
> if (ret)
> current_state = ULONG_MAX;
>
> + thermal_cooling_device_setup_sysfs(cdev);
> +
> ret = device_add(&cdev->device);
> if (ret)
> goto out_put_device;
> --
The fixed commit is in linux-next only ATM, not in the mainline, but
the problem addressed by the patch is genuine AFAICS.
Applied, thanks!
© 2016 - 2026 Red Hat, Inc.