[PATCH RFC 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices

Armin Wolf posted 8 patches 1 month ago
Only 3 patches received!
There is a newer version of this series
Documentation/driver-api/thermal/sysfs-api.rst     | 10 ++++-
drivers/acpi/acpi_video.c                          |  9 +----
drivers/acpi/fan_core.c                            | 16 ++------
drivers/acpi/processor_thermal.c                   | 15 +------
drivers/acpi/thermal.c                             | 33 ++++++---------
drivers/gpu/drm/etnaviv/etnaviv_gpu.c              |  4 +-
drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c |  4 +-
drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 47 +++++++++++-----------
drivers/net/wireless/ath/ath10k/thermal.c          |  2 +-
drivers/net/wireless/ath/ath11k/thermal.c          |  2 +-
drivers/net/wireless/intel/iwlwifi/mld/thermal.c   |  6 +--
drivers/net/wireless/intel/iwlwifi/mvm/tt.c        | 12 +++---
drivers/net/wireless/mediatek/mt76/mt7915/init.c   |  2 +-
drivers/net/wireless/mediatek/mt76/mt7996/init.c   |  2 +-
drivers/platform/x86/acerhdf.c                     |  4 +-
drivers/power/supply/power_supply_core.c           |  4 +-
drivers/thermal/armada_thermal.c                   |  2 +-
drivers/thermal/cpufreq_cooling.c                  |  2 +-
drivers/thermal/cpuidle_cooling.c                  |  2 +-
drivers/thermal/da9062-thermal.c                   |  2 +-
drivers/thermal/devfreq_cooling.c                  |  2 +-
drivers/thermal/dove_thermal.c                     |  2 +-
drivers/thermal/imx_thermal.c                      |  2 +-
.../intel/int340x_thermal/int3400_thermal.c        |  2 +-
.../intel/int340x_thermal/int3403_thermal.c        |  4 +-
.../intel/int340x_thermal/int3406_thermal.c        |  2 +-
.../intel/int340x_thermal/int340x_thermal_zone.c   | 13 +++---
.../int340x_thermal/processor_thermal_device_pci.c |  7 ++--
drivers/thermal/intel/intel_pch_thermal.c          |  2 +-
drivers/thermal/intel/intel_powerclamp.c           |  2 +-
drivers/thermal/intel/intel_quark_dts_thermal.c    |  2 +-
drivers/thermal/intel/intel_soc_dts_iosf.c         |  2 +-
drivers/thermal/intel/intel_tcc_cooling.c          |  2 +-
drivers/thermal/intel/x86_pkg_temp_thermal.c       |  6 +--
drivers/thermal/kirkwood_thermal.c                 |  2 +-
drivers/thermal/pcie_cooling.c                     |  2 +-
drivers/thermal/renesas/rcar_thermal.c             | 10 +++--
drivers/thermal/spear_thermal.c                    |  2 +-
drivers/thermal/tegra/soctherm.c                   |  5 +--
drivers/thermal/testing/zone.c                     |  2 +-
drivers/thermal/thermal_core.c                     | 23 +++++++----
drivers/thermal/thermal_of.c                       |  9 +++--
include/linux/thermal.h                            | 22 +++++-----
43 files changed, 145 insertions(+), 162 deletions(-)
[PATCH RFC 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices
Posted by Armin Wolf 1 month ago
Drivers registering thermal zone/cooling devices are currently unable
to tell the thermal core what parent device the new thermal zone/
cooling device should have, potentially causing issues with suspend
ordering and making it impossible for user space appications to
associate a given thermal zone device with its parent device.

This patch series aims to fix this issue by extending the functions
used to register thermal zone/cooling devices to also accept a parent
device pointer. The first six patches convert all functions used for
registering cooling devices, while the functions used for registering
thermal zone devices are converted by the remaining two patches.

I tested this series on various devices containing (among others):
- ACPI thermal zones
- ACPI processor devices
- PCIe cooling devices
- Intel Wifi card
- Intel powerclamp
- Intel TCC cooling

I also compile-tested the remaining affected drivers, however i would
still be happy if the relevant maintainers (especially those of the
mellanox ethernet switch driver) could take a quick glance at the
code and verify that i am using the correct device as the parent
device.

This work is also necessary for extending the ACPI thermal zone driver
to support the _TZD ACPI object in the future.

Signed-off-by: Armin Wolf <W_Armin@gmx.de>
---
Armin Wolf (8):
      thermal: core: Allow setting the parent device of cooling devices
      thermal: core: Set parent device in thermal_of_cooling_device_register()
      ACPI: processor: Stop creating "device" sysfs link
      ACPI: fan: Stop creating "device" sysfs link
      ACPI: video: Stop creating "device" sysfs link
      thermal: core: Set parent device in thermal_cooling_device_register()
      ACPI: thermal: Stop creating "device" sysfs link
      thermal: core: Allow setting the parent device of thermal zone devices

 Documentation/driver-api/thermal/sysfs-api.rst     | 10 ++++-
 drivers/acpi/acpi_video.c                          |  9 +----
 drivers/acpi/fan_core.c                            | 16 ++------
 drivers/acpi/processor_thermal.c                   | 15 +------
 drivers/acpi/thermal.c                             | 33 ++++++---------
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c              |  4 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c |  4 +-
 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 47 +++++++++++-----------
 drivers/net/wireless/ath/ath10k/thermal.c          |  2 +-
 drivers/net/wireless/ath/ath11k/thermal.c          |  2 +-
 drivers/net/wireless/intel/iwlwifi/mld/thermal.c   |  6 +--
 drivers/net/wireless/intel/iwlwifi/mvm/tt.c        | 12 +++---
 drivers/net/wireless/mediatek/mt76/mt7915/init.c   |  2 +-
 drivers/net/wireless/mediatek/mt76/mt7996/init.c   |  2 +-
 drivers/platform/x86/acerhdf.c                     |  4 +-
 drivers/power/supply/power_supply_core.c           |  4 +-
 drivers/thermal/armada_thermal.c                   |  2 +-
 drivers/thermal/cpufreq_cooling.c                  |  2 +-
 drivers/thermal/cpuidle_cooling.c                  |  2 +-
 drivers/thermal/da9062-thermal.c                   |  2 +-
 drivers/thermal/devfreq_cooling.c                  |  2 +-
 drivers/thermal/dove_thermal.c                     |  2 +-
 drivers/thermal/imx_thermal.c                      |  2 +-
 .../intel/int340x_thermal/int3400_thermal.c        |  2 +-
 .../intel/int340x_thermal/int3403_thermal.c        |  4 +-
 .../intel/int340x_thermal/int3406_thermal.c        |  2 +-
 .../intel/int340x_thermal/int340x_thermal_zone.c   | 13 +++---
 .../int340x_thermal/processor_thermal_device_pci.c |  7 ++--
 drivers/thermal/intel/intel_pch_thermal.c          |  2 +-
 drivers/thermal/intel/intel_powerclamp.c           |  2 +-
 drivers/thermal/intel/intel_quark_dts_thermal.c    |  2 +-
 drivers/thermal/intel/intel_soc_dts_iosf.c         |  2 +-
 drivers/thermal/intel/intel_tcc_cooling.c          |  2 +-
 drivers/thermal/intel/x86_pkg_temp_thermal.c       |  6 +--
 drivers/thermal/kirkwood_thermal.c                 |  2 +-
 drivers/thermal/pcie_cooling.c                     |  2 +-
 drivers/thermal/renesas/rcar_thermal.c             | 10 +++--
 drivers/thermal/spear_thermal.c                    |  2 +-
 drivers/thermal/tegra/soctherm.c                   |  5 +--
 drivers/thermal/testing/zone.c                     |  2 +-
 drivers/thermal/thermal_core.c                     | 23 +++++++----
 drivers/thermal/thermal_of.c                       |  9 +++--
 include/linux/thermal.h                            | 22 +++++-----
 43 files changed, 145 insertions(+), 162 deletions(-)
---
base-commit: 399fb812cd1532773e6aa985c0949859221341c4
change-id: 20251114-thermal-device-655d138824c6

Best regards,
-- 
Armin Wolf <W_Armin@gmx.de>
Re: [PATCH RFC 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices
Posted by Rafael J. Wysocki 1 month ago
On Fri, Nov 14, 2025 at 4:24 AM Armin Wolf <W_Armin@gmx.de> wrote:
>
> Drivers registering thermal zone/cooling devices are currently unable
> to tell the thermal core what parent device the new thermal zone/
> cooling device should have, potentially causing issues with suspend
> ordering

Do you have any examples of this?

> and making it impossible for user space appications to
> associate a given thermal zone device with its parent device.
>
> This patch series aims to fix this issue by extending the functions
> used to register thermal zone/cooling devices to also accept a parent
> device pointer. The first six patches convert all functions used for
> registering cooling devices, while the functions used for registering
> thermal zone devices are converted by the remaining two patches.
>
> I tested this series on various devices containing (among others):
> - ACPI thermal zones
> - ACPI processor devices
> - PCIe cooling devices
> - Intel Wifi card
> - Intel powerclamp
> - Intel TCC cooling
>
> I also compile-tested the remaining affected drivers, however i would
> still be happy if the relevant maintainers (especially those of the
> mellanox ethernet switch driver) could take a quick glance at the
> code and verify that i am using the correct device as the parent
> device.
>
> This work is also necessary for extending the ACPI thermal zone driver
> to support the _TZD ACPI object in the future.

Can you please elaborate a bit here?

_TZD is a list of devices that belong to the given thermal zone, so
how is it connected to the thermal zone parent?

> Signed-off-by: Armin Wolf <W_Armin@gmx.de>
> ---
> Armin Wolf (8):
>       thermal: core: Allow setting the parent device of cooling devices
>       thermal: core: Set parent device in thermal_of_cooling_device_register()
>       ACPI: processor: Stop creating "device" sysfs link
>       ACPI: fan: Stop creating "device" sysfs link
>       ACPI: video: Stop creating "device" sysfs link
>       thermal: core: Set parent device in thermal_cooling_device_register()
>       ACPI: thermal: Stop creating "device" sysfs link
>       thermal: core: Allow setting the parent device of thermal zone devices

I can only see the first three patches in the series ATM as per

https://lore.kernel.org/linux-pm/20251114-thermal-device-v1-0-d8b442aae38b@gmx.de/T/#r605b23f2e27e751d8406e7949dad6f5b5b112067
Re: [PATCH RFC 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices
Posted by Rafael J. Wysocki 1 month ago
CC list trimmed and I'd rather not use such an extensive one if I were you.

On Fri, Nov 14, 2025 at 1:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Nov 14, 2025 at 4:24 AM Armin Wolf <W_Armin@gmx.de> wrote:
> >
> > Drivers registering thermal zone/cooling devices are currently unable
> > to tell the thermal core what parent device the new thermal zone/
> > cooling device should have, potentially causing issues with suspend
> > ordering
>
> Do you have any examples of this?

Especially for thermal zones.

> > and making it impossible for user space appications to
> > associate a given thermal zone device with its parent device.
> >
> > This patch series aims to fix this issue by extending the functions
> > used to register thermal zone/cooling devices to also accept a parent
> > device pointer. The first six patches convert all functions used for
> > registering cooling devices, while the functions used for registering
> > thermal zone devices are converted by the remaining two patches.
> >
> > I tested this series on various devices containing (among others):
> > - ACPI thermal zones
> > - ACPI processor devices
> > - PCIe cooling devices
> > - Intel Wifi card
> > - Intel powerclamp
> > - Intel TCC cooling
> >
> > I also compile-tested the remaining affected drivers, however i would
> > still be happy if the relevant maintainers (especially those of the
> > mellanox ethernet switch driver) could take a quick glance at the
> > code and verify that i am using the correct device as the parent
> > device.
> >
> > This work is also necessary for extending the ACPI thermal zone driver
> > to support the _TZD ACPI object in the future.
>
> Can you please elaborate a bit here?
>
> _TZD is a list of devices that belong to the given thermal zone, so
> how is it connected to the thermal zone parent?
>
> > Signed-off-by: Armin Wolf <W_Armin@gmx.de>
> > ---
> > Armin Wolf (8):
> >       thermal: core: Allow setting the parent device of cooling devices
> >       thermal: core: Set parent device in thermal_of_cooling_device_register()
> >       ACPI: processor: Stop creating "device" sysfs link
> >       ACPI: fan: Stop creating "device" sysfs link
> >       ACPI: video: Stop creating "device" sysfs link
> >       thermal: core: Set parent device in thermal_cooling_device_register()
> >       ACPI: thermal: Stop creating "device" sysfs link

This will kind of break things because user space may rely on those, may it not?

> >       thermal: core: Allow setting the parent device of thermal zone devices

For this last change, you need to define what it means for a thermal
zone to have a parent device.  In particular, in what way would a
thermal zone depend on its parent?

> I can only see the first three patches in the series ATM as per
>
> https://lore.kernel.org/linux-pm/20251114-thermal-device-v1-0-d8b442aae38b@gmx.de/T/#r605b23f2e27e751d8406e7949dad6f5b5b112067

That's probably because of the excessive CC list.
Re: [PATCH RFC 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices
Posted by Armin Wolf 1 month ago
Am 14.11.25 um 21:10 schrieb Rafael J. Wysocki:

> CC list trimmed and I'd rather not use such an extensive one if I were you.
>
> On Fri, Nov 14, 2025 at 1:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>> On Fri, Nov 14, 2025 at 4:24 AM Armin Wolf <W_Armin@gmx.de> wrote:
>>> Drivers registering thermal zone/cooling devices are currently unable
>>> to tell the thermal core what parent device the new thermal zone/
>>> cooling device should have, potentially causing issues with suspend
>>> ordering
>> Do you have any examples of this?
> Especially for thermal zones.

The device core suspends child devices before parent devices in order to avoid
child devices accessing an already suspended parent device. Since thermal zone
and cooling devices have no parent, they could potentially suspended after their
parent.

I said "potentially" because currently the thermal subsystem handles suspend/resume
using a PM notifier, something that prevents the above problem from occurring. We
should however eventually migrate to dev_pm_ops for that, so the device core needs
to know about parent-child dependencies between thermal zone/cooling devices and their
respective parent devices.

>>> and making it impossible for user space appications to
>>> associate a given thermal zone device with its parent device.
>>>
>>> This patch series aims to fix this issue by extending the functions
>>> used to register thermal zone/cooling devices to also accept a parent
>>> device pointer. The first six patches convert all functions used for
>>> registering cooling devices, while the functions used for registering
>>> thermal zone devices are converted by the remaining two patches.
>>>
>>> I tested this series on various devices containing (among others):
>>> - ACPI thermal zones
>>> - ACPI processor devices
>>> - PCIe cooling devices
>>> - Intel Wifi card
>>> - Intel powerclamp
>>> - Intel TCC cooling
>>>
>>> I also compile-tested the remaining affected drivers, however i would
>>> still be happy if the relevant maintainers (especially those of the
>>> mellanox ethernet switch driver) could take a quick glance at the
>>> code and verify that i am using the correct device as the parent
>>> device.
>>>
>>> This work is also necessary for extending the ACPI thermal zone driver
>>> to support the _TZD ACPI object in the future.
>> Can you please elaborate a bit here?
>>
>> _TZD is a list of devices that belong to the given thermal zone, so
>> how is it connected to the thermal zone parent?

The ACPI thermal zone driver currently matches cooling devices by accessing their
private drvdata and checking if it is a pointer to the correct ACPI device. This
work well enough for ACPI fans and processors, but will likely not work for other
cooling devices (like batteries). Such cooling devices are supposed to be listed
by the _TZD ACPI object, so we need a more generic matching algorithm before adding
support for said ACPI object.

I as thinking of modifying the ACPI thermal zone driver to instead use the ACPI handle
of the parent device for matching cooling devices. This would solve the problem described
above-

>>> Signed-off-by: Armin Wolf <W_Armin@gmx.de>
>>> ---
>>> Armin Wolf (8):
>>>        thermal: core: Allow setting the parent device of cooling devices
>>>        thermal: core: Set parent device in thermal_of_cooling_device_register()
>>>        ACPI: processor: Stop creating "device" sysfs link
>>>        ACPI: fan: Stop creating "device" sysfs link
>>>        ACPI: video: Stop creating "device" sysfs link
>>>        thermal: core: Set parent device in thermal_cooling_device_register()
>>>        ACPI: thermal: Stop creating "device" sysfs link
> This will kind of break things because user space may rely on those, may it not?

The driver core will create the "device" sysfs link for us as soon as we populate the
parent device pointer of the thermal zone/cooling device. So user space application
relying on those links should continue to work.

I even tested this on my devices.

>>>        thermal: core: Allow setting the parent device of thermal zone devices
> For this last change, you need to define what it means for a thermal
> zone to have a parent device.  In particular, in what way would a
> thermal zone depend on its parent?

1. A thermal zone should be suspended before the parent device is suspended. For this the
    device core needs to know the parent device of a given thermal zone device.
2. Users space applications can determine the physical device behind a given thermal zone device
    if we enable the device core to automatically create a "device" sysfs link.

Other than that all current thermal zone devices do not depend on the parent device pointer
(because currently it is always NULL).

>> I can only see the first three patches in the series ATM as per
>>
>> https://lore.kernel.org/linux-pm/20251114-thermal-device-v1-0-d8b442aae38b@gmx.de/T/#r605b23f2e27e751d8406e7949dad6f5b5b112067
> That's probably because of the excessive CC list.

Yes, that was my mistake. I will prune the CC list and resend the series. Do you think that i should include
all maintainers of the affected drivers or only the subsystem maintainers?

Thanks,
Armin Wolf