Documentation/driver-api/thermal/sysfs-api.rst | 10 ++++- drivers/acpi/acpi_video.c | 9 +---- drivers/acpi/fan_core.c | 16 ++------ drivers/acpi/processor_thermal.c | 15 +------ drivers/acpi/thermal.c | 33 ++++++--------- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c | 4 +- drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 47 +++++++++++----------- drivers/net/wireless/ath/ath10k/thermal.c | 2 +- drivers/net/wireless/ath/ath11k/thermal.c | 2 +- drivers/net/wireless/intel/iwlwifi/mld/thermal.c | 6 +-- drivers/net/wireless/intel/iwlwifi/mvm/tt.c | 12 +++--- drivers/net/wireless/mediatek/mt76/mt7915/init.c | 2 +- drivers/net/wireless/mediatek/mt76/mt7996/init.c | 2 +- drivers/platform/x86/acerhdf.c | 4 +- drivers/power/supply/power_supply_core.c | 4 +- drivers/thermal/armada_thermal.c | 2 +- drivers/thermal/cpufreq_cooling.c | 2 +- drivers/thermal/cpuidle_cooling.c | 2 +- drivers/thermal/da9062-thermal.c | 2 +- drivers/thermal/devfreq_cooling.c | 2 +- drivers/thermal/dove_thermal.c | 2 +- drivers/thermal/imx_thermal.c | 2 +- .../intel/int340x_thermal/int3400_thermal.c | 2 +- .../intel/int340x_thermal/int3403_thermal.c | 4 +- .../intel/int340x_thermal/int3406_thermal.c | 2 +- .../intel/int340x_thermal/int340x_thermal_zone.c | 13 +++--- .../int340x_thermal/processor_thermal_device_pci.c | 7 ++-- drivers/thermal/intel/intel_pch_thermal.c | 2 +- drivers/thermal/intel/intel_powerclamp.c | 2 +- drivers/thermal/intel/intel_quark_dts_thermal.c | 2 +- drivers/thermal/intel/intel_soc_dts_iosf.c | 2 +- drivers/thermal/intel/intel_tcc_cooling.c | 2 +- drivers/thermal/intel/x86_pkg_temp_thermal.c | 6 +-- drivers/thermal/kirkwood_thermal.c | 2 +- drivers/thermal/pcie_cooling.c | 2 +- drivers/thermal/renesas/rcar_thermal.c | 10 +++-- drivers/thermal/spear_thermal.c | 2 +- drivers/thermal/tegra/soctherm.c | 5 +-- drivers/thermal/testing/zone.c | 2 +- drivers/thermal/thermal_core.c | 23 +++++++---- drivers/thermal/thermal_of.c | 9 +++-- include/linux/thermal.h | 22 +++++----- 43 files changed, 145 insertions(+), 162 deletions(-)
Drivers registering thermal zone/cooling devices are currently unable
to tell the thermal core what parent device the new thermal zone/
cooling device should have, potentially causing issues with suspend
ordering and making it impossible for user space appications to
associate a given thermal zone device with its parent device.
This patch series aims to fix this issue by extending the functions
used to register thermal zone/cooling devices to also accept a parent
device pointer. The first six patches convert all functions used for
registering cooling devices, while the functions used for registering
thermal zone devices are converted by the remaining two patches.
I tested this series on various devices containing (among others):
- ACPI thermal zones
- ACPI processor devices
- PCIe cooling devices
- Intel Wifi card
- Intel powerclamp
- Intel TCC cooling
I also compile-tested the remaining affected drivers, however i would
still be happy if the relevant maintainers (especially those of the
mellanox ethernet switch driver) could take a quick glance at the
code and verify that i am using the correct device as the parent
device.
This work is also necessary for extending the ACPI thermal zone driver
to support the _TZD ACPI object in the future.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
---
Armin Wolf (8):
thermal: core: Allow setting the parent device of cooling devices
thermal: core: Set parent device in thermal_of_cooling_device_register()
ACPI: processor: Stop creating "device" sysfs link
ACPI: fan: Stop creating "device" sysfs link
ACPI: video: Stop creating "device" sysfs link
thermal: core: Set parent device in thermal_cooling_device_register()
ACPI: thermal: Stop creating "device" sysfs link
thermal: core: Allow setting the parent device of thermal zone devices
Documentation/driver-api/thermal/sysfs-api.rst | 10 ++++-
drivers/acpi/acpi_video.c | 9 +----
drivers/acpi/fan_core.c | 16 ++------
drivers/acpi/processor_thermal.c | 15 +------
drivers/acpi/thermal.c | 33 ++++++---------
drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 +-
drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c | 4 +-
drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 47 +++++++++++-----------
drivers/net/wireless/ath/ath10k/thermal.c | 2 +-
drivers/net/wireless/ath/ath11k/thermal.c | 2 +-
drivers/net/wireless/intel/iwlwifi/mld/thermal.c | 6 +--
drivers/net/wireless/intel/iwlwifi/mvm/tt.c | 12 +++---
drivers/net/wireless/mediatek/mt76/mt7915/init.c | 2 +-
drivers/net/wireless/mediatek/mt76/mt7996/init.c | 2 +-
drivers/platform/x86/acerhdf.c | 4 +-
drivers/power/supply/power_supply_core.c | 4 +-
drivers/thermal/armada_thermal.c | 2 +-
drivers/thermal/cpufreq_cooling.c | 2 +-
drivers/thermal/cpuidle_cooling.c | 2 +-
drivers/thermal/da9062-thermal.c | 2 +-
drivers/thermal/devfreq_cooling.c | 2 +-
drivers/thermal/dove_thermal.c | 2 +-
drivers/thermal/imx_thermal.c | 2 +-
.../intel/int340x_thermal/int3400_thermal.c | 2 +-
.../intel/int340x_thermal/int3403_thermal.c | 4 +-
.../intel/int340x_thermal/int3406_thermal.c | 2 +-
.../intel/int340x_thermal/int340x_thermal_zone.c | 13 +++---
.../int340x_thermal/processor_thermal_device_pci.c | 7 ++--
drivers/thermal/intel/intel_pch_thermal.c | 2 +-
drivers/thermal/intel/intel_powerclamp.c | 2 +-
drivers/thermal/intel/intel_quark_dts_thermal.c | 2 +-
drivers/thermal/intel/intel_soc_dts_iosf.c | 2 +-
drivers/thermal/intel/intel_tcc_cooling.c | 2 +-
drivers/thermal/intel/x86_pkg_temp_thermal.c | 6 +--
drivers/thermal/kirkwood_thermal.c | 2 +-
drivers/thermal/pcie_cooling.c | 2 +-
drivers/thermal/renesas/rcar_thermal.c | 10 +++--
drivers/thermal/spear_thermal.c | 2 +-
drivers/thermal/tegra/soctherm.c | 5 +--
drivers/thermal/testing/zone.c | 2 +-
drivers/thermal/thermal_core.c | 23 +++++++----
drivers/thermal/thermal_of.c | 9 +++--
include/linux/thermal.h | 22 +++++-----
43 files changed, 145 insertions(+), 162 deletions(-)
---
base-commit: 653ef66b2c04bcdecaf3d13ea5069c4b1f27d5da
change-id: 20251114-thermal-device-655d138824c6
Best regards,
--
Armin Wolf <W_Armin@gmx.de>
On Thu, Nov 20, 2025 at 4:41 AM Armin Wolf <W_Armin@gmx.de> wrote: > > Drivers registering thermal zone/cooling devices are currently unable > to tell the thermal core what parent device the new thermal zone/ > cooling device should have, potentially causing issues with suspend > ordering This is one potential class of problems that may arise, but I would like to see a real example of this. As it stands today, thermal_class has no PM callbacks, so there are no callback execution ordering issues with devices in that class and what other suspend/resume ordering issues are there? Also, the suspend and resume of thermal zones is handled via PM notifiers. Is there a problem with this? > and making it impossible for user space applications to > associate a given thermal zone device with its parent device. Why does user space need to know the parent of a given cooling device or thermal zone? > This patch series aims to fix this issue by extending the functions > used to register thermal zone/cooling devices to also accept a parent > device pointer. The first six patches convert all functions used for > registering cooling devices, while the functions used for registering > thermal zone devices are converted by the remaining two patches. > > I tested this series on various devices containing (among others): > - ACPI thermal zones > - ACPI processor devices > - PCIe cooling devices > - Intel Wifi card > - Intel powerclamp > - Intel TCC cooling What exactly did you do to test it? > I also compile-tested the remaining affected drivers, however i would > still be happy if the relevant maintainers (especially those of the > mellanox ethernet switch driver) could take a quick glance at the > code and verify that i am using the correct device as the parent > device. I think that the above paragraph is not relevant any more? > This work is also necessary for extending the ACPI thermal zone driver > to support the _TZD ACPI object in the future. I'm still unsure why _TZD support requires the ability to set a thermal zone parent device. > Signed-off-by: Armin Wolf <W_Armin@gmx.de> > --- > Armin Wolf (8): > thermal: core: Allow setting the parent device of cooling devices > thermal: core: Set parent device in thermal_of_cooling_device_register() > ACPI: processor: Stop creating "device" sysfs link That link is not to the cooling devices' parent, but to the ACPI device object (a struct acpi_device) that corresponds to the parent. The parent of the cooling device should be the processor device, not its ACPI companion, so I'm not sure why there would be a conflict. > ACPI: fan: Stop creating "device" sysfs link > ACPI: video: Stop creating "device" sysfs link Analogously in the above two cases AFAICS. The parent of a cooling device should be a "physical" device object, like a platform device or a PCI device or similar, not a struct acpi_device (which in fact is not a device even). > thermal: core: Set parent device in thermal_cooling_device_register() > ACPI: thermal: Stop creating "device" sysfs link And this link is to the struct acpi_device representing the thermal zone itself. > thermal: core: Allow setting the parent device of thermal zone devices I'm not sure if this is a good idea, at least until it is clear what the role of a thermal zone parent device should be.
Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki:
> On Thu, Nov 20, 2025 at 4:41 AM Armin Wolf <W_Armin@gmx.de> wrote:
>> Drivers registering thermal zone/cooling devices are currently unable
>> to tell the thermal core what parent device the new thermal zone/
>> cooling device should have, potentially causing issues with suspend
>> ordering
> This is one potential class of problems that may arise, but I would
> like to see a real example of this.
>
> As it stands today, thermal_class has no PM callbacks, so there are no
> callback execution ordering issues with devices in that class and what
> other suspend/resume ordering issues are there?
Correct, that is why i said "potentially".
>
> Also, the suspend and resume of thermal zones is handled via PM
> notifiers. Is there a problem with this?
The problem with PM notifiers is that thermal zones stop working even before
user space is frozen. Freezing user space might take a lot of time, so having
no thermal management during this period is less than ideal.
This problem would not occur when using dev_pm_ops, as thermal zones would be
suspended after user space has been frozen successfully. Additionally, when using
dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already mandates
that no new devices (including thermal zones and cooling devices) be registered during
a suspend/resume cycle.
Replacing the PM notifiers with dev_pm_ops would of course be a optimization with
its own patch series.
>> and making it impossible for user space applications to
>> associate a given thermal zone device with its parent device.
> Why does user space need to know the parent of a given cooling device
> or thermal zone?
Lets say that we have two thermal zones registered by two instances of the
Intel Wifi driver. User space is currently unable to find out which thermal zone
belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string ("iwlwifi[0-X]").
This problem would be solved once we populate the parent device pointer inside the thermal zone
device, as user space can simply look at the "device" symlink to determine the parent device behind
a given thermal zone device.
Additionally, being able to access the acpi_handle of the parent device will be necessary for the
ACPI thermal zone driver to support cooling devices other than ACPI fans and ACPI processors.
>> This patch series aims to fix this issue by extending the functions
>> used to register thermal zone/cooling devices to also accept a parent
>> device pointer. The first six patches convert all functions used for
>> registering cooling devices, while the functions used for registering
>> thermal zone devices are converted by the remaining two patches.
>>
>> I tested this series on various devices containing (among others):
>> - ACPI thermal zones
>> - ACPI processor devices
>> - PCIe cooling devices
>> - Intel Wifi card
>> - Intel powerclamp
>> - Intel TCC cooling
> What exactly did you do to test it?
I tested:
- the thermal zone temperature readout
- correctness of the new sysfs links
- suspend/resume
I also verified that ACPI thermal zones still bind with the ACPI fans.
>> I also compile-tested the remaining affected drivers, however i would
>> still be happy if the relevant maintainers (especially those of the
>> mellanox ethernet switch driver) could take a quick glance at the
>> code and verify that i am using the correct device as the parent
>> device.
> I think that the above paragraph is not relevant any more?
You are right, however i originally meant to CC the mellanox maintainers as
i was a bit unsure about the changes i made to their driver. I will rework
this section in the next revision and CC the mellanox maintainers.
>
>> This work is also necessary for extending the ACPI thermal zone driver
>> to support the _TZD ACPI object in the future.
> I'm still unsure why _TZD support requires the ability to set a
> thermal zone parent device.
_TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI fans
and ACPI processors, like ACPI batteries. This however will currently not work as
the ACPI thermal zone driver uses the private drvdata of the cooling device to
determine if said cooling device should bind. This only works for ACPI fans and
processors due to the fact that those drivers store a ACPI device pointer inside
drvdata, something the ACPI thermal zone expects.
As we cannot require all cooling devices to store an ACPI device pointer inside
their drvdata field in order to support ACPI, we must use a more generic approach.
I was thinking about using the acpi_handle of the parent device instead of messing
with the drvdata field, but this only works if the parent device pointer of the
cooling device is populated.
(Cooling devices without a parent device would then be ignored by the ACPI thermal
zone driver, as such cooling devices cannot be linked to ACPI).
>
>> Signed-off-by: Armin Wolf <W_Armin@gmx.de>
>> ---
>> Armin Wolf (8):
>> thermal: core: Allow setting the parent device of cooling devices
>> thermal: core: Set parent device in thermal_of_cooling_device_register()
>> ACPI: processor: Stop creating "device" sysfs link
> That link is not to the cooling devices' parent, but to the ACPI
> device object (a struct acpi_device) that corresponds to the parent.
> The parent of the cooling device should be the processor device, not
> its ACPI companion, so I'm not sure why there would be a conflict.
From the perspective of the Linux device core, a parent device does not have to be
a "physical" device. In the case of the ACPI processor driver, the ACPI device is used,
so the cooling device registered by said driver belongs to the ACPI device. I agree
that using the Linux processor device would make more sense, but this will require
changes inside the ACPI processor driver.
As for the "device" symlink: The conflict would be a naming conflict, as both "device" symlinks
(the one created by the ACPI processor driver and the one created by the device core) will
be created in the same directory (which is the directory of the cooling device).
>> ACPI: fan: Stop creating "device" sysfs link
>> ACPI: video: Stop creating "device" sysfs link
> Analogously in the above two cases AFAICS.
>
> The parent of a cooling device should be a "physical" device object,
> like a platform device or a PCI device or similar, not a struct
> acpi_device (which in fact is not a device even).
From the perspective of the Linux device core, a ACPI device is a perfectly valid device.
I agree that using a platform device or PCI device is better, but this already happens
inside the ACPI fan driver (platform device).
Only the ACPI video driver created a "device" sysfs link that points to the ACPI device
instead of the PCI device. I just noticed that i accidentally changed this by using the
PCI device as the parent device for the cooling device.
If you want then we can keep this change.
>> thermal: core: Set parent device in thermal_cooling_device_register()
>> ACPI: thermal: Stop creating "device" sysfs link
> And this link is to the struct acpi_device representing the thermal zone itself.
Correct, the ACPI thermal zone driver is a ACPI driver, meaning that he binds to
ACPI devices. Because of this all (thermal zone) devices created by an instance of
said driver are descendants of the ACPI device said instance is bound to.
We can of course convert the ACPI thermal zone driver into a platform driver, but
this would be a separate patch series.
>> thermal: core: Allow setting the parent device of thermal zone devices
> I'm not sure if this is a good idea, at least until it is clear what
> the role of a thermal zone parent device should be.
Take a look at my explanation with the Intel Wifi driver.
Thanks,
Armin Wolf
On Sat, Nov 22, 2025 at 3:18 PM Armin Wolf <W_Armin@gmx.de> wrote: > > Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki: > > > On Thu, Nov 20, 2025 at 4:41 AM Armin Wolf <W_Armin@gmx.de> wrote: [...] > >> --- > >> Armin Wolf (8): > >> thermal: core: Allow setting the parent device of cooling devices > >> thermal: core: Set parent device in thermal_of_cooling_device_register() > >> ACPI: processor: Stop creating "device" sysfs link > > > > That link is not to the cooling devices' parent, but to the ACPI > > device object (a struct acpi_device) that corresponds to the parent. > > The parent of the cooling device should be the processor device, not > > its ACPI companion, so I'm not sure why there would be a conflict. > > From the perspective of the Linux device core, a parent device does not have to be > a "physical" device. In the case of the ACPI processor driver, the ACPI device is used, > so the cooling device registered by said driver belongs to the ACPI device. Well, that's a problem. A struct acpi_device should not be a parent of anything other than a struct acpi_device. > I agree that using the Linux processor device would make more sense, but this will require > changes inside the ACPI processor driver. So be it. > As for the "device" symlink: The conflict would be a naming conflict, as both "device" symlinks > (the one created by the ACPI processor driver and the one created by the device core) will > be created in the same directory (which is the directory of the cooling device). I see. But why is the new symlink needed in the first place? If the device has a parent, it will appear under that parent in /sys/devices/, won't it? Currently, all of the thermal class devices appear under /sys/devices/virtual/thermal/ because they have no parents and they all get a class parent kobject under /sys/devices/virtual/, as that's what get_device_parent() does. If they have real parents, they will appear under those parents, so why will the parents need to be pointed to additionally? BTW, this means that the layout of /sys/devices/ will change when thermal devices get real parents. I'm not sure if this is a problem, but certainly something to note. > >> ACPI: fan: Stop creating "device" sysfs link > >> ACPI: video: Stop creating "device" sysfs link > > Analogously in the above two cases AFAICS. > > > > The parent of a cooling device should be a "physical" device object, > > like a platform device or a PCI device or similar, not a struct > > acpi_device (which in fact is not a device even). > > From the perspective of the Linux device core, a ACPI device is a perfectly valid device. The driver core is irrelevant here. As I said before, a struct acpi_device object should not be a parent of anything other than a struct acpi_device object. Those things are not devices and they cannot be used for representing PM dependencies, for example. > I agree that using a platform device or PCI device is better, but this already happens > inside the ACPI fan driver (platform device). So it should not happen there. > Only the ACPI video driver created a "device" sysfs link that points to the ACPI device > instead of the PCI device. I just noticed that i accidentally changed this by using the > PCI device as the parent device for the cooling device. > > If you want then we can keep this change. The PCI device should be its parent. > >> thermal: core: Set parent device in thermal_cooling_device_register() > >> ACPI: thermal: Stop creating "device" sysfs link > > And this link is to the struct acpi_device representing the thermal zone itself. > > Correct, the ACPI thermal zone driver is a ACPI driver, meaning that he binds to > ACPI devices. Because of this all (thermal zone) devices created by an instance of > said driver are descendants of the ACPI device said instance is bound to. > > We can of course convert the ACPI thermal zone driver into a platform driver, but > this would be a separate patch series. If you want parents, this needs to be done first, but I'm still not sure what the parent of a thermal zone would represent. In the ACPI case it is kind of easy - it would be the (platform) device corresponding to a given ThermalZone object in the ACPI namespace - but it only has a practical meaning if that device has a specific parent. For example, if the corresponding ThermalZone object is present in the \_SB scope, the presence of the thermal zone parent won't provide any additional information. Unfortunately, the language in the specification isn't particularly helpful here: "Thermal zone objects should appear in the namespace under the portion of the system that comprises the thermal zone. For example, a thermal zone that is isolated to a docking station should be defined within the scope of the docking station device." To me "the portion of the system" is not too meaningful unless it is just one device without children. That's why _TZD has been added AFAICS. > >> thermal: core: Allow setting the parent device of thermal zone devices > > > > I'm not sure if this is a good idea, at least until it is clear what > > the role of a thermal zone parent device should be. > > Take a look at my explanation with the Intel Wifi driver. I did and I think that you want the parent to be a device somehow associated with the thermal zone, but how exactly? What should that be in the Wifi driver case, the PCI device or something else? And what if the thermal zone affects multiple devices? Which of them (if any) would be its parent? And would it be consistent with the ACPI case described above? All of that needs consideration IMV.
Am 27.11.25 um 19:22 schrieb Rafael J. Wysocki: > On Sat, Nov 22, 2025 at 3:18 PM Armin Wolf <W_Armin@gmx.de> wrote: >> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki: >> >>> On Thu, Nov 20, 2025 at 4:41 AM Armin Wolf <W_Armin@gmx.de> wrote: > [...] > >>>> --- >>>> Armin Wolf (8): >>>> thermal: core: Allow setting the parent device of cooling devices >>>> thermal: core: Set parent device in thermal_of_cooling_device_register() >>>> ACPI: processor: Stop creating "device" sysfs link >>> That link is not to the cooling devices' parent, but to the ACPI >>> device object (a struct acpi_device) that corresponds to the parent. >>> The parent of the cooling device should be the processor device, not >>> its ACPI companion, so I'm not sure why there would be a conflict. >> From the perspective of the Linux device core, a parent device does not have to be >> a "physical" device. In the case of the ACPI processor driver, the ACPI device is used, >> so the cooling device registered by said driver belongs to the ACPI device. > Well, that's a problem. A struct acpi_device should not be a parent > of anything other than a struct acpi_device. Understandable, in this case we should indeed use the the CPU device, especially since the fwnode associated with it already points to the correct ACPI processor object (at least on my machine). >> I agree that using the Linux processor device would make more sense, but this will require >> changes inside the ACPI processor driver. > So be it. OK. >> As for the "device" symlink: The conflict would be a naming conflict, as both "device" symlinks >> (the one created by the ACPI processor driver and the one created by the device core) will >> be created in the same directory (which is the directory of the cooling device). > I see. > > But why is the new symlink needed in the first place? If the device > has a parent, it will appear under that parent in /sys/devices/, won't > it? > > Currently, all of the thermal class devices appear under > /sys/devices/virtual/thermal/ because they have no parents and they > all get a class parent kobject under /sys/devices/virtual/, as that's > what get_device_parent() does. > > If they have real parents, they will appear under those parents, so > why will the parents need to be pointed to additionally? The "device" smylink is a comfort feature provided by the device core itself to allow user space application to traverse the device tree from bottom to top, like a double-linked list. We cannot disable the creation of this symlink, nor should we. > BTW, this means that the layout of /sys/devices/ will change when > thermal devices get real parents. I'm not sure if this is a problem, > but certainly something to note. I know, most applications likely use /sys/class/thermal/, so they are not impacted by this. I will note this in the cover letter of the next revision. >>>> ACPI: fan: Stop creating "device" sysfs link >>>> ACPI: video: Stop creating "device" sysfs link >>> Analogously in the above two cases AFAICS. >>> >>> The parent of a cooling device should be a "physical" device object, >>> like a platform device or a PCI device or similar, not a struct >>> acpi_device (which in fact is not a device even). >> From the perspective of the Linux device core, a ACPI device is a perfectly valid device. > The driver core is irrelevant here. > > As I said before, a struct acpi_device object should not be a parent > of anything other than a struct acpi_device object. Those things are > not devices and they cannot be used for representing PM dependencies, > for example. > >> I agree that using a platform device or PCI device is better, but this already happens >> inside the ACPI fan driver (platform device). > So it should not happen there. I meant that the ACPI fan driver already uses the platform device as the parent device of the cooling device, so the ACPI device is only used for interacting with the ACPI control methods (and registering sysfs attributes i think). >> Only the ACPI video driver created a "device" sysfs link that points to the ACPI device >> instead of the PCI device. I just noticed that i accidentally changed this by using the >> PCI device as the parent device for the cooling device. >> >> If you want then we can keep this change. > The PCI device should be its parent. Alright, i will note this in the patch description. >>>> thermal: core: Set parent device in thermal_cooling_device_register() >>>> ACPI: thermal: Stop creating "device" sysfs link >>> And this link is to the struct acpi_device representing the thermal zone itself. >> Correct, the ACPI thermal zone driver is a ACPI driver, meaning that he binds to >> ACPI devices. Because of this all (thermal zone) devices created by an instance of >> said driver are descendants of the ACPI device said instance is bound to. >> >> We can of course convert the ACPI thermal zone driver into a platform driver, but >> this would be a separate patch series. > If you want parents, this needs to be done first, but I'm still not > sure what the parent of a thermal zone would represent. > > In the ACPI case it is kind of easy - it would be the (platform) > device corresponding to a given ThermalZone object in the ACPI > namespace - but it only has a practical meaning if that device has a > specific parent. For example, if the corresponding ThermalZone object > is present in the \_SB scope, the presence of the thermal zone parent > won't provide any additional information. To the device core it will, as the platform device will need to be suspended after the thermal zone device has been suspended, among other things. > Unfortunately, the language in the specification isn't particularly > helpful here: "Thermal zone objects should appear in the namespace > under the portion of the system that comprises the thermal zone. For > example, a thermal zone that is isolated to a docking station should > be defined within the scope of the docking station device." To me > "the portion of the system" is not too meaningful unless it is just > one device without children. That's why _TZD has been added AFAICS. I think you are confusing the parent device of the ThermalZone ACPI device with the parent device of the struct thermal_zone_device. I begin to wonder if mentioning the ACPI ThermalZone device together with the struct thermal_zone_device was a bad idea on my side xd. >>>> thermal: core: Allow setting the parent device of thermal zone devices >>> I'm not sure if this is a good idea, at least until it is clear what >>> the role of a thermal zone parent device should be. >> Take a look at my explanation with the Intel Wifi driver. > I did and I think that you want the parent to be a device somehow > associated with the thermal zone, but how exactly? What should that > be in the Wifi driver case, the PCI device or something else? > > And what if the thermal zone affects multiple devices? Which of them > (if any) would be its parent? And would it be consistent with the > ACPI case described above? > > All of that needs consideration IMV. I agree, but there is a difference between "this struct thermal_zone_device depends on device X to be operational" and "this thermal zone affects device X, device Y and device Z". This patch series exclusively deals with telling the driver core that "this struct thermal_zone_device depends on device X to be operational". Thanks, Armin Wolf
On Thu, Nov 27, 2025 at 9:29 PM Armin Wolf <W_Armin@gmx.de> wrote: > > Am 27.11.25 um 19:22 schrieb Rafael J. Wysocki: > > > On Sat, Nov 22, 2025 at 3:18 PM Armin Wolf <W_Armin@gmx.de> wrote: > >> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki: > >> > >>> On Thu, Nov 20, 2025 at 4:41 AM Armin Wolf <W_Armin@gmx.de> wrote: > > [...] > > > >>>> --- > >>>> Armin Wolf (8): > >>>> thermal: core: Allow setting the parent device of cooling devices > >>>> thermal: core: Set parent device in thermal_of_cooling_device_register() > >>>> ACPI: processor: Stop creating "device" sysfs link > >>> That link is not to the cooling devices' parent, but to the ACPI > >>> device object (a struct acpi_device) that corresponds to the parent. > >>> The parent of the cooling device should be the processor device, not > >>> its ACPI companion, so I'm not sure why there would be a conflict. > >> From the perspective of the Linux device core, a parent device does not have to be > >> a "physical" device. In the case of the ACPI processor driver, the ACPI device is used, > >> so the cooling device registered by said driver belongs to the ACPI device. > > Well, that's a problem. A struct acpi_device should not be a parent > > of anything other than a struct acpi_device. > > Understandable, in this case we should indeed use the the CPU device, especially since the fwnode > associated with it already points to the correct ACPI processor object (at least on my machine). > > >> I agree that using the Linux processor device would make more sense, but this will require > >> changes inside the ACPI processor driver. > > So be it. > > OK. > > >> As for the "device" symlink: The conflict would be a naming conflict, as both "device" symlinks > >> (the one created by the ACPI processor driver and the one created by the device core) will > >> be created in the same directory (which is the directory of the cooling device). > > I see. > > > > But why is the new symlink needed in the first place? If the device > > has a parent, it will appear under that parent in /sys/devices/, won't > > it? > > > > Currently, all of the thermal class devices appear under > > /sys/devices/virtual/thermal/ because they have no parents and they > > all get a class parent kobject under /sys/devices/virtual/, as that's > > what get_device_parent() does. > > > > If they have real parents, they will appear under those parents, so > > why will the parents need to be pointed to additionally? > > The "device" smylink is a comfort feature provided by the device core itself to allow user space > application to traverse the device tree from bottom to top, like a double-linked list. We cannot > disable the creation of this symlink, nor should we. I think you mean device_add_class_symlinks(), but that's just for class devices. Of course, thermal devices are class devices, so they'll get those links if they get parents. Fair enough. > > BTW, this means that the layout of /sys/devices/ will change when > > thermal devices get real parents. I'm not sure if this is a problem, > > but certainly something to note. > > I know, most applications likely use /sys/class/thermal/, so they are not impacted by this. I will > note this in the cover letter of the next revision. > > >>>> ACPI: fan: Stop creating "device" sysfs link > >>>> ACPI: video: Stop creating "device" sysfs link > >>> Analogously in the above two cases AFAICS. > >>> > >>> The parent of a cooling device should be a "physical" device object, > >>> like a platform device or a PCI device or similar, not a struct > >>> acpi_device (which in fact is not a device even). > >> From the perspective of the Linux device core, a ACPI device is a perfectly valid device. > > The driver core is irrelevant here. > > > > As I said before, a struct acpi_device object should not be a parent > > of anything other than a struct acpi_device object. Those things are > > not devices and they cannot be used for representing PM dependencies, > > for example. > > > >> I agree that using a platform device or PCI device is better, but this already happens > >> inside the ACPI fan driver (platform device). > > So it should not happen there. > > I meant that the ACPI fan driver already uses the platform device as the parent device of the > cooling device, so the ACPI device is only used for interacting with the ACPI control methods > (and registering sysfs attributes i think). OK > >> Only the ACPI video driver created a "device" sysfs link that points to the ACPI device > >> instead of the PCI device. I just noticed that i accidentally changed this by using the > >> PCI device as the parent device for the cooling device. > >> > >> If you want then we can keep this change. > > The PCI device should be its parent. > > Alright, i will note this in the patch description. > > >>>> thermal: core: Set parent device in thermal_cooling_device_register() > >>>> ACPI: thermal: Stop creating "device" sysfs link > >>> And this link is to the struct acpi_device representing the thermal zone itself. > >> Correct, the ACPI thermal zone driver is a ACPI driver, meaning that he binds to > >> ACPI devices. Because of this all (thermal zone) devices created by an instance of > >> said driver are descendants of the ACPI device said instance is bound to. > >> > >> We can of course convert the ACPI thermal zone driver into a platform driver, but > >> this would be a separate patch series. > > If you want parents, this needs to be done first, but I'm still not > > sure what the parent of a thermal zone would represent. > > > > In the ACPI case it is kind of easy - it would be the (platform) > > device corresponding to a given ThermalZone object in the ACPI > > namespace - but it only has a practical meaning if that device has a > > specific parent. For example, if the corresponding ThermalZone object > > is present in the \_SB scope, the presence of the thermal zone parent > > won't provide any additional information. > > To the device core it will, as the platform device will need to be suspended > after the thermal zone device has been suspended, among other things. Let's set suspend aside for now, I think I've explained my viewpoint on this enough elsewhere. > > Unfortunately, the language in the specification isn't particularly > > helpful here: "Thermal zone objects should appear in the namespace > > under the portion of the system that comprises the thermal zone. For > > example, a thermal zone that is isolated to a docking station should > > be defined within the scope of the docking station device." To me > > "the portion of the system" is not too meaningful unless it is just > > one device without children. That's why _TZD has been added AFAICS. > > I think you are confusing the parent device of the ThermalZone ACPI device > with the parent device of the struct thermal_zone_device. No, I'm not. > I begin to wonder if mentioning the ACPI ThermalZone device together with the > struct thermal_zone_device was a bad idea on my side xd. Maybe. > >>>> thermal: core: Allow setting the parent device of thermal zone devices > >>> I'm not sure if this is a good idea, at least until it is clear what > >>> the role of a thermal zone parent device should be. > >> Take a look at my explanation with the Intel Wifi driver. > > I did and I think that you want the parent to be a device somehow > > associated with the thermal zone, but how exactly? What should that > > be in the Wifi driver case, the PCI device or something else? > > > > And what if the thermal zone affects multiple devices? Which of them > > (if any) would be its parent? And would it be consistent with the > > ACPI case described above? > > > > All of that needs consideration IMV. > > I agree, but there is a difference between "this struct thermal_zone_device depends on > device X to be operational" and "this thermal zone affects device X, device Y and device Z". Yes, there is. > This patch series exclusively deals with telling the driver core that "this struct thermal_zone_device > depends on device X to be operational". Maybe let's take care of cooling devices first and get back to this later?
Am 27.11.25 um 23:14 schrieb Rafael J. Wysocki: > On Thu, Nov 27, 2025 at 9:29 PM Armin Wolf <W_Armin@gmx.de> wrote: >> Am 27.11.25 um 19:22 schrieb Rafael J. Wysocki: >> >>> On Sat, Nov 22, 2025 at 3:18 PM Armin Wolf <W_Armin@gmx.de> wrote: >>>> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki: >>>> >>>>> On Thu, Nov 20, 2025 at 4:41 AM Armin Wolf <W_Armin@gmx.de> wrote: >>> [...] >>> >>>>>> --- >>>>>> Armin Wolf (8): >>>>>> thermal: core: Allow setting the parent device of cooling devices >>>>>> thermal: core: Set parent device in thermal_of_cooling_device_register() >>>>>> ACPI: processor: Stop creating "device" sysfs link >>>>> That link is not to the cooling devices' parent, but to the ACPI >>>>> device object (a struct acpi_device) that corresponds to the parent. >>>>> The parent of the cooling device should be the processor device, not >>>>> its ACPI companion, so I'm not sure why there would be a conflict. >>>> From the perspective of the Linux device core, a parent device does not have to be >>>> a "physical" device. In the case of the ACPI processor driver, the ACPI device is used, >>>> so the cooling device registered by said driver belongs to the ACPI device. >>> Well, that's a problem. A struct acpi_device should not be a parent >>> of anything other than a struct acpi_device. >> Understandable, in this case we should indeed use the the CPU device, especially since the fwnode >> associated with it already points to the correct ACPI processor object (at least on my machine). >> >>>> I agree that using the Linux processor device would make more sense, but this will require >>>> changes inside the ACPI processor driver. >>> So be it. >> OK. >> >>>> As for the "device" symlink: The conflict would be a naming conflict, as both "device" symlinks >>>> (the one created by the ACPI processor driver and the one created by the device core) will >>>> be created in the same directory (which is the directory of the cooling device). >>> I see. >>> >>> But why is the new symlink needed in the first place? If the device >>> has a parent, it will appear under that parent in /sys/devices/, won't >>> it? >>> >>> Currently, all of the thermal class devices appear under >>> /sys/devices/virtual/thermal/ because they have no parents and they >>> all get a class parent kobject under /sys/devices/virtual/, as that's >>> what get_device_parent() does. >>> >>> If they have real parents, they will appear under those parents, so >>> why will the parents need to be pointed to additionally? >> The "device" smylink is a comfort feature provided by the device core itself to allow user space >> application to traverse the device tree from bottom to top, like a double-linked list. We cannot >> disable the creation of this symlink, nor should we. > I think you mean device_add_class_symlinks(), but that's just for > class devices. Of course, thermal devices are class devices, so > they'll get those links if they get parents. Fair enough. > >>> BTW, this means that the layout of /sys/devices/ will change when >>> thermal devices get real parents. I'm not sure if this is a problem, >>> but certainly something to note. >> I know, most applications likely use /sys/class/thermal/, so they are not impacted by this. I will >> note this in the cover letter of the next revision. >> >>>>>> ACPI: fan: Stop creating "device" sysfs link >>>>>> ACPI: video: Stop creating "device" sysfs link >>>>> Analogously in the above two cases AFAICS. >>>>> >>>>> The parent of a cooling device should be a "physical" device object, >>>>> like a platform device or a PCI device or similar, not a struct >>>>> acpi_device (which in fact is not a device even). >>>> From the perspective of the Linux device core, a ACPI device is a perfectly valid device. >>> The driver core is irrelevant here. >>> >>> As I said before, a struct acpi_device object should not be a parent >>> of anything other than a struct acpi_device object. Those things are >>> not devices and they cannot be used for representing PM dependencies, >>> for example. >>> >>>> I agree that using a platform device or PCI device is better, but this already happens >>>> inside the ACPI fan driver (platform device). >>> So it should not happen there. >> I meant that the ACPI fan driver already uses the platform device as the parent device of the >> cooling device, so the ACPI device is only used for interacting with the ACPI control methods >> (and registering sysfs attributes i think). > OK > >>>> Only the ACPI video driver created a "device" sysfs link that points to the ACPI device >>>> instead of the PCI device. I just noticed that i accidentally changed this by using the >>>> PCI device as the parent device for the cooling device. >>>> >>>> If you want then we can keep this change. >>> The PCI device should be its parent. >> Alright, i will note this in the patch description. >> >>>>>> thermal: core: Set parent device in thermal_cooling_device_register() >>>>>> ACPI: thermal: Stop creating "device" sysfs link >>>>> And this link is to the struct acpi_device representing the thermal zone itself. >>>> Correct, the ACPI thermal zone driver is a ACPI driver, meaning that he binds to >>>> ACPI devices. Because of this all (thermal zone) devices created by an instance of >>>> said driver are descendants of the ACPI device said instance is bound to. >>>> >>>> We can of course convert the ACPI thermal zone driver into a platform driver, but >>>> this would be a separate patch series. >>> If you want parents, this needs to be done first, but I'm still not >>> sure what the parent of a thermal zone would represent. >>> >>> In the ACPI case it is kind of easy - it would be the (platform) >>> device corresponding to a given ThermalZone object in the ACPI >>> namespace - but it only has a practical meaning if that device has a >>> specific parent. For example, if the corresponding ThermalZone object >>> is present in the \_SB scope, the presence of the thermal zone parent >>> won't provide any additional information. >> To the device core it will, as the platform device will need to be suspended >> after the thermal zone device has been suspended, among other things. > Let's set suspend aside for now, I think I've explained my viewpoint > on this enough elsewhere. > Agreed. >>> Unfortunately, the language in the specification isn't particularly >>> helpful here: "Thermal zone objects should appear in the namespace >>> under the portion of the system that comprises the thermal zone. For >>> example, a thermal zone that is isolated to a docking station should >>> be defined within the scope of the docking station device." To me >>> "the portion of the system" is not too meaningful unless it is just >>> one device without children. That's why _TZD has been added AFAICS. >> I think you are confusing the parent device of the ThermalZone ACPI device >> with the parent device of the struct thermal_zone_device. > No, I'm not. > >> I begin to wonder if mentioning the ACPI ThermalZone device together with the >> struct thermal_zone_device was a bad idea on my side xd. > Maybe. > >>>>>> thermal: core: Allow setting the parent device of thermal zone devices >>>>> I'm not sure if this is a good idea, at least until it is clear what >>>>> the role of a thermal zone parent device should be. >>>> Take a look at my explanation with the Intel Wifi driver. >>> I did and I think that you want the parent to be a device somehow >>> associated with the thermal zone, but how exactly? What should that >>> be in the Wifi driver case, the PCI device or something else? >>> >>> And what if the thermal zone affects multiple devices? Which of them >>> (if any) would be its parent? And would it be consistent with the >>> ACPI case described above? >>> >>> All of that needs consideration IMV. >> I agree, but there is a difference between "this struct thermal_zone_device depends on >> device X to be operational" and "this thermal zone affects device X, device Y and device Z". > Yes, there is. > >> This patch series exclusively deals with telling the driver core that "this struct thermal_zone_device >> depends on device X to be operational". > Maybe let's take care of cooling devices first and get back to this later? > Agreed.
On Sat, Nov 22, 2025 at 3:18 PM Armin Wolf <W_Armin@gmx.de> wrote:
>
> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki:
>
> > On Thu, Nov 20, 2025 at 4:41 AM Armin Wolf <W_Armin@gmx.de> wrote:
> >> Drivers registering thermal zone/cooling devices are currently unable
> >> to tell the thermal core what parent device the new thermal zone/
> >> cooling device should have, potentially causing issues with suspend
> >> ordering
> > This is one potential class of problems that may arise, but I would
> > like to see a real example of this.
> >
> > As it stands today, thermal_class has no PM callbacks, so there are no
> > callback execution ordering issues with devices in that class and what
> > other suspend/resume ordering issues are there?
>
> Correct, that is why i said "potentially".
>
> >
> > Also, the suspend and resume of thermal zones is handled via PM
> > notifiers. Is there a problem with this?
>
> The problem with PM notifiers is that thermal zones stop working even before
> user space is frozen. Freezing user space might take a lot of time, so having
> no thermal management during this period is less than ideal.
This can be addressed by doing thermal zone suspend after freezing
tasks and before starting to suspend devices. Accordingly, thermal
zones could be resumed after resuming devices and before thawing
tasks. That should not be an overly complex change to make.
> This problem would not occur when using dev_pm_ops, as thermal zones would be
> suspended after user space has been frozen successfully. Additionally, when using
> dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already mandates
> that no new devices (including thermal zones and cooling devices) be registered during
> a suspend/resume cycle.
>
> Replacing the PM notifiers with dev_pm_ops would of course be a optimization with
> its own patch series.
Honestly, I don't see much benefit from using dev_pm_ops for thermal
zone devices and cooling devices. Moreover, I actually think that
they could be "no PM" devices that are not even put on the
suspend-resume device list. Technically, they are just interfaces on
top of some other devices allowing the user space to interact with the
latter and combining different pieces described by the platform
firmware. They by themselves have no PM capabilities.
> >> and making it impossible for user space applications to
> >> associate a given thermal zone device with its parent device.
> > Why does user space need to know the parent of a given cooling device
> > or thermal zone?
>
> Lets say that we have two thermal zones registered by two instances of the
> Intel Wifi driver. User space is currently unable to find out which thermal zone
> belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string ("iwlwifi[0-X]").
But the "belong" part is not quite well defined here. I think that
what user space needs to know is what devices are located in a given
thermal zone, isn't it? Knowing the parent doesn't necessarily
address this.
> This problem would be solved once we populate the parent device pointer inside the thermal zone
> device, as user space can simply look at the "device" symlink to determine the parent device behind
> a given thermal zone device.
I'm not convinced about this.
> Additionally, being able to access the acpi_handle of the parent device will be necessary for the
> ACPI thermal zone driver to support cooling devices other than ACPI fans and ACPI processors.
I guess by the "parent" you mean the device represented in the ACPI
namespace by a ThermalZone object, right? But this is not the same as
the "parent" in the Wifi driver context, is it?
> >> This patch series aims to fix this issue by extending the functions
> >> used to register thermal zone/cooling devices to also accept a parent
> >> device pointer. The first six patches convert all functions used for
> >> registering cooling devices, while the functions used for registering
> >> thermal zone devices are converted by the remaining two patches.
> >>
> >> I tested this series on various devices containing (among others):
> >> - ACPI thermal zones
> >> - ACPI processor devices
> >> - PCIe cooling devices
> >> - Intel Wifi card
> >> - Intel powerclamp
> >> - Intel TCC cooling
> > What exactly did you do to test it?
>
> I tested:
> - the thermal zone temperature readout
> - correctness of the new sysfs links
> - suspend/resume
>
> I also verified that ACPI thermal zones still bind with the ACPI fans.
I see, thanks.
> >> I also compile-tested the remaining affected drivers, however i would
> >> still be happy if the relevant maintainers (especially those of the
> >> mellanox ethernet switch driver) could take a quick glance at the
> >> code and verify that i am using the correct device as the parent
> >> device.
> > I think that the above paragraph is not relevant any more?
>
> You are right, however i originally meant to CC the mellanox maintainers as
> i was a bit unsure about the changes i made to their driver. I will rework
> this section in the next revision and CC the mellanox maintainers.
>
> >
> >> This work is also necessary for extending the ACPI thermal zone driver
> >> to support the _TZD ACPI object in the future.
> > I'm still unsure why _TZD support requires the ability to set a
> > thermal zone parent device.
>
> _TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI fans
> and ACPI processors, like ACPI batteries.
No, it is not for cooling devices if my reading of the specification
is correct. It says:
"_TZD (Thermal Zone Devices)
This optional object evaluates to a package of device names. Each name
corresponds to a device in the ACPI namespace that is associated with
the thermal zone. The temperature reported by the thermal zone is
roughly correspondent to that of each of the devices."
And then
"The list of devices returned by the control method need not be a
complete and absolute list of devices affected by the thermal zone.
However, the package should at least contain the devices that would
uniquely identify where this thermal zone is located in the machine.
For example, a thermal zone in a docking station should include a
device in the docking station, a thermal zone for the CD-ROM bay,
should include the CD-ROM."
So IIUC this is a list of devices allowing the location of the thermal
zone to be figured out. There's nothing about cooling in this
definition.
> This however will currently not work as
> the ACPI thermal zone driver uses the private drvdata of the cooling device to
> determine if said cooling device should bind. This only works for ACPI fans and
> processors due to the fact that those drivers store a ACPI device pointer inside
> drvdata, something the ACPI thermal zone expects.
I'm not sure I understand the above.
There is a list of ACPI device handles per trip point, as returned by
either _PSL or _ALx. Devices whose handles are in that list will be
bound to the thermal zone, so long as there are struct acpi_device
objects representing them which is verified with the help of the
devdata field in struct thermal_cooling_device.
IOW, cooling device drivers that create struct thermal_cooling_device
objects representing them are expected to set devdata in those objects
to point to struct acpi_device objects corresponding to their ACPI
handles, but in principle acpi_thermal_should_bind_cdev() might as
well just use the handles themselves. It just needs to know that
there is a cooling driver on the other side of the ACPI handle.
The point is that a cooling device to be bound to an ACPI thermal zone
needs an ACPI handle in the first place to be listed in _PSL or _ALx.
> As we cannot require all cooling devices to store an ACPI device pointer inside
> their drvdata field in order to support ACPI,
Cooling devices don't store ACPI device pointers in struct
thermal_cooling_device objects, ACPI cooling drivers do, and there are
two reasons to do that: (1) to associate a given struct
thermal_cooling_device with an ACPI handle and (2) to let
acpi_thermal_should_bind_cdev() know that the cooling device is
present and functional.
This can be changed to store an ACPI handle in struct
thermal_cooling_device and acpi_thermal_should_bind_cdev() may just
verify that the device is there by itself.
> we must use a more generic approach.
I'm not sure what use case you are talking about.
Surely, devices with no representation in the ACPI namespace cannot be
bound to ACPI thermal zones. For devices that have a representation
in the ACPI namespace, storing an ACPI handle in devdata should not be
a problem.
> I was thinking about using the acpi_handle of the parent device instead of messing
> with the drvdata field, but this only works if the parent device pointer of the
> cooling device is populated.
>
> (Cooling devices without a parent device would then be ignored by the ACPI thermal
> zone driver, as such cooling devices cannot be linked to ACPI).
It can be arranged this way, but what's the practical difference?
Anyone who creates a struct thermal_cooling_device and can set its
parent pointer to a device with an ACPI companion, may as well set its
devdata to point to that companion directly - or to its ACPI handle if
that's preferred.
Am 27.11.25 um 18:41 schrieb Rafael J. Wysocki:
> On Sat, Nov 22, 2025 at 3:18 PM Armin Wolf <W_Armin@gmx.de> wrote:
>> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki:
>>
>>> On Thu, Nov 20, 2025 at 4:41 AM Armin Wolf <W_Armin@gmx.de> wrote:
>>>> Drivers registering thermal zone/cooling devices are currently unable
>>>> to tell the thermal core what parent device the new thermal zone/
>>>> cooling device should have, potentially causing issues with suspend
>>>> ordering
>>> This is one potential class of problems that may arise, but I would
>>> like to see a real example of this.
>>>
>>> As it stands today, thermal_class has no PM callbacks, so there are no
>>> callback execution ordering issues with devices in that class and what
>>> other suspend/resume ordering issues are there?
>> Correct, that is why i said "potentially".
>>
>>> Also, the suspend and resume of thermal zones is handled via PM
>>> notifiers. Is there a problem with this?
>> The problem with PM notifiers is that thermal zones stop working even before
>> user space is frozen. Freezing user space might take a lot of time, so having
>> no thermal management during this period is less than ideal.
> This can be addressed by doing thermal zone suspend after freezing
> tasks and before starting to suspend devices. Accordingly, thermal
> zones could be resumed after resuming devices and before thawing
> tasks. That should not be an overly complex change to make.
AFAIK this is only possible by using dev_pm_ops, the PM notifier is triggered before
tasks are frozen during suspend and after they are thawed during resume.
Using dev_pm_ops would also ensure that thermal zone devices are resumed after their
parent devices, so no additional changes inside the pm core would be needed.
>> This problem would not occur when using dev_pm_ops, as thermal zones would be
>> suspended after user space has been frozen successfully. Additionally, when using
>> dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already mandates
>> that no new devices (including thermal zones and cooling devices) be registered during
>> a suspend/resume cycle.
>>
>> Replacing the PM notifiers with dev_pm_ops would of course be a optimization with
>> its own patch series.
> Honestly, I don't see much benefit from using dev_pm_ops for thermal
> zone devices and cooling devices. Moreover, I actually think that
> they could be "no PM" devices that are not even put on the
> suspend-resume device list. Technically, they are just interfaces on
> top of some other devices allowing the user space to interact with the
> latter and combining different pieces described by the platform
> firmware. They by themselves have no PM capabilities.
Correct, thermal zone devices are virtual devices representing thermal management
aspects of the underlying parent device. This however does not mean that thermal zone
devices have no PM capabilities, because they contain state. Some part of this state
(namely TZ_STATE_FLAG_SUSPENDED and TZ_STATE_FLAG_RESUMING) is affected by power management,
so we should tell the device core about this by using dev_pm_ops instead of the PM notifier.
>>>> and making it impossible for user space applications to
>>>> associate a given thermal zone device with its parent device.
>>> Why does user space need to know the parent of a given cooling device
>>> or thermal zone?
>> Lets say that we have two thermal zones registered by two instances of the
>> Intel Wifi driver. User space is currently unable to find out which thermal zone
>> belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string ("iwlwifi[0-X]").
> But the "belong" part is not quite well defined here. I think that
> what user space needs to know is what devices are located in a given
> thermal zone, isn't it? Knowing the parent doesn't necessarily
> address this.
The device exposing a given thermal zone device is not always a member of the thermal zone itself.
In case of the Intel Wifi adapters, the individual Wifi adapters are indeed members of the thermal zone
associated with their thermal zone device. But thermal zones created thru a system management controller
for example might only cover devices like the CPUs and GPUs, not the system management controller device itself.
The parent device of a child device is the upstream device of the child device. The connection between parent
and child can be physical (SMBus controller (parent) -> i2c device (child)) or purely logical
(PCI device (parent) -> thermal zone device (child)). There exists a parent-child dependency between a parent
and a child device (the child device cannot function without its parent being operational), and user space
might want to be able to discover such dependencies.
>> This problem would be solved once we populate the parent device pointer inside the thermal zone
>> device, as user space can simply look at the "device" symlink to determine the parent device behind
>> a given thermal zone device.
> I'm not convinced about this.
>
>> Additionally, being able to access the acpi_handle of the parent device will be necessary for the
>> ACPI thermal zone driver to support cooling devices other than ACPI fans and ACPI processors.
> I guess by the "parent" you mean the device represented in the ACPI
> namespace by a ThermalZone object, right? But this is not the same as
> the "parent" in the Wifi driver context, is it?
In the context of a ACPI ThermalZone, the parent device of the thermal cooling device would currently
be the ACPI device bound to the "thermal" ACPI driver. In the context of the Intel Wifi card, the parent
device would be PCI device bound to the corresponding Intel Wifi driver.
I think you misunderstood what kind of parent device i was referring to. You likely though that i was referring
to the parent device of the ACPI ThermalZone, right? That however is not the case , with "parent device" i was
referring to the device responsible for creating a given struct thermal_zone_device instance.
>>>> This patch series aims to fix this issue by extending the functions
>>>> used to register thermal zone/cooling devices to also accept a parent
>>>> device pointer. The first six patches convert all functions used for
>>>> registering cooling devices, while the functions used for registering
>>>> thermal zone devices are converted by the remaining two patches.
>>>>
>>>> I tested this series on various devices containing (among others):
>>>> - ACPI thermal zones
>>>> - ACPI processor devices
>>>> - PCIe cooling devices
>>>> - Intel Wifi card
>>>> - Intel powerclamp
>>>> - Intel TCC cooling
>>> What exactly did you do to test it?
>> I tested:
>> - the thermal zone temperature readout
>> - correctness of the new sysfs links
>> - suspend/resume
>>
>> I also verified that ACPI thermal zones still bind with the ACPI fans.
> I see, thanks.
>
>>>> I also compile-tested the remaining affected drivers, however i would
>>>> still be happy if the relevant maintainers (especially those of the
>>>> mellanox ethernet switch driver) could take a quick glance at the
>>>> code and verify that i am using the correct device as the parent
>>>> device.
>>> I think that the above paragraph is not relevant any more?
>> You are right, however i originally meant to CC the mellanox maintainers as
>> i was a bit unsure about the changes i made to their driver. I will rework
>> this section in the next revision and CC the mellanox maintainers.
>>
>>>> This work is also necessary for extending the ACPI thermal zone driver
>>>> to support the _TZD ACPI object in the future.
>>> I'm still unsure why _TZD support requires the ability to set a
>>> thermal zone parent device.
>> _TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI fans
>> and ACPI processors, like ACPI batteries.
> No, it is not for cooling devices if my reading of the specification
> is correct. It says:
>
> "_TZD (Thermal Zone Devices)
>
> This optional object evaluates to a package of device names. Each name
> corresponds to a device in the ACPI namespace that is associated with
> the thermal zone. The temperature reported by the thermal zone is
> roughly correspondent to that of each of the devices."
>
> And then
>
> "The list of devices returned by the control method need not be a
> complete and absolute list of devices affected by the thermal zone.
> However, the package should at least contain the devices that would
> uniquely identify where this thermal zone is located in the machine.
> For example, a thermal zone in a docking station should include a
> device in the docking station, a thermal zone for the CD-ROM bay,
> should include the CD-ROM."
>
> So IIUC this is a list of devices allowing the location of the thermal
> zone to be figured out. There's nothing about cooling in this
> definition.
Using _TZD to figure out the location of a given thermal zone is another usage
of this ACPI control method, but lets take a look at section 11.6:
- If _PSV is defined then either the _PSL or _TZD objects must exist. The _PSL and _TZD objects may both exist.
- If _PSV is defined and _PSL is not defined then at least one device in thermal zone, as indicated by either the
_TZD device list or devices’ _TZM objects, must support device performance states.
So according to my understanding, _TZD can also be used to discover additional cooling devices used for passive cooling.
This makes sense as _PSL is defined to only contain processor objects (see section 11.4.10), so _TZD can act like an
extension of _PSL for things like ACPI control method batteries (see 10.2.2.12).
Microsoft also follows this approach (see https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/design-guide
section "Thermally managed devices" paragraph "Processor aggregator").
>> This however will currently not work as
>> the ACPI thermal zone driver uses the private drvdata of the cooling device to
>> determine if said cooling device should bind. This only works for ACPI fans and
>> processors due to the fact that those drivers store a ACPI device pointer inside
>> drvdata, something the ACPI thermal zone expects.
> I'm not sure I understand the above.
>
> There is a list of ACPI device handles per trip point, as returned by
> either _PSL or _ALx. Devices whose handles are in that list will be
> bound to the thermal zone, so long as there are struct acpi_device
> objects representing them which is verified with the help of the
> devdata field in struct thermal_cooling_device.
AFAIK devdata is meant to be used by the thermal zone device callbacks to access the state
container struct of the associated device driver instance. Assuming that a given device driver
will populate devdata with a pointer to is ACPI companion device is an implementation-specific
detail that does not apply to all cooling device implementations. It just so happens that the
ACPI processor and fan driver do this, likely because they where designed specifically to work
with the ACPI thermal zone driver.
The documentation of thermal_cooling_device_register() even describes devdata as "device private data", so any meaning of devdata purely depends on the
given device driver.
> IOW, cooling device drivers that create struct thermal_cooling_device
> objects representing them are expected to set devdata in those objects
> to point to struct acpi_device objects corresponding to their ACPI
> handles, but in principle acpi_thermal_should_bind_cdev() might as
> well just use the handles themselves. It just needs to know that
> there is a cooling driver on the other side of the ACPI handle.
>
> The point is that a cooling device to be bound to an ACPI thermal zone
> needs an ACPI handle in the first place to be listed in _PSL or _ALx.
Correct, i merely change the way the ACPI thermal zone driver retrieves the
ACPI handle associated with a given cooling device.
>> As we cannot require all cooling devices to store an ACPI device pointer inside
>> their drvdata field in order to support ACPI,
> Cooling devices don't store ACPI device pointers in struct
> thermal_cooling_device objects, ACPI cooling drivers do, and there are
> two reasons to do that: (1) to associate a given struct
> thermal_cooling_device with an ACPI handle and (2) to let
> acpi_thermal_should_bind_cdev() know that the cooling device is
> present and functional.
>
> This can be changed to store an ACPI handle in struct
> thermal_cooling_device and acpi_thermal_should_bind_cdev() may just
> verify that the device is there by itself.
I can of course extend thermal_cooling_device_register() to accept a fwnode_handle that
can be used for both ACPI and OF based cooling device identification, if this is what you
prefer.
This patch series would then turn into a cleanup series, focusing on properly adding
thermal zone devices and cooling devices into the global device hierarchy.
>> we must use a more generic approach.
> I'm not sure what use case you are talking about.
>
> Surely, devices with no representation in the ACPI namespace cannot be
> bound to ACPI thermal zones. For devices that have a representation
> in the ACPI namespace, storing an ACPI handle in devdata should not be
> a problem.
See my above explanations for details, drvdata is defined to hold device private data,
nothing more.
>> I was thinking about using the acpi_handle of the parent device instead of messing
>> with the drvdata field, but this only works if the parent device pointer of the
>> cooling device is populated.
>>
>> (Cooling devices without a parent device would then be ignored by the ACPI thermal
>> zone driver, as such cooling devices cannot be linked to ACPI).
> It can be arranged this way, but what's the practical difference?
> Anyone who creates a struct thermal_cooling_device and can set its
> parent pointer to a device with an ACPI companion, may as well set its
> devdata to point to that companion directly - or to its ACPI handle if
> that's preferred.
Yes, but this would require explicit support for ACPI in every driver that registers cooling devices.
Using the parent device to retrieve the acpi_handle or allowing all drivers to just submit a fwnode_handle
of their choice when creating a cooling device will fix this.
Thanks,
Armin Wolf
On Thu, Nov 27, 2025 at 9:06 PM Armin Wolf <W_Armin@gmx.de> wrote:
>
> Am 27.11.25 um 18:41 schrieb Rafael J. Wysocki:
>
> > On Sat, Nov 22, 2025 at 3:18 PM Armin Wolf <W_Armin@gmx.de> wrote:
> >> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki:
> >>
> >>> On Thu, Nov 20, 2025 at 4:41 AM Armin Wolf <W_Armin@gmx.de> wrote:
> >>>> Drivers registering thermal zone/cooling devices are currently unable
> >>>> to tell the thermal core what parent device the new thermal zone/
> >>>> cooling device should have, potentially causing issues with suspend
> >>>> ordering
> >>> This is one potential class of problems that may arise, but I would
> >>> like to see a real example of this.
> >>>
> >>> As it stands today, thermal_class has no PM callbacks, so there are no
> >>> callback execution ordering issues with devices in that class and what
> >>> other suspend/resume ordering issues are there?
> >> Correct, that is why i said "potentially".
> >>
> >>> Also, the suspend and resume of thermal zones is handled via PM
> >>> notifiers. Is there a problem with this?
> >> The problem with PM notifiers is that thermal zones stop working even before
> >> user space is frozen. Freezing user space might take a lot of time, so having
> >> no thermal management during this period is less than ideal.
> > This can be addressed by doing thermal zone suspend after freezing
> > tasks and before starting to suspend devices. Accordingly, thermal
> > zones could be resumed after resuming devices and before thawing
> > tasks. That should not be an overly complex change to make.
>
> AFAIK this is only possible by using dev_pm_ops,
Of course it is not the case.
For example, thermal_pm_notify_prepare() could be called directly from
dpm_prepare() and thermal_pm_notify_complete() could be called
directly from dpm_complete() (which would require switching over
thermal to a non-freezable workqueue).
> the PM notifier is triggered before tasks are frozen during suspend and after they are thawed during resume.
I know that.
> Using dev_pm_ops would also ensure that thermal zone devices are resumed after their
> parent devices, so no additional changes inside the pm core would be needed.
Not really. thermal_pm_suspended needs to be set and cleared from somewhere.
> >> This problem would not occur when using dev_pm_ops, as thermal zones would be
> >> suspended after user space has been frozen successfully. Additionally, when using
> >> dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already mandates
> >> that no new devices (including thermal zones and cooling devices) be registered during
> >> a suspend/resume cycle.
> >>
> >> Replacing the PM notifiers with dev_pm_ops would of course be a optimization with
> >> its own patch series.
> >
> > Honestly, I don't see much benefit from using dev_pm_ops for thermal
> > zone devices and cooling devices. Moreover, I actually think that
> > they could be "no PM" devices that are not even put on the
> > suspend-resume device list. Technically, they are just interfaces on
> > top of some other devices allowing the user space to interact with the
> > latter and combining different pieces described by the platform
> > firmware. They by themselves have no PM capabilities.
>
> Correct, thermal zone devices are virtual devices representing thermal management
> aspects of the underlying parent device. This however does not mean that thermal zone
> devices have no PM capabilities, because they contain state. Some part of this state
> (namely TZ_STATE_FLAG_SUSPENDED and TZ_STATE_FLAG_RESUMING) is affected by power management,
> so we should tell the device core about this by using dev_pm_ops instead of the PM notifier.
Changing the zone state to anything different from TZ_STATE_READY
causes __thermal_zone_device_update() to do nothing and this is the
whole "suspend". It does not need to be done from a PM callback and I
see no reason why doing it from a PM callback would be desirable.
Sorry.
Apart from the above, TZ_STATE_FLAG_SUSPENDED and
TZ_STATE_FLAG_RESUMING are only used for coordination between
thermal_zone_pm_prepare(), thermal_zone_device_resume() and
thermal_zone_pm_complete(), so this is not a state anything other then
the specific thermal zone in question cares about.
Moreover, resuming a thermal zone before resuming any cooling devices
bound to it would almost certainly break things and I'm not sure how
you would make that work with dev_pm_ops. BTW, using device links for
this is not an option as far as I'm concerned.
> >>>> and making it impossible for user space applications to
> >>>> associate a given thermal zone device with its parent device.
> >>> Why does user space need to know the parent of a given cooling device
> >>> or thermal zone?
> >> Lets say that we have two thermal zones registered by two instances of the
> >> Intel Wifi driver. User space is currently unable to find out which thermal zone
> >> belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string ("iwlwifi[0-X]").
> > But the "belong" part is not quite well defined here. I think that
> > what user space needs to know is what devices are located in a given
> > thermal zone, isn't it? Knowing the parent doesn't necessarily
> > address this.
>
> The device exposing a given thermal zone device is not always a member of the thermal zone itself.
> In case of the Intel Wifi adapters, the individual Wifi adapters are indeed members of the thermal zone
> associated with their thermal zone device. But thermal zones created thru a system management controller
> for example might only cover devices like the CPUs and GPUs, not the system management controller device itself.
Well, exactly.
> The parent device of a child device is the upstream device of the child device. The connection between parent
> and child can be physical (SMBus controller (parent) -> i2c device (child)) or purely logical
> (PCI device (parent) -> thermal zone device (child)). There exists a parent-child dependency between a parent
> and a child device (the child device cannot function without its parent being operational), and user space
> might want to be able to discover such dependencies.
But this needs to be consistent.
If the parent of one thermal zone represents the device affected by it
and the parent of another thermal zone represents something else, user
space will need platform-specific knowledge to figure this out, which
is the case today. Without consistency, this is just not useful.
> >> This problem would be solved once we populate the parent device pointer inside the thermal zone
> >> device, as user space can simply look at the "device" symlink to determine the parent device behind
> >> a given thermal zone device.
> > I'm not convinced about this.
> >
> >> Additionally, being able to access the acpi_handle of the parent device will be necessary for the
> >> ACPI thermal zone driver to support cooling devices other than ACPI fans and ACPI processors.
> > I guess by the "parent" you mean the device represented in the ACPI
> > namespace by a ThermalZone object, right? But this is not the same as
> > the "parent" in the Wifi driver context, is it?
>
> In the context of a ACPI ThermalZone, the parent device of the thermal cooling device would currently
> be the ACPI device bound to the "thermal" ACPI driver. In the context of the Intel Wifi card, the parent
> device would be PCI device bound to the corresponding Intel Wifi driver.
>
> I think you misunderstood what kind of parent device i was referring to. You likely though that i was referring
> to the parent device of the ACPI ThermalZone, right?
No. I thought that you were referring to the ACPI ThermalZone itself.
Or rather, a platform device associated with the ACPI ThermalZone
(that is, the device the ACPI ThermalZone in the ACPI_COMPAION() of).
> That however is not the case , with "parent device" i was
> referring to the device responsible for creating a given struct thermal_zone_device instance.
So I was not confused.
> >>>> This patch series aims to fix this issue by extending the functions
> >>>> used to register thermal zone/cooling devices to also accept a parent
> >>>> device pointer. The first six patches convert all functions used for
> >>>> registering cooling devices, while the functions used for registering
> >>>> thermal zone devices are converted by the remaining two patches.
> >>>>
> >>>> I tested this series on various devices containing (among others):
> >>>> - ACPI thermal zones
> >>>> - ACPI processor devices
> >>>> - PCIe cooling devices
> >>>> - Intel Wifi card
> >>>> - Intel powerclamp
> >>>> - Intel TCC cooling
> >>> What exactly did you do to test it?
> >> I tested:
> >> - the thermal zone temperature readout
> >> - correctness of the new sysfs links
> >> - suspend/resume
> >>
> >> I also verified that ACPI thermal zones still bind with the ACPI fans.
> > I see, thanks.
> >
> >>>> I also compile-tested the remaining affected drivers, however i would
> >>>> still be happy if the relevant maintainers (especially those of the
> >>>> mellanox ethernet switch driver) could take a quick glance at the
> >>>> code and verify that i am using the correct device as the parent
> >>>> device.
> >>> I think that the above paragraph is not relevant any more?
> >> You are right, however i originally meant to CC the mellanox maintainers as
> >> i was a bit unsure about the changes i made to their driver. I will rework
> >> this section in the next revision and CC the mellanox maintainers.
> >>
> >>>> This work is also necessary for extending the ACPI thermal zone driver
> >>>> to support the _TZD ACPI object in the future.
> >>> I'm still unsure why _TZD support requires the ability to set a
> >>> thermal zone parent device.
> >> _TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI fans
> >> and ACPI processors, like ACPI batteries.
> > No, it is not for cooling devices if my reading of the specification
> > is correct. It says:
> >
> > "_TZD (Thermal Zone Devices)
> >
> > This optional object evaluates to a package of device names. Each name
> > corresponds to a device in the ACPI namespace that is associated with
> > the thermal zone. The temperature reported by the thermal zone is
> > roughly correspondent to that of each of the devices."
> >
> > And then
> >
> > "The list of devices returned by the control method need not be a
> > complete and absolute list of devices affected by the thermal zone.
> > However, the package should at least contain the devices that would
> > uniquely identify where this thermal zone is located in the machine.
> > For example, a thermal zone in a docking station should include a
> > device in the docking station, a thermal zone for the CD-ROM bay,
> > should include the CD-ROM."
> >
> > So IIUC this is a list of devices allowing the location of the thermal
> > zone to be figured out. There's nothing about cooling in this
> > definition.
>
> Using _TZD to figure out the location of a given thermal zone is another usage
> of this ACPI control method, but lets take a look at section 11.6:
>
> - If _PSV is defined then either the _PSL or _TZD objects must exist. The _PSL and _TZD objects may both exist.
> - If _PSV is defined and _PSL is not defined then at least one device in thermal zone, as indicated by either the
> _TZD device list or devices’ _TZM objects, must support device performance states.
>
> So according to my understanding, _TZD can also be used to discover additional cooling devices used for passive cooling.
But it doesn't actually say how those "device performance states" are
supposed to be used for cooling, does it?
> This makes sense as _PSL is defined to only contain processor objects (see section 11.4.10), so _TZD can act like an
> extension of _PSL for things like ACPI control method batteries (see 10.2.2.12).
But not everything in _TZD needs to be a potential "cooling device"
and how you'll decide which one is?
> Microsoft also follows this approach (see https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/design-guide
> section "Thermally managed devices" paragraph "Processor aggregator").
Interesting.
I agree that it would make sense to follow them because there will be
platform dependencies on that, if there aren't already.
> >> This however will currently not work as
> >> the ACPI thermal zone driver uses the private drvdata of the cooling device to
> >> determine if said cooling device should bind. This only works for ACPI fans and
> >> processors due to the fact that those drivers store a ACPI device pointer inside
> >> drvdata, something the ACPI thermal zone expects.
> > I'm not sure I understand the above.
> >
> > There is a list of ACPI device handles per trip point, as returned by
> > either _PSL or _ALx. Devices whose handles are in that list will be
> > bound to the thermal zone, so long as there are struct acpi_device
> > objects representing them which is verified with the help of the
> > devdata field in struct thermal_cooling_device.
>
> AFAIK devdata is meant to be used by the thermal zone device callbacks to access the state
> container struct of the associated device driver instance. Assuming that a given device driver
> will populate devdata with a pointer to is ACPI companion device is an implementation-specific
> detail that does not apply to all cooling device implementations. It just so happens that the
> ACPI processor and fan driver do this, likely because they where designed specifically to work
> with the ACPI thermal zone driver.
>
> The documentation of thermal_cooling_device_register() even describes devdata as "device private data", so any meaning of devdata purely depends on the
> given device driver.
Yes, and these particular drivers decide to store a pointer to struct
acpi_device in it.
But this is not super important, they might as well set the
ACPI_COMPANION() of the cooling device to the corresponding struct
acpi_device and the ACPI thermal driver might use that information.
I'm not opposed to using parents for this purpose, but it doesn't
change the big picture that the ACPI thermal driver will need to know
the ACPI handle corresponding to each cooling device.
If you want to use _TZD instead of or in addition to _PSL for this, it
doesn't change much here, it's just another list of ACPI handles, so
saying that parents are needed for supporting this is not exactly
accurate IMV.
> > IOW, cooling device drivers that create struct thermal_cooling_device
> > objects representing them are expected to set devdata in those objects
> > to point to struct acpi_device objects corresponding to their ACPI
> > handles, but in principle acpi_thermal_should_bind_cdev() might as
> > well just use the handles themselves. It just needs to know that
> > there is a cooling driver on the other side of the ACPI handle.
> >
> > The point is that a cooling device to be bound to an ACPI thermal zone
> > needs an ACPI handle in the first place to be listed in _PSL or _ALx.
>
> Correct, i merely change the way the ACPI thermal zone driver retrieves the
> ACPI handle associated with a given cooling device.
Right.
> >> As we cannot require all cooling devices to store an ACPI device pointer inside
> >> their drvdata field in order to support ACPI,
> > Cooling devices don't store ACPI device pointers in struct
> > thermal_cooling_device objects, ACPI cooling drivers do, and there are
> > two reasons to do that: (1) to associate a given struct
> > thermal_cooling_device with an ACPI handle and (2) to let
> > acpi_thermal_should_bind_cdev() know that the cooling device is
> > present and functional.
> >
> > This can be changed to store an ACPI handle in struct
> > thermal_cooling_device and acpi_thermal_should_bind_cdev() may just
> > verify that the device is there by itself.
>
> I can of course extend thermal_cooling_device_register() to accept a fwnode_handle that
> can be used for both ACPI and OF based cooling device identification, if this is what you
> prefer.
I'm not sure about this ATM and see below.
> This patch series would then turn into a cleanup series, focusing on properly adding
> thermal zone devices and cooling devices into the global device hierarchy.
I'd prefer to do one thing at a time though.
If you want cooling devices to get parents, fine. I'm not
fundamentally opposed to that idea, but let's have clear rules for
device drivers on how to set those parents for the sake of
consistency.
As for the ACPI case, one rule that I want to be followed (as already
stated multiple times) is that a struct acpi_device can only be a
parent of another struct acpi_device. This means that the parent of a
cooling device needs to be a platform device or similar representing
the actual device that will be used for implementing the cooling.
A separate question is how acpi_thermal_should_bind_cdev() will match
cooling devices with the ACPI handles coming from _PSL, _ALx, _TZD
etc. and the rule can be that it will look at the ACPI_COMPANION() of
the parent of the given cooling device.
> >> we must use a more generic approach.
> > I'm not sure what use case you are talking about.
> >
> > Surely, devices with no representation in the ACPI namespace cannot be
> > bound to ACPI thermal zones. For devices that have a representation
> > in the ACPI namespace, storing an ACPI handle in devdata should not be
> > a problem.
>
> See my above explanations for details, drvdata is defined to hold device private data,
> nothing more.
This is related to the discussion below.
> >> I was thinking about using the acpi_handle of the parent device instead of messing
> >> with the drvdata field, but this only works if the parent device pointer of the
> >> cooling device is populated.
> >>
> >> (Cooling devices without a parent device would then be ignored by the ACPI thermal
> >> zone driver, as such cooling devices cannot be linked to ACPI).
> > It can be arranged this way, but what's the practical difference?
> > Anyone who creates a struct thermal_cooling_device and can set its
> > parent pointer to a device with an ACPI companion, may as well set its
> > devdata to point to that companion directly - or to its ACPI handle if
> > that's preferred.
>
> Yes, but this would require explicit support for ACPI in every driver that registers cooling devices.
So you want to have generic drivers that may work on ACPI platforms
and on DT platforms to be able to create cooling devices for use with
ACPI thermal zones. Well, had you started the whole discussion with
this statement, it would have been much easier to understand your
point.
> Using the parent device to retrieve the acpi_handle or allowing all drivers to just submit a fwnode_handle
> of their choice when creating a cooling device will fix this.
If you go the parents route, this is an important consideration for
the rules on how to set those parents. Namely, they would need to be
set so that the fwnode_handle of the parent could be used for binding
the cooling device to a thermal zone either on ACPI or on DT systems.
Of course, there are also cooling devices whose parents will not have
an fwnode_handle and they would still need to work in this brave new
world.
Am 27.11.25 um 22:46 schrieb Rafael J. Wysocki:
> On Thu, Nov 27, 2025 at 9:06 PM Armin Wolf <W_Armin@gmx.de> wrote:
>> Am 27.11.25 um 18:41 schrieb Rafael J. Wysocki:
>>
>>> On Sat, Nov 22, 2025 at 3:18 PM Armin Wolf <W_Armin@gmx.de> wrote:
>>>> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki:
>>>>
>>>>> On Thu, Nov 20, 2025 at 4:41 AM Armin Wolf <W_Armin@gmx.de> wrote:
>>>>>> Drivers registering thermal zone/cooling devices are currently unable
>>>>>> to tell the thermal core what parent device the new thermal zone/
>>>>>> cooling device should have, potentially causing issues with suspend
>>>>>> ordering
>>>>> This is one potential class of problems that may arise, but I would
>>>>> like to see a real example of this.
>>>>>
>>>>> As it stands today, thermal_class has no PM callbacks, so there are no
>>>>> callback execution ordering issues with devices in that class and what
>>>>> other suspend/resume ordering issues are there?
>>>> Correct, that is why i said "potentially".
>>>>
>>>>> Also, the suspend and resume of thermal zones is handled via PM
>>>>> notifiers. Is there a problem with this?
>>>> The problem with PM notifiers is that thermal zones stop working even before
>>>> user space is frozen. Freezing user space might take a lot of time, so having
>>>> no thermal management during this period is less than ideal.
>>> This can be addressed by doing thermal zone suspend after freezing
>>> tasks and before starting to suspend devices. Accordingly, thermal
>>> zones could be resumed after resuming devices and before thawing
>>> tasks. That should not be an overly complex change to make.
>> AFAIK this is only possible by using dev_pm_ops,
> Of course it is not the case.
>
> For example, thermal_pm_notify_prepare() could be called directly from
> dpm_prepare() and thermal_pm_notify_complete() could be called
> directly from dpm_complete() (which would require switching over
> thermal to a non-freezable workqueue).
>
>> the PM notifier is triggered before tasks are frozen during suspend and after they are thawed during resume.
> I know that.
>
>> Using dev_pm_ops would also ensure that thermal zone devices are resumed after their
>> parent devices, so no additional changes inside the pm core would be needed.
> Not really. thermal_pm_suspended needs to be set and cleared from somewhere.
thermal_pm_suspended is only used for initializing the state of thermal zone devices registered
during a suspend transition. This is currently needed because user space tasks are still operational
when the PM notifier callback is called, so we have to be prepared for new thermal zone devices
being registered in the middle of a suspend transition.
When using dev_pm_ops, new thermal zone devices cannot appear in the middle of a suspend transition,
as this would violate the restraints of the device core regarding device registrations. Because of
this thermal_pm_suspended can be removed once we use dev_pm_ops.
>>>> This problem would not occur when using dev_pm_ops, as thermal zones would be
>>>> suspended after user space has been frozen successfully. Additionally, when using
>>>> dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already mandates
>>>> that no new devices (including thermal zones and cooling devices) be registered during
>>>> a suspend/resume cycle.
>>>>
>>>> Replacing the PM notifiers with dev_pm_ops would of course be a optimization with
>>>> its own patch series.
>>> Honestly, I don't see much benefit from using dev_pm_ops for thermal
>>> zone devices and cooling devices. Moreover, I actually think that
>>> they could be "no PM" devices that are not even put on the
>>> suspend-resume device list. Technically, they are just interfaces on
>>> top of some other devices allowing the user space to interact with the
>>> latter and combining different pieces described by the platform
>>> firmware. They by themselves have no PM capabilities.
>> Correct, thermal zone devices are virtual devices representing thermal management
>> aspects of the underlying parent device. This however does not mean that thermal zone
>> devices have no PM capabilities, because they contain state. Some part of this state
>> (namely TZ_STATE_FLAG_SUSPENDED and TZ_STATE_FLAG_RESUMING) is affected by power management,
>> so we should tell the device core about this by using dev_pm_ops instead of the PM notifier.
> Changing the zone state to anything different from TZ_STATE_READY
> causes __thermal_zone_device_update() to do nothing and this is the
> whole "suspend". It does not need to be done from a PM callback and I
> see no reason why doing it from a PM callback would be desirable.
> Sorry.
>
> Apart from the above, TZ_STATE_FLAG_SUSPENDED and
> TZ_STATE_FLAG_RESUMING are only used for coordination between
> thermal_zone_pm_prepare(), thermal_zone_device_resume() and
> thermal_zone_pm_complete(), so this is not a state anything other then
> the specific thermal zone in question cares about.
AFAIK this is not completely true, once TZ_STATE_FLAG_SUSPENDED is set,
__thermal_zone_device_update() will stop polling said device (as you said).
This is not only important for the thermal zone device itself, but also for
the underlying device driver as he has to make sure that the thermal zone
callbacks do not access an already suspended hardware device.
> Moreover, resuming a thermal zone before resuming any cooling devices
> bound to it would almost certainly break things and I'm not sure how
> you would make that work with dev_pm_ops. BTW, using device links for
> this is not an option as far as I'm concerned.
We could simply resume the thermal zones inside the .complete callback.
The cooling devices will already be operational when said complete callback
is being called by the PM core, due to the resume phase having been completed
already.
>>>>>> and making it impossible for user space applications to
>>>>>> associate a given thermal zone device with its parent device.
>>>>> Why does user space need to know the parent of a given cooling device
>>>>> or thermal zone?
>>>> Lets say that we have two thermal zones registered by two instances of the
>>>> Intel Wifi driver. User space is currently unable to find out which thermal zone
>>>> belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string ("iwlwifi[0-X]").
>>> But the "belong" part is not quite well defined here. I think that
>>> what user space needs to know is what devices are located in a given
>>> thermal zone, isn't it? Knowing the parent doesn't necessarily
>>> address this.
>> The device exposing a given thermal zone device is not always a member of the thermal zone itself.
>> In case of the Intel Wifi adapters, the individual Wifi adapters are indeed members of the thermal zone
>> associated with their thermal zone device. But thermal zones created thru a system management controller
>> for example might only cover devices like the CPUs and GPUs, not the system management controller device itself.
> Well, exactly.
>
>> The parent device of a child device is the upstream device of the child device. The connection between parent
>> and child can be physical (SMBus controller (parent) -> i2c device (child)) or purely logical
>> (PCI device (parent) -> thermal zone device (child)). There exists a parent-child dependency between a parent
>> and a child device (the child device cannot function without its parent being operational), and user space
>> might want to be able to discover such dependencies.
> But this needs to be consistent.
>
> If the parent of one thermal zone represents the device affected by it
> and the parent of another thermal zone represents something else, user
> space will need platform-specific knowledge to figure this out, which
> is the case today. Without consistency, this is just not useful.
I think there is a misunderstanding here, describing the devices affected by a given thermal zone
has nothing to do with the parent-child dependency between a thermal zone device and its parent device.
This parent-child dependency only states that:
"This thermal zone device is descended from this parent device. It might thus depend on
said parent device to be operational."
>>>> This problem would be solved once we populate the parent device pointer inside the thermal zone
>>>> device, as user space can simply look at the "device" symlink to determine the parent device behind
>>>> a given thermal zone device.
>>> I'm not convinced about this.
>>>
>>>> Additionally, being able to access the acpi_handle of the parent device will be necessary for the
>>>> ACPI thermal zone driver to support cooling devices other than ACPI fans and ACPI processors.
>>> I guess by the "parent" you mean the device represented in the ACPI
>>> namespace by a ThermalZone object, right? But this is not the same as
>>> the "parent" in the Wifi driver context, is it?
>> In the context of a ACPI ThermalZone, the parent device of the thermal cooling device would currently
>> be the ACPI device bound to the "thermal" ACPI driver. In the context of the Intel Wifi card, the parent
>> device would be PCI device bound to the corresponding Intel Wifi driver.
>>
>> I think you misunderstood what kind of parent device i was referring to. You likely though that i was referring
>> to the parent device of the ACPI ThermalZone, right?
> No. I thought that you were referring to the ACPI ThermalZone itself.
> Or rather, a platform device associated with the ACPI ThermalZone
> (that is, the device the ACPI ThermalZone in the ACPI_COMPAION() of).
That is correct.
>> That however is not the case , with "parent device" i was
>> referring to the device responsible for creating a given struct thermal_zone_device instance.
> So I was not confused.
>
>>>>>> This patch series aims to fix this issue by extending the functions
>>>>>> used to register thermal zone/cooling devices to also accept a parent
>>>>>> device pointer. The first six patches convert all functions used for
>>>>>> registering cooling devices, while the functions used for registering
>>>>>> thermal zone devices are converted by the remaining two patches.
>>>>>>
>>>>>> I tested this series on various devices containing (among others):
>>>>>> - ACPI thermal zones
>>>>>> - ACPI processor devices
>>>>>> - PCIe cooling devices
>>>>>> - Intel Wifi card
>>>>>> - Intel powerclamp
>>>>>> - Intel TCC cooling
>>>>> What exactly did you do to test it?
>>>> I tested:
>>>> - the thermal zone temperature readout
>>>> - correctness of the new sysfs links
>>>> - suspend/resume
>>>>
>>>> I also verified that ACPI thermal zones still bind with the ACPI fans.
>>> I see, thanks.
>>>
>>>>>> I also compile-tested the remaining affected drivers, however i would
>>>>>> still be happy if the relevant maintainers (especially those of the
>>>>>> mellanox ethernet switch driver) could take a quick glance at the
>>>>>> code and verify that i am using the correct device as the parent
>>>>>> device.
>>>>> I think that the above paragraph is not relevant any more?
>>>> You are right, however i originally meant to CC the mellanox maintainers as
>>>> i was a bit unsure about the changes i made to their driver. I will rework
>>>> this section in the next revision and CC the mellanox maintainers.
>>>>
>>>>>> This work is also necessary for extending the ACPI thermal zone driver
>>>>>> to support the _TZD ACPI object in the future.
>>>>> I'm still unsure why _TZD support requires the ability to set a
>>>>> thermal zone parent device.
>>>> _TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI fans
>>>> and ACPI processors, like ACPI batteries.
>>> No, it is not for cooling devices if my reading of the specification
>>> is correct. It says:
>>>
>>> "_TZD (Thermal Zone Devices)
>>>
>>> This optional object evaluates to a package of device names. Each name
>>> corresponds to a device in the ACPI namespace that is associated with
>>> the thermal zone. The temperature reported by the thermal zone is
>>> roughly correspondent to that of each of the devices."
>>>
>>> And then
>>>
>>> "The list of devices returned by the control method need not be a
>>> complete and absolute list of devices affected by the thermal zone.
>>> However, the package should at least contain the devices that would
>>> uniquely identify where this thermal zone is located in the machine.
>>> For example, a thermal zone in a docking station should include a
>>> device in the docking station, a thermal zone for the CD-ROM bay,
>>> should include the CD-ROM."
>>>
>>> So IIUC this is a list of devices allowing the location of the thermal
>>> zone to be figured out. There's nothing about cooling in this
>>> definition.
>> Using _TZD to figure out the location of a given thermal zone is another usage
>> of this ACPI control method, but lets take a look at section 11.6:
>>
>> - If _PSV is defined then either the _PSL or _TZD objects must exist. The _PSL and _TZD objects may both exist.
>> - If _PSV is defined and _PSL is not defined then at least one device in thermal zone, as indicated by either the
>> _TZD device list or devices’ _TZM objects, must support device performance states.
>>
>> So according to my understanding, _TZD can also be used to discover additional cooling devices used for passive cooling.
> But it doesn't actually say how those "device performance states" are
> supposed to be used for cooling, does it?
Well, ACPI specifies how passive cooling should be done using percentage values between 0% and 100%,
so this part is actually specified.
>> This makes sense as _PSL is defined to only contain processor objects (see section 11.4.10), so _TZD can act like an
>> extension of _PSL for things like ACPI control method batteries (see 10.2.2.12).
> But not everything in _TZD needs to be a potential "cooling device"
> and how you'll decide which one is?
Devices in _TZD that have no cooling capability will simply never register any cooling devices. This means that
the .should_bind callback of the ACPI thermal zone will never see those devices. Only devices in _TZD that also
have the ability for (passive) cooling will register a cooling device, so only those devices will end up with
the .should_bind callback of the ACPI thermal zone.
The ACPI thermal zone treats _TZD as a list of ACPI handles. If some of those handles are unused, then this is
totally fine.
>> Microsoft also follows this approach (see https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/design-guide
>> section "Thermally managed devices" paragraph "Processor aggregator").
> Interesting.
>
> I agree that it would make sense to follow them because there will be
> platform dependencies on that, if there aren't already.
My primary goal is to improve the Linux thermal subsystem to be as powerful as
the Windows thermal subsystem. This means that we must stop viewing _PSL, _ALx and _TZD
as something that only works with a predefined set of devices. Instead we must view
_PSL, _ALx and _TZD as something similar to the cooling-maps used for connecting
thermal zones and cooling devices on OF-based systems.
>>>> This however will currently not work as
>>>> the ACPI thermal zone driver uses the private drvdata of the cooling device to
>>>> determine if said cooling device should bind. This only works for ACPI fans and
>>>> processors due to the fact that those drivers store a ACPI device pointer inside
>>>> drvdata, something the ACPI thermal zone expects.
>>> I'm not sure I understand the above.
>>>
>>> There is a list of ACPI device handles per trip point, as returned by
>>> either _PSL or _ALx. Devices whose handles are in that list will be
>>> bound to the thermal zone, so long as there are struct acpi_device
>>> objects representing them which is verified with the help of the
>>> devdata field in struct thermal_cooling_device.
>> AFAIK devdata is meant to be used by the thermal zone device callbacks to access the state
>> container struct of the associated device driver instance. Assuming that a given device driver
>> will populate devdata with a pointer to is ACPI companion device is an implementation-specific
>> detail that does not apply to all cooling device implementations. It just so happens that the
>> ACPI processor and fan driver do this, likely because they where designed specifically to work
>> with the ACPI thermal zone driver.
>>
>> The documentation of thermal_cooling_device_register() even describes devdata as "device private data", so any meaning of devdata purely depends on the
>> given device driver.
> Yes, and these particular drivers decide to store a pointer to struct
> acpi_device in it.
>
> But this is not super important, they might as well set the
> ACPI_COMPANION() of the cooling device to the corresponding struct
> acpi_device and the ACPI thermal driver might use that information.
>
> I'm not opposed to using parents for this purpose, but it doesn't
> change the big picture that the ACPI thermal driver will need to know
> the ACPI handle corresponding to each cooling device.
>
> If you want to use _TZD instead of or in addition to _PSL for this, it
> doesn't change much here, it's just another list of ACPI handles, so
> saying that parents are needed for supporting this is not exactly
> accurate IMV.
My idea was something like this:
/* Cooling devices without a parent device cannot be referenced using ACPI */
if (!cdev->device.parent)
return false;
/* Not all devices are described inside the ACPI tables */
acpi_handle cdev_handle = ACPI_HANDLE(cdev->device.parent);
if (!cdev_handle)
return false;
for (i = 0; i < acpi_trip->devices.count; i++) {
acpi_handle handle = acpi_trip->devices.handles[i];
if (handle == cdev_handle)
return true;
}
This only works if the parent device pointer of the cooling device is populated.
>>> IOW, cooling device drivers that create struct thermal_cooling_device
>>> objects representing them are expected to set devdata in those objects
>>> to point to struct acpi_device objects corresponding to their ACPI
>>> handles, but in principle acpi_thermal_should_bind_cdev() might as
>>> well just use the handles themselves. It just needs to know that
>>> there is a cooling driver on the other side of the ACPI handle.
>>>
>>> The point is that a cooling device to be bound to an ACPI thermal zone
>>> needs an ACPI handle in the first place to be listed in _PSL or _ALx.
>> Correct, i merely change the way the ACPI thermal zone driver retrieves the
>> ACPI handle associated with a given cooling device.
> Right.
>
>>>> As we cannot require all cooling devices to store an ACPI device pointer inside
>>>> their drvdata field in order to support ACPI,
>>> Cooling devices don't store ACPI device pointers in struct
>>> thermal_cooling_device objects, ACPI cooling drivers do, and there are
>>> two reasons to do that: (1) to associate a given struct
>>> thermal_cooling_device with an ACPI handle and (2) to let
>>> acpi_thermal_should_bind_cdev() know that the cooling device is
>>> present and functional.
>>>
>>> This can be changed to store an ACPI handle in struct
>>> thermal_cooling_device and acpi_thermal_should_bind_cdev() may just
>>> verify that the device is there by itself.
>> I can of course extend thermal_cooling_device_register() to accept a fwnode_handle that
>> can be used for both ACPI and OF based cooling device identification, if this is what you
>> prefer.
> I'm not sure about this ATM and see below.
>
>> This patch series would then turn into a cleanup series, focusing on properly adding
>> thermal zone devices and cooling devices into the global device hierarchy.
> I'd prefer to do one thing at a time though.
>
> If you want cooling devices to get parents, fine. I'm not
> fundamentally opposed to that idea, but let's have clear rules for
> device drivers on how to set those parents for the sake of
> consistency.
>
> As for the ACPI case, one rule that I want to be followed (as already
> stated multiple times) is that a struct acpi_device can only be a
> parent of another struct acpi_device. This means that the parent of a
> cooling device needs to be a platform device or similar representing
> the actual device that will be used for implementing the cooling.
OK.
> A separate question is how acpi_thermal_should_bind_cdev() will match
> cooling devices with the ACPI handles coming from _PSL, _ALx, _TZD
> etc. and the rule can be that it will look at the ACPI_COMPANION() of
> the parent of the given cooling device.
See the example code i pasted above, the whole matching is done using ACPI handles,
so we can completely leave ACPI_COMPANION() out of this.
>>>> we must use a more generic approach.
>>> I'm not sure what use case you are talking about.
>>>
>>> Surely, devices with no representation in the ACPI namespace cannot be
>>> bound to ACPI thermal zones. For devices that have a representation
>>> in the ACPI namespace, storing an ACPI handle in devdata should not be
>>> a problem.
>> See my above explanations for details, drvdata is defined to hold device private data,
>> nothing more.
> This is related to the discussion below.
>
>>>> I was thinking about using the acpi_handle of the parent device instead of messing
>>>> with the drvdata field, but this only works if the parent device pointer of the
>>>> cooling device is populated.
>>>>
>>>> (Cooling devices without a parent device would then be ignored by the ACPI thermal
>>>> zone driver, as such cooling devices cannot be linked to ACPI).
>>> It can be arranged this way, but what's the practical difference?
>>> Anyone who creates a struct thermal_cooling_device and can set its
>>> parent pointer to a device with an ACPI companion, may as well set its
>>> devdata to point to that companion directly - or to its ACPI handle if
>>> that's preferred.
>> Yes, but this would require explicit support for ACPI in every driver that registers cooling devices.
> So you want to have generic drivers that may work on ACPI platforms
> and on DT platforms to be able to create cooling devices for use with
> ACPI thermal zones. Well, had you started the whole discussion with
> this statement, it would have been much easier to understand your
> point.
Sorry for the messy discussion, i intended to have two separate patch series. This one was meant to
simply be a preparation, with the important changes inside the ACPI thermal zone driver being implemented
with the second patch series.
That was also the reason why i send this series as an RFC.
>> Using the parent device to retrieve the acpi_handle or allowing all drivers to just submit a fwnode_handle
>> of their choice when creating a cooling device will fix this.
> If you go the parents route, this is an important consideration for
> the rules on how to set those parents. Namely, they would need to be
> set so that the fwnode_handle of the parent could be used for binding
> the cooling device to a thermal zone either on ACPI or on DT systems.
>
> Of course, there are also cooling devices whose parents will not have
> an fwnode_handle and they would still need to work in this brave new
> world.
>
True, i did not think of that. In this case extending thermal_of_cooling_device_register() and friends to accept
a generic fwnode_handle instead of a OF-specific device_node would make more sense. Most drivers can simply
pass the result of dev_fwnode() instead of dev->of_node, only those that support multiple cooling device child
nodes would need additional work to also support ACPI.
Basically, thermal_of_get_cooling_spec() could handle the fwnode_handle in the following manner:
if (cooling_spec.np->fwnode != cdev->fwnode)
return false;
And the ACPI thermal zone driver could then simply use ACPI_HANDLE_FWNODE() to retrieve the ACPI handle from
the fwnode_handle (together with a NULL check of course).
If you are OK with this approach, i will forget about the whole parent device stuff for now and focus on extending
(devm_)thermal_of_cooling_device_register(). There are some additional changes needed for reliably associating
cooling devices to ACPI trip points using fwnode handles, but those are not that intrusive.
What do you think?
Thanks,
Armin Wolf
On Fri, Nov 28, 2025 at 12:50 AM Armin Wolf <W_Armin@gmx.de> wrote:
>
> Am 27.11.25 um 22:46 schrieb Rafael J. Wysocki:
>
> > On Thu, Nov 27, 2025 at 9:06 PM Armin Wolf <W_Armin@gmx.de> wrote:
> >> Am 27.11.25 um 18:41 schrieb Rafael J. Wysocki:
> >>
> >>> On Sat, Nov 22, 2025 at 3:18 PM Armin Wolf <W_Armin@gmx.de> wrote:
> >>>> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki:
> >>>>
> >>>>> On Thu, Nov 20, 2025 at 4:41 AM Armin Wolf <W_Armin@gmx.de> wrote:
> >>>>>> Drivers registering thermal zone/cooling devices are currently unable
> >>>>>> to tell the thermal core what parent device the new thermal zone/
> >>>>>> cooling device should have, potentially causing issues with suspend
> >>>>>> ordering
> >>>>> This is one potential class of problems that may arise, but I would
> >>>>> like to see a real example of this.
> >>>>>
> >>>>> As it stands today, thermal_class has no PM callbacks, so there are no
> >>>>> callback execution ordering issues with devices in that class and what
> >>>>> other suspend/resume ordering issues are there?
> >>>> Correct, that is why i said "potentially".
> >>>>
> >>>>> Also, the suspend and resume of thermal zones is handled via PM
> >>>>> notifiers. Is there a problem with this?
> >>>> The problem with PM notifiers is that thermal zones stop working even before
> >>>> user space is frozen. Freezing user space might take a lot of time, so having
> >>>> no thermal management during this period is less than ideal.
> >>> This can be addressed by doing thermal zone suspend after freezing
> >>> tasks and before starting to suspend devices. Accordingly, thermal
> >>> zones could be resumed after resuming devices and before thawing
> >>> tasks. That should not be an overly complex change to make.
> >> AFAIK this is only possible by using dev_pm_ops,
> > Of course it is not the case.
> >
> > For example, thermal_pm_notify_prepare() could be called directly from
> > dpm_prepare() and thermal_pm_notify_complete() could be called
> > directly from dpm_complete() (which would require switching over
> > thermal to a non-freezable workqueue).
> >
> >> the PM notifier is triggered before tasks are frozen during suspend and after they are thawed during resume.
> > I know that.
> >
> >> Using dev_pm_ops would also ensure that thermal zone devices are resumed after their
> >> parent devices, so no additional changes inside the pm core would be needed.
> > Not really. thermal_pm_suspended needs to be set and cleared from somewhere.
>
> thermal_pm_suspended is only used for initializing the state of thermal zone devices registered
> during a suspend transition. This is currently needed because user space tasks are still operational
> when the PM notifier callback is called, so we have to be prepared for new thermal zone devices
> being registered in the middle of a suspend transition.
>
> When using dev_pm_ops, new thermal zone devices cannot appear in the middle of a suspend transition,
> as this would violate the restraints of the device core regarding device registrations. Because of
> this thermal_pm_suspended can be removed once we use dev_pm_ops.
No, we are not going to use dev_pm_ops for thermal zone suspend. That
would be adding complexity just for the sake of it IMV.
> >>>> This problem would not occur when using dev_pm_ops, as thermal zones would be
> >>>> suspended after user space has been frozen successfully. Additionally, when using
> >>>> dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already mandates
> >>>> that no new devices (including thermal zones and cooling devices) be registered during
> >>>> a suspend/resume cycle.
> >>>>
> >>>> Replacing the PM notifiers with dev_pm_ops would of course be a optimization with
> >>>> its own patch series.
> >>> Honestly, I don't see much benefit from using dev_pm_ops for thermal
> >>> zone devices and cooling devices. Moreover, I actually think that
> >>> they could be "no PM" devices that are not even put on the
> >>> suspend-resume device list. Technically, they are just interfaces on
> >>> top of some other devices allowing the user space to interact with the
> >>> latter and combining different pieces described by the platform
> >>> firmware. They by themselves have no PM capabilities.
> >> Correct, thermal zone devices are virtual devices representing thermal management
> >> aspects of the underlying parent device. This however does not mean that thermal zone
> >> devices have no PM capabilities, because they contain state. Some part of this state
> >> (namely TZ_STATE_FLAG_SUSPENDED and TZ_STATE_FLAG_RESUMING) is affected by power management,
> >> so we should tell the device core about this by using dev_pm_ops instead of the PM notifier.
> > Changing the zone state to anything different from TZ_STATE_READY
> > causes __thermal_zone_device_update() to do nothing and this is the
> > whole "suspend". It does not need to be done from a PM callback and I
> > see no reason why doing it from a PM callback would be desirable.
> > Sorry.
> >
> > Apart from the above, TZ_STATE_FLAG_SUSPENDED and
> > TZ_STATE_FLAG_RESUMING are only used for coordination between
> > thermal_zone_pm_prepare(), thermal_zone_device_resume() and
> > thermal_zone_pm_complete(), so this is not a state anything other then
> > the specific thermal zone in question cares about.
>
> AFAIK this is not completely true, once TZ_STATE_FLAG_SUSPENDED is set,
> __thermal_zone_device_update() will stop polling said device (as you said).
> This is not only important for the thermal zone device itself, but also for
> the underlying device driver as he has to make sure that the thermal zone
> callbacks do not access an already suspended hardware device.
Which callbacks in particular do you mean? That would need to be
something that is not called from either
__thermal_zone_device_update() because it is going to bail out early
or user space because it is frozen. So what is left?
Seriously, if the only problem with the existing thermal zone suspend
and resume is that they are done from a PM notifier, I don't think
addressing this requires involving dev_pm_ops and it will be very hard
to convince me otherwise.
> > Moreover, resuming a thermal zone before resuming any cooling devices
> > bound to it would almost certainly break things and I'm not sure how
> > you would make that work with dev_pm_ops. BTW, using device links for
> > this is not an option as far as I'm concerned.
>
> We could simply resume the thermal zones inside the .complete callback.
> The cooling devices will already be operational when said complete callback
> is being called by the PM core, due to the resume phase having been completed
> already.
But then it would be synchronous, wouldn't it? Or if you want to
start async handling from a .complete callback then I don't see a
point.
> >>>>>> and making it impossible for user space applications to
> >>>>>> associate a given thermal zone device with its parent device.
> >>>>> Why does user space need to know the parent of a given cooling device
> >>>>> or thermal zone?
> >>>> Lets say that we have two thermal zones registered by two instances of the
> >>>> Intel Wifi driver. User space is currently unable to find out which thermal zone
> >>>> belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string ("iwlwifi[0-X]").
> >>> But the "belong" part is not quite well defined here. I think that
> >>> what user space needs to know is what devices are located in a given
> >>> thermal zone, isn't it? Knowing the parent doesn't necessarily
> >>> address this.
> >> The device exposing a given thermal zone device is not always a member of the thermal zone itself.
> >> In case of the Intel Wifi adapters, the individual Wifi adapters are indeed members of the thermal zone
> >> associated with their thermal zone device. But thermal zones created thru a system management controller
> >> for example might only cover devices like the CPUs and GPUs, not the system management controller device itself.
> > Well, exactly.
> >
> >> The parent device of a child device is the upstream device of the child device. The connection between parent
> >> and child can be physical (SMBus controller (parent) -> i2c device (child)) or purely logical
> >> (PCI device (parent) -> thermal zone device (child)). There exists a parent-child dependency between a parent
> >> and a child device (the child device cannot function without its parent being operational), and user space
> >> might want to be able to discover such dependencies.
> > But this needs to be consistent.
> >
> > If the parent of one thermal zone represents the device affected by it
> > and the parent of another thermal zone represents something else, user
> > space will need platform-specific knowledge to figure this out, which
> > is the case today. Without consistency, this is just not useful.
>
> I think there is a misunderstanding here, describing the devices affected by a given thermal zone
> has nothing to do with the parent-child dependency between a thermal zone device and its parent device.
> This parent-child dependency only states that:
>
> "This thermal zone device is descended from this parent device. It might thus depend on
> said parent device to be operational."
So you are postulating that the parent of a thermal zone should be the
device providing the thermal sensor or otherwise a mechanism allowing
temperature to be read. That is precise enough as far as I'm
concerned.
> >>>> This problem would be solved once we populate the parent device pointer inside the thermal zone
> >>>> device, as user space can simply look at the "device" symlink to determine the parent device behind
> >>>> a given thermal zone device.
> >>> I'm not convinced about this.
> >>>
> >>>> Additionally, being able to access the acpi_handle of the parent device will be necessary for the
> >>>> ACPI thermal zone driver to support cooling devices other than ACPI fans and ACPI processors.
> >>> I guess by the "parent" you mean the device represented in the ACPI
> >>> namespace by a ThermalZone object, right? But this is not the same as
> >>> the "parent" in the Wifi driver context, is it?
> >> In the context of a ACPI ThermalZone, the parent device of the thermal cooling device would currently
> >> be the ACPI device bound to the "thermal" ACPI driver. In the context of the Intel Wifi card, the parent
> >> device would be PCI device bound to the corresponding Intel Wifi driver.
> >>
> >> I think you misunderstood what kind of parent device i was referring to. You likely though that i was referring
> >> to the parent device of the ACPI ThermalZone, right?
> > No. I thought that you were referring to the ACPI ThermalZone itself.
> > Or rather, a platform device associated with the ACPI ThermalZone
> > (that is, the device the ACPI ThermalZone in the ACPI_COMPAION() of).
>
> That is correct.
>
> >> That however is not the case , with "parent device" i was
> >> referring to the device responsible for creating a given struct thermal_zone_device instance.
> > So I was not confused.
> >
> >>>>>> This patch series aims to fix this issue by extending the functions
> >>>>>> used to register thermal zone/cooling devices to also accept a parent
> >>>>>> device pointer. The first six patches convert all functions used for
> >>>>>> registering cooling devices, while the functions used for registering
> >>>>>> thermal zone devices are converted by the remaining two patches.
> >>>>>>
> >>>>>> I tested this series on various devices containing (among others):
> >>>>>> - ACPI thermal zones
> >>>>>> - ACPI processor devices
> >>>>>> - PCIe cooling devices
> >>>>>> - Intel Wifi card
> >>>>>> - Intel powerclamp
> >>>>>> - Intel TCC cooling
> >>>>> What exactly did you do to test it?
> >>>> I tested:
> >>>> - the thermal zone temperature readout
> >>>> - correctness of the new sysfs links
> >>>> - suspend/resume
> >>>>
> >>>> I also verified that ACPI thermal zones still bind with the ACPI fans.
> >>> I see, thanks.
> >>>
> >>>>>> I also compile-tested the remaining affected drivers, however i would
> >>>>>> still be happy if the relevant maintainers (especially those of the
> >>>>>> mellanox ethernet switch driver) could take a quick glance at the
> >>>>>> code and verify that i am using the correct device as the parent
> >>>>>> device.
> >>>>> I think that the above paragraph is not relevant any more?
> >>>> You are right, however i originally meant to CC the mellanox maintainers as
> >>>> i was a bit unsure about the changes i made to their driver. I will rework
> >>>> this section in the next revision and CC the mellanox maintainers.
> >>>>
> >>>>>> This work is also necessary for extending the ACPI thermal zone driver
> >>>>>> to support the _TZD ACPI object in the future.
> >>>>> I'm still unsure why _TZD support requires the ability to set a
> >>>>> thermal zone parent device.
> >>>> _TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI fans
> >>>> and ACPI processors, like ACPI batteries.
> >>> No, it is not for cooling devices if my reading of the specification
> >>> is correct. It says:
> >>>
> >>> "_TZD (Thermal Zone Devices)
> >>>
> >>> This optional object evaluates to a package of device names. Each name
> >>> corresponds to a device in the ACPI namespace that is associated with
> >>> the thermal zone. The temperature reported by the thermal zone is
> >>> roughly correspondent to that of each of the devices."
> >>>
> >>> And then
> >>>
> >>> "The list of devices returned by the control method need not be a
> >>> complete and absolute list of devices affected by the thermal zone.
> >>> However, the package should at least contain the devices that would
> >>> uniquely identify where this thermal zone is located in the machine.
> >>> For example, a thermal zone in a docking station should include a
> >>> device in the docking station, a thermal zone for the CD-ROM bay,
> >>> should include the CD-ROM."
> >>>
> >>> So IIUC this is a list of devices allowing the location of the thermal
> >>> zone to be figured out. There's nothing about cooling in this
> >>> definition.
> >> Using _TZD to figure out the location of a given thermal zone is another usage
> >> of this ACPI control method, but lets take a look at section 11.6:
> >>
> >> - If _PSV is defined then either the _PSL or _TZD objects must exist. The _PSL and _TZD objects may both exist.
> >> - If _PSV is defined and _PSL is not defined then at least one device in thermal zone, as indicated by either the
> >> _TZD device list or devices’ _TZM objects, must support device performance states.
> >>
> >> So according to my understanding, _TZD can also be used to discover additional cooling devices used for passive cooling.
> > But it doesn't actually say how those "device performance states" are
> > supposed to be used for cooling, does it?
>
> Well, ACPI specifies how passive cooling should be done using percentage values between 0% and 100%,
> so this part is actually specified.
If you refer to Section 11.1.5, this is based on _TC1 and _TC2 and has
limitations. So you are saying that Section 11.1.5 should be extended
to _TZD devices. Is this also there in the MSFT document?
> >> This makes sense as _PSL is defined to only contain processor objects (see section 11.4.10), so _TZD can act like an
> >> extension of _PSL for things like ACPI control method batteries (see 10.2.2.12).
> > But not everything in _TZD needs to be a potential "cooling device"
> > and how you'll decide which one is?
>
> Devices in _TZD that have no cooling capability will simply never register any cooling devices. This means that
> the .should_bind callback of the ACPI thermal zone will never see those devices. Only devices in _TZD that also
> have the ability for (passive) cooling will register a cooling device, so only those devices will end up with
> the .should_bind callback of the ACPI thermal zone.
>
> The ACPI thermal zone treats _TZD as a list of ACPI handles. If some of those handles are unused, then this is
> totally fine.
>
> >> Microsoft also follows this approach (see https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/design-guide
> >> section "Thermally managed devices" paragraph "Processor aggregator").
> > Interesting.
> >
> > I agree that it would make sense to follow them because there will be
> > platform dependencies on that, if there aren't already.
>
> My primary goal is to improve the Linux thermal subsystem to be as powerful as
> the Windows thermal subsystem. This means that we must stop viewing _PSL, _ALx and _TZD
> as something that only works with a predefined set of devices. Instead we must view
> _PSL, _ALx and _TZD as something similar to the cooling-maps used for connecting
> thermal zones and cooling devices on OF-based systems.
>
> >>>> This however will currently not work as
> >>>> the ACPI thermal zone driver uses the private drvdata of the cooling device to
> >>>> determine if said cooling device should bind. This only works for ACPI fans and
> >>>> processors due to the fact that those drivers store a ACPI device pointer inside
> >>>> drvdata, something the ACPI thermal zone expects.
> >>> I'm not sure I understand the above.
> >>>
> >>> There is a list of ACPI device handles per trip point, as returned by
> >>> either _PSL or _ALx. Devices whose handles are in that list will be
> >>> bound to the thermal zone, so long as there are struct acpi_device
> >>> objects representing them which is verified with the help of the
> >>> devdata field in struct thermal_cooling_device.
> >> AFAIK devdata is meant to be used by the thermal zone device callbacks to access the state
> >> container struct of the associated device driver instance. Assuming that a given device driver
> >> will populate devdata with a pointer to is ACPI companion device is an implementation-specific
> >> detail that does not apply to all cooling device implementations. It just so happens that the
> >> ACPI processor and fan driver do this, likely because they where designed specifically to work
> >> with the ACPI thermal zone driver.
> >>
> >> The documentation of thermal_cooling_device_register() even describes devdata as "device private data", so any meaning of devdata purely depends on the
> >> given device driver.
> > Yes, and these particular drivers decide to store a pointer to struct
> > acpi_device in it.
> >
> > But this is not super important, they might as well set the
> > ACPI_COMPANION() of the cooling device to the corresponding struct
> > acpi_device and the ACPI thermal driver might use that information.
> >
> > I'm not opposed to using parents for this purpose, but it doesn't
> > change the big picture that the ACPI thermal driver will need to know
> > the ACPI handle corresponding to each cooling device.
> >
> > If you want to use _TZD instead of or in addition to _PSL for this, it
> > doesn't change much here, it's just another list of ACPI handles, so
> > saying that parents are needed for supporting this is not exactly
> > accurate IMV.
>
> My idea was something like this:
>
> /* Cooling devices without a parent device cannot be referenced using ACPI */
> if (!cdev->device.parent)
> return false;
>
> /* Not all devices are described inside the ACPI tables */
> acpi_handle cdev_handle = ACPI_HANDLE(cdev->device.parent);
> if (!cdev_handle)
> return false;
>
> for (i = 0; i < acpi_trip->devices.count; i++) {
> acpi_handle handle = acpi_trip->devices.handles[i];
>
> if (handle == cdev_handle)
> return true;
> }
>
> This only works if the parent device pointer of the cooling device is populated.
Sure, but it looks reasonable to me.
> >>> IOW, cooling device drivers that create struct thermal_cooling_device
> >>> objects representing them are expected to set devdata in those objects
> >>> to point to struct acpi_device objects corresponding to their ACPI
> >>> handles, but in principle acpi_thermal_should_bind_cdev() might as
> >>> well just use the handles themselves. It just needs to know that
> >>> there is a cooling driver on the other side of the ACPI handle.
> >>>
> >>> The point is that a cooling device to be bound to an ACPI thermal zone
> >>> needs an ACPI handle in the first place to be listed in _PSL or _ALx.
> >> Correct, i merely change the way the ACPI thermal zone driver retrieves the
> >> ACPI handle associated with a given cooling device.
> > Right.
> >
> >>>> As we cannot require all cooling devices to store an ACPI device pointer inside
> >>>> their drvdata field in order to support ACPI,
> >>> Cooling devices don't store ACPI device pointers in struct
> >>> thermal_cooling_device objects, ACPI cooling drivers do, and there are
> >>> two reasons to do that: (1) to associate a given struct
> >>> thermal_cooling_device with an ACPI handle and (2) to let
> >>> acpi_thermal_should_bind_cdev() know that the cooling device is
> >>> present and functional.
> >>>
> >>> This can be changed to store an ACPI handle in struct
> >>> thermal_cooling_device and acpi_thermal_should_bind_cdev() may just
> >>> verify that the device is there by itself.
> >> I can of course extend thermal_cooling_device_register() to accept a fwnode_handle that
> >> can be used for both ACPI and OF based cooling device identification, if this is what you
> >> prefer.
> > I'm not sure about this ATM and see below.
> >
> >> This patch series would then turn into a cleanup series, focusing on properly adding
> >> thermal zone devices and cooling devices into the global device hierarchy.
> > I'd prefer to do one thing at a time though.
> >
> > If you want cooling devices to get parents, fine. I'm not
> > fundamentally opposed to that idea, but let's have clear rules for
> > device drivers on how to set those parents for the sake of
> > consistency.
> >
> > As for the ACPI case, one rule that I want to be followed (as already
> > stated multiple times) is that a struct acpi_device can only be a
> > parent of another struct acpi_device. This means that the parent of a
> > cooling device needs to be a platform device or similar representing
> > the actual device that will be used for implementing the cooling.
>
> OK.
>
> > A separate question is how acpi_thermal_should_bind_cdev() will match
> > cooling devices with the ACPI handles coming from _PSL, _ALx, _TZD
> > etc. and the rule can be that it will look at the ACPI_COMPANION() of
> > the parent of the given cooling device.
>
> See the example code i pasted above, the whole matching is done using ACPI handles,
> so we can completely leave ACPI_COMPANION() out of this.
ACPI_HANDLE() is a wrapper around ACPI_COMPANION() so your code
effectively does what I said above.
> >>>> we must use a more generic approach.
> >>> I'm not sure what use case you are talking about.
> >>>
> >>> Surely, devices with no representation in the ACPI namespace cannot be
> >>> bound to ACPI thermal zones. For devices that have a representation
> >>> in the ACPI namespace, storing an ACPI handle in devdata should not be
> >>> a problem.
> >> See my above explanations for details, drvdata is defined to hold device private data,
> >> nothing more.
> > This is related to the discussion below.
> >
> >>>> I was thinking about using the acpi_handle of the parent device instead of messing
> >>>> with the drvdata field, but this only works if the parent device pointer of the
> >>>> cooling device is populated.
> >>>>
> >>>> (Cooling devices without a parent device would then be ignored by the ACPI thermal
> >>>> zone driver, as such cooling devices cannot be linked to ACPI).
> >>> It can be arranged this way, but what's the practical difference?
> >>> Anyone who creates a struct thermal_cooling_device and can set its
> >>> parent pointer to a device with an ACPI companion, may as well set its
> >>> devdata to point to that companion directly - or to its ACPI handle if
> >>> that's preferred.
> >> Yes, but this would require explicit support for ACPI in every driver that registers cooling devices.
> > So you want to have generic drivers that may work on ACPI platforms
> > and on DT platforms to be able to create cooling devices for use with
> > ACPI thermal zones. Well, had you started the whole discussion with
> > this statement, it would have been much easier to understand your
> > point.
>
> Sorry for the messy discussion, i intended to have two separate patch series. This one was meant to
> simply be a preparation, with the important changes inside the ACPI thermal zone driver being implemented
> with the second patch series.
>
> That was also the reason why i send this series as an RFC.
>
> >> Using the parent device to retrieve the acpi_handle or allowing all drivers to just submit a fwnode_handle
> >> of their choice when creating a cooling device will fix this.
> > If you go the parents route, this is an important consideration for
> > the rules on how to set those parents. Namely, they would need to be
> > set so that the fwnode_handle of the parent could be used for binding
> > the cooling device to a thermal zone either on ACPI or on DT systems.
> >
> > Of course, there are also cooling devices whose parents will not have
> > an fwnode_handle and they would still need to work in this brave new
> > world.
> >
> True, i did not think of that. In this case extending thermal_of_cooling_device_register() and friends to accept
> a generic fwnode_handle instead of a OF-specific device_node would make more sense. Most drivers can simply
> pass the result of dev_fwnode() instead of dev->of_node, only those that support multiple cooling device child
> nodes would need additional work to also support ACPI.
>
> Basically, thermal_of_get_cooling_spec() could handle the fwnode_handle in the following manner:
>
> if (cooling_spec.np->fwnode != cdev->fwnode)
> return false;
>
> And the ACPI thermal zone driver could then simply use ACPI_HANDLE_FWNODE() to retrieve the ACPI handle from
> the fwnode_handle (together with a NULL check of course).
>
> If you are OK with this approach, i will forget about the whole parent device stuff for now and focus on extending
> (devm_)thermal_of_cooling_device_register(). There are some additional changes needed for reliably associating
> cooling devices to ACPI trip points using fwnode handles, but those are not that intrusive.
>
> What do you think?
One advantage of using parents is that it will help user space to
figure out connections between the abstract cooling devices and the
associated hardware or firmware entities. I think that this is an
important one.
It also doesn't prevent fwnode_handle from being used because the
fwnode_handle may just be stored in the parent. I like this more than
associating fwnode_handles directly with abstract cooling devices.
If the cooling device parent (that is, the provider of the cooling
mechanism used by it) does not have an fwnode_handle, then either it
needs to be driven directly from user space, or the driver creating a
thermal zone device needs to provide a specific .should_bind()
callback that will know what to look for.
Am 28.11.25 um 12:40 schrieb Rafael J. Wysocki:
> On Fri, Nov 28, 2025 at 12:50 AM Armin Wolf <W_Armin@gmx.de> wrote:
>> Am 27.11.25 um 22:46 schrieb Rafael J. Wysocki:
>>
>>> On Thu, Nov 27, 2025 at 9:06 PM Armin Wolf <W_Armin@gmx.de> wrote:
>>>> Am 27.11.25 um 18:41 schrieb Rafael J. Wysocki:
>>>>
>>>>> On Sat, Nov 22, 2025 at 3:18 PM Armin Wolf <W_Armin@gmx.de> wrote:
>>>>>> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki:
>>>>>>
>>>>>>> On Thu, Nov 20, 2025 at 4:41 AM Armin Wolf <W_Armin@gmx.de> wrote:
>>>>>>>> Drivers registering thermal zone/cooling devices are currently unable
>>>>>>>> to tell the thermal core what parent device the new thermal zone/
>>>>>>>> cooling device should have, potentially causing issues with suspend
>>>>>>>> ordering
>>>>>>> This is one potential class of problems that may arise, but I would
>>>>>>> like to see a real example of this.
>>>>>>>
>>>>>>> As it stands today, thermal_class has no PM callbacks, so there are no
>>>>>>> callback execution ordering issues with devices in that class and what
>>>>>>> other suspend/resume ordering issues are there?
>>>>>> Correct, that is why i said "potentially".
>>>>>>
>>>>>>> Also, the suspend and resume of thermal zones is handled via PM
>>>>>>> notifiers. Is there a problem with this?
>>>>>> The problem with PM notifiers is that thermal zones stop working even before
>>>>>> user space is frozen. Freezing user space might take a lot of time, so having
>>>>>> no thermal management during this period is less than ideal.
>>>>> This can be addressed by doing thermal zone suspend after freezing
>>>>> tasks and before starting to suspend devices. Accordingly, thermal
>>>>> zones could be resumed after resuming devices and before thawing
>>>>> tasks. That should not be an overly complex change to make.
>>>> AFAIK this is only possible by using dev_pm_ops,
>>> Of course it is not the case.
>>>
>>> For example, thermal_pm_notify_prepare() could be called directly from
>>> dpm_prepare() and thermal_pm_notify_complete() could be called
>>> directly from dpm_complete() (which would require switching over
>>> thermal to a non-freezable workqueue).
>>>
>>>> the PM notifier is triggered before tasks are frozen during suspend and after they are thawed during resume.
>>> I know that.
>>>
>>>> Using dev_pm_ops would also ensure that thermal zone devices are resumed after their
>>>> parent devices, so no additional changes inside the pm core would be needed.
>>> Not really. thermal_pm_suspended needs to be set and cleared from somewhere.
>> thermal_pm_suspended is only used for initializing the state of thermal zone devices registered
>> during a suspend transition. This is currently needed because user space tasks are still operational
>> when the PM notifier callback is called, so we have to be prepared for new thermal zone devices
>> being registered in the middle of a suspend transition.
>>
>> When using dev_pm_ops, new thermal zone devices cannot appear in the middle of a suspend transition,
>> as this would violate the restraints of the device core regarding device registrations. Because of
>> this thermal_pm_suspended can be removed once we use dev_pm_ops.
> No, we are not going to use dev_pm_ops for thermal zone suspend. That
> would be adding complexity just for the sake of it IMV.
OK, fine. I will forget about using dev_pm_ops for the thermal subsystem.
>>>>>> This problem would not occur when using dev_pm_ops, as thermal zones would be
>>>>>> suspended after user space has been frozen successfully. Additionally, when using
>>>>>> dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already mandates
>>>>>> that no new devices (including thermal zones and cooling devices) be registered during
>>>>>> a suspend/resume cycle.
>>>>>>
>>>>>> Replacing the PM notifiers with dev_pm_ops would of course be a optimization with
>>>>>> its own patch series.
>>>>> Honestly, I don't see much benefit from using dev_pm_ops for thermal
>>>>> zone devices and cooling devices. Moreover, I actually think that
>>>>> they could be "no PM" devices that are not even put on the
>>>>> suspend-resume device list. Technically, they are just interfaces on
>>>>> top of some other devices allowing the user space to interact with the
>>>>> latter and combining different pieces described by the platform
>>>>> firmware. They by themselves have no PM capabilities.
>>>> Correct, thermal zone devices are virtual devices representing thermal management
>>>> aspects of the underlying parent device. This however does not mean that thermal zone
>>>> devices have no PM capabilities, because they contain state. Some part of this state
>>>> (namely TZ_STATE_FLAG_SUSPENDED and TZ_STATE_FLAG_RESUMING) is affected by power management,
>>>> so we should tell the device core about this by using dev_pm_ops instead of the PM notifier.
>>> Changing the zone state to anything different from TZ_STATE_READY
>>> causes __thermal_zone_device_update() to do nothing and this is the
>>> whole "suspend". It does not need to be done from a PM callback and I
>>> see no reason why doing it from a PM callback would be desirable.
>>> Sorry.
>>>
>>> Apart from the above, TZ_STATE_FLAG_SUSPENDED and
>>> TZ_STATE_FLAG_RESUMING are only used for coordination between
>>> thermal_zone_pm_prepare(), thermal_zone_device_resume() and
>>> thermal_zone_pm_complete(), so this is not a state anything other then
>>> the specific thermal zone in question cares about.
>> AFAIK this is not completely true, once TZ_STATE_FLAG_SUSPENDED is set,
>> __thermal_zone_device_update() will stop polling said device (as you said).
>> This is not only important for the thermal zone device itself, but also for
>> the underlying device driver as he has to make sure that the thermal zone
>> callbacks do not access an already suspended hardware device.
> Which callbacks in particular do you mean? That would need to be
> something that is not called from either
> __thermal_zone_device_update() because it is going to bail out early
> or user space because it is frozen. So what is left?
>
> Seriously, if the only problem with the existing thermal zone suspend
> and resume is that they are done from a PM notifier, I don't think
> addressing this requires involving dev_pm_ops and it will be very hard
> to convince me otherwise.
I was referring to the callbacks inside struct thermal_zone_device_ops, but
those are indeed already covered by the current approach using the PM notifier.
Since you are happy with the current approach, i say that we forget about the
suggestion with the dev_pm_ops for now.
>>> Moreover, resuming a thermal zone before resuming any cooling devices
>>> bound to it would almost certainly break things and I'm not sure how
>>> you would make that work with dev_pm_ops. BTW, using device links for
>>> this is not an option as far as I'm concerned.
>> We could simply resume the thermal zones inside the .complete callback.
>> The cooling devices will already be operational when said complete callback
>> is being called by the PM core, due to the resume phase having been completed
>> already.
> But then it would be synchronous, wouldn't it? Or if you want to
> start async handling from a .complete callback then I don't see a
> point.
>
>>>>>>>> and making it impossible for user space applications to
>>>>>>>> associate a given thermal zone device with its parent device.
>>>>>>> Why does user space need to know the parent of a given cooling device
>>>>>>> or thermal zone?
>>>>>> Lets say that we have two thermal zones registered by two instances of the
>>>>>> Intel Wifi driver. User space is currently unable to find out which thermal zone
>>>>>> belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string ("iwlwifi[0-X]").
>>>>> But the "belong" part is not quite well defined here. I think that
>>>>> what user space needs to know is what devices are located in a given
>>>>> thermal zone, isn't it? Knowing the parent doesn't necessarily
>>>>> address this.
>>>> The device exposing a given thermal zone device is not always a member of the thermal zone itself.
>>>> In case of the Intel Wifi adapters, the individual Wifi adapters are indeed members of the thermal zone
>>>> associated with their thermal zone device. But thermal zones created thru a system management controller
>>>> for example might only cover devices like the CPUs and GPUs, not the system management controller device itself.
>>> Well, exactly.
>>>
>>>> The parent device of a child device is the upstream device of the child device. The connection between parent
>>>> and child can be physical (SMBus controller (parent) -> i2c device (child)) or purely logical
>>>> (PCI device (parent) -> thermal zone device (child)). There exists a parent-child dependency between a parent
>>>> and a child device (the child device cannot function without its parent being operational), and user space
>>>> might want to be able to discover such dependencies.
>>> But this needs to be consistent.
>>>
>>> If the parent of one thermal zone represents the device affected by it
>>> and the parent of another thermal zone represents something else, user
>>> space will need platform-specific knowledge to figure this out, which
>>> is the case today. Without consistency, this is just not useful.
>> I think there is a misunderstanding here, describing the devices affected by a given thermal zone
>> has nothing to do with the parent-child dependency between a thermal zone device and its parent device.
>> This parent-child dependency only states that:
>>
>> "This thermal zone device is descended from this parent device. It might thus depend on
>> said parent device to be operational."
> So you are postulating that the parent of a thermal zone should be the
> device providing the thermal sensor or otherwise a mechanism allowing
> temperature to be read. That is precise enough as far as I'm
> concerned.
Correct.
>>>>>> This problem would be solved once we populate the parent device pointer inside the thermal zone
>>>>>> device, as user space can simply look at the "device" symlink to determine the parent device behind
>>>>>> a given thermal zone device.
>>>>> I'm not convinced about this.
>>>>>
>>>>>> Additionally, being able to access the acpi_handle of the parent device will be necessary for the
>>>>>> ACPI thermal zone driver to support cooling devices other than ACPI fans and ACPI processors.
>>>>> I guess by the "parent" you mean the device represented in the ACPI
>>>>> namespace by a ThermalZone object, right? But this is not the same as
>>>>> the "parent" in the Wifi driver context, is it?
>>>> In the context of a ACPI ThermalZone, the parent device of the thermal cooling device would currently
>>>> be the ACPI device bound to the "thermal" ACPI driver. In the context of the Intel Wifi card, the parent
>>>> device would be PCI device bound to the corresponding Intel Wifi driver.
>>>>
>>>> I think you misunderstood what kind of parent device i was referring to. You likely though that i was referring
>>>> to the parent device of the ACPI ThermalZone, right?
>>> No. I thought that you were referring to the ACPI ThermalZone itself.
>>> Or rather, a platform device associated with the ACPI ThermalZone
>>> (that is, the device the ACPI ThermalZone in the ACPI_COMPAION() of).
>> That is correct.
>>
>>>> That however is not the case , with "parent device" i was
>>>> referring to the device responsible for creating a given struct thermal_zone_device instance.
>>> So I was not confused.
>>>
>>>>>>>> This patch series aims to fix this issue by extending the functions
>>>>>>>> used to register thermal zone/cooling devices to also accept a parent
>>>>>>>> device pointer. The first six patches convert all functions used for
>>>>>>>> registering cooling devices, while the functions used for registering
>>>>>>>> thermal zone devices are converted by the remaining two patches.
>>>>>>>>
>>>>>>>> I tested this series on various devices containing (among others):
>>>>>>>> - ACPI thermal zones
>>>>>>>> - ACPI processor devices
>>>>>>>> - PCIe cooling devices
>>>>>>>> - Intel Wifi card
>>>>>>>> - Intel powerclamp
>>>>>>>> - Intel TCC cooling
>>>>>>> What exactly did you do to test it?
>>>>>> I tested:
>>>>>> - the thermal zone temperature readout
>>>>>> - correctness of the new sysfs links
>>>>>> - suspend/resume
>>>>>>
>>>>>> I also verified that ACPI thermal zones still bind with the ACPI fans.
>>>>> I see, thanks.
>>>>>
>>>>>>>> I also compile-tested the remaining affected drivers, however i would
>>>>>>>> still be happy if the relevant maintainers (especially those of the
>>>>>>>> mellanox ethernet switch driver) could take a quick glance at the
>>>>>>>> code and verify that i am using the correct device as the parent
>>>>>>>> device.
>>>>>>> I think that the above paragraph is not relevant any more?
>>>>>> You are right, however i originally meant to CC the mellanox maintainers as
>>>>>> i was a bit unsure about the changes i made to their driver. I will rework
>>>>>> this section in the next revision and CC the mellanox maintainers.
>>>>>>
>>>>>>>> This work is also necessary for extending the ACPI thermal zone driver
>>>>>>>> to support the _TZD ACPI object in the future.
>>>>>>> I'm still unsure why _TZD support requires the ability to set a
>>>>>>> thermal zone parent device.
>>>>>> _TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI fans
>>>>>> and ACPI processors, like ACPI batteries.
>>>>> No, it is not for cooling devices if my reading of the specification
>>>>> is correct. It says:
>>>>>
>>>>> "_TZD (Thermal Zone Devices)
>>>>>
>>>>> This optional object evaluates to a package of device names. Each name
>>>>> corresponds to a device in the ACPI namespace that is associated with
>>>>> the thermal zone. The temperature reported by the thermal zone is
>>>>> roughly correspondent to that of each of the devices."
>>>>>
>>>>> And then
>>>>>
>>>>> "The list of devices returned by the control method need not be a
>>>>> complete and absolute list of devices affected by the thermal zone.
>>>>> However, the package should at least contain the devices that would
>>>>> uniquely identify where this thermal zone is located in the machine.
>>>>> For example, a thermal zone in a docking station should include a
>>>>> device in the docking station, a thermal zone for the CD-ROM bay,
>>>>> should include the CD-ROM."
>>>>>
>>>>> So IIUC this is a list of devices allowing the location of the thermal
>>>>> zone to be figured out. There's nothing about cooling in this
>>>>> definition.
>>>> Using _TZD to figure out the location of a given thermal zone is another usage
>>>> of this ACPI control method, but lets take a look at section 11.6:
>>>>
>>>> - If _PSV is defined then either the _PSL or _TZD objects must exist. The _PSL and _TZD objects may both exist.
>>>> - If _PSV is defined and _PSL is not defined then at least one device in thermal zone, as indicated by either the
>>>> _TZD device list or devices’ _TZM objects, must support device performance states.
>>>>
>>>> So according to my understanding, _TZD can also be used to discover additional cooling devices used for passive cooling.
>>> But it doesn't actually say how those "device performance states" are
>>> supposed to be used for cooling, does it?
>> Well, ACPI specifies how passive cooling should be done using percentage values between 0% and 100%,
>> so this part is actually specified.
> If you refer to Section 11.1.5, this is based on _TC1 and _TC2 and has
> limitations. So you are saying that Section 11.1.5 should be extended
> to _TZD devices. Is this also there in the MSFT document?
Looking at https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/design-guide
section "Thermal policy control" paragraph "Thermal manager in kernel", it seems that the NT kernel
uses the passive cooling algorithm defined by the ACPI specification for all passive cooling devices.
So when using Windows, _TZD is indeed treated like an extension for _PSL.
>>>> This makes sense as _PSL is defined to only contain processor objects (see section 11.4.10), so _TZD can act like an
>>>> extension of _PSL for things like ACPI control method batteries (see 10.2.2.12).
>>> But not everything in _TZD needs to be a potential "cooling device"
>>> and how you'll decide which one is?
>> Devices in _TZD that have no cooling capability will simply never register any cooling devices. This means that
>> the .should_bind callback of the ACPI thermal zone will never see those devices. Only devices in _TZD that also
>> have the ability for (passive) cooling will register a cooling device, so only those devices will end up with
>> the .should_bind callback of the ACPI thermal zone.
>>
>> The ACPI thermal zone treats _TZD as a list of ACPI handles. If some of those handles are unused, then this is
>> totally fine.
>>
>>>> Microsoft also follows this approach (see https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/design-guide
>>>> section "Thermally managed devices" paragraph "Processor aggregator").
>>> Interesting.
>>>
>>> I agree that it would make sense to follow them because there will be
>>> platform dependencies on that, if there aren't already.
>> My primary goal is to improve the Linux thermal subsystem to be as powerful as
>> the Windows thermal subsystem. This means that we must stop viewing _PSL, _ALx and _TZD
>> as something that only works with a predefined set of devices. Instead we must view
>> _PSL, _ALx and _TZD as something similar to the cooling-maps used for connecting
>> thermal zones and cooling devices on OF-based systems.
>>
>>>>>> This however will currently not work as
>>>>>> the ACPI thermal zone driver uses the private drvdata of the cooling device to
>>>>>> determine if said cooling device should bind. This only works for ACPI fans and
>>>>>> processors due to the fact that those drivers store a ACPI device pointer inside
>>>>>> drvdata, something the ACPI thermal zone expects.
>>>>> I'm not sure I understand the above.
>>>>>
>>>>> There is a list of ACPI device handles per trip point, as returned by
>>>>> either _PSL or _ALx. Devices whose handles are in that list will be
>>>>> bound to the thermal zone, so long as there are struct acpi_device
>>>>> objects representing them which is verified with the help of the
>>>>> devdata field in struct thermal_cooling_device.
>>>> AFAIK devdata is meant to be used by the thermal zone device callbacks to access the state
>>>> container struct of the associated device driver instance. Assuming that a given device driver
>>>> will populate devdata with a pointer to is ACPI companion device is an implementation-specific
>>>> detail that does not apply to all cooling device implementations. It just so happens that the
>>>> ACPI processor and fan driver do this, likely because they where designed specifically to work
>>>> with the ACPI thermal zone driver.
>>>>
>>>> The documentation of thermal_cooling_device_register() even describes devdata as "device private data", so any meaning of devdata purely depends on the
>>>> given device driver.
>>> Yes, and these particular drivers decide to store a pointer to struct
>>> acpi_device in it.
>>>
>>> But this is not super important, they might as well set the
>>> ACPI_COMPANION() of the cooling device to the corresponding struct
>>> acpi_device and the ACPI thermal driver might use that information.
>>>
>>> I'm not opposed to using parents for this purpose, but it doesn't
>>> change the big picture that the ACPI thermal driver will need to know
>>> the ACPI handle corresponding to each cooling device.
>>>
>>> If you want to use _TZD instead of or in addition to _PSL for this, it
>>> doesn't change much here, it's just another list of ACPI handles, so
>>> saying that parents are needed for supporting this is not exactly
>>> accurate IMV.
>> My idea was something like this:
>>
>> /* Cooling devices without a parent device cannot be referenced using ACPI */
>> if (!cdev->device.parent)
>> return false;
>>
>> /* Not all devices are described inside the ACPI tables */
>> acpi_handle cdev_handle = ACPI_HANDLE(cdev->device.parent);
>> if (!cdev_handle)
>> return false;
>>
>> for (i = 0; i < acpi_trip->devices.count; i++) {
>> acpi_handle handle = acpi_trip->devices.handles[i];
>>
>> if (handle == cdev_handle)
>> return true;
>> }
>>
>> This only works if the parent device pointer of the cooling device is populated.
> Sure, but it looks reasonable to me.
>
>>>>> IOW, cooling device drivers that create struct thermal_cooling_device
>>>>> objects representing them are expected to set devdata in those objects
>>>>> to point to struct acpi_device objects corresponding to their ACPI
>>>>> handles, but in principle acpi_thermal_should_bind_cdev() might as
>>>>> well just use the handles themselves. It just needs to know that
>>>>> there is a cooling driver on the other side of the ACPI handle.
>>>>>
>>>>> The point is that a cooling device to be bound to an ACPI thermal zone
>>>>> needs an ACPI handle in the first place to be listed in _PSL or _ALx.
>>>> Correct, i merely change the way the ACPI thermal zone driver retrieves the
>>>> ACPI handle associated with a given cooling device.
>>> Right.
>>>
>>>>>> As we cannot require all cooling devices to store an ACPI device pointer inside
>>>>>> their drvdata field in order to support ACPI,
>>>>> Cooling devices don't store ACPI device pointers in struct
>>>>> thermal_cooling_device objects, ACPI cooling drivers do, and there are
>>>>> two reasons to do that: (1) to associate a given struct
>>>>> thermal_cooling_device with an ACPI handle and (2) to let
>>>>> acpi_thermal_should_bind_cdev() know that the cooling device is
>>>>> present and functional.
>>>>>
>>>>> This can be changed to store an ACPI handle in struct
>>>>> thermal_cooling_device and acpi_thermal_should_bind_cdev() may just
>>>>> verify that the device is there by itself.
>>>> I can of course extend thermal_cooling_device_register() to accept a fwnode_handle that
>>>> can be used for both ACPI and OF based cooling device identification, if this is what you
>>>> prefer.
>>> I'm not sure about this ATM and see below.
>>>
>>>> This patch series would then turn into a cleanup series, focusing on properly adding
>>>> thermal zone devices and cooling devices into the global device hierarchy.
>>> I'd prefer to do one thing at a time though.
>>>
>>> If you want cooling devices to get parents, fine. I'm not
>>> fundamentally opposed to that idea, but let's have clear rules for
>>> device drivers on how to set those parents for the sake of
>>> consistency.
>>>
>>> As for the ACPI case, one rule that I want to be followed (as already
>>> stated multiple times) is that a struct acpi_device can only be a
>>> parent of another struct acpi_device. This means that the parent of a
>>> cooling device needs to be a platform device or similar representing
>>> the actual device that will be used for implementing the cooling.
>> OK.
>>
>>> A separate question is how acpi_thermal_should_bind_cdev() will match
>>> cooling devices with the ACPI handles coming from _PSL, _ALx, _TZD
>>> etc. and the rule can be that it will look at the ACPI_COMPANION() of
>>> the parent of the given cooling device.
>> See the example code i pasted above, the whole matching is done using ACPI handles,
>> so we can completely leave ACPI_COMPANION() out of this.
> ACPI_HANDLE() is a wrapper around ACPI_COMPANION() so your code
> effectively does what I said above.
True, i forgot about that.
>>>>>> we must use a more generic approach.
>>>>> I'm not sure what use case you are talking about.
>>>>>
>>>>> Surely, devices with no representation in the ACPI namespace cannot be
>>>>> bound to ACPI thermal zones. For devices that have a representation
>>>>> in the ACPI namespace, storing an ACPI handle in devdata should not be
>>>>> a problem.
>>>> See my above explanations for details, drvdata is defined to hold device private data,
>>>> nothing more.
>>> This is related to the discussion below.
>>>
>>>>>> I was thinking about using the acpi_handle of the parent device instead of messing
>>>>>> with the drvdata field, but this only works if the parent device pointer of the
>>>>>> cooling device is populated.
>>>>>>
>>>>>> (Cooling devices without a parent device would then be ignored by the ACPI thermal
>>>>>> zone driver, as such cooling devices cannot be linked to ACPI).
>>>>> It can be arranged this way, but what's the practical difference?
>>>>> Anyone who creates a struct thermal_cooling_device and can set its
>>>>> parent pointer to a device with an ACPI companion, may as well set its
>>>>> devdata to point to that companion directly - or to its ACPI handle if
>>>>> that's preferred.
>>>> Yes, but this would require explicit support for ACPI in every driver that registers cooling devices.
>>> So you want to have generic drivers that may work on ACPI platforms
>>> and on DT platforms to be able to create cooling devices for use with
>>> ACPI thermal zones. Well, had you started the whole discussion with
>>> this statement, it would have been much easier to understand your
>>> point.
>> Sorry for the messy discussion, i intended to have two separate patch series. This one was meant to
>> simply be a preparation, with the important changes inside the ACPI thermal zone driver being implemented
>> with the second patch series.
>>
>> That was also the reason why i send this series as an RFC.
>>
>>>> Using the parent device to retrieve the acpi_handle or allowing all drivers to just submit a fwnode_handle
>>>> of their choice when creating a cooling device will fix this.
>>> If you go the parents route, this is an important consideration for
>>> the rules on how to set those parents. Namely, they would need to be
>>> set so that the fwnode_handle of the parent could be used for binding
>>> the cooling device to a thermal zone either on ACPI or on DT systems.
>>>
>>> Of course, there are also cooling devices whose parents will not have
>>> an fwnode_handle and they would still need to work in this brave new
>>> world.
>>>
>> True, i did not think of that. In this case extending thermal_of_cooling_device_register() and friends to accept
>> a generic fwnode_handle instead of a OF-specific device_node would make more sense. Most drivers can simply
>> pass the result of dev_fwnode() instead of dev->of_node, only those that support multiple cooling device child
>> nodes would need additional work to also support ACPI.
>>
>> Basically, thermal_of_get_cooling_spec() could handle the fwnode_handle in the following manner:
>>
>> if (cooling_spec.np->fwnode != cdev->fwnode)
>> return false;
>>
>> And the ACPI thermal zone driver could then simply use ACPI_HANDLE_FWNODE() to retrieve the ACPI handle from
>> the fwnode_handle (together with a NULL check of course).
>>
>> If you are OK with this approach, i will forget about the whole parent device stuff for now and focus on extending
>> (devm_)thermal_of_cooling_device_register(). There are some additional changes needed for reliably associating
>> cooling devices to ACPI trip points using fwnode handles, but those are not that intrusive.
>>
>> What do you think?
> One advantage of using parents is that it will help user space to
> figure out connections between the abstract cooling devices and the
> associated hardware or firmware entities. I think that this is an
> important one.
>
> It also doesn't prevent fwnode_handle from being used because the
> fwnode_handle may just be stored in the parent. I like this more than
> associating fwnode_handles directly with abstract cooling devices.
>
> If the cooling device parent (that is, the provider of the cooling
> mechanism used by it) does not have an fwnode_handle, then either it
> needs to be driven directly from user space, or the driver creating a
> thermal zone device needs to provide a specific .should_bind()
> callback that will know what to look for.
>
OK. When sending the next revision of this patch series, should i also keep
the patches for the thermal zone device or should i only keep the patches
concerning the cooling devices?
Thanks,
Armin Wolf
On Sat, Nov 29, 2025 at 12:36 PM Armin Wolf <W_Armin@gmx.de> wrote: > > Am 28.11.25 um 12:40 schrieb Rafael J. Wysocki: > > > On Fri, Nov 28, 2025 at 12:50 AM Armin Wolf <W_Armin@gmx.de> wrote: > >> Am 27.11.25 um 22:46 schrieb Rafael J. Wysocki: [cut] > >> What do you think? > > One advantage of using parents is that it will help user space to > > figure out connections between the abstract cooling devices and the > > associated hardware or firmware entities. I think that this is an > > important one. > > > > It also doesn't prevent fwnode_handle from being used because the > > fwnode_handle may just be stored in the parent. I like this more than > > associating fwnode_handles directly with abstract cooling devices. > > > > If the cooling device parent (that is, the provider of the cooling > > mechanism used by it) does not have an fwnode_handle, then either it > > needs to be driven directly from user space, or the driver creating a > > thermal zone device needs to provide a specific .should_bind() > > callback that will know what to look for. > > > OK. When sending the next revision of this patch series, should i also keep > the patches for the thermal zone device or should i only keep the patches > concerning the cooling devices? The cooling device changes are kind of unrelated to the thermal zone device changes, so it would be better to send them as separate series, but you may as well send those series at the same time as far as I'm concerned.
© 2016 - 2025 Red Hat, Inc.