[PATCH v6 4/6] arm64: dts: mediatek: mt8186: add default thermal zones

Julien Panis posted 6 patches 3 months, 3 weeks ago
There is a newer version of this series
[PATCH v6 4/6] arm64: dts: mediatek: mt8186: add default thermal zones
Posted by Julien Panis 3 months, 3 weeks ago
From: Nicolas Pitre <npitre@baylibre.com>

Inspired by the vendor kernel but adapted to the upstream thermal
driver version.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Signed-off-by: Julien Panis <jpanis@baylibre.com>
---
 arch/arm64/boot/dts/mediatek/mt8186.dtsi | 297 +++++++++++++++++++++++++++++++
 1 file changed, 297 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt8186.dtsi b/arch/arm64/boot/dts/mediatek/mt8186.dtsi
index caec83f5eece..95fe5a05f0d7 100644
--- a/arch/arm64/boot/dts/mediatek/mt8186.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8186.dtsi
@@ -13,6 +13,8 @@
 #include <dt-bindings/power/mt8186-power.h>
 #include <dt-bindings/phy/phy.h>
 #include <dt-bindings/reset/mt8186-resets.h>
+#include <dt-bindings/thermal/thermal.h>
+#include <dt-bindings/thermal/mediatek,lvts-thermal.h>
 
 / {
 	compatible = "mediatek,mt8186";
@@ -2197,4 +2199,299 @@ larb19: smi@1c10f000 {
 			power-domains = <&spm MT8186_POWER_DOMAIN_IPE>;
 		};
 	};
+
+	thermal_zones: thermal-zones {
+		cpu-little0-thermal {
+			polling-delay = <1000>;
+			polling-delay-passive = <150>;
+			thermal-sensors = <&lvts MT8186_LITTLE_CPU0>;
+
+			trips {
+				cpu_little0_alert0: trip-alert0 {
+					temperature = <85000>;
+					hysteresis = <2000>;
+					type = "passive";
+				};
+
+				cpu_little0_alert1: trip-alert1 {
+					temperature = <95000>;
+					hysteresis = <2000>;
+					type = "hot";
+				};
+
+				cpu_little0_crit: trip-crit {
+					temperature = <100000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_little0_alert0>;
+					cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu4 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu5 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+				};
+			};
+		};
+
+		cpu-little1-thermal {
+			polling-delay = <1000>;
+			polling-delay-passive = <150>;
+			thermal-sensors = <&lvts MT8186_LITTLE_CPU1>;
+
+			trips {
+				cpu_little1_alert0: trip-alert0 {
+					temperature = <85000>;
+					hysteresis = <2000>;
+					type = "passive";
+				};
+
+				cpu_little1_alert1: trip-alert1 {
+					temperature = <95000>;
+					hysteresis = <2000>;
+					type = "hot";
+				};
+
+				cpu_little1_crit: trip-crit {
+					temperature = <100000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_little1_alert0>;
+					cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu4 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu5 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+				};
+			};
+		};
+
+		cpu-little2-thermal {
+			polling-delay = <1000>;
+			polling-delay-passive = <150>;
+			thermal-sensors = <&lvts MT8186_LITTLE_CPU2>;
+
+			trips {
+				cpu_little2_alert0: trip-alert0 {
+					temperature = <85000>;
+					hysteresis = <2000>;
+					type = "passive";
+				};
+
+				cpu_little2_alert1: trip-alert1 {
+					temperature = <95000>;
+					hysteresis = <2000>;
+					type = "hot";
+				};
+
+				cpu_little2_crit: trip-crit {
+					temperature = <100000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_little2_alert0>;
+					cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu4 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu5 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+				};
+			};
+		};
+
+		cam-thermal {
+			polling-delay = <1000>;
+			polling-delay-passive = <250>;
+			thermal-sensors = <&lvts MT8186_CAM>;
+
+			trips {
+				cam_alert0: trip-alert0 {
+					temperature = <85000>;
+					hysteresis = <2000>;
+					type = "passive";
+				};
+
+				cam_alert1: trip-alert1 {
+					temperature = <95000>;
+					hysteresis = <2000>;
+					type = "hot";
+				};
+
+				cam_crit: trip-crit {
+					temperature = <100000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+		};
+
+		nna-thermal {
+			polling-delay = <1000>;
+			polling-delay-passive = <250>;
+			thermal-sensors = <&lvts MT8186_NNA>;
+
+			trips {
+				nna_alert0: trip-alert0 {
+					temperature = <85000>;
+					hysteresis = <2000>;
+					type = "passive";
+				};
+
+				nna_alert1: trip-alert1 {
+					temperature = <95000>;
+					hysteresis = <2000>;
+					type = "hot";
+				};
+
+				nna_crit: trip-crit {
+					temperature = <100000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+		};
+
+		adsp-thermal {
+			polling-delay = <1000>;
+			polling-delay-passive = <250>;
+			thermal-sensors = <&lvts MT8186_ADSP>;
+
+			trips {
+				adsp_alert0: trip-alert0 {
+					temperature = <85000>;
+					hysteresis = <2000>;
+					type = "passive";
+				};
+
+				adsp_alert1: trip-alert1 {
+					temperature = <95000>;
+					hysteresis = <2000>;
+					type = "hot";
+				};
+
+				adsp_crit: trip-crit {
+					temperature = <100000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+		};
+
+		gpu-thermal {
+			polling-delay = <1000>;
+			polling-delay-passive = <250>;
+			thermal-sensors = <&lvts MT8186_GPU>;
+
+			trips {
+				gpu_alert0: trip-alert0 {
+					temperature = <85000>;
+					hysteresis = <2000>;
+					type = "passive";
+				};
+
+				gpu_alert1: trip-alert1 {
+					temperature = <95000>;
+					hysteresis = <2000>;
+					type = "hot";
+				};
+
+				gpu_crit: trip-crit {
+					temperature = <100000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&gpu_alert0>;
+					cooling-device = <&gpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+				};
+			};
+		};
+
+		cpu-big0-thermal {
+			polling-delay = <1000>;
+			polling-delay-passive = <100>;
+			thermal-sensors = <&lvts MT8186_BIG_CPU0>;
+
+			trips {
+				cpu_big0_alert0: trip-alert0 {
+					temperature = <85000>;
+					hysteresis = <2000>;
+					type = "passive";
+				};
+
+				cpu_big0_alert1: trip-alert1 {
+					temperature = <95000>;
+					hysteresis = <2000>;
+					type = "hot";
+				};
+
+				cpu_big0_crit: trip-crit {
+					temperature = <100000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_big0_alert0>;
+					cooling-device = <&cpu6 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu7 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+				};
+			};
+		};
+
+		cpu-big1-thermal {
+			polling-delay = <1000>;
+			polling-delay-passive = <100>;
+			thermal-sensors = <&lvts MT8186_BIG_CPU1>;
+
+			trips {
+				cpu_big1_alert0: trip-alert0 {
+					temperature = <85000>;
+					hysteresis = <2000>;
+					type = "passive";
+				};
+
+				cpu_big1_alert1: trip-alert1 {
+					temperature = <95000>;
+					hysteresis = <2000>;
+					type = "hot";
+				};
+
+				cpu_big1_crit: trip-crit {
+					temperature = <100000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_big1_alert0>;
+					cooling-device = <&cpu6 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu7 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+				};
+			};
+		};
+	};
 };

-- 
2.37.3
Re: [PATCH v6 4/6] arm64: dts: mediatek: mt8186: add default thermal zones
Posted by AngeloGioacchino Del Regno 3 months, 3 weeks ago
Il 29/05/24 07:57, Julien Panis ha scritto:
> From: Nicolas Pitre <npitre@baylibre.com>
> 
> Inspired by the vendor kernel but adapted to the upstream thermal
> driver version.
> 
> Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
> Signed-off-by: Julien Panis <jpanis@baylibre.com>

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Re: [PATCH v6 4/6] arm64: dts: mediatek: mt8186: add default thermal zones
Posted by Chen-Yu Tsai 3 months, 3 weeks ago
On Wed, May 29, 2024 at 4:17 PM AngeloGioacchino Del Regno
<angelogioacchino.delregno@collabora.com> wrote:
>
> Il 29/05/24 07:57, Julien Panis ha scritto:
> > From: Nicolas Pitre <npitre@baylibre.com>
> >
> > Inspired by the vendor kernel but adapted to the upstream thermal
> > driver version.
> >
> > Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
> > Signed-off-by: Julien Panis <jpanis@baylibre.com>
>
> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>

I'm getting some crazy readings which would cause the machine to
immediately shutdown during boot. Anyone else see this? Or maybe
my device has bad calibration data?

gpu_thermal-virtual-0
Adapter: Virtual device
temp1:       +229.7 C

nna_thermal-virtual-0
Adapter: Virtual device
temp1:       +229.7 C

cpu_big0_thermal-virtual-0
Adapter: Virtual device
temp1:         -7.2 C

cpu_little2_thermal-virtual-0
Adapter: Virtual device
temp1:       +157.2 C

cpu_little0_thermal-virtual-0
Adapter: Virtual device
temp1:       -277.1 C

adsp_thermal-virtual-0
Adapter: Virtual device
temp1:       +229.7 C

cpu_big1_thermal-virtual-0
Adapter: Virtual device
temp1:       +229.7 C

cam_thermal-virtual-0
Adapter: Virtual device
temp1:        +45.4 C

cpu_little1_thermal-virtual-0
Adapter: Virtual device
temp1:       -241.8 C
Re: [PATCH v6 4/6] arm64: dts: mediatek: mt8186: add default thermal zones
Posted by Julien Panis 3 months, 3 weeks ago
On 5/29/24 10:33, Chen-Yu Tsai wrote:
> On Wed, May 29, 2024 at 4:17 PM AngeloGioacchino Del Regno
> <angelogioacchino.delregno@collabora.com> wrote:
>> Il 29/05/24 07:57, Julien Panis ha scritto:
>>> From: Nicolas Pitre <npitre@baylibre.com>
>>>
>>> Inspired by the vendor kernel but adapted to the upstream thermal
>>> driver version.
>>>
>>> Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
>>> Signed-off-by: Julien Panis <jpanis@baylibre.com>
>> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
> I'm getting some crazy readings which would cause the machine to
> immediately shutdown during boot. Anyone else see this? Or maybe
> my device has bad calibration data?
>
> gpu_thermal-virtual-0
> Adapter: Virtual device
> temp1:       +229.7 C
>
> nna_thermal-virtual-0
> Adapter: Virtual device
> temp1:       +229.7 C
>
> cpu_big0_thermal-virtual-0
> Adapter: Virtual device
> temp1:         -7.2 C
>
> cpu_little2_thermal-virtual-0
> Adapter: Virtual device
> temp1:       +157.2 C
>
> cpu_little0_thermal-virtual-0
> Adapter: Virtual device
> temp1:       -277.1 C
>
> adsp_thermal-virtual-0
> Adapter: Virtual device
> temp1:       +229.7 C
>
> cpu_big1_thermal-virtual-0
> Adapter: Virtual device
> temp1:       +229.7 C
>
> cam_thermal-virtual-0
> Adapter: Virtual device
> temp1:        +45.4 C
>
> cpu_little1_thermal-virtual-0
> Adapter: Virtual device
> temp1:       -241.8 C

It's likely that your device has bad calibration data indeed. We observed the same
behavior on the mt8186 device we used (a Corsola) and finally realized that the
golden temperature was 0 (device not properly calibrated).

To make a comparison, we run chromiumos v5.15 and dmesg output was:
'This sample is not calibrated, fake !!'
Additional debugging revealed that the golden temp was actually 0. As a result,
chromiumos v5.15 does not use the calibration data. It uses some default values
instead. That's why you can observe good temperatures with chromiumos v5.15
even with a device that is not calibrated.

This feature is not implemented in the driver upstream, so you need a device
properly calibrated to get good temperatures with it. When we forced this
driver using the default values used by chromiumos v5.15 instead of real calib
data (temporarily, just for testing), the temperatures were good.

Please make sure your device is properly calibrated: 0 < golden temp < 62.

Julien

Re: [PATCH v6 4/6] arm64: dts: mediatek: mt8186: add default thermal zones
Posted by AngeloGioacchino Del Regno 3 months, 3 weeks ago
Il 29/05/24 11:12, Julien Panis ha scritto:
> On 5/29/24 10:33, Chen-Yu Tsai wrote:
>> On Wed, May 29, 2024 at 4:17 PM AngeloGioacchino Del Regno
>> <angelogioacchino.delregno@collabora.com> wrote:
>>> Il 29/05/24 07:57, Julien Panis ha scritto:
>>>> From: Nicolas Pitre <npitre@baylibre.com>
>>>>
>>>> Inspired by the vendor kernel but adapted to the upstream thermal
>>>> driver version.
>>>>
>>>> Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
>>>> Signed-off-by: Julien Panis <jpanis@baylibre.com>
>>> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
>> I'm getting some crazy readings which would cause the machine to
>> immediately shutdown during boot. Anyone else see this? Or maybe
>> my device has bad calibration data?
>>
>> gpu_thermal-virtual-0
>> Adapter: Virtual device
>> temp1:       +229.7 C
>>
>> nna_thermal-virtual-0
>> Adapter: Virtual device
>> temp1:       +229.7 C
>>
>> cpu_big0_thermal-virtual-0
>> Adapter: Virtual device
>> temp1:         -7.2 C
>>
>> cpu_little2_thermal-virtual-0
>> Adapter: Virtual device
>> temp1:       +157.2 C
>>
>> cpu_little0_thermal-virtual-0
>> Adapter: Virtual device
>> temp1:       -277.1 C
>>
>> adsp_thermal-virtual-0
>> Adapter: Virtual device
>> temp1:       +229.7 C
>>
>> cpu_big1_thermal-virtual-0
>> Adapter: Virtual device
>> temp1:       +229.7 C
>>
>> cam_thermal-virtual-0
>> Adapter: Virtual device
>> temp1:        +45.4 C
>>
>> cpu_little1_thermal-virtual-0
>> Adapter: Virtual device
>> temp1:       -241.8 C
> 
> It's likely that your device has bad calibration data indeed. We observed the same
> behavior on the mt8186 device we used (a Corsola) and finally realized that the
> golden temperature was 0 (device not properly calibrated).
> 
> To make a comparison, we run chromiumos v5.15 and dmesg output was:
> 'This sample is not calibrated, fake !!'
> Additional debugging revealed that the golden temp was actually 0. As a result,
> chromiumos v5.15 does not use the calibration data. It uses some default values
> instead. That's why you can observe good temperatures with chromiumos v5.15
> even with a device that is not calibrated.
> 
> This feature is not implemented in the driver upstream, so you need a device
> properly calibrated to get good temperatures with it. When we forced this
> driver using the default values used by chromiumos v5.15 instead of real calib
> data (temporarily, just for testing), the temperatures were good.
> 
> Please make sure your device is properly calibrated: 0 < golden temp < 62.
> 

Wait wait wait wait.

What's up with that calibration data stuff?

If there's any device that cannot use the calibration data, we need a way to
recognize whether the provided data (read from efuse, of course) is valid,
otherwise we're creating an important regression here.

"This device is unlucky" is not a good reason to have this kind of regression.

Since - as far as I understand - downstream can recognize that, upstream should
do the same.
I'd be okay with refusing to even probe this driver on such devices for the
moment being, as those are things that could be eventually handled on a second
part series, even though I would prefer a kind of on-the-fly calibration or
anyway something that would still make the unlucky ones to actually have good
readings *right now*.

Though, the fact that you assert that you observed this behavior on one of your
devices and *still decided to send that upstream* is, in my opinion, unacceptable.

Regards,
Angelo
Re: [PATCH v6 4/6] arm64: dts: mediatek: mt8186: add default thermal zones
Posted by Julien Panis 3 months, 2 weeks ago
On 5/29/24 14:06, AngeloGioacchino Del Regno wrote:
> Il 29/05/24 11:12, Julien Panis ha scritto:
>> On 5/29/24 10:33, Chen-Yu Tsai wrote:
>>> On Wed, May 29, 2024 at 4:17 PM AngeloGioacchino Del Regno
>>> <angelogioacchino.delregno@collabora.com> wrote:
>>>> Il 29/05/24 07:57, Julien Panis ha scritto:
>>>>> From: Nicolas Pitre <npitre@baylibre.com>
>>>>>
>>>>> Inspired by the vendor kernel but adapted to the upstream thermal
>>>>> driver version.
>>>>>
>>>>> Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
>>>>> Signed-off-by: Julien Panis <jpanis@baylibre.com>
>>>> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
>>> I'm getting some crazy readings which would cause the machine to
>>> immediately shutdown during boot. Anyone else see this? Or maybe
>>> my device has bad calibration data?
>>>
>>> gpu_thermal-virtual-0
>>> Adapter: Virtual device
>>> temp1:       +229.7 C
>>>
>>> nna_thermal-virtual-0
>>> Adapter: Virtual device
>>> temp1:       +229.7 C
>>>
>>> cpu_big0_thermal-virtual-0
>>> Adapter: Virtual device
>>> temp1:         -7.2 C
>>>
>>> cpu_little2_thermal-virtual-0
>>> Adapter: Virtual device
>>> temp1:       +157.2 C
>>>
>>> cpu_little0_thermal-virtual-0
>>> Adapter: Virtual device
>>> temp1:       -277.1 C
>>>
>>> adsp_thermal-virtual-0
>>> Adapter: Virtual device
>>> temp1:       +229.7 C
>>>
>>> cpu_big1_thermal-virtual-0
>>> Adapter: Virtual device
>>> temp1:       +229.7 C
>>>
>>> cam_thermal-virtual-0
>>> Adapter: Virtual device
>>> temp1:        +45.4 C
>>>
>>> cpu_little1_thermal-virtual-0
>>> Adapter: Virtual device
>>> temp1:       -241.8 C
>>
>> It's likely that your device has bad calibration data indeed. We observed the same
>> behavior on the mt8186 device we used (a Corsola) and finally realized that the
>> golden temperature was 0 (device not properly calibrated).
>>
>> To make a comparison, we run chromiumos v5.15 and dmesg output was:
>> 'This sample is not calibrated, fake !!'
>> Additional debugging revealed that the golden temp was actually 0. As a result,
>> chromiumos v5.15 does not use the calibration data. It uses some default values
>> instead. That's why you can observe good temperatures with chromiumos v5.15
>> even with a device that is not calibrated.
>>
>> This feature is not implemented in the driver upstream, so you need a device
>> properly calibrated to get good temperatures with it. When we forced this
>> driver using the default values used by chromiumos v5.15 instead of real calib
>> data (temporarily, just for testing), the temperatures were good.
>>
>> Please make sure your device is properly calibrated: 0 < golden temp < 62.
>>
>
> Wait wait wait wait.
>
> What's up with that calibration data stuff?
>
> If there's any device that cannot use the calibration data, we need a way to
> recognize whether the provided data (read from efuse, of course) is valid,
> otherwise we're creating an important regression here.
>
> "This device is unlucky" is not a good reason to have this kind of regression.
>
> Since - as far as I understand - downstream can recognize that, upstream should
> do the same.
> I'd be okay with refusing to even probe this driver on such devices for the
> moment being, as those are things that could be eventually handled on a second
> part series, even though I would prefer a kind of on-the-fly calibration or
> anyway something that would still make the unlucky ones to actually have good
> readings *right now*.
>
> Though, the fact that you assert that you observed this behavior on one of your
> devices and *still decided to send that upstream* is, in my opinion, unacceptable.
>
> Regards,
> Angelo

I've been trying to find some more information about the criteria
"device calibrated VS device not calibrated" because there's a
confusing comment in downstream code (the comment does not
match what I observe on my device). I'll send a separate patch
to add this feature over the next few days, when I get additional
information from MTK about this criteria.
Re: [PATCH v6 4/6] arm64: dts: mediatek: mt8186: add default thermal zones
Posted by Chen-Yu Tsai 3 months ago
Hi,

On Mon, Jun 3, 2024 at 3:58 PM Julien Panis <jpanis@baylibre.com> wrote:
>
> On 5/29/24 14:06, AngeloGioacchino Del Regno wrote:
> > Il 29/05/24 11:12, Julien Panis ha scritto:
> >> On 5/29/24 10:33, Chen-Yu Tsai wrote:
> >>> On Wed, May 29, 2024 at 4:17 PM AngeloGioacchino Del Regno
> >>> <angelogioacchino.delregno@collabora.com> wrote:
> >>>> Il 29/05/24 07:57, Julien Panis ha scritto:
> >>>>> From: Nicolas Pitre <npitre@baylibre.com>
> >>>>>
> >>>>> Inspired by the vendor kernel but adapted to the upstream thermal
> >>>>> driver version.
> >>>>>
> >>>>> Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
> >>>>> Signed-off-by: Julien Panis <jpanis@baylibre.com>
> >>>> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
> >>> I'm getting some crazy readings which would cause the machine to
> >>> immediately shutdown during boot. Anyone else see this? Or maybe
> >>> my device has bad calibration data?
> >>>
> >>> gpu_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1:       +229.7 C
> >>>
> >>> nna_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1:       +229.7 C
> >>>
> >>> cpu_big0_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1:         -7.2 C
> >>>
> >>> cpu_little2_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1:       +157.2 C
> >>>
> >>> cpu_little0_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1:       -277.1 C
> >>>
> >>> adsp_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1:       +229.7 C
> >>>
> >>> cpu_big1_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1:       +229.7 C
> >>>
> >>> cam_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1:        +45.4 C
> >>>
> >>> cpu_little1_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1:       -241.8 C
> >>
> >> It's likely that your device has bad calibration data indeed. We observed the same
> >> behavior on the mt8186 device we used (a Corsola) and finally realized that the
> >> golden temperature was 0 (device not properly calibrated).
> >>
> >> To make a comparison, we run chromiumos v5.15 and dmesg output was:
> >> 'This sample is not calibrated, fake !!'
> >> Additional debugging revealed that the golden temp was actually 0. As a result,
> >> chromiumos v5.15 does not use the calibration data. It uses some default values
> >> instead. That's why you can observe good temperatures with chromiumos v5.15
> >> even with a device that is not calibrated.
> >>
> >> This feature is not implemented in the driver upstream, so you need a device
> >> properly calibrated to get good temperatures with it. When we forced this
> >> driver using the default values used by chromiumos v5.15 instead of real calib
> >> data (temporarily, just for testing), the temperatures were good.
> >>
> >> Please make sure your device is properly calibrated: 0 < golden temp < 62.
> >>
> >
> > Wait wait wait wait.
> >
> > What's up with that calibration data stuff?
> >
> > If there's any device that cannot use the calibration data, we need a way to
> > recognize whether the provided data (read from efuse, of course) is valid,
> > otherwise we're creating an important regression here.
> >
> > "This device is unlucky" is not a good reason to have this kind of regression.
> >
> > Since - as far as I understand - downstream can recognize that, upstream should
> > do the same.
> > I'd be okay with refusing to even probe this driver on such devices for the
> > moment being, as those are things that could be eventually handled on a second
> > part series, even though I would prefer a kind of on-the-fly calibration or
> > anyway something that would still make the unlucky ones to actually have good
> > readings *right now*.
> >
> > Though, the fact that you assert that you observed this behavior on one of your
> > devices and *still decided to send that upstream* is, in my opinion, unacceptable.
> >
> > Regards,
> > Angelo
>
> I've been trying to find some more information about the criteria
> "device calibrated VS device not calibrated" because there's a
> confusing comment in downstream code (the comment does not
> match what I observe on my device). I'll send a separate patch
> to add this feature over the next few days, when I get additional
> information from MTK about this criteria.

I couldn't wait and sent a patch to provide default calibration data,
based on the values and code from the ChromeOS kernels. It seems to
work OK-ish. I get 4x degrees C on my MT8186 device.

Also, your previous patch blocking invalid efuse data has landed. So
I think this series can be relanded. What do you think, Angelo?

ChenYu