kernel/power/energy_model.c | 4 ++++ 1 file changed, 4 insertions(+)
From: Yaxiong Tian <tianyaxiong@kylinos.cn>
When the device is of a non-CPU type, table[i].performance won't be
initialized in the previous em_init_performance(), resulting in division
by zero when calculating costs in em_compute_costs().
Since the 'cost' algorithm is only used for EAS energy efficiency
calculations and is currently not utilized by other device drivers, we
should add the _is_cpu_device(dev) check to prevent this division-by-zero
issue.
Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove division")
Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
---
kernel/power/energy_model.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index d9b7e2b38c7a..41606247c277 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -233,6 +233,10 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
unsigned long prev_cost = ULONG_MAX;
int i, ret;
+ /* This is needed only for CPUs and EAS skip other devices */
+ if (!_is_cpu_device(dev))
+ return 0;
+
/* Compute the cost of each performance state. */
for (i = nr_states - 1; i >= 0; i--) {
unsigned long power_res, cost;
--
2.25.1
On 4/17/25 02:07, Yaxiong Tian wrote:
> From: Yaxiong Tian <tianyaxiong@kylinos.cn>
>
> When the device is of a non-CPU type, table[i].performance won't be
> initialized in the previous em_init_performance(), resulting in division
> by zero when calculating costs in em_compute_costs().
>
> Since the 'cost' algorithm is only used for EAS energy efficiency
> calculations and is currently not utilized by other device drivers, we
> should add the _is_cpu_device(dev) check to prevent this division-by-zero
> issue.
>
> Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove division")
> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
> ---
> kernel/power/energy_model.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index d9b7e2b38c7a..41606247c277 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -233,6 +233,10 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
> unsigned long prev_cost = ULONG_MAX;
> int i, ret;
>
> + /* This is needed only for CPUs and EAS skip other devices */
> + if (!_is_cpu_device(dev))
> + return 0;
> +
> /* Compute the cost of each performance state. */
> for (i = nr_states - 1; i >= 0; i--) {
> unsigned long power_res, cost;
Please stop for a while. I have to check what happened that you
faced the issue in the first place. I have been testing the GPU
EMs and there was no issues...
Let me debug that today.
在 2025/4/17 13:57, Lukasz Luba 写道:
>
>
> On 4/17/25 02:07, Yaxiong Tian wrote:
>> From: Yaxiong Tian <tianyaxiong@kylinos.cn>
>>
>> When the device is of a non-CPU type, table[i].performance won't be
>> initialized in the previous em_init_performance(), resulting in division
>> by zero when calculating costs in em_compute_costs().
>>
>> Since the 'cost' algorithm is only used for EAS energy efficiency
>> calculations and is currently not utilized by other device drivers, we
>> should add the _is_cpu_device(dev) check to prevent this division-by-zero
>> issue.
>>
>> Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove
>> division")
>> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
>> ---
>> kernel/power/energy_model.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
>> index d9b7e2b38c7a..41606247c277 100644
>> --- a/kernel/power/energy_model.c
>> +++ b/kernel/power/energy_model.c
>> @@ -233,6 +233,10 @@ static int em_compute_costs(struct device *dev,
>> struct em_perf_state *table,
>> unsigned long prev_cost = ULONG_MAX;
>> int i, ret;
>> + /* This is needed only for CPUs and EAS skip other devices */
>> + if (!_is_cpu_device(dev))
>> + return 0;
>> +
>> /* Compute the cost of each performance state. */
>> for (i = nr_states - 1; i >= 0; i--) {
>> unsigned long power_res, cost;
>
>
> Please stop for a while. I have to check what happened that you
> faced the issue in the first place. I have been testing the GPU
> EMs and there was no issues...
>
> Let me debug that today.
Of course. Since I don't have actual hardware, I can only logically
deduce that this issue might exist.
On 4/17/25 08:43, Yaxiong Tian wrote:
>
>
> 在 2025/4/17 13:57, Lukasz Luba 写道:
>>
>>
>> On 4/17/25 02:07, Yaxiong Tian wrote:
>>> From: Yaxiong Tian <tianyaxiong@kylinos.cn>
>>>
>>> When the device is of a non-CPU type, table[i].performance won't be
>>> initialized in the previous em_init_performance(), resulting in division
>>> by zero when calculating costs in em_compute_costs().
>>>
>>> Since the 'cost' algorithm is only used for EAS energy efficiency
>>> calculations and is currently not utilized by other device drivers, we
>>> should add the _is_cpu_device(dev) check to prevent this
>>> division-by-zero
>>> issue.
>>>
>>> Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove
>>> division")
>>> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
>>> ---
>>> kernel/power/energy_model.c | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>>
>>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
>>> index d9b7e2b38c7a..41606247c277 100644
>>> --- a/kernel/power/energy_model.c
>>> +++ b/kernel/power/energy_model.c
>>> @@ -233,6 +233,10 @@ static int em_compute_costs(struct device *dev,
>>> struct em_perf_state *table,
>>> unsigned long prev_cost = ULONG_MAX;
>>> int i, ret;
>>> + /* This is needed only for CPUs and EAS skip other devices */
>>> + if (!_is_cpu_device(dev))
>>> + return 0;
>>> +
>>> /* Compute the cost of each performance state. */
>>> for (i = nr_states - 1; i >= 0; i--) {
>>> unsigned long power_res, cost;
>>
>>
>> Please stop for a while. I have to check what happened that you
>> faced the issue in the first place. I have been testing the GPU
>> EMs and there was no issues...
>>
>> Let me debug that today.
>
> Of course. Since I don't have actual hardware, I can only logically
> deduce that this issue might exist.
>
>
I have run with the GPU EM registered in the boot:
-------------------------------------------------------
[ 2.753333] panfrost ff9a0000.gpu: EM: created perf domain
[ 2.759863] panfrost ff9a0000.gpu: mali-t860 id 0x860 major 0x2 minor
0x0 status 0x0
[ 2.768530] panfrost ff9a0000.gpu: features: 00000000,00000407,
issues: 00000000,24040400
[ 2.777678] panfrost ff9a0000.gpu: Features: L2:0x07120206
Shader:0x00000000 Tiler:0x00000809 Mem:0x1 MMU:0x00002830 AS:0xff JS:0x7
[ 2.780746] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req
150000000Hz, actual 148500000HZ div = 0)
[ 2.790905] panfrost ff9a0000.gpu: shader_present=0xf l2_present=0x1
root@arm:~# cat /sys/kernel/debug/energy_model/ff9a0000.gpu/flags
0x1
root@arm:~# grep . /sys/kernel/debug/energy_model/ff9a0000.gpu/ps*/*
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/cost:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/frequency:200000
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/inefficient:1
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/performance:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/power:404250
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/cost:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/frequency:300000
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/inefficient:1
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/performance:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/power:606375
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/cost:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/frequency:400000
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/inefficient:1
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/performance:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/power:808500
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/cost:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/frequency:600000
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/inefficient:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/performance:0
/sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/power:1505790
--------------------------------------------------------
The EM for the GPU is not modified during the boot like the CPUs'
EM are, thus this code is not triggered. Although, the API is
open and in theory the GPU EM can be modified at runtime
as well and it will reach that em_compute_costs() issue
with 'performance' field having value 0.
So this v4 patch would be needed in this case.
Please re-send this v4 patch as a completely new message.
Thanks for looking at that code path and the fix for potential
issue.
You can also add my:
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Regrds,
Lukasz
在 2025/4/17 21:27, Lukasz Luba 写道:
>
>
> On 4/17/25 08:43, Yaxiong Tian wrote:
>>
>>
>> 在 2025/4/17 13:57, Lukasz Luba 写道:
>>>
>>>
>>> On 4/17/25 02:07, Yaxiong Tian wrote:
>>>> From: Yaxiong Tian <tianyaxiong@kylinos.cn>
>>>>
>>>> When the device is of a non-CPU type, table[i].performance won't be
>>>> initialized in the previous em_init_performance(), resulting in
>>>> division
>>>> by zero when calculating costs in em_compute_costs().
>>>>
>>>> Since the 'cost' algorithm is only used for EAS energy efficiency
>>>> calculations and is currently not utilized by other device drivers, we
>>>> should add the _is_cpu_device(dev) check to prevent this
>>>> division-by-zero
>>>> issue.
>>>>
>>>> Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove
>>>> division")
>>>> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
>>>> ---
>>>> kernel/power/energy_model.c | 4 ++++
>>>> 1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
>>>> index d9b7e2b38c7a..41606247c277 100644
>>>> --- a/kernel/power/energy_model.c
>>>> +++ b/kernel/power/energy_model.c
>>>> @@ -233,6 +233,10 @@ static int em_compute_costs(struct device *dev,
>>>> struct em_perf_state *table,
>>>> unsigned long prev_cost = ULONG_MAX;
>>>> int i, ret;
>>>> + /* This is needed only for CPUs and EAS skip other devices */
>>>> + if (!_is_cpu_device(dev))
>>>> + return 0;
>>>> +
>>>> /* Compute the cost of each performance state. */
>>>> for (i = nr_states - 1; i >= 0; i--) {
>>>> unsigned long power_res, cost;
>>>
>>>
>>> Please stop for a while. I have to check what happened that you
>>> faced the issue in the first place. I have been testing the GPU
>>> EMs and there was no issues...
>>>
>>> Let me debug that today.
>>
>> Of course. Since I don't have actual hardware, I can only logically
>> deduce that this issue might exist.
>>
>>
>
> I have run with the GPU EM registered in the boot:
>
> -------------------------------------------------------
> [ 2.753333] panfrost ff9a0000.gpu: EM: created perf domain
> [ 2.759863] panfrost ff9a0000.gpu: mali-t860 id 0x860 major 0x2 minor
> 0x0 status 0x0
> [ 2.768530] panfrost ff9a0000.gpu: features: 00000000,00000407,
> issues: 00000000,24040400
> [ 2.777678] panfrost ff9a0000.gpu: Features: L2:0x07120206
> Shader:0x00000000 Tiler:0x00000809 Mem:0x1 MMU:0x00002830 AS:0xff JS:0x7
> [ 2.780746] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req
> 150000000Hz, actual 148500000HZ div = 0)
> [ 2.790905] panfrost ff9a0000.gpu: shader_present=0xf l2_present=0x1
>
> root@arm:~# cat /sys/kernel/debug/energy_model/ff9a0000.gpu/flags
> 0x1
> root@arm:~# grep . /sys/kernel/debug/energy_model/ff9a0000.gpu/ps*/*
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/cost:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/frequency:200000
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/inefficient:1
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/performance:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/power:404250
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/cost:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/frequency:300000
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/inefficient:1
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/performance:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/power:606375
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/cost:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/frequency:400000
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/inefficient:1
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/performance:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/power:808500
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/cost:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/frequency:600000
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/inefficient:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/performance:0
> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/power:1505790
>
> --------------------------------------------------------
>
> The EM for the GPU is not modified during the boot like the CPUs'
> EM are, thus this code is not triggered. Although, the API is
> open and in theory the GPU EM can be modified at runtime
> as well and it will reach that em_compute_costs() issue
> with 'performance' field having value 0.
>
> So this v4 patch would be needed in this case.
>
> Please re-send this v4 patch as a completely new message.
>
> Thanks for looking at that code path and the fix for potential
> issue.
>
> You can also add my:
>
> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
>
> Regrds,
> Lukasz
Got it - patch resent with Reviewed-by.
https://lore.kernel.org/all/tencent_7F99ED4767C1AF7889D0D8AD50F34859CE06@qq.com/
在 2025/4/18 09:26, Yaxiong Tian 写道: > 在 2025/4/17 21:27, Lukasz Luba 写道: >> I have run with the GPU EM registered in the boot: >> >> ------------------------------------------------------- >> [ 2.753333] panfrost ff9a0000.gpu: EM: created perf domain >> [ 2.759863] panfrost ff9a0000.gpu: mali-t860 id 0x860 major 0x2 >> minor 0x0 status 0x0 >> [ 2.768530] panfrost ff9a0000.gpu: features: 00000000,00000407, >> issues: 00000000,24040400 >> [ 2.777678] panfrost ff9a0000.gpu: Features: L2:0x07120206 >> Shader:0x00000000 Tiler:0x00000809 Mem:0x1 MMU:0x00002830 AS:0xff JS:0x7 >> [ 2.780746] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot >> req 150000000Hz, actual 148500000HZ div = 0) >> [ 2.790905] panfrost ff9a0000.gpu: shader_present=0xf l2_present=0x1 >> >> root@arm:~# cat /sys/kernel/debug/energy_model/ff9a0000.gpu/flags >> 0x1 >> root@arm:~# grep . /sys/kernel/debug/energy_model/ff9a0000.gpu/ps*/* >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/cost:0 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/frequency:200000 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/inefficient:1 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/performance:0 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/power:404250 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/cost:0 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/frequency:300000 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/inefficient:1 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/performance:0 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/power:606375 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/cost:0 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/frequency:400000 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/inefficient:1 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/performance:0 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/power:808500 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/cost:0 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/frequency:600000 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/inefficient:0 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/performance:0 >> /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/power:1505790 >> >> -------------------------------------------------------- >> >> The EM for the GPU is not modified during the boot like the CPUs' >> EM are, thus this code is not triggered. Although, the API is >> open and in theory the GPU EM can be modified at runtime >> as well and it will reach that em_compute_costs() issue >> with 'performance' field having value 0. >> >> So this v4 patch would be needed in this case. >> >> Please re-send this v4 patch as a completely new message. >> >> Thanks for looking at that code path and the fix for potential >> issue. >> >> You can also add my: >> >> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> >> >> Regrds, >> Lukasz > > Got it - patch resent with Reviewed-by. > > https://lore.kernel.org/all/tencent_7F99ED4767C1AF7889D0D8AD50F34859CE06@qq.com/ > Hi Rafael: This bug appears to be stalled. Are there any alternative fixes we could consider?
© 2016 - 2025 Red Hat, Inc.