[PATCH] PM: EM: Fix potential division-by-zero error in em_compute_costs()

Yaxiong Tian posted 1 patch 8 months, 1 week ago
There is a newer version of this series
kernel/power/energy_model.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
[PATCH] PM: EM: Fix potential division-by-zero error in em_compute_costs()
Posted by Yaxiong Tian 8 months, 1 week ago
From: Yaxiong Tian <tianyaxiong@kylinos.cn>

When the device is of a non-CPU type, table[i].performance won't be
initialized in the previous em_init_performance(), resulting in division
 by zero when calculating costs in em_compute_costs().

Considering that the performance field in struct em_perf_state is defined
as "CPU performance (capacity) at a given frequency", the original
calculation method should be maintained when the device is of a non-CPU
type.

Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove division")

Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
---
 kernel/power/energy_model.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index d9b7e2b38c7a..bbd95573d91e 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -231,9 +231,11 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
 			    unsigned long flags)
 {
 	unsigned long prev_cost = ULONG_MAX;
+	u64 fmax;
 	int i, ret;
 
 	/* Compute the cost of each performance state. */
+	fmax = (u64) table[nr_states - 1].frequency;
 	for (i = nr_states - 1; i >= 0; i--) {
 		unsigned long power_res, cost;
 
@@ -245,9 +247,15 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
 				return -EINVAL;
 			}
 		} else {
-			/* increase resolution of 'cost' precision */
-			power_res = table[i].power * 10;
-			cost = power_res / table[i].performance;
+			if (_is_cpu_device(dev)) {
+				/* increase resolution of 'cost' precision */
+				power_res = table[i].power * 10;
+				cost = power_res / table[i].performance;
+			} else {
+				power_res = table[i].power;
+				cost = div64_u64(fmax * power_res, table[i].frequency);
+
+			}
 		}
 
 		table[i].cost = cost;
-- 
2.25.1
Re: [PATCH] PM: EM: Fix potential division-by-zero error in em_compute_costs()
Posted by Rafael J. Wysocki 8 months, 1 week ago
On Thu, Apr 10, 2025 at 7:39 AM Yaxiong Tian <iambestgod@qq.com> wrote:
>
> From: Yaxiong Tian <tianyaxiong@kylinos.cn>
>
> When the device is of a non-CPU type, table[i].performance won't be
> initialized in the previous em_init_performance(), resulting in division
>  by zero when calculating costs in em_compute_costs().
>
> Considering that the performance field in struct em_perf_state is defined
> as "CPU performance (capacity) at a given frequency", the original
> calculation method should be maintained when the device is of a non-CPU
> type.
>
> Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove division")
>
> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
> ---
>  kernel/power/energy_model.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index d9b7e2b38c7a..bbd95573d91e 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -231,9 +231,11 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
>                             unsigned long flags)
>  {
>         unsigned long prev_cost = ULONG_MAX;
> +       u64 fmax;

Why not initialize it here?  Also please retain the reverse x-mas tree
ordering of declarations.

>         int i, ret;
>
>         /* Compute the cost of each performance state. */
> +       fmax = (u64) table[nr_states - 1].frequency;

No need to cast to u64 explicitly (it will be cast anyway).

>         for (i = nr_states - 1; i >= 0; i--) {
>                 unsigned long power_res, cost;
>
> @@ -245,9 +247,15 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
>                                 return -EINVAL;
>                         }
>                 } else {
> -                       /* increase resolution of 'cost' precision */
> -                       power_res = table[i].power * 10;
> -                       cost = power_res / table[i].performance;
> +                       if (_is_cpu_device(dev)) {
> +                               /* increase resolution of 'cost' precision */
> +                               power_res = table[i].power * 10;
> +                               cost = power_res / table[i].performance;
> +                       } else {
> +                               power_res = table[i].power;
> +                               cost = div64_u64(fmax * power_res, table[i].frequency);

Why is it necessary to compute the "cost" field value for non-CPU
devices at all?

> +

An excess empty line.

> +                       }
>                 }
>
>                 table[i].cost = cost;
> --
Re: [PATCH] PM: EM: Fix potential division-by-zero error in em_compute_costs()
Posted by Yaxiong Tian 8 months, 1 week ago

在 2025/4/10 21:23, Rafael J. Wysocki 写道:
> On Thu, Apr 10, 2025 at 7:39 AM Yaxiong Tian <iambestgod@qq.com> wrote:
>>
>> From: Yaxiong Tian <tianyaxiong@kylinos.cn>
>>
>> When the device is of a non-CPU type, table[i].performance won't be
>> initialized in the previous em_init_performance(), resulting in division
>>   by zero when calculating costs in em_compute_costs().
>>
>> Considering that the performance field in struct em_perf_state is defined
>> as "CPU performance (capacity) at a given frequency", the original
>> calculation method should be maintained when the device is of a non-CPU
>> type.
>>
>> Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove division")
>>
>> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
>> ---
>>   kernel/power/energy_model.c | 14 +++++++++++---
>>   1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
>> index d9b7e2b38c7a..bbd95573d91e 100644
>> --- a/kernel/power/energy_model.c
>> +++ b/kernel/power/energy_model.c
>> @@ -231,9 +231,11 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
>>                              unsigned long flags)
>>   {
>>          unsigned long prev_cost = ULONG_MAX;
>> +       u64 fmax;
> 
> Why not initialize it here?  Also please retain the reverse x-mas tree
> ordering of declarations.
> 
There is indeed an issue with imperfect code style here.

>>          int i, ret;
>>
>>          /* Compute the cost of each performance state. */
>> +       fmax = (u64) table[nr_states - 1].frequency;
> 
> No need to cast to u64 explicitly (it will be cast anyway).
> 
>>          for (i = nr_states - 1; i >= 0; i--) {
>>                  unsigned long power_res, cost;
>>
>> @@ -245,9 +247,15 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table,
>>                                  return -EINVAL;
>>                          }
>>                  } else {
>> -                       /* increase resolution of 'cost' precision */
>> -                       power_res = table[i].power * 10;
>> -                       cost = power_res / table[i].performance;
>> +                       if (	) {
>> +                               /* increase resolution of 'cost' precision */
>> +                               power_res = table[i].power * 10;
>> +                               cost = power_res / table[i].performance;
>> +                       } else {
>> +                               power_res = table[i].power;
>> +                               cost = div64_u64(fmax * power_res, table[i].frequency);
> 
> Why is it necessary to compute the "cost" field value for non-CPU
> devices at all?
> 
Indeed, I didn't think of this issue—I was focused on ensuring the rest
of the code was correct and forgot that the 'cost' algorithm was only
for EAS energy efficiency calculations. I checked other parts of the
code, and 'cost' isn't used elsewhere. Therefore, this bug can be fixed
simply by adding a _is_cpu_device(dev) check. I'll make the change in
the next version and update the commit message accordingly.

>> +
> 
> An excess empty line.
> 
>> +                       }
>>                  }
>>
>>                  table[i].cost = cost;
>> --