[PATCH v2] sched/fair: Fix cpu_util runnable_avg arithmetic

Hongyan Xia posted 1 patch 2 days, 19 hours ago
kernel/sched/fair.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
[PATCH v2] sched/fair: Fix cpu_util runnable_avg arithmetic
Posted by Hongyan Xia 2 days, 19 hours ago
From: Hongyan Xia <hongyan.xia@transsion.com>

If we take runnable_avg in max(runnable_avg, util_avg) in cpu_util(), we
should then add or subtract task runnable_avg, but the arithmetic below
is still with task util_avg. This mixes runnable_avg with util_avg which
is incorrect.

Fix by always doing arithmetic with runnable_avg and only take
max(runnable_avg, util_avg) at the last step.

Fixes: 7d0583cf9ec7 ("sched/fair, cpufreq: Introduce 'runnable boosting'")
Signed-off-by: Hongyan Xia <hongyan.xia@transsion.com>

---
Changed in v2:
- Rebase against the latest sched/core
---
 kernel/sched/fair.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f4ed841f766f..1b23e73f48b0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8968,25 +8968,32 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 static unsigned long
 cpu_util(int cpu, struct task_struct *p, int dst_cpu, int boost)
 {
+	bool add_task = p && task_cpu(p) != cpu && dst_cpu == cpu;
+	bool sub_task = p && task_cpu(p) == cpu && dst_cpu != cpu;
 	struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
 	unsigned long util = READ_ONCE(cfs_rq->avg.util_avg);
 	unsigned long runnable;
 
-	if (boost) {
-		runnable = READ_ONCE(cfs_rq->avg.runnable_avg);
-		util = max(util, runnable);
-	}
-
 	/*
 	 * If @dst_cpu is -1 or @p migrates from @cpu to @dst_cpu remove its
 	 * contribution. If @p migrates from another CPU to @cpu add its
 	 * contribution. In all the other cases @cpu is not impacted by the
 	 * migration so its util_avg is already correct.
 	 */
-	if (p && task_cpu(p) == cpu && dst_cpu != cpu)
-		lsub_positive(&util, task_util(p));
-	else if (p && task_cpu(p) != cpu && dst_cpu == cpu)
+	if (add_task)
 		util += task_util(p);
+	else if (sub_task)
+		lsub_positive(&util, task_util(p));
+
+	if (boost) {
+		runnable = READ_ONCE(cfs_rq->avg.runnable_avg);
+		if (add_task)
+			runnable += READ_ONCE(p->se.avg.runnable_avg);
+		else if (sub_task)
+			lsub_positive(&runnable,
+				      READ_ONCE(p->se.avg.runnable_avg));
+		util = max(util, runnable);
+	}
 
 	if (sched_feat(UTIL_EST)) {
 		unsigned long util_est;
-- 
2.47.3

Re: [PATCH v2] sched/fair: Fix cpu_util runnable_avg arithmetic
Posted by Vincent Guittot 2 days, 16 hours ago
On Fri, 5 Jun 2026 at 11:43, Hongyan Xia <hongyan.xia@transsion.com> wrote:
>
> From: Hongyan Xia <hongyan.xia@transsion.com>
>
> If we take runnable_avg in max(runnable_avg, util_avg) in cpu_util(), we
> should then add or subtract task runnable_avg, but the arithmetic below
> is still with task util_avg. This mixes runnable_avg with util_avg which
> is incorrect.
>
> Fix by always doing arithmetic with runnable_avg and only take
> max(runnable_avg, util_avg) at the last step.
>
> Fixes: 7d0583cf9ec7 ("sched/fair, cpufreq: Introduce 'runnable boosting'")
> Signed-off-by: Hongyan Xia <hongyan.xia@transsion.com>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

>
> ---
> Changed in v2:
> - Rebase against the latest sched/core
> ---
>  kernel/sched/fair.c | 23 +++++++++++++++--------
>  1 file changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f4ed841f766f..1b23e73f48b0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8968,25 +8968,32 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>  static unsigned long
>  cpu_util(int cpu, struct task_struct *p, int dst_cpu, int boost)
>  {
> +       bool add_task = p && task_cpu(p) != cpu && dst_cpu == cpu;
> +       bool sub_task = p && task_cpu(p) == cpu && dst_cpu != cpu;
>         struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
>         unsigned long util = READ_ONCE(cfs_rq->avg.util_avg);
>         unsigned long runnable;
>
> -       if (boost) {
> -               runnable = READ_ONCE(cfs_rq->avg.runnable_avg);
> -               util = max(util, runnable);
> -       }
> -
>         /*
>          * If @dst_cpu is -1 or @p migrates from @cpu to @dst_cpu remove its
>          * contribution. If @p migrates from another CPU to @cpu add its
>          * contribution. In all the other cases @cpu is not impacted by the
>          * migration so its util_avg is already correct.
>          */
> -       if (p && task_cpu(p) == cpu && dst_cpu != cpu)
> -               lsub_positive(&util, task_util(p));
> -       else if (p && task_cpu(p) != cpu && dst_cpu == cpu)
> +       if (add_task)
>                 util += task_util(p);
> +       else if (sub_task)
> +               lsub_positive(&util, task_util(p));
> +
> +       if (boost) {
> +               runnable = READ_ONCE(cfs_rq->avg.runnable_avg);
> +               if (add_task)
> +                       runnable += READ_ONCE(p->se.avg.runnable_avg);
> +               else if (sub_task)
> +                       lsub_positive(&runnable,
> +                                     READ_ONCE(p->se.avg.runnable_avg));
> +               util = max(util, runnable);
> +       }
>
>         if (sched_feat(UTIL_EST)) {
>                 unsigned long util_est;
> --
> 2.47.3
>
Re: [PATCH v2] sched/fair: Fix cpu_util runnable_avg arithmetic
Posted by Dietmar Eggemann 2 days, 18 hours ago
On 05.06.26 11:43, Hongyan Xia wrote:
> From: Hongyan Xia <hongyan.xia@transsion.com>
> 
> If we take runnable_avg in max(runnable_avg, util_avg) in cpu_util(), we
> should then add or subtract task runnable_avg, but the arithmetic below
> is still with task util_avg. This mixes runnable_avg with util_avg which
> is incorrect.
> 
> Fix by always doing arithmetic with runnable_avg and only take
> max(runnable_avg, util_avg) at the last step.
> 
> Fixes: 7d0583cf9ec7 ("sched/fair, cpufreq: Introduce 'runnable boosting'")
> Signed-off-by: Hongyan Xia <hongyan.xia@transsion.com>

Does this fix the issue in EAS energy calculation you mentioned
initially? We now add/subtract task rbl_avg from CPU rbl_avg but can we
now use this value correctly in util_avg based EAS?

How do you want to solve the power consumption regression in you
low-power use cases? Since you mentioned per-CPU tasks in those
contention scenarios (per-CPU worker vs producer *), do you plan to only
use boost in cpu_util() in case the affinity of p (worker) is not
constrained? Not sure whether the consumer (CPU affinity not
constrained) also has rbl_avg > util_avg?

*
https://lore.kernel.org/r/4adbab4d-f9e4-4354-aa1e-48f11b1fd208@transsion.com


[...]
Re: [PATCH v2] sched/fair: Fix cpu_util runnable_avg arithmetic
Posted by Hongyan Xia 2 days, 16 hours ago
On 6/5/2026 6:35 PM, Dietmar Eggemann wrote:
> On 05.06.26 11:43, Hongyan Xia wrote:
>> From: Hongyan Xia <hongyan.xia@transsion.com>
>>
>> If we take runnable_avg in max(runnable_avg, util_avg) in cpu_util(), we
>> should then add or subtract task runnable_avg, but the arithmetic below
>> is still with task util_avg. This mixes runnable_avg with util_avg which
>> is incorrect.
>>
>> Fix by always doing arithmetic with runnable_avg and only take
>> max(runnable_avg, util_avg) at the last step.
>>
>> Fixes: 7d0583cf9ec7 ("sched/fair, cpufreq: Introduce 'runnable boosting'")
>> Signed-off-by: Hongyan Xia <hongyan.xia@transsion.com>
>
> Does this fix the issue in EAS energy calculation you mentioned
> initially? We now add/subtract task rbl_avg from CPU rbl_avg but can we
> now use this value correctly in util_avg based EAS?

It does improve things a bit. It used to occasionally pile up tasks on
the same CPU, which happens less after this patch. At least it now gets
the maths right so this fix should probably be in regardless.

> How do you want to solve the power consumption regression in you
> low-power use cases? Since you mentioned per-CPU tasks in those
> contention scenarios (per-CPU worker vs producer *), do you plan to only
> use boost in cpu_util() in case the affinity of p (worker) is not
> constrained? Not sure whether the consumer (CPU affinity not
> constrained) also has rbl_avg > util_avg?

This patch only gets the maths right but the energy regression is still
big because frequency hasn't changed. The producer-consumer is only the
worst offender, not the only one. Trouble is that runnable_avg is just a
big number to deal with in general, and you could easily double or
triple your frequency if you have many small threads around (which is
the case in our mobile cases).

We haven't found a good solution to solve it completely. I keep
wondering if there could be a better metric than raw runnable_avg. One
that is not so big in magnitude and does a much better job to tell the
true contention where boosting frequency helps.

> *
> https://lore.kernel.org/r/4adbab4d-f9e4-4354-aa1e-48f11b1fd208@transsion.com
>
>
> [...]