[PATCH 0/6 v7] sched/fair: Add push task mecansim and hadle more EAS cases

Vincent Guittot posted 6 patches 11 hours ago
kernel/sched/fair.c     | 350 +++++++++++++++++++++++++++++++++++-----
kernel/sched/sched.h    |  46 ++++--
kernel/sched/topology.c |   3 +
3 files changed, 346 insertions(+), 53 deletions(-)
[PATCH 0/6 v7] sched/fair: Add push task mecansim and hadle more EAS cases
Posted by Vincent Guittot 11 hours ago
This is a subset of [1] (sched/fair: Rework EAS to handle more cases)

[1] https://lore.kernel.org/all/20250314163614.1356125-1-vincent.guittot@linaro.org/

The current Energy Aware Scheduler has some known limitations which have
became more and more visible with features like uclamp as an example. This
serie tries to fix some of those issues:
- tasks stacked on the same CPU of a PD
- tasks stuck on the wrong CPU.

Patch 1 fixes the case where a CPU is wrongly classified as overloaded
whereas it is capped to a lower compute capacity. This wrong classification
can prevent periodic load balancer to select a group_misfit_task CPU
because group_overloaded has higher priority.

Patch 2 removes the need of testing uclamp_min in cpu_overutilized to
trigger the active migration of a task on another CPU.

Patch 3 prepares select_task_rq_fair() to be called without TTWU, Fork or
Exec flags when we just want to look for a possible better CPU.

Patch 4 adds push call back mecanism to fair scheduler but doesn't enable
it.

Patch 5 enable has_idle_core for !SMP system to track if there may be an
idle CPU in the LLC.

Patch 6 adds some conditions to enable pushing runnable tasks for EAS:
- when a task is stuck on a CPU and the system is not overutilized.
- if there is a possible idle CPU when the system is overutilized.

More tests results will come later as I wanted to send the pachtset before
LPC.

Tbench  on dragonboard rb5
schedutil and EAS enabled

# process     tip                   +patchset
1              29.1(+/-4.1%)        124.7(+/-12.3%) +329%
2              60.0(+/-0.9%)        216.1(+/- 7.9%) +260%
4             255.8(+/-1.9%)        421.4(+/- 2.0%)  +65%       
8            1317.3(+/-4.6%)       1396.1(+/- 3.0%)   +6%
16            958.2(+/-4.6%)        979.6(+/- 2.0%)   +2%

Hackbench didn't show any difference


Vincent Guittot (6):
  sched/fair: Filter false overloaded_group case for EAS
  sched/fair: Update overutilized detection
  sched/fair: Prepare select_task_rq_fair() to be called for new cases
  sched/fair: Add push task mechanism for fair
  sched/fair: Enable idle core tracking for !SMT
  sched/fair: Add EAS and idle cpu push trigger

 kernel/sched/fair.c     | 350 +++++++++++++++++++++++++++++++++++-----
 kernel/sched/sched.h    |  46 ++++--
 kernel/sched/topology.c |   3 +
 3 files changed, 346 insertions(+), 53 deletions(-)

-- 
2.43.0
Re: [PATCH 0/6 v7] sched/fair: Add push task mecansim and hadle more EAS cases
Posted by Christian Loehle 7 hours ago
On 12/1/25 09:13, Vincent Guittot wrote:
> This is a subset of [1] (sched/fair: Rework EAS to handle more cases)
> 
> [1] https://lore.kernel.org/all/20250314163614.1356125-1-vincent.guittot@linaro.org/
> 
> The current Energy Aware Scheduler has some known limitations which have
> became more and more visible with features like uclamp as an example. This
> serie tries to fix some of those issues:
> - tasks stacked on the same CPU of a PD
> - tasks stuck on the wrong CPU.
> 
> Patch 1 fixes the case where a CPU is wrongly classified as overloaded
> whereas it is capped to a lower compute capacity. This wrong classification
> can prevent periodic load balancer to select a group_misfit_task CPU
> because group_overloaded has higher priority.
> 
> Patch 2 removes the need of testing uclamp_min in cpu_overutilized to
> trigger the active migration of a task on another CPU.
> 
> Patch 3 prepares select_task_rq_fair() to be called without TTWU, Fork or
> Exec flags when we just want to look for a possible better CPU.
> 
> Patch 4 adds push call back mecanism to fair scheduler but doesn't enable
> it.
> 
> Patch 5 enable has_idle_core for !SMP system to track if there may be an
> idle CPU in the LLC.
> 
> Patch 6 adds some conditions to enable pushing runnable tasks for EAS:
> - when a task is stuck on a CPU and the system is not overutilized.
> - if there is a possible idle CPU when the system is overutilized.
> 
> More tests results will come later as I wanted to send the pachtset before
> LPC.
> 
> Tbench  on dragonboard rb5
> schedutil and EAS enabled
> 
> # process     tip                   +patchset
> 1              29.1(+/-4.1%)        124.7(+/-12.3%) +329%
> 2              60.0(+/-0.9%)        216.1(+/- 7.9%) +260%
> 4             255.8(+/-1.9%)        421.4(+/- 2.0%)  +65%       
> 8            1317.3(+/-4.6%)       1396.1(+/- 3.0%)   +6%
> 16            958.2(+/-4.6%)        979.6(+/- 2.0%)   +2%

Just so I understand, there's no uclamp in the workload here?
Could you expand on the workload a little, what were the parameters/settings?
So the significant increase is really only for nr_proc < nr_cpus, with the
observed throughput increase it'll probably be something like "always running
on little CPUs" vs "always running on big CPUs", is that what's happening?
Also shouldn't tbench still have plenty of wakeup events? It issues plenty of
TCP anyway.

> 
> Hackbench didn't show any difference
> 
> 
> Vincent Guittot (6):
>   sched/fair: Filter false overloaded_group case for EAS
>   sched/fair: Update overutilized detection
>   sched/fair: Prepare select_task_rq_fair() to be called for new cases
>   sched/fair: Add push task mechanism for fair
>   sched/fair: Enable idle core tracking for !SMT
>   sched/fair: Add EAS and idle cpu push trigger
> 
>  kernel/sched/fair.c     | 350 +++++++++++++++++++++++++++++++++++-----
>  kernel/sched/sched.h    |  46 ++++--
>  kernel/sched/topology.c |   3 +
>  3 files changed, 346 insertions(+), 53 deletions(-)
>
Re: [PATCH 0/6 v7] sched/fair: Add push task mecansim and hadle more EAS cases
Posted by Vincent Guittot 3 hours ago
On Mon, 1 Dec 2025 at 14:31, Christian Loehle <christian.loehle@arm.com> wrote:
>
> On 12/1/25 09:13, Vincent Guittot wrote:
> > This is a subset of [1] (sched/fair: Rework EAS to handle more cases)
> >
> > [1] https://lore.kernel.org/all/20250314163614.1356125-1-vincent.guittot@linaro.org/
> >
> > The current Energy Aware Scheduler has some known limitations which have
> > became more and more visible with features like uclamp as an example. This
> > serie tries to fix some of those issues:
> > - tasks stacked on the same CPU of a PD
> > - tasks stuck on the wrong CPU.
> >
> > Patch 1 fixes the case where a CPU is wrongly classified as overloaded
> > whereas it is capped to a lower compute capacity. This wrong classification
> > can prevent periodic load balancer to select a group_misfit_task CPU
> > because group_overloaded has higher priority.
> >
> > Patch 2 removes the need of testing uclamp_min in cpu_overutilized to
> > trigger the active migration of a task on another CPU.
> >
> > Patch 3 prepares select_task_rq_fair() to be called without TTWU, Fork or
> > Exec flags when we just want to look for a possible better CPU.
> >
> > Patch 4 adds push call back mecanism to fair scheduler but doesn't enable
> > it.
> >
> > Patch 5 enable has_idle_core for !SMP system to track if there may be an
> > idle CPU in the LLC.
> >
> > Patch 6 adds some conditions to enable pushing runnable tasks for EAS:
> > - when a task is stuck on a CPU and the system is not overutilized.
> > - if there is a possible idle CPU when the system is overutilized.
> >
> > More tests results will come later as I wanted to send the pachtset before
> > LPC.
> >
> > Tbench  on dragonboard rb5
> > schedutil and EAS enabled
> >
> > # process     tip                   +patchset
> > 1              29.1(+/-4.1%)        124.7(+/-12.3%) +329%
> > 2              60.0(+/-0.9%)        216.1(+/- 7.9%) +260%
> > 4             255.8(+/-1.9%)        421.4(+/- 2.0%)  +65%
> > 8            1317.3(+/-4.6%)       1396.1(+/- 3.0%)   +6%
> > 16            958.2(+/-4.6%)        979.6(+/- 2.0%)   +2%
>
> Just so I understand, there's no uclamp in the workload here?

Yes, no uclamp

> Could you expand on the workload a little, what were the parameters/settings?

for g in 1 2 4 8 16; do
for i in {0..8}; do
sync
sleep 3.777
tbench -t 10 $g
done
done

> So the significant increase is really only for nr_proc < nr_cpus, with the

yes

> observed throughput increase it'll probably be something like "always running
> on little CPUs" vs "always running on big CPUs", is that what's happening?

I have looked at the details. These results are part of the bench that
I'm running with hackbench but It's most probably come from migrating
task on a better cpu

> Also shouldn't tbench still have plenty of wakeup events? It issues plenty of
> TCP anyway.

Yes


>
> >
> > Hackbench didn't show any difference
> >
> >
> > Vincent Guittot (6):
> >   sched/fair: Filter false overloaded_group case for EAS
> >   sched/fair: Update overutilized detection
> >   sched/fair: Prepare select_task_rq_fair() to be called for new cases
> >   sched/fair: Add push task mechanism for fair
> >   sched/fair: Enable idle core tracking for !SMT
> >   sched/fair: Add EAS and idle cpu push trigger
> >
> >  kernel/sched/fair.c     | 350 +++++++++++++++++++++++++++++++++++-----
> >  kernel/sched/sched.h    |  46 ++++--
> >  kernel/sched/topology.c |   3 +
> >  3 files changed, 346 insertions(+), 53 deletions(-)
> >
>
Re: [PATCH 0/6 v7] sched/fair: Add push task mecansim and hadle more EAS cases
Posted by Christian Loehle 6 hours ago
Nit in the title: mechanism, handle

On 12/1/25 13:31, Christian Loehle wrote:
> On 12/1/25 09:13, Vincent Guittot wrote:
>> This is a subset of [1] (sched/fair: Rework EAS to handle more cases)
>>
>> [1] https://lore.kernel.org/all/20250314163614.1356125-1-vincent.guittot@linaro.org/
>>
>> The current Energy Aware Scheduler has some known limitations which have
>> became more and more visible with features like uclamp as an example. This
>> serie tries to fix some of those issues:
>> - tasks stacked on the same CPU of a PD
>> - tasks stuck on the wrong CPU.
>>
>> Patch 1 fixes the case where a CPU is wrongly classified as overloaded
>> whereas it is capped to a lower compute capacity. This wrong classification
>> can prevent periodic load balancer to select a group_misfit_task CPU
>> because group_overloaded has higher priority.
>>
>> Patch 2 removes the need of testing uclamp_min in cpu_overutilized to
>> trigger the active migration of a task on another CPU.
>>
>> Patch 3 prepares select_task_rq_fair() to be called without TTWU, Fork or
>> Exec flags when we just want to look for a possible better CPU.
>>
>> Patch 4 adds push call back mecanism to fair scheduler but doesn't enable
>> it.
>>
>> Patch 5 enable has_idle_core for !SMP system to track if there may be an
>> idle CPU in the LLC.
>>
>> Patch 6 adds some conditions to enable pushing runnable tasks for EAS:
>> - when a task is stuck on a CPU and the system is not overutilized.
>> - if there is a possible idle CPU when the system is overutilized.
>>
>> More tests results will come later as I wanted to send the pachtset before
>> LPC.
>>
>> Tbench  on dragonboard rb5
>> schedutil and EAS enabled
>>
>> # process     tip                   +patchset
>> 1              29.1(+/-4.1%)        124.7(+/-12.3%) +329%
>> 2              60.0(+/-0.9%)        216.1(+/- 7.9%) +260%
>> 4             255.8(+/-1.9%)        421.4(+/- 2.0%)  +65%       
>> 8            1317.3(+/-4.6%)       1396.1(+/- 3.0%)   +6%
>> 16            958.2(+/-4.6%)        979.6(+/- 2.0%)   +2%
> 
> Just so I understand, there's no uclamp in the workload here?
> Could you expand on the workload a little, what were the parameters/settings?
> So the significant increase is really only for nr_proc < nr_cpus, with the
> observed throughput increase it'll probably be something like "always running
> on little CPUs" vs "always running on big CPUs", is that what's happening?
> Also shouldn't tbench still have plenty of wakeup events? It issues plenty of
> TCP anyway.

... or if not why does OU not trigger on tip?

> 
>>
>> Hackbench didn't show any difference
>>
>>
>> Vincent Guittot (6):
>>   sched/fair: Filter false overloaded_group case for EAS
>>   sched/fair: Update overutilized detection
>>   sched/fair: Prepare select_task_rq_fair() to be called for new cases
>>   sched/fair: Add push task mechanism for fair
>>   sched/fair: Enable idle core tracking for !SMT
>>   sched/fair: Add EAS and idle cpu push trigger
>>
>>  kernel/sched/fair.c     | 350 +++++++++++++++++++++++++++++++++++-----
>>  kernel/sched/sched.h    |  46 ++++--
>>  kernel/sched/topology.c |   3 +
>>  3 files changed, 346 insertions(+), 53 deletions(-)
>>

I can't apply this on yesterday's released 6.18 and not on tip/sched-core, what's
this based on? Can I get a branch or a 6.18 rebase?
Re: [PATCH 0/6 v7] sched/fair: Add push task mecansim and hadle more EAS cases
Posted by Vincent Guittot 3 hours ago
On Mon, 1 Dec 2025 at 14:57, Christian Loehle <christian.loehle@arm.com> wrote:
>
> Nit in the title: mechanism, handle
>
> On 12/1/25 13:31, Christian Loehle wrote:
> > On 12/1/25 09:13, Vincent Guittot wrote:
> >> This is a subset of [1] (sched/fair: Rework EAS to handle more cases)
> >>
> >> [1] https://lore.kernel.org/all/20250314163614.1356125-1-vincent.guittot@linaro.org/
> >>
> >> The current Energy Aware Scheduler has some known limitations which have
> >> became more and more visible with features like uclamp as an example. This
> >> serie tries to fix some of those issues:
> >> - tasks stacked on the same CPU of a PD
> >> - tasks stuck on the wrong CPU.
> >>
> >> Patch 1 fixes the case where a CPU is wrongly classified as overloaded
> >> whereas it is capped to a lower compute capacity. This wrong classification
> >> can prevent periodic load balancer to select a group_misfit_task CPU
> >> because group_overloaded has higher priority.
> >>
> >> Patch 2 removes the need of testing uclamp_min in cpu_overutilized to
> >> trigger the active migration of a task on another CPU.
> >>
> >> Patch 3 prepares select_task_rq_fair() to be called without TTWU, Fork or
> >> Exec flags when we just want to look for a possible better CPU.
> >>
> >> Patch 4 adds push call back mecanism to fair scheduler but doesn't enable
> >> it.
> >>
> >> Patch 5 enable has_idle_core for !SMP system to track if there may be an
> >> idle CPU in the LLC.
> >>
> >> Patch 6 adds some conditions to enable pushing runnable tasks for EAS:
> >> - when a task is stuck on a CPU and the system is not overutilized.
> >> - if there is a possible idle CPU when the system is overutilized.
> >>
> >> More tests results will come later as I wanted to send the pachtset before
> >> LPC.
> >>
> >> Tbench  on dragonboard rb5
> >> schedutil and EAS enabled
> >>
> >> # process     tip                   +patchset
> >> 1              29.1(+/-4.1%)        124.7(+/-12.3%) +329%
> >> 2              60.0(+/-0.9%)        216.1(+/- 7.9%) +260%
> >> 4             255.8(+/-1.9%)        421.4(+/- 2.0%)  +65%
> >> 8            1317.3(+/-4.6%)       1396.1(+/- 3.0%)   +6%
> >> 16            958.2(+/-4.6%)        979.6(+/- 2.0%)   +2%
> >
> > Just so I understand, there's no uclamp in the workload here?
> > Could you expand on the workload a little, what were the parameters/settings?
> > So the significant increase is really only for nr_proc < nr_cpus, with the
> > observed throughput increase it'll probably be something like "always running
> > on little CPUs" vs "always running on big CPUs", is that what's happening?
> > Also shouldn't tbench still have plenty of wakeup events? It issues plenty of
> > TCP anyway.
>
> ... or if not why does OU not trigger on tip?
>
> >
> >>
> >> Hackbench didn't show any difference
> >>
> >>
> >> Vincent Guittot (6):
> >>   sched/fair: Filter false overloaded_group case for EAS
> >>   sched/fair: Update overutilized detection
> >>   sched/fair: Prepare select_task_rq_fair() to be called for new cases
> >>   sched/fair: Add push task mechanism for fair
> >>   sched/fair: Enable idle core tracking for !SMT
> >>   sched/fair: Add EAS and idle cpu push trigger
> >>
> >>  kernel/sched/fair.c     | 350 +++++++++++++++++++++++++++++++++++-----
> >>  kernel/sched/sched.h    |  46 ++++--
> >>  kernel/sched/topology.c |   3 +
> >>  3 files changed, 346 insertions(+), 53 deletions(-)
> >>
>
> I can't apply this on yesterday's released 6.18 and not on tip/sched-core, what's
> this based on? Can I get a branch or a 6.18 rebase?

The patchset is based on tip/sched/core commit 33cf66d88306
("sched/fair: Proportional newidle balance")