[v2] sched/fair: Limit access to overutilized

[PATCH v2 0/2] sched/fair: Limit access to overutilized

Posted by Shrikanth Hegde 1 year, 11 months ago

When running a ISV workload on a large system (240 Cores, SMT8), it was
observed from perf profile that newidle_balance and enqueue_task_fair
were consuming more cycles. Perf annotate showed that most of the time
was spent on accessing overutilized field of root domain.

Aboorva was able to simulate similar perf profile by making some
changes to stress-ng --wait. Both newidle_balance and enqueue_task_fair
consume close to 5-7%. Perf annotate shows that most of the cycles are spent
in accessing rd,rd->overutilized field.

Overutilized was added for EAS(Energy aware scheduler) to choose either
EAS aware load balancing or regular load balance. As checked, on x86 and
powerpc both overload and overutilized share the same cacheline in rd.
Updating overutilized is not required for non-EAS platforms.

Patch 1/2 is the main patch. It helps in reducing the above said issue.
Both the functions don't show up in the profile. With patch comparison is in
changelog. With the patch stated problem in the ISV workload also got
solved and throughput has improved.
Patch 2/2 is only code refactoring to use the helper function instead of
direct access of the field, so one would come to know that it is accessed
only in EAS. This depends on 1/2 to be applied first

Thanks to Aboorva Devarajan and Nysal Jan K A for helping in recreating,
debugging this issue and verifying the patch.
Detailed perf annotate can be found in cover letter of v1.

v2 -> v1:
Chen Yu pointed out minor issue in code. corrected that code and updated
the changelog.

v1: https://lore.kernel.org/lkml/20240223150707.410417-1-sshegde@linux.ibm.com/

Shrikanth Hegde (2):
  sched/fair: Add EAS checks before updating overutilized
  sched/fair: Use helper function to access rd->overutilized

 kernel/sched/fair.c | 49 +++++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 13 deletions(-)

--
2.39.3

Re: [PATCH v2 0/2] sched/fair: Limit access to overutilized

Posted by Qais Yousef 1 year, 11 months ago

On 02/28/24 12:46, Shrikanth Hegde wrote:
> When running a ISV workload on a large system (240 Cores, SMT8), it was
> observed from perf profile that newidle_balance and enqueue_task_fair
> were consuming more cycles. Perf annotate showed that most of the time
> was spent on accessing overutilized field of root domain.
> 
> Aboorva was able to simulate similar perf profile by making some
> changes to stress-ng --wait. Both newidle_balance and enqueue_task_fair
> consume close to 5-7%. Perf annotate shows that most of the cycles are spent
> in accessing rd,rd->overutilized field.
> 
> Overutilized was added for EAS(Energy aware scheduler) to choose either
> EAS aware load balancing or regular load balance. As checked, on x86 and

It actually toggles load balance on/off (off if !overutilized).

misfit load balance used to be controlled by this but this was decoupled since
commit e5ed0550c04c ("sched/fair: unlink misfit task from cpu overutilized")

> powerpc both overload and overutilized share the same cacheline in rd.
> Updating overutilized is not required for non-EAS platforms.

Is the fact these two share the cacheline is part of the problem? From patch
1 it seems the fact that overutlized is updated often on different cpus is the
problem? Did you try to move overutlized to different places to see if this
alternatively helps?

The patches look fine to me. I am just trying to verify that indeed the access
to overutilzed is the problem, not something else being on the same cacheline
is accidentally being slowed down, which means the problem can resurface in the
future.

> 
> Patch 1/2 is the main patch. It helps in reducing the above said issue.
> Both the functions don't show up in the profile. With patch comparison is in
> changelog. With the patch stated problem in the ISV workload also got
> solved and throughput has improved.
> Patch 2/2 is only code refactoring to use the helper function instead of
> direct access of the field, so one would come to know that it is accessed
> only in EAS. This depends on 1/2 to be applied first
> 
> Thanks to Aboorva Devarajan and Nysal Jan K A for helping in recreating,
> debugging this issue and verifying the patch.
> Detailed perf annotate can be found in cover letter of v1.
> 
> v2 -> v1:
> Chen Yu pointed out minor issue in code. corrected that code and updated
> the changelog.
> 
> v1: https://lore.kernel.org/lkml/20240223150707.410417-1-sshegde@linux.ibm.com/
> 
> Shrikanth Hegde (2):
>   sched/fair: Add EAS checks before updating overutilized
>   sched/fair: Use helper function to access rd->overutilized
> 
>  kernel/sched/fair.c | 49 +++++++++++++++++++++++++++++++++------------
>  1 file changed, 36 insertions(+), 13 deletions(-)
> 
> --
> 2.39.3
>

Re: [PATCH v2 0/2] sched/fair: Limit access to overutilized

Posted by Shrikanth Hegde 1 year, 11 months ago


On 2/29/24 5:38 AM, Qais Yousef wrote:
> On 02/28/24 12:46, Shrikanth Hegde wrote:
[...]
>> Overutilized was added for EAS(Energy aware scheduler) to choose either
>> EAS aware load balancing or regular load balance. As checked, on x86 and
> 
> It actually toggles load balance on/off (off if !overutilized).
> 
> misfit load balance used to be controlled by this but this was decoupled since
> commit e5ed0550c04c ("sched/fair: unlink misfit task from cpu overutilized")
> 

Ok.

>> powerpc both overload and overutilized share the same cacheline in rd.
>> Updating overutilized is not required for non-EAS platforms.
> 
> Is the fact these two share the cacheline is part of the problem? From patch
> 1 it seems the fact that overutlized is updated often on different cpus is the
> problem? Did you try to move overutlized to different places to see if this
> alternatively helps?
> 
> The patches look fine to me. I am just trying to verify that indeed the access
> to overutilzed is the problem, not something else being on the same cacheline
> is accidentally being slowed down, which means the problem can resurface in the
> future.
> 

We did explicit cachealign for overload. By doing that newidle_balance goes away from
perf profile. But enqueue_task_fair still remains. That because there is load-store 
tearing happening on overutilized field alone due to different CPUs accessing and 
updating it at the same time. 

We have also verified that rq->rd->overutilized in enqueue_task_fair path is the reason
for it showing up in perf profile. 

>>
[...]
>>
>> --
>> 2.39.3
>>

Re: [PATCH v2 0/2] sched/fair: Limit access to overutilized

Posted by Qais Yousef 1 year, 11 months ago

On 02/29/24 10:16, Shrikanth Hegde wrote:

> > Is the fact these two share the cacheline is part of the problem? From patch
> > 1 it seems the fact that overutlized is updated often on different cpus is the
> > problem? Did you try to move overutlized to different places to see if this
> > alternatively helps?
> > 
> > The patches look fine to me. I am just trying to verify that indeed the access
> > to overutilzed is the problem, not something else being on the same cacheline
> > is accidentally being slowed down, which means the problem can resurface in the
> > future.
> > 
> 
> We did explicit cachealign for overload. By doing that newidle_balance goes away from
> perf profile. But enqueue_task_fair still remains. That because there is load-store 

I don't have a solution, but this accidental dependency is something to ponder
for the future..

> tearing happening on overutilized field alone due to different CPUs accessing and 
> updating it at the same time. 
> 
> We have also verified that rq->rd->overutilized in enqueue_task_fair path is the reason
> for it showing up in perf profile. 

Something to ponder as well. Maybe this is better converted to a per-cpu
variable if it can cause such cacheline bouncing. But that's probably not good
enough on its own.

Anyway. Something to ponder for the future too. I think the current definition
of overutilized is due for a revisit and I'll keep this problem in mind if we
can introduce a friendlier pattern.

Thanks

--
Qais Yousef