[PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg

Vincent Guittot posted 4 patches 4 years, 5 months ago
kernel/sched/fair.c | 113 +++++++++++++++++++++++++++++---------------
1 file changed, 75 insertions(+), 38 deletions(-)
[PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg
Posted by Vincent Guittot 4 years, 5 months ago
Rick reported performance regressions in bugzilla because of cpu
frequency being lower than before:
    https://bugzilla.kernel.org/show_bug.cgi?id=215045

He bisected the problem to:
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")

More details are available in commit message of patch 1.

This patchset reverts the commit above and adds several checks when
propagating the changes in the hierarchy to make sure that we still have
coherent util_avg and util_sum.

Dietmar found a simple way to reproduce the WARN fixed by 
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
by looping on hackbench in several different sched group levels.

This patchset as run on the reproducer with success but it probably needs
more tests by people who faced the WARN before.

The changes done on util_sum have been also applied to runnable_sum and
load_sum which faces the same rounding problem although this has not been
reflected in measurable performance impact.

Changes for v3:
- split patch 1 in 2 patches
  - One to fix rick's regression
  - One to apply same changes in other places
- some typos
- move main comment so it appears in the 1st patch 

Changes for v2:
- fix wrong update of load_sum
- move a change from patch 3 to patch 2
- update patch 3 commit message

Vincent Guittot (4):
  sched/pelt: Relax the sync of util_sum with util_avg
  sched/pelt: Continue to relax the sync of util_sum with util_avg
  sched/pelt: Relax the sync of runnable_sum with runnable_avg
  sched/pelt: Relax the sync of load_sum with load_avg

 kernel/sched/fair.c | 113 +++++++++++++++++++++++++++++---------------
 1 file changed, 75 insertions(+), 38 deletions(-)

-- 
2.17.1

Re: [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg
Posted by Sachin Sant 4 years, 5 months ago
> On 11-Jan-2022, at 7:16 PM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> 
> Rick reported performance regressions in bugzilla because of cpu
> frequency being lower than before:
>    https://bugzilla.kernel.org/show_bug.cgi?id=215045
> 
> He bisected the problem to:
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> 
> More details are available in commit message of patch 1.
> 
> This patchset reverts the commit above and adds several checks when
> propagating the changes in the hierarchy to make sure that we still have
> coherent util_avg and util_sum.
> 
> Dietmar found a simple way to reproduce the WARN fixed by 
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> by looping on hackbench in several different sched group levels.
> 
> This patchset as run on the reproducer with success but it probably needs
> more tests by people who faced the WARN before.
> 

I ran scheduler regression tests(including cfg_bandwidth) from LTP
for about 6 hours. I did not observe any (new or previously reported)
kernel warn messages.

Based on this test result for ppc64le
Tested-by: Sachin Sant <sachinp@linux.ibm.com>

-Sachin
Re: [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg
Posted by Vincent Guittot 4 years, 5 months ago
On Wed, 12 Jan 2022 at 12:24, Sachin Sant <sachinp@linux.vnet.ibm.com> wrote:
>
>
> > On 11-Jan-2022, at 7:16 PM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> >
> > Rick reported performance regressions in bugzilla because of cpu
> > frequency being lower than before:
> >    https://bugzilla.kernel.org/show_bug.cgi?id=215045
> >
> > He bisected the problem to:
> > commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> >
> > More details are available in commit message of patch 1.
> >
> > This patchset reverts the commit above and adds several checks when
> > propagating the changes in the hierarchy to make sure that we still have
> > coherent util_avg and util_sum.
> >
> > Dietmar found a simple way to reproduce the WARN fixed by
> > commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> > by looping on hackbench in several different sched group levels.
> >
> > This patchset as run on the reproducer with success but it probably needs
> > more tests by people who faced the WARN before.
> >
>
> I ran scheduler regression tests(including cfg_bandwidth) from LTP
> for about 6 hours. I did not observe any (new or previously reported)
> kernel warn messages.
>
> Based on this test result for ppc64le
> Tested-by: Sachin Sant <sachinp@linux.ibm.com>

Thanks

>
> -Sachin
Re: [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg
Posted by Dietmar Eggemann 4 years, 5 months ago
On 11/01/2022 14:46, Vincent Guittot wrote:
> Rick reported performance regressions in bugzilla because of cpu
> frequency being lower than before:
>     https://bugzilla.kernel.org/show_bug.cgi?id=215045
> 
> He bisected the problem to:
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> 
> More details are available in commit message of patch 1.
> 
> This patchset reverts the commit above and adds several checks when
> propagating the changes in the hierarchy to make sure that we still have
> coherent util_avg and util_sum.
> 
> Dietmar found a simple way to reproduce the WARN fixed by 
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> by looping on hackbench in several different sched group levels.
> 
> This patchset as run on the reproducer with success but it probably needs
> more tests by people who faced the WARN before.
> 
> The changes done on util_sum have been also applied to runnable_sum and
> load_sum which faces the same rounding problem although this has not been
> reflected in measurable performance impact.
> 
> Changes for v3:
> - split patch 1 in 2 patches
>   - One to fix rick's regression
>   - One to apply same changes in other places
> - some typos
> - move main comment so it appears in the 1st patch 
> 
> Changes for v2:
> - fix wrong update of load_sum
> - move a change from patch 3 to patch 2
> - update patch 3 commit message
> 
> Vincent Guittot (4):
>   sched/pelt: Relax the sync of util_sum with util_avg
>   sched/pelt: Continue to relax the sync of util_sum with util_avg
>   sched/pelt: Relax the sync of runnable_sum with runnable_avg
>   sched/pelt: Relax the sync of load_sum with load_avg
> 
>  kernel/sched/fair.c | 113 +++++++++++++++++++++++++++++---------------
>  1 file changed, 75 insertions(+), 38 deletions(-)

LGTM. Just a couple of questions on the patch header of 1/4.

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>