kernel/sched/core.c | 4 +- kernel/sched/debug.c | 14 ++- kernel/sched/fair.c | 240 ++++++++++++++++++++++++------------------- kernel/sched/pelt.c | 4 +- kernel/sched/sched.h | 12 +-- 5 files changed, 153 insertions(+), 121 deletions(-)
Delayed dequeued feature keeps a sleeping sched_entitiy enqueued until its
lag has elapsed. As a result, it stays also visible in the statistics that
are used to balance the system and in particular the field h_nr_running.
This serie fixes those metrics by creating a new h_nr_runnable that tracks
only tasks that want to run. It renames h_nr_running into h_nr_runnable.
h_nr_runnable is used in several places to make decision on load balance:
- PELT runnable_avg
- deciding if a group is overloaded or has spare capacity
- numa stats
- reduced capacity management
- load balance between groups
While fixing h_nr_running, some fields have been renamed to follow the
same pattern. We now have:
- cfs.h_nr_runnable : running tasks in the hierarchy
- cfs.h_nr_queued : enqueued tasks in the hierarchy either running or
delayed dequeue
- cfs.h_nr_idle : enqueued sched idle tasks in the hierarchy
cfs.nr_running has been rename cfs.nr_queued because it includes the
delayed dequeued entities
The unused cfs.idle_nr_running has been removed
Load balance compares the number of running tasks when selecting the
busiest group or runqueue and tries to migrate a runnable task and not a
sleeping delayed dequeue one. delayed dequeue tasks are considered only
when migrating load as they continue to impact it.
It should be noticed that this serie doesn't fix the problem of delayed
dequeued tasks that can't migrate at wakeup.
Some additional cleanups have been added:
- move variable declaration at the beginning of pick_next_entity()
and dequeue_entity()
- sched_can_stop_tick() should use cfs.h_nr_queued instead of
cfs.nr_queued (previously cfs.nr_running) to know how many tasks
are running in the whole hierarchy instead of how many entities at
root level
Changes since v2:
- Fix h_nr_runnable after removing h_nr_delayed (reported by Mike and Prateek)
- Move "sched/fair: Fix sched_can_stop_tick() for fair tasks" at the
beginning of the series so it can be easily backported (asked by Prateek)
- Split "sched/fair: Add new cfs_rq.h_nr_runnable" in 2 patches. One
for the creation of h_nr_runnable and one for its use (asked by Peter)
- Fix more variable declarations (reported Prateek)
Changes since v1:
- reorder the patches
- rename fields into:
- h_nr_queued for all tasks queued both runnable and delayed dequeue
- h_nr_runnable for all runnable tasks
- h_nr_idle for all tasks with sched_idle policy
- Cleanup how h_nr_runnable is updated in enqueue_task_fair() and
dequeue_entities
Peter Zijlstra (1):
sched/eevdf: More PELT vs DELAYED_DEQUEUE
Vincent Guittot (10):
sched/fair: Fix sched_can_stop_tick() for fair tasks
sched/fair: Rename h_nr_running into h_nr_queued
sched/fair: Add new cfs_rq.h_nr_runnable
sched/fair: Use the new cfs_rq.h_nr_runnable
sched/fair: Removed unsued cfs_rq.h_nr_delayed
sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle
sched/fair: Remove unused cfs_rq.idle_nr_running
sched/fair: Rename cfs_rq.nr_running into nr_queued
sched/fair: Do not try to migrate delayed dequeue task
sched/fair: Fix variable declaration position
kernel/sched/core.c | 4 +-
kernel/sched/debug.c | 14 ++-
kernel/sched/fair.c | 240 ++++++++++++++++++++++++-------------------
kernel/sched/pelt.c | 4 +-
kernel/sched/sched.h | 12 +--
5 files changed, 153 insertions(+), 121 deletions(-)
--
2.43.0
On 12/2/24 17:45, Vincent Guittot wrote: > Delayed dequeued feature keeps a sleeping sched_entitiy enqueued until its > lag has elapsed. As a result, it stays also visible in the statistics that > are used to balance the system and in particular the field h_nr_running. > > This serie fixes those metrics by creating a new h_nr_runnable that tracks > only tasks that want to run. It renames h_nr_running into h_nr_runnable. > > h_nr_runnable is used in several places to make decision on load balance: > - PELT runnable_avg > - deciding if a group is overloaded or has spare capacity > - numa stats > - reduced capacity management > - load balance between groups > > While fixing h_nr_running, some fields have been renamed to follow the > same pattern. We now have: > - cfs.h_nr_runnable : running tasks in the hierarchy > - cfs.h_nr_queued : enqueued tasks in the hierarchy either running or > delayed dequeue > - cfs.h_nr_idle : enqueued sched idle tasks in the hierarchy > > cfs.nr_running has been rename cfs.nr_queued because it includes the > delayed dequeued entities > > The unused cfs.idle_nr_running has been removed > > Load balance compares the number of running tasks when selecting the > busiest group or runqueue and tries to migrate a runnable task and not a > sleeping delayed dequeue one. delayed dequeue tasks are considered only > when migrating load as they continue to impact it. > > It should be noticed that this serie doesn't fix the problem of delayed > dequeued tasks that can't migrate at wakeup. > > Some additional cleanups have been added: > - move variable declaration at the beginning of pick_next_entity() > and dequeue_entity() > - sched_can_stop_tick() should use cfs.h_nr_queued instead of > cfs.nr_queued (previously cfs.nr_running) to know how many tasks > are running in the whole hierarchy instead of how many entities at > root level > > Changes since v2: > - Fix h_nr_runnable after removing h_nr_delayed (reported by Mike and Prateek) > - Move "sched/fair: Fix sched_can_stop_tick() for fair tasks" at the > beginning of the series so it can be easily backported (asked by Prateek) > - Split "sched/fair: Add new cfs_rq.h_nr_runnable" in 2 patches. One > for the creation of h_nr_runnable and one for its use (asked by Peter) > - Fix more variable declarations (reported Prateek) > > > Changes since v1: > - reorder the patches > - rename fields into: > - h_nr_queued for all tasks queued both runnable and delayed dequeue > - h_nr_runnable for all runnable tasks > - h_nr_idle for all tasks with sched_idle policy > - Cleanup how h_nr_runnable is updated in enqueue_task_fair() and > dequeue_entities > > Peter Zijlstra (1): > sched/eevdf: More PELT vs DELAYED_DEQUEUE > > Vincent Guittot (10): > sched/fair: Fix sched_can_stop_tick() for fair tasks > sched/fair: Rename h_nr_running into h_nr_queued > sched/fair: Add new cfs_rq.h_nr_runnable > sched/fair: Use the new cfs_rq.h_nr_runnable > sched/fair: Removed unsued cfs_rq.h_nr_delayed > sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle > sched/fair: Remove unused cfs_rq.idle_nr_running > sched/fair: Rename cfs_rq.nr_running into nr_queued > sched/fair: Do not try to migrate delayed dequeue task > sched/fair: Fix variable declaration position > > kernel/sched/core.c | 4 +- > kernel/sched/debug.c | 14 ++- > kernel/sched/fair.c | 240 ++++++++++++++++++++++++------------------- > kernel/sched/pelt.c | 4 +- > kernel/sched/sched.h | 12 +-- > 5 files changed, 153 insertions(+), 121 deletions(-) > Sorry, it took me a bit to work the backports for 6.8/Android. I gave this series a try on the Pixel 6 with a couple workloads (Speedometer and a UI-based one) and I didn't spot any significant performance/frame metric differences or power metric differences compared to a 6.8 kernel with this series applied. Also, checking the statistics for the cores didn't show any numbers that are obviously wrong. Though the kernel version is a bit behind mainline, I hope that's useful. Tested-By: Luis Machado <luis.machado@arm.com>
On Wed, 4 Dec 2024 at 20:10, Luis Machado <luis.machado@arm.com> wrote: > > On 12/2/24 17:45, Vincent Guittot wrote: > > Delayed dequeued feature keeps a sleeping sched_entitiy enqueued until its > > lag has elapsed. As a result, it stays also visible in the statistics that > > are used to balance the system and in particular the field h_nr_running. > > > > This serie fixes those metrics by creating a new h_nr_runnable that tracks > > only tasks that want to run. It renames h_nr_running into h_nr_runnable. > > > > h_nr_runnable is used in several places to make decision on load balance: > > - PELT runnable_avg > > - deciding if a group is overloaded or has spare capacity > > - numa stats > > - reduced capacity management > > - load balance between groups > > > > While fixing h_nr_running, some fields have been renamed to follow the > > same pattern. We now have: > > - cfs.h_nr_runnable : running tasks in the hierarchy > > - cfs.h_nr_queued : enqueued tasks in the hierarchy either running or > > delayed dequeue > > - cfs.h_nr_idle : enqueued sched idle tasks in the hierarchy > > > > cfs.nr_running has been rename cfs.nr_queued because it includes the > > delayed dequeued entities > > > > The unused cfs.idle_nr_running has been removed > > > > Load balance compares the number of running tasks when selecting the > > busiest group or runqueue and tries to migrate a runnable task and not a > > sleeping delayed dequeue one. delayed dequeue tasks are considered only > > when migrating load as they continue to impact it. > > > > It should be noticed that this serie doesn't fix the problem of delayed > > dequeued tasks that can't migrate at wakeup. > > > > Some additional cleanups have been added: > > - move variable declaration at the beginning of pick_next_entity() > > and dequeue_entity() > > - sched_can_stop_tick() should use cfs.h_nr_queued instead of > > cfs.nr_queued (previously cfs.nr_running) to know how many tasks > > are running in the whole hierarchy instead of how many entities at > > root level > > > > Changes since v2: > > - Fix h_nr_runnable after removing h_nr_delayed (reported by Mike and Prateek) > > - Move "sched/fair: Fix sched_can_stop_tick() for fair tasks" at the > > beginning of the series so it can be easily backported (asked by Prateek) > > - Split "sched/fair: Add new cfs_rq.h_nr_runnable" in 2 patches. One > > for the creation of h_nr_runnable and one for its use (asked by Peter) > > - Fix more variable declarations (reported Prateek) > > > > > > Changes since v1: > > - reorder the patches > > - rename fields into: > > - h_nr_queued for all tasks queued both runnable and delayed dequeue > > - h_nr_runnable for all runnable tasks > > - h_nr_idle for all tasks with sched_idle policy > > - Cleanup how h_nr_runnable is updated in enqueue_task_fair() and > > dequeue_entities > > > > Peter Zijlstra (1): > > sched/eevdf: More PELT vs DELAYED_DEQUEUE > > > > Vincent Guittot (10): > > sched/fair: Fix sched_can_stop_tick() for fair tasks > > sched/fair: Rename h_nr_running into h_nr_queued > > sched/fair: Add new cfs_rq.h_nr_runnable > > sched/fair: Use the new cfs_rq.h_nr_runnable > > sched/fair: Removed unsued cfs_rq.h_nr_delayed > > sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle > > sched/fair: Remove unused cfs_rq.idle_nr_running > > sched/fair: Rename cfs_rq.nr_running into nr_queued > > sched/fair: Do not try to migrate delayed dequeue task > > sched/fair: Fix variable declaration position > > > > kernel/sched/core.c | 4 +- > > kernel/sched/debug.c | 14 ++- > > kernel/sched/fair.c | 240 ++++++++++++++++++++++++------------------- > > kernel/sched/pelt.c | 4 +- > > kernel/sched/sched.h | 12 +-- > > 5 files changed, 153 insertions(+), 121 deletions(-) > > > > Sorry, it took me a bit to work the backports for 6.8/Android. I gave this series a try > on the Pixel 6 with a couple workloads (Speedometer and a UI-based one) and I didn't > spot any significant performance/frame metric differences or power metric differences compared > to a 6.8 kernel with this series applied. > > Also, checking the statistics for the cores didn't show any numbers that are obviously wrong. > > Though the kernel version is a bit behind mainline, I hope that's useful. > > Tested-By: Luis Machado <luis.machado@arm.com> Thanks
On 02/12/2024 18:45, Vincent Guittot wrote:
> Delayed dequeued feature keeps a sleeping sched_entitiy enqueued until its
> lag has elapsed. As a result, it stays also visible in the statistics that
> are used to balance the system and in particular the field h_nr_running.
>
> This serie fixes those metrics by creating a new h_nr_runnable that tracks
> only tasks that want to run. It renames h_nr_running into h_nr_runnable.
>
> h_nr_runnable is used in several places to make decision on load balance:
> - PELT runnable_avg
> - deciding if a group is overloaded or has spare capacity
> - numa stats
> - reduced capacity management
> - load balance between groups
>
> While fixing h_nr_running, some fields have been renamed to follow the
> same pattern. We now have:
> - cfs.h_nr_runnable : running tasks in the hierarchy
> - cfs.h_nr_queued : enqueued tasks in the hierarchy either running or
> delayed dequeue
> - cfs.h_nr_idle : enqueued sched idle tasks in the hierarchy
>
> cfs.nr_running has been rename cfs.nr_queued because it includes the
> delayed dequeued entities
>
> The unused cfs.idle_nr_running has been removed
>
> Load balance compares the number of running tasks when selecting the
> busiest group or runqueue and tries to migrate a runnable task and not a
> sleeping delayed dequeue one. delayed dequeue tasks are considered only
> when migrating load as they continue to impact it.
>
> It should be noticed that this serie doesn't fix the problem of delayed
> dequeued tasks that can't migrate at wakeup.
>
> Some additional cleanups have been added:
> - move variable declaration at the beginning of pick_next_entity()
> and dequeue_entity()
> - sched_can_stop_tick() should use cfs.h_nr_queued instead of
> cfs.nr_queued (previously cfs.nr_running) to know how many tasks
> are running in the whole hierarchy instead of how many entities at
> root level
>
> Changes since v2:
> - Fix h_nr_runnable after removing h_nr_delayed (reported by Mike and Prateek)
> - Move "sched/fair: Fix sched_can_stop_tick() for fair tasks" at the
> beginning of the series so it can be easily backported (asked by Prateek)
> - Split "sched/fair: Add new cfs_rq.h_nr_runnable" in 2 patches. One
> for the creation of h_nr_runnable and one for its use (asked by Peter)
> - Fix more variable declarations (reported Prateek)
with the following nits:
(1) 01/11
Proposed 'Fixes:' missing:
https://lkml.kernel.org/r/c82ed217-cfe4-41a4-b39a-e7356231835f@amd.com
(2) 08/11
Would be helpful to point out that we lost the only use case for
'cfs_rq->idle_nr_running' with the advent of EEVDF with:
5e963f2bd465 - sched/fair: Commit to EEVDF
(3) Using nr_running on rq/rt_rq/dl_rq and nr_queued
for cfs_rq might look strange to the untrained eye.
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
[...]
On Tue, Dec 03, 2024 at 11:41:47AM +0100, Dietmar Eggemann wrote: > with the following nits: > > (1) 01/11 > > Proposed 'Fixes:' missing: > https://lkml.kernel.org/r/c82ed217-cfe4-41a4-b39a-e7356231835f@amd.com > > (2) 08/11 > > Would be helpful to point out that we lost the only use case for > 'cfs_rq->idle_nr_running' with the advent of EEVDF with: > > 5e963f2bd465 - sched/fair: Commit to EEVDF Done. > (3) Using nr_running on rq/rt_rq/dl_rq and nr_queued > for cfs_rq might look strange to the untrained eye. Yes, but keeping nr_running with new semantics would not be less confusing and potentially more dangerous. > Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Thanks!
On Tue, 3 Dec 2024 at 12:18, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Dec 03, 2024 at 11:41:47AM +0100, Dietmar Eggemann wrote:
>
> > with the following nits:
> >
> > (1) 01/11
> >
> > Proposed 'Fixes:' missing:
> > https://lkml.kernel.org/r/c82ed217-cfe4-41a4-b39a-e7356231835f@amd.com
Yeah i forgot to add the fix tag
Fixes: 11cc374f4643b ("sched_ext: Simplify scx_can_stop_tick()
invocation in sched_can_stop_tick()")
> >
> > (2) 08/11
> >
> > Would be helpful to point out that we lost the only use case for
> > 'cfs_rq->idle_nr_running' with the advent of EEVDF with:
> >
> > 5e963f2bd465 - sched/fair: Commit to EEVDF
>
> Done.
>
> > (3) Using nr_running on rq/rt_rq/dl_rq and nr_queued
> > for cfs_rq might look strange to the untrained eye.
>
> Yes, but keeping nr_running with new semantics would not be less
> confusing and potentially more dangerous.
>
> > Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
>
> Thanks!
© 2016 - 2026 Red Hat, Inc.