kernel/sched/core.c | 4 +- kernel/sched/debug.c | 14 ++- kernel/sched/fair.c | 240 ++++++++++++++++++++++++------------------- kernel/sched/pelt.c | 4 +- kernel/sched/sched.h | 12 +-- 5 files changed, 153 insertions(+), 121 deletions(-)
Delayed dequeued feature keeps a sleeping sched_entitiy enqueued until its lag has elapsed. As a result, it stays also visible in the statistics that are used to balance the system and in particular the field h_nr_running. This serie fixes those metrics by creating a new h_nr_runnable that tracks only tasks that want to run. It renames h_nr_running into h_nr_runnable. h_nr_runnable is used in several places to make decision on load balance: - PELT runnable_avg - deciding if a group is overloaded or has spare capacity - numa stats - reduced capacity management - load balance between groups While fixing h_nr_running, some fields have been renamed to follow the same pattern. We now have: - cfs.h_nr_runnable : running tasks in the hierarchy - cfs.h_nr_queued : enqueued tasks in the hierarchy either running or delayed dequeue - cfs.h_nr_idle : enqueued sched idle tasks in the hierarchy cfs.nr_running has been rename cfs.nr_queued because it includes the delayed dequeued entities The unused cfs.idle_nr_running has been removed Load balance compares the number of running tasks when selecting the busiest group or runqueue and tries to migrate a runnable task and not a sleeping delayed dequeue one. delayed dequeue tasks are considered only when migrating load as they continue to impact it. It should be noticed that this serie doesn't fix the problem of delayed dequeued tasks that can't migrate at wakeup. Some additional cleanups have been added: - move variable declaration at the beginning of pick_next_entity() and dequeue_entity() - sched_can_stop_tick() should use cfs.h_nr_queued instead of cfs.nr_queued (previously cfs.nr_running) to know how many tasks are running in the whole hierarchy instead of how many entities at root level Changes since v2: - Fix h_nr_runnable after removing h_nr_delayed (reported by Mike and Prateek) - Move "sched/fair: Fix sched_can_stop_tick() for fair tasks" at the beginning of the series so it can be easily backported (asked by Prateek) - Split "sched/fair: Add new cfs_rq.h_nr_runnable" in 2 patches. One for the creation of h_nr_runnable and one for its use (asked by Peter) - Fix more variable declarations (reported Prateek) Changes since v1: - reorder the patches - rename fields into: - h_nr_queued for all tasks queued both runnable and delayed dequeue - h_nr_runnable for all runnable tasks - h_nr_idle for all tasks with sched_idle policy - Cleanup how h_nr_runnable is updated in enqueue_task_fair() and dequeue_entities Peter Zijlstra (1): sched/eevdf: More PELT vs DELAYED_DEQUEUE Vincent Guittot (10): sched/fair: Fix sched_can_stop_tick() for fair tasks sched/fair: Rename h_nr_running into h_nr_queued sched/fair: Add new cfs_rq.h_nr_runnable sched/fair: Use the new cfs_rq.h_nr_runnable sched/fair: Removed unsued cfs_rq.h_nr_delayed sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle sched/fair: Remove unused cfs_rq.idle_nr_running sched/fair: Rename cfs_rq.nr_running into nr_queued sched/fair: Do not try to migrate delayed dequeue task sched/fair: Fix variable declaration position kernel/sched/core.c | 4 +- kernel/sched/debug.c | 14 ++- kernel/sched/fair.c | 240 ++++++++++++++++++++++++------------------- kernel/sched/pelt.c | 4 +- kernel/sched/sched.h | 12 +-- 5 files changed, 153 insertions(+), 121 deletions(-) -- 2.43.0
On 12/2/24 17:45, Vincent Guittot wrote: > Delayed dequeued feature keeps a sleeping sched_entitiy enqueued until its > lag has elapsed. As a result, it stays also visible in the statistics that > are used to balance the system and in particular the field h_nr_running. > > This serie fixes those metrics by creating a new h_nr_runnable that tracks > only tasks that want to run. It renames h_nr_running into h_nr_runnable. > > h_nr_runnable is used in several places to make decision on load balance: > - PELT runnable_avg > - deciding if a group is overloaded or has spare capacity > - numa stats > - reduced capacity management > - load balance between groups > > While fixing h_nr_running, some fields have been renamed to follow the > same pattern. We now have: > - cfs.h_nr_runnable : running tasks in the hierarchy > - cfs.h_nr_queued : enqueued tasks in the hierarchy either running or > delayed dequeue > - cfs.h_nr_idle : enqueued sched idle tasks in the hierarchy > > cfs.nr_running has been rename cfs.nr_queued because it includes the > delayed dequeued entities > > The unused cfs.idle_nr_running has been removed > > Load balance compares the number of running tasks when selecting the > busiest group or runqueue and tries to migrate a runnable task and not a > sleeping delayed dequeue one. delayed dequeue tasks are considered only > when migrating load as they continue to impact it. > > It should be noticed that this serie doesn't fix the problem of delayed > dequeued tasks that can't migrate at wakeup. > > Some additional cleanups have been added: > - move variable declaration at the beginning of pick_next_entity() > and dequeue_entity() > - sched_can_stop_tick() should use cfs.h_nr_queued instead of > cfs.nr_queued (previously cfs.nr_running) to know how many tasks > are running in the whole hierarchy instead of how many entities at > root level > > Changes since v2: > - Fix h_nr_runnable after removing h_nr_delayed (reported by Mike and Prateek) > - Move "sched/fair: Fix sched_can_stop_tick() for fair tasks" at the > beginning of the series so it can be easily backported (asked by Prateek) > - Split "sched/fair: Add new cfs_rq.h_nr_runnable" in 2 patches. One > for the creation of h_nr_runnable and one for its use (asked by Peter) > - Fix more variable declarations (reported Prateek) > > > Changes since v1: > - reorder the patches > - rename fields into: > - h_nr_queued for all tasks queued both runnable and delayed dequeue > - h_nr_runnable for all runnable tasks > - h_nr_idle for all tasks with sched_idle policy > - Cleanup how h_nr_runnable is updated in enqueue_task_fair() and > dequeue_entities > > Peter Zijlstra (1): > sched/eevdf: More PELT vs DELAYED_DEQUEUE > > Vincent Guittot (10): > sched/fair: Fix sched_can_stop_tick() for fair tasks > sched/fair: Rename h_nr_running into h_nr_queued > sched/fair: Add new cfs_rq.h_nr_runnable > sched/fair: Use the new cfs_rq.h_nr_runnable > sched/fair: Removed unsued cfs_rq.h_nr_delayed > sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle > sched/fair: Remove unused cfs_rq.idle_nr_running > sched/fair: Rename cfs_rq.nr_running into nr_queued > sched/fair: Do not try to migrate delayed dequeue task > sched/fair: Fix variable declaration position > > kernel/sched/core.c | 4 +- > kernel/sched/debug.c | 14 ++- > kernel/sched/fair.c | 240 ++++++++++++++++++++++++------------------- > kernel/sched/pelt.c | 4 +- > kernel/sched/sched.h | 12 +-- > 5 files changed, 153 insertions(+), 121 deletions(-) > Sorry, it took me a bit to work the backports for 6.8/Android. I gave this series a try on the Pixel 6 with a couple workloads (Speedometer and a UI-based one) and I didn't spot any significant performance/frame metric differences or power metric differences compared to a 6.8 kernel with this series applied. Also, checking the statistics for the cores didn't show any numbers that are obviously wrong. Though the kernel version is a bit behind mainline, I hope that's useful. Tested-By: Luis Machado <luis.machado@arm.com>
On Wed, 4 Dec 2024 at 20:10, Luis Machado <luis.machado@arm.com> wrote: > > On 12/2/24 17:45, Vincent Guittot wrote: > > Delayed dequeued feature keeps a sleeping sched_entitiy enqueued until its > > lag has elapsed. As a result, it stays also visible in the statistics that > > are used to balance the system and in particular the field h_nr_running. > > > > This serie fixes those metrics by creating a new h_nr_runnable that tracks > > only tasks that want to run. It renames h_nr_running into h_nr_runnable. > > > > h_nr_runnable is used in several places to make decision on load balance: > > - PELT runnable_avg > > - deciding if a group is overloaded or has spare capacity > > - numa stats > > - reduced capacity management > > - load balance between groups > > > > While fixing h_nr_running, some fields have been renamed to follow the > > same pattern. We now have: > > - cfs.h_nr_runnable : running tasks in the hierarchy > > - cfs.h_nr_queued : enqueued tasks in the hierarchy either running or > > delayed dequeue > > - cfs.h_nr_idle : enqueued sched idle tasks in the hierarchy > > > > cfs.nr_running has been rename cfs.nr_queued because it includes the > > delayed dequeued entities > > > > The unused cfs.idle_nr_running has been removed > > > > Load balance compares the number of running tasks when selecting the > > busiest group or runqueue and tries to migrate a runnable task and not a > > sleeping delayed dequeue one. delayed dequeue tasks are considered only > > when migrating load as they continue to impact it. > > > > It should be noticed that this serie doesn't fix the problem of delayed > > dequeued tasks that can't migrate at wakeup. > > > > Some additional cleanups have been added: > > - move variable declaration at the beginning of pick_next_entity() > > and dequeue_entity() > > - sched_can_stop_tick() should use cfs.h_nr_queued instead of > > cfs.nr_queued (previously cfs.nr_running) to know how many tasks > > are running in the whole hierarchy instead of how many entities at > > root level > > > > Changes since v2: > > - Fix h_nr_runnable after removing h_nr_delayed (reported by Mike and Prateek) > > - Move "sched/fair: Fix sched_can_stop_tick() for fair tasks" at the > > beginning of the series so it can be easily backported (asked by Prateek) > > - Split "sched/fair: Add new cfs_rq.h_nr_runnable" in 2 patches. One > > for the creation of h_nr_runnable and one for its use (asked by Peter) > > - Fix more variable declarations (reported Prateek) > > > > > > Changes since v1: > > - reorder the patches > > - rename fields into: > > - h_nr_queued for all tasks queued both runnable and delayed dequeue > > - h_nr_runnable for all runnable tasks > > - h_nr_idle for all tasks with sched_idle policy > > - Cleanup how h_nr_runnable is updated in enqueue_task_fair() and > > dequeue_entities > > > > Peter Zijlstra (1): > > sched/eevdf: More PELT vs DELAYED_DEQUEUE > > > > Vincent Guittot (10): > > sched/fair: Fix sched_can_stop_tick() for fair tasks > > sched/fair: Rename h_nr_running into h_nr_queued > > sched/fair: Add new cfs_rq.h_nr_runnable > > sched/fair: Use the new cfs_rq.h_nr_runnable > > sched/fair: Removed unsued cfs_rq.h_nr_delayed > > sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle > > sched/fair: Remove unused cfs_rq.idle_nr_running > > sched/fair: Rename cfs_rq.nr_running into nr_queued > > sched/fair: Do not try to migrate delayed dequeue task > > sched/fair: Fix variable declaration position > > > > kernel/sched/core.c | 4 +- > > kernel/sched/debug.c | 14 ++- > > kernel/sched/fair.c | 240 ++++++++++++++++++++++++------------------- > > kernel/sched/pelt.c | 4 +- > > kernel/sched/sched.h | 12 +-- > > 5 files changed, 153 insertions(+), 121 deletions(-) > > > > Sorry, it took me a bit to work the backports for 6.8/Android. I gave this series a try > on the Pixel 6 with a couple workloads (Speedometer and a UI-based one) and I didn't > spot any significant performance/frame metric differences or power metric differences compared > to a 6.8 kernel with this series applied. > > Also, checking the statistics for the cores didn't show any numbers that are obviously wrong. > > Though the kernel version is a bit behind mainline, I hope that's useful. > > Tested-By: Luis Machado <luis.machado@arm.com> Thanks
On 02/12/2024 18:45, Vincent Guittot wrote: > Delayed dequeued feature keeps a sleeping sched_entitiy enqueued until its > lag has elapsed. As a result, it stays also visible in the statistics that > are used to balance the system and in particular the field h_nr_running. > > This serie fixes those metrics by creating a new h_nr_runnable that tracks > only tasks that want to run. It renames h_nr_running into h_nr_runnable. > > h_nr_runnable is used in several places to make decision on load balance: > - PELT runnable_avg > - deciding if a group is overloaded or has spare capacity > - numa stats > - reduced capacity management > - load balance between groups > > While fixing h_nr_running, some fields have been renamed to follow the > same pattern. We now have: > - cfs.h_nr_runnable : running tasks in the hierarchy > - cfs.h_nr_queued : enqueued tasks in the hierarchy either running or > delayed dequeue > - cfs.h_nr_idle : enqueued sched idle tasks in the hierarchy > > cfs.nr_running has been rename cfs.nr_queued because it includes the > delayed dequeued entities > > The unused cfs.idle_nr_running has been removed > > Load balance compares the number of running tasks when selecting the > busiest group or runqueue and tries to migrate a runnable task and not a > sleeping delayed dequeue one. delayed dequeue tasks are considered only > when migrating load as they continue to impact it. > > It should be noticed that this serie doesn't fix the problem of delayed > dequeued tasks that can't migrate at wakeup. > > Some additional cleanups have been added: > - move variable declaration at the beginning of pick_next_entity() > and dequeue_entity() > - sched_can_stop_tick() should use cfs.h_nr_queued instead of > cfs.nr_queued (previously cfs.nr_running) to know how many tasks > are running in the whole hierarchy instead of how many entities at > root level > > Changes since v2: > - Fix h_nr_runnable after removing h_nr_delayed (reported by Mike and Prateek) > - Move "sched/fair: Fix sched_can_stop_tick() for fair tasks" at the > beginning of the series so it can be easily backported (asked by Prateek) > - Split "sched/fair: Add new cfs_rq.h_nr_runnable" in 2 patches. One > for the creation of h_nr_runnable and one for its use (asked by Peter) > - Fix more variable declarations (reported Prateek) with the following nits: (1) 01/11 Proposed 'Fixes:' missing: https://lkml.kernel.org/r/c82ed217-cfe4-41a4-b39a-e7356231835f@amd.com (2) 08/11 Would be helpful to point out that we lost the only use case for 'cfs_rq->idle_nr_running' with the advent of EEVDF with: 5e963f2bd465 - sched/fair: Commit to EEVDF (3) Using nr_running on rq/rt_rq/dl_rq and nr_queued for cfs_rq might look strange to the untrained eye. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> [...]
On Tue, Dec 03, 2024 at 11:41:47AM +0100, Dietmar Eggemann wrote: > with the following nits: > > (1) 01/11 > > Proposed 'Fixes:' missing: > https://lkml.kernel.org/r/c82ed217-cfe4-41a4-b39a-e7356231835f@amd.com > > (2) 08/11 > > Would be helpful to point out that we lost the only use case for > 'cfs_rq->idle_nr_running' with the advent of EEVDF with: > > 5e963f2bd465 - sched/fair: Commit to EEVDF Done. > (3) Using nr_running on rq/rt_rq/dl_rq and nr_queued > for cfs_rq might look strange to the untrained eye. Yes, but keeping nr_running with new semantics would not be less confusing and potentially more dangerous. > Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Thanks!
On Tue, 3 Dec 2024 at 12:18, Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, Dec 03, 2024 at 11:41:47AM +0100, Dietmar Eggemann wrote: > > > with the following nits: > > > > (1) 01/11 > > > > Proposed 'Fixes:' missing: > > https://lkml.kernel.org/r/c82ed217-cfe4-41a4-b39a-e7356231835f@amd.com Yeah i forgot to add the fix tag Fixes: 11cc374f4643b ("sched_ext: Simplify scx_can_stop_tick() invocation in sched_can_stop_tick()") > > > > (2) 08/11 > > > > Would be helpful to point out that we lost the only use case for > > 'cfs_rq->idle_nr_running' with the advent of EEVDF with: > > > > 5e963f2bd465 - sched/fair: Commit to EEVDF > > Done. > > > (3) Using nr_running on rq/rt_rq/dl_rq and nr_queued > > for cfs_rq might look strange to the untrained eye. > > Yes, but keeping nr_running with new semantics would not be less > confusing and potentially more dangerous. > > > Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> > > Thanks!
© 2016 - 2024 Red Hat, Inc.