kernel/sched/fair.c | 50 ++++++++++++++++++++++++++++----------------- 1 file changed, 31 insertions(+), 19 deletions(-)
Peter noticed the inconsistency in load propagation for throttled cfs_rq and Ben pointed out several other places regarding throttled cfs_rq that could be no longer needed after task based throttle model. To ease discussing and reviewing, I've come up with this follow up series which implements the individual changes. Patch1 deals with load propagation. According to Peter and Prateek's discussion, previously, load propagation for throttled cfs_rq happened on unthrottle time but now with per-task throttle, it's no longer the case so load propagation should happen immediately or we could lose this propagated part. Patch2 made update_cfs_group() to continue function for cfs_rqs in throttled hierarchy so that cfs_rq's entity can get an up2date weight. I think this is mostly useful when a cfs_rq in throttled hierarchy still has tasks running and on tick/enqueue/dequeue, update_cfs_group() can update this cfs_rq's entity weight. Patch3 removed special treatment of tasks in throttled hierarchy, including: dequeue_entities(), check_preempt_wakeup_fair() and yield_task_to_fair(). Patch4 inhibited load balancing to a throttled cfs_rq to make hackbench happy. I think patch1 is needed for correctness, patch2-4 is open for discussion as there are pros/cons doing things either way. Comments are welcome, thanks. BTW, I also noticed there is the task_is_throttled sched class callback and in fair, it is task_is_throttled_fair(). IIUC, it is used by core scheduling to find a matching cookie task to run on the sibling SMT CPU. For this reason, it doesn't seem very useful if we find it a task that is to be throttled so I kept the current implementation; but I guess this is also two folded if that to be throttled task is holding some kernel resources. Anyway, I didn't write a patch to change it in this series, but feel free to let me know if it should be changed. Aaron Lu (4): sched/fair: Propagate load for throttled cfs_rq sched/fair: update_cfs_group() for throttled cfs_rqs sched/fair: Do not special case tasks in throttled hierarchy sched/fair: Do not balance task to a throttled cfs_rq kernel/sched/fair.c | 50 ++++++++++++++++++++++++++++----------------- 1 file changed, 31 insertions(+), 19 deletions(-) base-commit: 5b726e9bf9544a349090879a513a5e00da486c14 -- 2.39.5
On 10/09/25 17:50, Aaron Lu wrote: > Peter noticed the inconsistency in load propagation for throttled cfs_rq > and Ben pointed out several other places regarding throttled cfs_rq that > could be no longer needed after task based throttle model. > > To ease discussing and reviewing, I've come up with this follow up > series which implements the individual changes. > > Patch1 deals with load propagation. According to Peter and Prateek's > discussion, previously, load propagation for throttled cfs_rq happened > on unthrottle time but now with per-task throttle, it's no longer the > case so load propagation should happen immediately or we could lose this > propagated part. > > Patch2 made update_cfs_group() to continue function for cfs_rqs in > throttled hierarchy so that cfs_rq's entity can get an up2date weight. I > think this is mostly useful when a cfs_rq in throttled hierarchy still > has tasks running and on tick/enqueue/dequeue, update_cfs_group() can > update this cfs_rq's entity weight. > > Patch3 removed special treatment of tasks in throttled hierarchy, > including: dequeue_entities(), check_preempt_wakeup_fair() and > yield_task_to_fair(). > > Patch4 inhibited load balancing to a throttled cfs_rq to make hackbench > happy. > > I think patch1 is needed for correctness, patch2-4 is open for > discussion as there are pros/cons doing things either way. Comments are > welcome, thanks. > > BTW, I also noticed there is the task_is_throttled sched class callback > and in fair, it is task_is_throttled_fair(). IIUC, it is used by core > scheduling to find a matching cookie task to run on the sibling SMT CPU. > For this reason, it doesn't seem very useful if we find it a task that > is to be throttled so I kept the current implementation; but I guess > this is also two folded if that to be throttled task is holding some > kernel resources. Anyway, I didn't write a patch to change it in this > series, but feel free to let me know if it should be changed. > I saw these already made it to tip/sched/core, but FWIW that all LGTM. Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Aaron Lu <ziqianlu@bytedance.com> writes: > Peter noticed the inconsistency in load propagation for throttled cfs_rq > and Ben pointed out several other places regarding throttled cfs_rq that > could be no longer needed after task based throttle model. > > To ease discussing and reviewing, I've come up with this follow up > series which implements the individual changes. > > Patch1 deals with load propagation. According to Peter and Prateek's > discussion, previously, load propagation for throttled cfs_rq happened > on unthrottle time but now with per-task throttle, it's no longer the > case so load propagation should happen immediately or we could lose this > propagated part. > > Patch2 made update_cfs_group() to continue function for cfs_rqs in > throttled hierarchy so that cfs_rq's entity can get an up2date weight. I > think this is mostly useful when a cfs_rq in throttled hierarchy still > has tasks running and on tick/enqueue/dequeue, update_cfs_group() can > update this cfs_rq's entity weight. > > Patch3 removed special treatment of tasks in throttled hierarchy, > including: dequeue_entities(), check_preempt_wakeup_fair() and > yield_task_to_fair(). > > Patch4 inhibited load balancing to a throttled cfs_rq to make hackbench > happy. > > I think patch1 is needed for correctness, patch2-4 is open for > discussion as there are pros/cons doing things either way. Comments are > welcome, thanks. > > BTW, I also noticed there is the task_is_throttled sched class callback > and in fair, it is task_is_throttled_fair(). IIUC, it is used by core > scheduling to find a matching cookie task to run on the sibling SMT CPU. > For this reason, it doesn't seem very useful if we find it a task that > is to be throttled so I kept the current implementation; but I guess > this is also two folded if that to be throttled task is holding some > kernel resources. Anyway, I didn't write a patch to change it in this > series, but feel free to let me know if it should be changed. > > Aaron Lu (4): > sched/fair: Propagate load for throttled cfs_rq > sched/fair: update_cfs_group() for throttled cfs_rqs > sched/fair: Do not special case tasks in throttled hierarchy > sched/fair: Do not balance task to a throttled cfs_rq > > kernel/sched/fair.c | 50 ++++++++++++++++++++++++++++----------------- > 1 file changed, 31 insertions(+), 19 deletions(-) > > > base-commit: 5b726e9bf9544a349090879a513a5e00da486c14 Yeah, these all make sense to me (with v2 for patch 4). Reviewed-by: Ben Segall <bsegall@google.com>
On Wed, Sep 10, 2025 at 05:50:40PM +0800, Aaron Lu wrote: > Aaron Lu (4): > sched/fair: Propagate load for throttled cfs_rq > sched/fair: update_cfs_group() for throttled cfs_rqs > sched/fair: Do not special case tasks in throttled hierarchy I stuffed these first 3 into queue/sched/core, leaving this one: > sched/fair: Do not balance task to a throttled cfs_rq on the table, because the robot already hated on it. If anybody disagrees with one of the queued patches, holler and I'll make it go away. Otherwise they should show up in tip 'soonish'.
On Thu, Sep 11, 2025 at 12:42:22PM +0200, Peter Zijlstra wrote: > On Wed, Sep 10, 2025 at 05:50:40PM +0800, Aaron Lu wrote: > > > Aaron Lu (4): > > sched/fair: Propagate load for throttled cfs_rq > > sched/fair: update_cfs_group() for throttled cfs_rqs > > sched/fair: Do not special case tasks in throttled hierarchy > > I stuffed these first 3 into queue/sched/core, leaving this one: > > > sched/fair: Do not balance task to a throttled cfs_rq > > on the table, because the robot already hated on it. Should have followed Prateek's suggestion to put it behind CONFIG_CFS_BANDWIDTH. Will fix that and send an updated version later, thanks.
© 2016 - 2025 Red Hat, Inc.