[PATCH 0/4] Task based throttle follow ups

Aaron Lu posted 4 patches 3 weeks, 1 day ago
kernel/sched/fair.c | 50 ++++++++++++++++++++++++++++-----------------
1 file changed, 31 insertions(+), 19 deletions(-)
[PATCH 0/4] Task based throttle follow ups
Posted by Aaron Lu 3 weeks, 1 day ago
Peter noticed the inconsistency in load propagation for throttled cfs_rq
and Ben pointed out several other places regarding throttled cfs_rq that
could be no longer needed after task based throttle model.

To ease discussing and reviewing, I've come up with this follow up
series which implements the individual changes.

Patch1 deals with load propagation. According to Peter and Prateek's
discussion, previously, load propagation for throttled cfs_rq happened
on unthrottle time but now with per-task throttle, it's no longer the
case so load propagation should happen immediately or we could lose this
propagated part.

Patch2 made update_cfs_group() to continue function for cfs_rqs in
throttled hierarchy so that cfs_rq's entity can get an up2date weight. I
think this is mostly useful when a cfs_rq in throttled hierarchy still
has tasks running and on tick/enqueue/dequeue, update_cfs_group() can
update this cfs_rq's entity weight.

Patch3 removed special treatment of tasks in throttled hierarchy,
including: dequeue_entities(), check_preempt_wakeup_fair() and
yield_task_to_fair().

Patch4 inhibited load balancing to a throttled cfs_rq to make hackbench
happy.

I think patch1 is needed for correctness, patch2-4 is open for
discussion as there are pros/cons doing things either way. Comments are
welcome, thanks.

BTW, I also noticed there is the task_is_throttled sched class callback
and in fair, it is task_is_throttled_fair(). IIUC, it is used by core
scheduling to find a matching cookie task to run on the sibling SMT CPU.
For this reason, it doesn't seem very useful if we find it a task that
is to be throttled so I kept the current implementation; but I guess
this is also two folded if that to be throttled task is holding some
kernel resources. Anyway, I didn't write a patch to change it in this
series, but feel free to let me know if it should be changed.

Aaron Lu (4):
  sched/fair: Propagate load for throttled cfs_rq
  sched/fair: update_cfs_group() for throttled cfs_rqs
  sched/fair: Do not special case tasks in throttled hierarchy
  sched/fair: Do not balance task to a throttled cfs_rq

 kernel/sched/fair.c | 50 ++++++++++++++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 19 deletions(-)


base-commit: 5b726e9bf9544a349090879a513a5e00da486c14
-- 
2.39.5
Re: [PATCH 0/4] Task based throttle follow ups
Posted by Valentin Schneider 1 week, 6 days ago
On 10/09/25 17:50, Aaron Lu wrote:
> Peter noticed the inconsistency in load propagation for throttled cfs_rq
> and Ben pointed out several other places regarding throttled cfs_rq that
> could be no longer needed after task based throttle model.
>
> To ease discussing and reviewing, I've come up with this follow up
> series which implements the individual changes.
>
> Patch1 deals with load propagation. According to Peter and Prateek's
> discussion, previously, load propagation for throttled cfs_rq happened
> on unthrottle time but now with per-task throttle, it's no longer the
> case so load propagation should happen immediately or we could lose this
> propagated part.
>
> Patch2 made update_cfs_group() to continue function for cfs_rqs in
> throttled hierarchy so that cfs_rq's entity can get an up2date weight. I
> think this is mostly useful when a cfs_rq in throttled hierarchy still
> has tasks running and on tick/enqueue/dequeue, update_cfs_group() can
> update this cfs_rq's entity weight.
>
> Patch3 removed special treatment of tasks in throttled hierarchy,
> including: dequeue_entities(), check_preempt_wakeup_fair() and
> yield_task_to_fair().
>
> Patch4 inhibited load balancing to a throttled cfs_rq to make hackbench
> happy.
>
> I think patch1 is needed for correctness, patch2-4 is open for
> discussion as there are pros/cons doing things either way. Comments are
> welcome, thanks.
>
> BTW, I also noticed there is the task_is_throttled sched class callback
> and in fair, it is task_is_throttled_fair(). IIUC, it is used by core
> scheduling to find a matching cookie task to run on the sibling SMT CPU.
> For this reason, it doesn't seem very useful if we find it a task that
> is to be throttled so I kept the current implementation; but I guess
> this is also two folded if that to be throttled task is holding some
> kernel resources. Anyway, I didn't write a patch to change it in this
> series, but feel free to let me know if it should be changed.
>

I saw these already made it to tip/sched/core, but FWIW that all LGTM.

Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Re: [PATCH 0/4] Task based throttle follow ups
Posted by Benjamin Segall 2 weeks, 2 days ago
Aaron Lu <ziqianlu@bytedance.com> writes:

> Peter noticed the inconsistency in load propagation for throttled cfs_rq
> and Ben pointed out several other places regarding throttled cfs_rq that
> could be no longer needed after task based throttle model.
>
> To ease discussing and reviewing, I've come up with this follow up
> series which implements the individual changes.
>
> Patch1 deals with load propagation. According to Peter and Prateek's
> discussion, previously, load propagation for throttled cfs_rq happened
> on unthrottle time but now with per-task throttle, it's no longer the
> case so load propagation should happen immediately or we could lose this
> propagated part.
>
> Patch2 made update_cfs_group() to continue function for cfs_rqs in
> throttled hierarchy so that cfs_rq's entity can get an up2date weight. I
> think this is mostly useful when a cfs_rq in throttled hierarchy still
> has tasks running and on tick/enqueue/dequeue, update_cfs_group() can
> update this cfs_rq's entity weight.
>
> Patch3 removed special treatment of tasks in throttled hierarchy,
> including: dequeue_entities(), check_preempt_wakeup_fair() and
> yield_task_to_fair().
>
> Patch4 inhibited load balancing to a throttled cfs_rq to make hackbench
> happy.
>
> I think patch1 is needed for correctness, patch2-4 is open for
> discussion as there are pros/cons doing things either way. Comments are
> welcome, thanks.
>
> BTW, I also noticed there is the task_is_throttled sched class callback
> and in fair, it is task_is_throttled_fair(). IIUC, it is used by core
> scheduling to find a matching cookie task to run on the sibling SMT CPU.
> For this reason, it doesn't seem very useful if we find it a task that
> is to be throttled so I kept the current implementation; but I guess
> this is also two folded if that to be throttled task is holding some
> kernel resources. Anyway, I didn't write a patch to change it in this
> series, but feel free to let me know if it should be changed.
>
> Aaron Lu (4):
>   sched/fair: Propagate load for throttled cfs_rq
>   sched/fair: update_cfs_group() for throttled cfs_rqs
>   sched/fair: Do not special case tasks in throttled hierarchy
>   sched/fair: Do not balance task to a throttled cfs_rq
>
>  kernel/sched/fair.c | 50 ++++++++++++++++++++++++++++-----------------
>  1 file changed, 31 insertions(+), 19 deletions(-)
>
>
> base-commit: 5b726e9bf9544a349090879a513a5e00da486c14

Yeah, these all make sense to me (with v2 for patch 4).

Reviewed-by: Ben Segall <bsegall@google.com>
Re: [PATCH 0/4] Task based throttle follow ups
Posted by Peter Zijlstra 3 weeks ago
On Wed, Sep 10, 2025 at 05:50:40PM +0800, Aaron Lu wrote:

> Aaron Lu (4):
>   sched/fair: Propagate load for throttled cfs_rq
>   sched/fair: update_cfs_group() for throttled cfs_rqs
>   sched/fair: Do not special case tasks in throttled hierarchy

I stuffed these first 3 into queue/sched/core, leaving this one:

>   sched/fair: Do not balance task to a throttled cfs_rq

on the table, because the robot already hated on it.

If anybody disagrees with one of the queued patches, holler and I'll
make it go away. Otherwise they should show up in tip 'soonish'.
Re: [PATCH 0/4] Task based throttle follow ups
Posted by Aaron Lu 3 weeks ago
On Thu, Sep 11, 2025 at 12:42:22PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 10, 2025 at 05:50:40PM +0800, Aaron Lu wrote:
> 
> > Aaron Lu (4):
> >   sched/fair: Propagate load for throttled cfs_rq
> >   sched/fair: update_cfs_group() for throttled cfs_rqs
> >   sched/fair: Do not special case tasks in throttled hierarchy
> 
> I stuffed these first 3 into queue/sched/core, leaving this one:
> 
> >   sched/fair: Do not balance task to a throttled cfs_rq
> 
> on the table, because the robot already hated on it.

Should have followed Prateek's suggestion to put it behind
CONFIG_CFS_BANDWIDTH. Will fix that and send an updated version later,
thanks.