[RFC PATCH 4/7] sched/fair: Take care of migrated task for task based throttle

Aaron Lu posted 7 patches 9 months ago
There is a newer version of this series
[RFC PATCH 4/7] sched/fair: Take care of migrated task for task based throttle
Posted by Aaron Lu 9 months, 1 week ago
If a task is migrated to a new cpu, it is possible this task is not
throttled but the new cfs_rq is throttled or vice vesa. Take care of
these situations in enqueue path.

Note that we can't handle this in migrate_task_rq_fair() because there,
the dst cpu's rq lock is not held and things like checking if the new
cfs_rq needs throttle can be racy.

Signed-off-by: Aaron Lu <ziqianlu@bytedance.com>
---
 kernel/sched/fair.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4a95fe3785e43..9e036f18d73e6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7051,6 +7051,23 @@ enqueue_task_fair(struct rq *rq, struct
task_struct *p, int flags)
 	assert_list_leaf_cfs_rq(rq);

 	hrtick_update(rq);
+
+	if (!cfs_bandwidth_used())
+		return;
+
+	/*
+	 * This is for migrate_task_rq_fair(): the new_cpu's rq lock is not held
+	 * in migrate_task_rq_fair() so we have to do these things in enqueue
+	 * time when the dst cpu's rq lock is held. Doing this check in enqueue
+	 * time also takes care of newly woken up tasks, e.g. a task wakes up
+	 * into a throttled cfs_rq.
+	 *
+	 * It's possible the task has a throttle work added but this new cfs_rq
+	 * is not in throttled hierarchy but that's OK, throttle_cfs_rq_work()
+	 * will take care of it.
+	 */
+	if (throttled_hierarchy(cfs_rq_of(&p->se)))
+		task_throttle_setup_work(p);
 }

 static void set_next_buddy(struct sched_entity *se);
-- 
2.39.5
Re: [RFC PATCH 4/7] sched/fair: Take care of migrated task for task based throttle
Posted by K Prateek Nayak 9 months, 1 week ago
Hello Aaron,

On 3/13/2025 12:51 PM, Aaron Lu wrote:
> If a task is migrated to a new cpu, it is possible this task is not
> throttled but the new cfs_rq is throttled or vice vesa. Take care of
> these situations in enqueue path.
> 
> Note that we can't handle this in migrate_task_rq_fair() because there,
> the dst cpu's rq lock is not held and things like checking if the new
> cfs_rq needs throttle can be racy.
> 
> Signed-off-by: Aaron Lu <ziqianlu@bytedance.com>
> ---
>   kernel/sched/fair.c | 17 +++++++++++++++++
>   1 file changed, 17 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4a95fe3785e43..9e036f18d73e6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7051,6 +7051,23 @@ enqueue_task_fair(struct rq *rq, struct
> task_struct *p, int flags)
>   	assert_list_leaf_cfs_rq(rq);
> 
>   	hrtick_update(rq);
> +
> +	if (!cfs_bandwidth_used())
> +		return;
> +
> +	/*
> +	 * This is for migrate_task_rq_fair(): the new_cpu's rq lock is not held
> +	 * in migrate_task_rq_fair() so we have to do these things in enqueue
> +	 * time when the dst cpu's rq lock is held. Doing this check in enqueue
> +	 * time also takes care of newly woken up tasks, e.g. a task wakes up
> +	 * into a throttled cfs_rq.
> +	 *
> +	 * It's possible the task has a throttle work added but this new cfs_rq
> +	 * is not in throttled hierarchy but that's OK, throttle_cfs_rq_work()
> +	 * will take care of it.
> +	 */
> +	if (throttled_hierarchy(cfs_rq_of(&p->se)))
> +		task_throttle_setup_work(p);

Any reason we can't move this to somewhere towards the top?
throttled_hierarchy() check should be cheap enough and we probably don't
need the cfs_bandwidth_used() guarding check unless there are other
concerns that I may have missed.

>   }
> 
>   static void set_next_buddy(struct sched_entity *se);

-- 
Thanks and Regards,
Prateek
Re: [External] Re: [RFC PATCH 4/7] sched/fair: Take care of migrated task for task based throttle
Posted by Aaron Lu 9 months, 1 week ago
On Fri, Mar 14, 2025 at 09:33:10AM +0530, K Prateek Nayak wrote:
> Hello Aaron,
> 
> On 3/13/2025 12:51 PM, Aaron Lu wrote:
> > If a task is migrated to a new cpu, it is possible this task is not
> > throttled but the new cfs_rq is throttled or vice vesa. Take care of
> > these situations in enqueue path.
> > 
> > Note that we can't handle this in migrate_task_rq_fair() because there,
> > the dst cpu's rq lock is not held and things like checking if the new
> > cfs_rq needs throttle can be racy.
> > 
> > Signed-off-by: Aaron Lu <ziqianlu@bytedance.com>
> > ---
> >   kernel/sched/fair.c | 17 +++++++++++++++++
> >   1 file changed, 17 insertions(+)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 4a95fe3785e43..9e036f18d73e6 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7051,6 +7051,23 @@ enqueue_task_fair(struct rq *rq, struct
> > task_struct *p, int flags)
> >   	assert_list_leaf_cfs_rq(rq);
> > 
> >   	hrtick_update(rq);
> > +
> > +	if (!cfs_bandwidth_used())
> > +		return;
> > +
> > +	/*
> > +	 * This is for migrate_task_rq_fair(): the new_cpu's rq lock is not held
> > +	 * in migrate_task_rq_fair() so we have to do these things in enqueue
> > +	 * time when the dst cpu's rq lock is held. Doing this check in enqueue
> > +	 * time also takes care of newly woken up tasks, e.g. a task wakes up
> > +	 * into a throttled cfs_rq.
> > +	 *
> > +	 * It's possible the task has a throttle work added but this new cfs_rq
> > +	 * is not in throttled hierarchy but that's OK, throttle_cfs_rq_work()
> > +	 * will take care of it.
> > +	 */
> > +	if (throttled_hierarchy(cfs_rq_of(&p->se)))
> > +		task_throttle_setup_work(p);
> 
> Any reason we can't move this to somewhere towards the top?
> throttled_hierarchy() check should be cheap enough and we probably don't
> need the cfs_bandwidth_used() guarding check unless there are other
> concerns that I may have missed.

I didn't realize the delayed dequeue case so I placed this at bottom,
but as you have mentioned, for delayed dequeue tasks that gets
re-queued, this has to be on top.

Will change it to top in next version.
Thanks!