sched/core: Do not migrate ineligible tasks in sched_balance_rq()

[PATCH] sched/core: Do not migrate ineligible tasks in sched_balance_rq()

Posted by Hao Jia 1 year, 2 months ago

From: Hao Jia <jiahao1@lixiang.com>

When the PLACE_LAG scheduling feature is enabled, if a task
is ineligible (lag < 0) on the source cpu runqueue, it will
also be ineligible when it is migrated to the destination
cpu runqueue.

Because we will keep the original equivalent lag of
the task in place_entity(). So if the task was ineligible
before, it will still be ineligible after migration.

Therefore, we should skip the migration of ineligible tasks
to reduce ineffective task migrations, just like the task
throttled by cfs_bandwidth, until they become eligible.

Signed-off-by: Hao Jia <jiahao1@lixiang.com>
---
 kernel/sched/fair.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fbdca89c677f..5564e16b6fdb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9358,13 +9358,14 @@ static inline int migrate_degrades_locality(struct task_struct *p,
 static
 int can_migrate_task(struct task_struct *p, struct lb_env *env)
 {
+	struct cfs_rq *cfs_rq = task_cfs_rq(p);
 	int tsk_cache_hot;
 
 	lockdep_assert_rq_held(env->src_rq);
 
 	/*
 	 * We do not migrate tasks that are:
-	 * 1) throttled_lb_pair, or
+	 * 1) throttled_lb_pair, or task ineligible, or
 	 * 2) cannot be migrated to this CPU due to cpus_ptr, or
 	 * 3) running (obviously), or
 	 * 4) are cache-hot on their current CPU.
@@ -9372,6 +9373,10 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 	if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
 		return 0;
 
+	if (sched_feat(PLACE_LAG) && cfs_rq->nr_running &&
+			!entity_eligible(cfs_rq, &p->se))
+		return 0;
+
 	/* Disregard percpu kthreads; they are where they need to be. */
 	if (kthread_is_per_cpu(p))
 		return 0;
-- 
2.34.1

Re: [PATCH] sched/core: Do not migrate ineligible tasks in sched_balance_rq()

Posted by Peter Zijlstra 1 year, 2 months ago

On Thu, Nov 28, 2024 at 04:48:58PM +0800, Hao Jia wrote:
> From: Hao Jia <jiahao1@lixiang.com>
> 
> When the PLACE_LAG scheduling feature is enabled, if a task
> is ineligible (lag < 0) on the source cpu runqueue, it will
> also be ineligible when it is migrated to the destination
> cpu runqueue.
> 
> Because we will keep the original equivalent lag of
> the task in place_entity(). So if the task was ineligible
> before, it will still be ineligible after migration.

This is not accurate, it will be eleigible, irrespective of lag, if
there are no other tasks. I think your patch tries to do this, but I'm
fairly sure it got it wrong.

> Therefore, we should skip the migration of ineligible tasks
> to reduce ineffective task migrations, just like the task
> throttled by cfs_bandwidth, until they become eligible.

And this misses an important case too -- load-balancing will try very
hard to balance load. If you disallow migrating tasks it might fail to
reach this goal. Since this is not a hard contraint it should eventually
give in and migrate it anyway.

That is, I would suggest allowing it when nr_balance_failed is non-zero.

> Signed-off-by: Hao Jia <jiahao1@lixiang.com>
> ---
>  kernel/sched/fair.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fbdca89c677f..5564e16b6fdb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9358,13 +9358,14 @@ static inline int migrate_degrades_locality(struct task_struct *p,
>  static
>  int can_migrate_task(struct task_struct *p, struct lb_env *env)
>  {
> +	struct cfs_rq *cfs_rq = task_cfs_rq(p);
>  	int tsk_cache_hot;
>  
>  	lockdep_assert_rq_held(env->src_rq);
>  
>  	/*
>  	 * We do not migrate tasks that are:
> -	 * 1) throttled_lb_pair, or
> +	 * 1) throttled_lb_pair, or task ineligible, or
>  	 * 2) cannot be migrated to this CPU due to cpus_ptr, or
>  	 * 3) running (obviously), or
>  	 * 4) are cache-hot on their current CPU.
> @@ -9372,6 +9373,10 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>  	if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
>  		return 0;
>  
> +	if (sched_feat(PLACE_LAG) && cfs_rq->nr_running &&
> +			!entity_eligible(cfs_rq, &p->se))

Your indenting it wrong, please use: cino=(0:0

> +		return 0;
> +
>  	/* Disregard percpu kthreads; they are where they need to be. */
>  	if (kthread_is_per_cpu(p))
>  		return 0;
> -- 
> 2.34.1
>

Re: [PATCH] sched/core: Do not migrate ineligible tasks in sched_balance_rq()

Posted by Hao Jia 1 year, 2 months ago


On 2024/11/28 17:19, Peter Zijlstra wrote:
> On Thu, Nov 28, 2024 at 04:48:58PM +0800, Hao Jia wrote:
>> From: Hao Jia <jiahao1@lixiang.com>
>>
>> When the PLACE_LAG scheduling feature is enabled, if a task
>> is ineligible (lag < 0) on the source cpu runqueue, it will
>> also be ineligible when it is migrated to the destination
>> cpu runqueue.
>>
>> Because we will keep the original equivalent lag of
>> the task in place_entity(). So if the task was ineligible
>> before, it will still be ineligible after migration.
> 
> This is not accurate, it will be eleigible, irrespective of lag, if
> there are no other tasks. I think your patch tries to do this, but I'm
> fairly sure it got it wrong.


Thank you for your reply. The expression in my commit message is 
inaccurate, and I will correct it in the patch v2. If I understand 
correctly, a task meeting the following conditions:

  sched_feat(PLACE_LAG) && cfs_rq->nr_running && 
!entity_eligible(cfs_rq, &p->se),

will remain ineligible both before and after migration.

If I am wrong, please correct me. Thank you!



> 
>> Therefore, we should skip the migration of ineligible tasks
>> to reduce ineffective task migrations, just like the task
>> throttled by cfs_bandwidth, until they become eligible.
> 
> And this misses an important case too -- load-balancing will try very
> hard to balance load. If you disallow migrating tasks it might fail to
> reach this goal. Since this is not a hard contraint it should eventually
> give in and migrate it anyway.
> 
> That is, I would suggest allowing it when nr_balance_failed is non-zero.
> 

Thank you for your suggestion, I will do it in the patch v2.

Thanks,
Hao

>> Signed-off-by: Hao Jia <jiahao1@lixiang.com>
>> ---
>>   kernel/sched/fair.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index fbdca89c677f..5564e16b6fdb 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -9358,13 +9358,14 @@ static inline int migrate_degrades_locality(struct task_struct *p,
>>   static
>>   int can_migrate_task(struct task_struct *p, struct lb_env *env)
>>   {
>> +	struct cfs_rq *cfs_rq = task_cfs_rq(p);
>>   	int tsk_cache_hot;
>>   
>>   	lockdep_assert_rq_held(env->src_rq);
>>   
>>   	/*
>>   	 * We do not migrate tasks that are:
>> -	 * 1) throttled_lb_pair, or
>> +	 * 1) throttled_lb_pair, or task ineligible, or
>>   	 * 2) cannot be migrated to this CPU due to cpus_ptr, or
>>   	 * 3) running (obviously), or
>>   	 * 4) are cache-hot on their current CPU.
>> @@ -9372,6 +9373,10 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>>   	if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
>>   		return 0;
>>   
>> +	if (sched_feat(PLACE_LAG) && cfs_rq->nr_running &&
>> +			!entity_eligible(cfs_rq, &p->se))
> 
> Your indenting it wrong, please use: cino=(0:0

I will do it in the patch v2.

Thanks,
Hao

> 
>> +		return 0;
>> +
>>   	/* Disregard percpu kthreads; they are where they need to be. */
>>   	if (kthread_is_per_cpu(p))
>>   		return 0;
>> -- 
>> 2.34.1
>>

Re: [PATCH] sched/core: Do not migrate ineligible tasks in sched_balance_rq()

Posted by Peter Zijlstra 1 year, 2 months ago

On Thu, Nov 28, 2024 at 07:30:37PM +0800, Hao Jia wrote:
> 
> 
> On 2024/11/28 17:19, Peter Zijlstra wrote:
> > On Thu, Nov 28, 2024 at 04:48:58PM +0800, Hao Jia wrote:
> > > From: Hao Jia <jiahao1@lixiang.com>
> > > 
> > > When the PLACE_LAG scheduling feature is enabled, if a task
> > > is ineligible (lag < 0) on the source cpu runqueue, it will
> > > also be ineligible when it is migrated to the destination
> > > cpu runqueue.
> > > 
> > > Because we will keep the original equivalent lag of
> > > the task in place_entity(). So if the task was ineligible
> > > before, it will still be ineligible after migration.
> > 
> > This is not accurate, it will be eleigible, irrespective of lag, if
> > there are no other tasks. I think your patch tries to do this, but I'm
> > fairly sure it got it wrong.
> 
> Thank you for your reply. The expression in my commit message is inaccurate,
> and I will correct it in the patch v2. If I understand correctly, a task
> meeting the following conditions:
> 
>  sched_feat(PLACE_LAG) && cfs_rq->nr_running && !entity_eligible(cfs_rq,
> &p->se),
> 
> will remain ineligible both before and after migration.
> 
> If I am wrong, please correct me. Thank you!

Problem is you're checking the wrong nr_running. 

> > > @@ -9358,13 +9358,14 @@ static inline int migrate_degrades_locality(struct task_struct *p,
> > >   static
> > >   int can_migrate_task(struct task_struct *p, struct lb_env *env)
> > >   {
> > > +	struct cfs_rq *cfs_rq = task_cfs_rq(p);

This is task's current cfs_rq. What you're interested in is destination
cfs_rq. If the destination is empty, then lag is irrelevant.

You want something like:

#if CONFIG_FAIR_GROUP_SCHED
	struct task_group *tg = task_group(p);
	struct cfs_rq *dst_cfs_rq = tg->cfs_rq[env->dst_cpu];
#else
	struct cfs_rq = &env->dst_rq->cfs_rq;
#endif


Also, please add benchmark details that show this actually makes a
difference.

Notably we keep rq->cfs_tasks in MRU order; most recently ran task is
head and balancing takes from the tail, the task longest not ran.

The task longest not ran should have build up eligibility.

Re: [PATCH] sched/core: Do not migrate ineligible tasks in sched_balance_rq()

Posted by Hao Jia 1 year, 2 months ago


On 2024/11/28 19:47, Peter Zijlstra wrote:
> On Thu, Nov 28, 2024 at 07:30:37PM +0800, Hao Jia wrote:
>>
>>
>> On 2024/11/28 17:19, Peter Zijlstra wrote:
>>> On Thu, Nov 28, 2024 at 04:48:58PM +0800, Hao Jia wrote:
>>>> From: Hao Jia <jiahao1@lixiang.com>
>>>>
>>>> When the PLACE_LAG scheduling feature is enabled, if a task
>>>> is ineligible (lag < 0) on the source cpu runqueue, it will
>>>> also be ineligible when it is migrated to the destination
>>>> cpu runqueue.
>>>>
>>>> Because we will keep the original equivalent lag of
>>>> the task in place_entity(). So if the task was ineligible
>>>> before, it will still be ineligible after migration.
>>>
>>> This is not accurate, it will be eleigible, irrespective of lag, if
>>> there are no other tasks. I think your patch tries to do this, but I'm
>>> fairly sure it got it wrong.
>>
>> Thank you for your reply. The expression in my commit message is inaccurate,
>> and I will correct it in the patch v2. If I understand correctly, a task
>> meeting the following conditions:
>>
>>   sched_feat(PLACE_LAG) && cfs_rq->nr_running && !entity_eligible(cfs_rq,
>> &p->se),
>>
>> will remain ineligible both before and after migration.
>>
>> If I am wrong, please correct me. Thank you!
> 
> Problem is you're checking the wrong nr_running.

Oops, that's my mistake. Thank you very much for pointing it out.

I should be focusing on the nr_running of the destination cfs_rq.

Thanks,
Hao

> 
>>>> @@ -9358,13 +9358,14 @@ static inline int migrate_degrades_locality(struct task_struct *p,
>>>>    static
>>>>    int can_migrate_task(struct task_struct *p, struct lb_env *env)
>>>>    {
>>>> +	struct cfs_rq *cfs_rq = task_cfs_rq(p);
> 
> This is task's current cfs_rq. What you're interested in is destination
> cfs_rq. If the destination is empty, then lag is irrelevant.
> 
> You want something like:
> 
> #if CONFIG_FAIR_GROUP_SCHED
> 	struct task_group *tg = task_group(p);
> 	struct cfs_rq *dst_cfs_rq = tg->cfs_rq[env->dst_cpu];
> #else
> 	struct cfs_rq = &env->dst_rq->cfs_rq;
> #endif
> 
> 
> Also, please add benchmark details that show this actually makes a
> difference.
> 
> Notably we keep rq->cfs_tasks in MRU order; most recently ran task is
> head and balancing takes from the tail, the task longest not ran.
> 
> The task longest not ran should have build up eligibility.

Thank you for your suggestion. It might be as you said, that inefficient 
task migrations are relatively rare in load balancing. I will use some 
benchmarks from MMTests for testing.

Thanks,
Hao