[PATCH v3] sched/fair: do not scan twice in detach_tasks()

Huang Shijie posted 1 patch 2 months, 2 weeks ago
There is a newer version of this series
kernel/sched/fair.c | 9 +++++++++
1 file changed, 9 insertions(+)
[PATCH v3] sched/fair: do not scan twice in detach_tasks()
Posted by Huang Shijie 2 months, 2 weeks ago
When detach_tasks() scans the src_cpu's task list, the task list
may shrink during the scanning. For example, the task list
may have four tasks at the beginning, it may becomes to two
during the scanning in detach_tasks():
    Task list at beginning : "ABCD"
    Task list in scanning  : "CD"

    (ABCD stands for differnt tasks.)

In this scenario, the env->loop_max is still four, so
detach_tasks() may scan twice for some tasks:
    the scanning order maybe : "DCDC"

In the Specjbb test, this issue can be catched many times.
(Over 330,000 times in a 30-min Specjbb test)

The patch introduces "first_back" to record the first task which
is put back to the task list. If we get a task which is equal to
first_back, we break the loop, and avoid to scan twice for it.

Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
---
v2 --> v3:
    Fix a typo in the commit message.
    v2: https://lore.kernel.org/all/20250718054709.8781-1-shijie@os.amperecomputing.com/

v1 --> v2:
    Add more comment from Valentin Schneider
    v1: https://lore.kernel.org/all/20250707083636.38380-1-shijie@os.amperecomputing.com/
---
 kernel/sched/fair.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7e2963efe800..7cc9d50e3e11 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9443,6 +9443,7 @@ static int detach_tasks(struct lb_env *env)
 {
 	struct list_head *tasks = &env->src_rq->cfs_tasks;
 	unsigned long util, load;
+	struct task_struct *first_back = NULL;
 	struct task_struct *p;
 	int detached = 0;
 
@@ -9481,6 +9482,12 @@ static int detach_tasks(struct lb_env *env)
 		}
 
 		p = list_last_entry(tasks, struct task_struct, se.group_node);
+		/*
+		 * We're back to an already visited task that couldn't be
+		 * detached, we've seen all there is to see.
+		 */
+		if (p == first_back)
+			break;
 
 		if (!can_migrate_task(p, env))
 			goto next;
@@ -9562,6 +9569,8 @@ static int detach_tasks(struct lb_env *env)
 			schedstat_inc(p->stats.nr_failed_migrations_hot);
 
 		list_move(&p->se.group_node, tasks);
+		if (!first_back)
+			first_back = p;
 	}
 
 	/*
-- 
2.40.1
Re: [PATCH v3] sched/fair: do not scan twice in detach_tasks()
Posted by Valentin Schneider 2 months, 2 weeks ago
On 18/07/25 14:35, Huang Shijie wrote:
> When detach_tasks() scans the src_cpu's task list, the task list
> may shrink during the scanning. For example, the task list
> may have four tasks at the beginning, it may becomes to two
> during the scanning in detach_tasks():
>     Task list at beginning : "ABCD"
>     Task list in scanning  : "CD"
>
>     (ABCD stands for differnt tasks.)
>
> In this scenario, the env->loop_max is still four, so
> detach_tasks() may scan twice for some tasks:
>     the scanning order maybe : "DCDC"
>

How about something like so:
"""
detach_tasks() uses struct lb_env.loop_max as an env.src_rq->cfs_tasks
iteration count limit. It is however set without the source RQ lock held,
and besides detach_tasks() can be re-invoked after releasing and
re-acquiring the RQ lock per LBF_NEED_BREAK.

This means that env.loop_max and the actual length of env.src_rq->cfs_tasks
as observed within detach_tasks() can differ. This can cause some tasks to
be unnecessarily iterated over more than once, for instance:

  env.loop_max := 4
  detach_tasks()
    // Here env.src->cfs_tasks only contains two tasks which can't be
    // migrated anywhere, so they're put back in the list each time.
    env.src->cfs_tasks := [p1, p0]
    // The iteration goes:
    p0; cfs_tasks := [p0, p1]
    p1; cfs_tasks := [p1, p0]
    p0; cfs_tasks := [p0, p1]
    p1; cfs_tasks := [p1, p0]

    // IOW we iterate over each task twice
"""

> In the Specjbb test, this issue can be catched many times.
                                         ^^^^^^^
                                         caught

> (Over 330,000 times in a 30-min Specjbb test)
>
> The patch introduces "first_back" to record the first task which
> is put back to the task list. If we get a task which is equal to
> first_back, we break the loop, and avoid to scan twice for it.
>
> Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>

Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Re: [PATCH v3] sched/fair: do not scan twice in detach_tasks()
Posted by Shijie Huang 2 months, 2 weeks ago
On 2025/7/19 2:49, Valentin Schneider wrote:
> On 18/07/25 14:35, Huang Shijie wrote:
>> When detach_tasks() scans the src_cpu's task list, the task list
>> may shrink during the scanning. For example, the task list
>> may have four tasks at the beginning, it may becomes to two
>> during the scanning in detach_tasks():
>>      Task list at beginning : "ABCD"
>>      Task list in scanning  : "CD"
>>
>>      (ABCD stands for differnt tasks.)
>>
>> In this scenario, the env->loop_max is still four, so
>> detach_tasks() may scan twice for some tasks:
>>      the scanning order maybe : "DCDC"
>>
> How about something like so:
> """
> detach_tasks() uses struct lb_env.loop_max as an env.src_rq->cfs_tasks
> iteration count limit. It is however set without the source RQ lock held,
> and besides detach_tasks() can be re-invoked after releasing and
> re-acquiring the RQ lock per LBF_NEED_BREAK.
>
> This means that env.loop_max and the actual length of env.src_rq->cfs_tasks
> as observed within detach_tasks() can differ. This can cause some tasks to
> be unnecessarily iterated over more than once, for instance:
>
>    env.loop_max := 4
>    detach_tasks()
>      // Here env.src->cfs_tasks only contains two tasks which can't be
>      // migrated anywhere, so they're put back in the list each time.
>      env.src->cfs_tasks := [p1, p0]
>      // The iteration goes:
>      p0; cfs_tasks := [p0, p1]
>      p1; cfs_tasks := [p1, p0]
>      p0; cfs_tasks := [p0, p1]
>      p1; cfs_tasks := [p1, p0]
>
>      // IOW we iterate over each task twice
> """
Okay, I will change the commit message later..
>> In the Specjbb test, this issue can be catched many times.
>                                           ^^^^^^^
>                                           caught
>
>> (Over 330,000 times in a 30-min Specjbb test)
>>
>> The patch introduces "first_back" to record the first task which
>> is put back to the task list. If we get a task which is equal to
>> first_back, we break the loop, and avoid to scan twice for it.
>>
>> Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
> Reviewed-by: Valentin Schneider <vschneid@redhat.com>


Thanks

Huang Shijie