[PATCH v2 03/12] sched: Optimize ttwu() / select_task_rq()

Peter Zijlstra posted 12 patches 3 months, 1 week ago
[PATCH v2 03/12] sched: Optimize ttwu() / select_task_rq()
Posted by Peter Zijlstra 3 months, 1 week ago
Optimize ttwu() by pushing select_idle_siblings() up above waiting for
on_cpu(). This allows making use of the cycles otherwise spend waiting
to search for an idle CPU.

One little detail is that since the task we're looking for an idle CPU
for might still be on the CPU, that CPU won't report as running the
idle task, and thus won't find his own CPU idle, even when it is.

To compensate, remove the 'rq->curr == rq->idle' condition from
idle_cpu() -- it doesn't really make sense anyway.

Additionally, Chris found (concurrently) that perf-c2c reported that
test as being a cache-miss monster.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250520101727.620602459@infradead.org
---
 kernel/sched/core.c     |    5 +++--
 kernel/sched/syscalls.c |    3 ---
 2 files changed, 3 insertions(+), 5 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3593,7 +3593,7 @@ int select_task_rq(struct task_struct *p
 		cpu = p->sched_class->select_task_rq(p, cpu, *wake_flags);
 		*wake_flags |= WF_RQ_SELECTED;
 	} else {
-		cpu = cpumask_any(p->cpus_ptr);
+		cpu = task_cpu(p);
 	}
 
 	/*
@@ -4309,6 +4309,8 @@ int try_to_wake_up(struct task_struct *p
 		    ttwu_queue_wakelist(p, task_cpu(p), wake_flags))
 			break;
 
+		cpu = select_task_rq(p, p->wake_cpu, &wake_flags);
+
 		/*
 		 * If the owning (remote) CPU is still in the middle of schedule() with
 		 * this task as prev, wait until it's done referencing the task.
@@ -4320,7 +4322,6 @@ int try_to_wake_up(struct task_struct *p
 		 */
 		smp_cond_load_acquire(&p->on_cpu, !VAL);
 
-		cpu = select_task_rq(p, p->wake_cpu, &wake_flags);
 		if (task_cpu(p) != cpu) {
 			if (p->in_iowait) {
 				delayacct_blkio_end(p);
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -203,9 +203,6 @@ int idle_cpu(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 
-	if (rq->curr != rq->idle)
-		return 0;
-
 	if (rq->nr_running)
 		return 0;
Re: [PATCH v2 03/12] sched: Optimize ttwu() / select_task_rq()
Posted by Mel Gorman 2 months, 3 weeks ago
On Wed, Jul 02, 2025 at 01:49:27PM +0200, Peter Zijlstra wrote:
> Optimize ttwu() by pushing select_idle_siblings() up above waiting for
> on_cpu(). This allows making use of the cycles otherwise spend waiting
> to search for an idle CPU.
> 
> One little detail is that since the task we're looking for an idle CPU
> for might still be on the CPU, that CPU won't report as running the
> idle task, and thus won't find his own CPU idle, even when it is.
> 
> To compensate, remove the 'rq->curr == rq->idle' condition from
> idle_cpu() -- it doesn't really make sense anyway.
> 
> Additionally, Chris found (concurrently) that perf-c2c reported that
> test as being a cache-miss monster.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20250520101727.620602459@infradead.org

*facepalm*

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs
Re: [PATCH v2 03/12] sched: Optimize ttwu() / select_task_rq()
Posted by Vincent Guittot 2 months, 4 weeks ago
On Wed, 2 Jul 2025 at 14:12, Peter Zijlstra <peterz@infradead.org> wrote:
>
> Optimize ttwu() by pushing select_idle_siblings() up above waiting for
> on_cpu(). This allows making use of the cycles otherwise spend waiting
> to search for an idle CPU.
>
> One little detail is that since the task we're looking for an idle CPU
> for might still be on the CPU, that CPU won't report as running the
> idle task, and thus won't find his own CPU idle, even when it is.
>
> To compensate, remove the 'rq->curr == rq->idle' condition from
> idle_cpu() -- it doesn't really make sense anyway.
>
> Additionally, Chris found (concurrently) that perf-c2c reported that
> test as being a cache-miss monster.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20250520101727.620602459@infradead.org

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>


> ---
>  kernel/sched/core.c     |    5 +++--
>  kernel/sched/syscalls.c |    3 ---
>  2 files changed, 3 insertions(+), 5 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3593,7 +3593,7 @@ int select_task_rq(struct task_struct *p
>                 cpu = p->sched_class->select_task_rq(p, cpu, *wake_flags);
>                 *wake_flags |= WF_RQ_SELECTED;
>         } else {
> -               cpu = cpumask_any(p->cpus_ptr);
> +               cpu = task_cpu(p);
>         }
>
>         /*
> @@ -4309,6 +4309,8 @@ int try_to_wake_up(struct task_struct *p
>                     ttwu_queue_wakelist(p, task_cpu(p), wake_flags))
>                         break;
>
> +               cpu = select_task_rq(p, p->wake_cpu, &wake_flags);
> +
>                 /*
>                  * If the owning (remote) CPU is still in the middle of schedule() with
>                  * this task as prev, wait until it's done referencing the task.
> @@ -4320,7 +4322,6 @@ int try_to_wake_up(struct task_struct *p
>                  */
>                 smp_cond_load_acquire(&p->on_cpu, !VAL);
>
> -               cpu = select_task_rq(p, p->wake_cpu, &wake_flags);
>                 if (task_cpu(p) != cpu) {
>                         if (p->in_iowait) {
>                                 delayacct_blkio_end(p);
> --- a/kernel/sched/syscalls.c
> +++ b/kernel/sched/syscalls.c
> @@ -203,9 +203,6 @@ int idle_cpu(int cpu)
>  {
>         struct rq *rq = cpu_rq(cpu);
>
> -       if (rq->curr != rq->idle)
> -               return 0;
> -
>         if (rq->nr_running)
>                 return 0;
>
>
>