[PATCH v2] sched_ext: improve WAKE_SYNC behavior for default idle CPU selection

Andrea Righi posted 1 patch 1 month, 1 week ago
kernel/sched/ext.c | 46 +++++++++++++++++++++++++++++++++-------------
1 file changed, 33 insertions(+), 13 deletions(-)
[PATCH v2] sched_ext: improve WAKE_SYNC behavior for default idle CPU selection
Posted by Andrea Righi 1 month, 1 week ago
In the sched_ext built-in idle CPU selection logic, when handling a
WF_SYNC wakeup, we always attempt to migrate the task to the waker's
CPU, as the waker is expected to yield the CPU after waking the task.

However, it may be preferable to keep the task on its previous CPU if
the waker's CPU is cache-affine.

The same approach is also used by the fair class and in other scx
schedulers, like scx_rusty and scx_bpfland.

Therefore, apply the same logic to the built-in idle CPU selection
policy as well.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
---
 kernel/sched/ext.c | 46 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 33 insertions(+), 13 deletions(-)

ChangeLog v1 -> v2:
  - correctly return prev_cpu in the cache affine case

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 6eae3b69bf6e..a34af6df2f98 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -3087,20 +3087,40 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 	*found = false;
 
 	/*
-	 * If WAKE_SYNC, the waker's local DSQ is empty, and the system is
-	 * under utilized, wake up @p to the local DSQ of the waker. Checking
-	 * only for an empty local DSQ is insufficient as it could give the
-	 * wakee an unfair advantage when the system is oversaturated.
-	 * Checking only for the presence of idle CPUs is also insufficient as
-	 * the local DSQ of the waker could have tasks piled up on it even if
-	 * there is an idle core elsewhere on the system.
-	 */
-	cpu = smp_processor_id();
-	if ((wake_flags & SCX_WAKE_SYNC) &&
-	    !cpumask_empty(idle_masks.cpu) && !(current->flags & PF_EXITING) &&
-	    cpu_rq(cpu)->scx.local_dsq.nr == 0) {
-		if (cpumask_test_cpu(cpu, p->cpus_ptr))
+	 * If WAKE_SYNC, try to migrate the wakee to the waker's CPU.
+	 */
+	if (wake_flags & SCX_WAKE_SYNC) {
+		cpu = smp_processor_id();
+
+		/*
+		 * If the waker's CPU is cache affine and prev_cpu is idle,
+		 * then avoid a migration.
+		 */
+		if (cpus_share_cache(cpu, prev_cpu) &&
+		    test_and_clear_cpu_idle(prev_cpu)) {
+			cpu = prev_cpu;
 			goto cpu_found;
+		}
+
+		/*
+		 * If the waker's local DSQ is empty, and the system is under
+		 * utilized, try to wake up @p to the local DSQ of the waker.
+		 *
+		 * Checking only for an empty local DSQ is insufficient as it
+		 * could give the wakee an unfair advantage when the system is
+		 * oversaturated.
+		 *
+		 * Checking only for the presence of idle CPUs is also
+		 * insufficient as the local DSQ of the waker could have tasks
+		 * piled up on it even if there is an idle core elsewhere on
+		 * the system.
+		 */
+		if (!cpumask_empty(idle_masks.cpu) &&
+		    !(current->flags & PF_EXITING) &&
+		    cpu_rq(cpu)->scx.local_dsq.nr == 0) {
+			if (cpumask_test_cpu(cpu, p->cpus_ptr))
+				goto cpu_found;
+		}
 	}
 
 	/*
-- 
2.47.0
Re: [PATCH v2] sched_ext: improve WAKE_SYNC behavior for default idle CPU selection
Posted by Tejun Heo 1 month, 1 week ago
On Fri, Oct 18, 2024 at 08:09:51PM +0200, Andrea Righi wrote:
> In the sched_ext built-in idle CPU selection logic, when handling a
> WF_SYNC wakeup, we always attempt to migrate the task to the waker's
> CPU, as the waker is expected to yield the CPU after waking the task.
> 
> However, it may be preferable to keep the task on its previous CPU if
> the waker's CPU is cache-affine.
> 
> The same approach is also used by the fair class and in other scx
> schedulers, like scx_rusty and scx_bpfland.
> 
> Therefore, apply the same logic to the built-in idle CPU selection
> policy as well.
> 
> Signed-off-by: Andrea Righi <andrea.righi@linux.dev>

Applied to sched_ext/for-6.13.

Thanks.

-- 
tejun