kernel/sched/fair.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
Hello,
We have observed an issue in the CFS scheduler's task placement logic
present in kernel versions v6.14 and latest v6.18-rc3.
The function select_idle_sibling() in *fair.c* does not correctly handle
the WF_SYNC wakeup flag, leading to suboptimal placement of newly awakened
child tasks.
The core issue lies in a logical contradiction between wake_affine_idle()
and select_idle_sibling() when sync is true.
1. Intended Behavior (wake_affine_idle): During a sync wakeup (WF_SYNC is
true), the scheduler's intent is to place the child task on this_cpu (the
parent's current CPU) if there is only 1 runnable task. The rationale is
that the parent is expected to go to sleep almost immediately, making its
CPU available. It keeps the child task on a CPU with a hot cache.
static int wake_affine_idle(int this_cpu, int prev_cpu, int sync) {
...
if ((rq->nr_running - cfs_h_nr_delayed(rq)) == 1)
return this_cpu;
...
}
2. Flawed Behavior (select_idle_sibling): When select_task_rq_fair() later
calls select_idle_sibling() with "this_cpu" as the "target", it rejects the
"target" because it is not currently idle (the parent is still running).
Instead, it searches for an actually idle sibling CPU within the same LLC
domain. During a sync wakeup, however, the scheduler assumes the parent
will sleep immediately and should treat the parent’s CPU as effectively
available if it's the only runnable task.
3. The Consequence: The wakee is placed on a remote idle sibling rather
than on the idle parent’s CPU, losing cache locality. The remote CPU may
also have been idle in a deeper C-state and/or at a lower frequency,
further hurting the child’s performance.
Kernel Info
===============
Host OS: on ubuntu24.04, running qemu with "-smp cpus=3,cores=3"
Processor: Two Intel Xeon Silver 4114 10-core CPUs at 2.20 GHz
Kernel Version: v6.14 and latest v6.18-rc3
===============
We attached a patch to fix this issue. Thank you for the effort!
Best,
Tingjia
From 8a2c855768add2eb53c2496a5230a8c1d7c677bf Mon Sep 17 00:00:00 2001
From: Tingjia-0v0 <tjcao980311@gmail.com>
Date: Tue, 28 Oct 2025 02:30:57 -0500
Subject: [PATCH] consider sync wakeup when selecting idle sibling
---
kernel/sched/fair.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cee1793e8277..01420a481c38 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1064,7 +1064,7 @@ static bool update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
#include "pelt.h"
-static int select_idle_sibling(struct task_struct *p, int prev_cpu, int cpu);
+static int select_idle_sibling(struct task_struct *p, int prev_cpu, int cpu, int sync);
static unsigned long task_h_load(struct task_struct *p);
static unsigned long capacity_of(int cpu);
@@ -7810,12 +7810,13 @@ static inline bool asym_fits_cpu(unsigned long util,
/*
* Try and locate an idle core/thread in the LLC cache domain.
*/
-static int select_idle_sibling(struct task_struct *p, int prev, int target)
+static int select_idle_sibling(struct task_struct *p, int prev, int target, int sync)
{
bool has_idle_core = false;
struct sched_domain *sd;
unsigned long task_util, util_min, util_max;
int i, recent_used_cpu, prev_aff = -1;
+ struct rq *rq = cpu_rq(target);
/*
* On asymmetric system, update task utilization because we will check
@@ -7837,6 +7838,9 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
asym_fits_cpu(task_util, util_min, util_max, target))
return target;
+ if (sync && (rq->nr_running - cfs_h_nr_delayed(rq)) == 1)
+ return target;
+
/*
* If the previous CPU is cache affine and idle, don't be stupid:
*/
@@ -8618,7 +8622,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
new_cpu = sched_balance_find_dst_cpu(sd, p, cpu, prev_cpu, sd_flag);
} else if (wake_flags & WF_TTWU) { /* XXX always ? */
/* Fast path */
- new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
+ new_cpu = select_idle_sibling(p, prev_cpu, new_cpu, sync);
}
rcu_read_unlock();
--
2.43.0
© 2016 - 2026 Red Hat, Inc.