kernel/sched/fair.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
The fast path in select_idle_sibling() can place tasks on CPUs without
considering core scheduling constraints, potentially causing immediate
force idle when the sibling runs an incompatible task.
Add cookie compatibility checks before selecting a CPU in the fast path.
This prevents placing waking tasks on CPUs where the sibling is running
an incompatible task, reducing force idle occurrences.
Testing
=======
Perf testing using 10 threads on 8 CPUs on a platform with each core
sporting two hyperthreads.
Each thread is running 1200 iterations of a cycle consisting of a random
period of work between 0-10ms followed by a random period of sleep 0-30ms.
The goal of this configuration is to apply a light load on the system
(33%), with randomization causing opportunities for scheduling placement
including all cases of idle cores, partially idle cores and fully busy
cores.
Each test configuration is run 3 times and time to work completion is
measured and averaged. First configuration doesn't use cookies, second
configuration groups the 10 threads in 5 pairs of cookies, third
configuration also groups the 10 threads in 5 pairs of cookies, but with
this patch applied.
Test Results (seconds)
Configuration Run 1 Run 2 Run 3 Average Overhead
No cookie 24.653 24.454 24.586 24.564 0.000
Cookie 25.896 26.256 25.979 26.044 1.480
Cookie + patch 25.346 25.007 25.042 25.132 0.568
The patch reduces cookie overhead by 61.7% (0.568/1.480 = 0.383)
Signed-off-by: Fernand Sieber <sieberf@amazon.com>
---
kernel/sched/fair.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5b752324270b..90ceb3da2251 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7647,7 +7647,7 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
*/
if (!cpumask_test_cpu(cpu, sched_domain_span(sd)))
continue;
- if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+ if (__select_idle_cpu(cpu, p) != -1)
return cpu;
}
@@ -7841,6 +7841,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
lockdep_assert_irqs_disabled();
if ((available_idle_cpu(target) || sched_idle_cpu(target)) &&
+ sched_core_cookie_match(cpu_rq(target), p) &&
asym_fits_cpu(task_util, util_min, util_max, target))
return target;
@@ -7849,6 +7850,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
*/
if (prev != target && cpus_share_cache(prev, target) &&
(available_idle_cpu(prev) || sched_idle_cpu(prev)) &&
+ sched_core_cookie_match(cpu_rq(prev), p) &&
asym_fits_cpu(task_util, util_min, util_max, prev)) {
if (!static_branch_unlikely(&sched_cluster_active) ||
@@ -7881,6 +7883,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
recent_used_cpu != target &&
cpus_share_cache(recent_used_cpu, target) &&
(available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
+ sched_core_cookie_match(cpu_rq(recent_used_cpu), p) &&
cpumask_test_cpu(recent_used_cpu, p->cpus_ptr) &&
asym_fits_cpu(task_util, util_min, util_max, recent_used_cpu)) {
--
2.34.1
Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07
Hello Fernand,
On 11/20/2025 3:49 PM, Fernand Sieber wrote:
> The fast path in select_idle_sibling() can place tasks on CPUs without
> considering core scheduling constraints, potentially causing immediate
> force idle when the sibling runs an incompatible task.
>
> Add cookie compatibility checks before selecting a CPU in the fast path.
> This prevents placing waking tasks on CPUs where the sibling is running
> an incompatible task, reducing force idle occurrences.
>
> Testing
> =======
>
> Perf testing using 10 threads on 8 CPUs on a platform with each core
> sporting two hyperthreads.
> Each thread is running 1200 iterations of a cycle consisting of a random
> period of work between 0-10ms followed by a random period of sleep 0-30ms.
>
> The goal of this configuration is to apply a light load on the system
> (33%), with randomization causing opportunities for scheduling placement
> including all cases of idle cores, partially idle cores and fully busy
> cores.
>
> Each test configuration is run 3 times and time to work completion is
> measured and averaged. First configuration doesn't use cookies, second
> configuration groups the 10 threads in 5 pairs of cookies, third
> configuration also groups the 10 threads in 5 pairs of cookies, but with
> this patch applied.
>
> Test Results (seconds)
>
> Configuration Run 1 Run 2 Run 3 Average Overhead
> No cookie 24.653 24.454 24.586 24.564 0.000
> Cookie 25.896 26.256 25.979 26.044 1.480
> Cookie + patch 25.346 25.007 25.042 25.132 0.568
>
> The patch reduces cookie overhead by 61.7% (0.568/1.480 = 0.383)
Thank you for including the benchmark numbers.
>
> Signed-off-by: Fernand Sieber <sieberf@amazon.com>
> ---
> kernel/sched/fair.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5b752324270b..90ceb3da2251 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7647,7 +7647,7 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
> */
> if (!cpumask_test_cpu(cpu, sched_domain_span(sd)))
> continue;
> - if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> + if (__select_idle_cpu(cpu, p) != -1)
> return cpu;
> }
>
> @@ -7841,6 +7841,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> lockdep_assert_irqs_disabled();
>
> if ((available_idle_cpu(target) || sched_idle_cpu(target)) &&
> + sched_core_cookie_match(cpu_rq(target), p) &&
nit.
You can replace the whole
(available_idle_cpu() || sched_idle_cpu()) && sched_core_cookie_match()
with "__select_idle_cpu() != 1" but since this pattern keeps repeating,
you can perhaps extract it into a helper __is_idle_cpu() that returns a
boolean and you can add a follow up cleanup to convert all the:
idle_cpu = __select_idle_cpu(cpu, p);
if ((unsigned int)idle_cpu < nr_cpumask_bits)
return idle_cpu;
pattern to:
if (__is_idle_cpu(cpu, p))
return cpu;
after which select_idle_core() for !CONFIG_SCHED_SMT would be the only
user of __select_idle_cpu() and you can move that entire logic into the
select_idle_core() function. Thoughts?
Apart from that, these changes look good to me since they are making all
the idle cpu checks along the wakeup path consistent with those in
select_idle_cpu() so feel free to add:
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
> asym_fits_cpu(task_util, util_min, util_max, target))
> return target;
>
> @@ -7849,6 +7850,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> */
> if (prev != target && cpus_share_cache(prev, target) &&
> (available_idle_cpu(prev) || sched_idle_cpu(prev)) &&
> + sched_core_cookie_match(cpu_rq(prev), p) &&
> asym_fits_cpu(task_util, util_min, util_max, prev)) {
>
> if (!static_branch_unlikely(&sched_cluster_active) ||
> @@ -7881,6 +7883,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> recent_used_cpu != target &&
> cpus_share_cache(recent_used_cpu, target) &&
> (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
> + sched_core_cookie_match(cpu_rq(recent_used_cpu), p) &&
> cpumask_test_cpu(recent_used_cpu, p->cpus_ptr) &&
> asym_fits_cpu(task_util, util_min, util_max, recent_used_cpu)) {
>
--
Thanks and Regards,
Prateek
Hi Prateek, Thank you for taking the time to review my patch. On 11/21/2025 2:54 AM, K Prateek Nayak wrote: > nit. > > You can replace the whole > > (available_idle_cpu() || sched_idle_cpu()) && sched_core_cookie_match() > > with "__select_idle_cpu() != 1" [...] I'm happy to do some code cleanup, however note that `__select_idle_cpu` is using `sched_cpu_cookie_match` whereas at this point in `select_idle_sibling` we don't know whether there's a full idle core or not available so I am proposing to use `sched_core_cookie_match`. We might be able to do the following alternatives: * use `__select_idle_cpu` (e.g `sched_cpu_cookie_match`) and ignore the case where the cpu might be idle but one of its siblings run an incompatible cookie. Perhaps simpler code but could yield more force idle. * use `test_idle_cores` before deciding whether `sched_cpu_cookie_match` or `sched_core_cookie_match` is appropriate. This is the most correct but perhaps overly complex. The approach that I picked has the disadvantages that it might fail some fast heuristics unnecessarily if there are no full idle core available anyways but I thought this was acceptable since the `select_idle_cpu` fallback at the end of the function seems reasonable. Let me know what you think. Best regards, Fernand Amazon Development Centre (South Africa) (Proprietary) Limited 29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa Registration Number: 2004 / 034463 / 07
© 2016 - 2025 Red Hat, Inc.