kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Load imbalance is observed when the workload frequently forks new threads.
Due to CPU affinity, the workload can run on CPU 0-7 in the first
group, and only on CPU 8-11 in the second group. CPU 12-15 are always idle.
{ 0 1 2 3 4 5 6 7 } {8 9 10 11 12 13 14 15}
* * * * * * * * * * * *
When looking for dst group for newly forked threads, in many times
update_sg_wakeup_stats() reports the second group has more idle CPUs
than the first group. The scheduler thinks the second group is less
busy. Then it selects least busy CPUs among CPU 8-11. Therefore CPU 8-11
can be crowded with newly forked threads, at the same time CPU 0-7
can be idle.
A task may not use all the CPUs in a schedule group due to CPU affinity.
Only update schedule group statistics for allowed CPUs.
Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
Resending this patch from the patchset:
https://lore.kernel.org/lkml/20250717062036.432243-2-adamli@os.amperecomputing.com/
Only changed commit message. The single patch may be easier for reviewing.
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bc0b7ce8a65d..d5ec15050ebc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10671,7 +10671,7 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
if (sd->flags & SD_ASYM_CPUCAPACITY)
sgs->group_misfit_task_load = 1;
- for_each_cpu(i, sched_group_span(group)) {
+ for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
struct rq *rq = cpu_rq(i);
unsigned int local;
--
2.34.1
On Sat, Oct 11, 2025 at 06:43:22AM +0000, Adam Li wrote:
> Load imbalance is observed when the workload frequently forks new threads.
> Due to CPU affinity, the workload can run on CPU 0-7 in the first
> group, and only on CPU 8-11 in the second group. CPU 12-15 are always idle.
>
> { 0 1 2 3 4 5 6 7 } {8 9 10 11 12 13 14 15}
> * * * * * * * * * * * *
>
> When looking for dst group for newly forked threads, in many times
> update_sg_wakeup_stats() reports the second group has more idle CPUs
> than the first group. The scheduler thinks the second group is less
> busy. Then it selects least busy CPUs among CPU 8-11. Therefore CPU 8-11
> can be crowded with newly forked threads, at the same time CPU 0-7
> can be idle.
>
> A task may not use all the CPUs in a schedule group due to CPU affinity.
> Only update schedule group statistics for allowed CPUs.
>
> Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
> ---
> Resending this patch from the patchset:
> https://lore.kernel.org/lkml/20250717062036.432243-2-adamli@os.amperecomputing.com/
>
Right, lets start with this then ;-)
No need to do the cpumask_and() thing, that's just more changes vs
update_sg_lb_stats().
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 82d6e01a0699800efd8b048eb584c907ccb47b7a
Gitweb: https://git.kernel.org/tip/82d6e01a0699800efd8b048eb584c907ccb47b7a
Author: Adam Li <adamli@os.amperecomputing.com>
AuthorDate: Sat, 11 Oct 2025 06:43:22
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 16 Oct 2025 11:13:50 +02:00
sched/fair: Only update stats for allowed CPUs when looking for dst group
Load imbalance is observed when the workload frequently forks new threads.
Due to CPU affinity, the workload can run on CPU 0-7 in the first
group, and only on CPU 8-11 in the second group. CPU 12-15 are always idle.
{ 0 1 2 3 4 5 6 7 } {8 9 10 11 12 13 14 15}
* * * * * * * * * * * *
When looking for dst group for newly forked threads, in many times
update_sg_wakeup_stats() reports the second group has more idle CPUs
than the first group. The scheduler thinks the second group is less
busy. Then it selects least busy CPUs among CPU 8-11. Therefore CPU 8-11
can be crowded with newly forked threads, at the same time CPU 0-7
can be idle.
A task may not use all the CPUs in a schedule group due to CPU affinity.
Only update schedule group statistics for allowed CPUs.
Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 00f9d6c..ac881df 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10683,7 +10683,7 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
if (sd->flags & SD_ASYM_CPUCAPACITY)
sgs->group_misfit_task_load = 1;
- for_each_cpu(i, sched_group_span(group)) {
+ for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
struct rq *rq = cpu_rq(i);
unsigned int local;
© 2016 - 2025 Red Hat, Inc.