kernel/sched/fair.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
Load imbalance is observed when the workload frequently forks new threads. Due to CPU affinity, the workload can run on CPU 0-7 in the first group, and only on CPU 8-11 in the second group. CPU 12-15 are always idle. { 0 1 2 3 4 5 6 7 } {8 9 10 11 12 13 14 15} * * * * * * * * * * * * When looking for dst group for newly forked threads, in many times update_sg_wakeup_stats() reports the second group has more idle CPUs than the first group. The scheduler thinks the second group is less busy. Then it selects least busy CPUs among CPU 8-11. So CPU 8-11 can be crowded with newly forked threads, at the same time CPU 0-7 can be idle. The first patch 'Only update stats of allowed CPUs when looking for dst group' *alone* can fix this imbalance issue. And I think the second patch also makes sense in this scenario. If group weight includes CPUs a task cannot use, group classification can be incorrect. Please comment. Adam Li (2): sched/fair: Only update stats of allowed CPUs when looking for dst group sched/fair: Only count group weight for allowed CPUs when looking for dst group kernel/sched/fair.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) -- 2.34.1
On Tue, Jul 01, 2025 at 02:45:47AM +0000, Adam Li wrote: > Load imbalance is observed when the workload frequently forks new threads. > Due to CPU affinity, the workload can run on CPU 0-7 in the first > group, and only on CPU 8-11 in the second group. CPU 12-15 are always idle. > > { 0 1 2 3 4 5 6 7 } {8 9 10 11 12 13 14 15} > * * * * * * * * * * * * > > When looking for dst group for newly forked threads, in many times > update_sg_wakeup_stats() reports the second group has more idle CPUs > than the first group. The scheduler thinks the second group is less > busy. Then it selects least busy CPUs among CPU 8-11. So CPU 8-11 can be > crowded with newly forked threads, at the same time CPU 0-7 can be idle. > > The first patch 'Only update stats of allowed CPUs when looking for dst > group' *alone* can fix this imbalance issue. > > And I think the second patch also makes sense in this scenario. If group > weight includes CPUs a task cannot use, group classification can be > incorrect. Please comment. > > Adam Li (2): > sched/fair: Only update stats of allowed CPUs when looking for dst > group > sched/fair: Only count group weight for allowed CPUs when looking for > dst group > > kernel/sched/fair.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > Hurm... so the thing I noticed is that update_sg_wakeup_stats() and update_sg_lb_stats() are *very* similar. Specifically, the first patch does something to wakeup_stats that lb_stats already does. While the second patch seems to do something that might also apply to lb_stats. Is there no way we can unify this?
On 7/4/2025 5:17 PM, Peter Zijlstra wrote: > On Tue, Jul 01, 2025 at 02:45:47AM +0000, Adam Li wrote: >> Load imbalance is observed when the workload frequently forks new threads. >> Due to CPU affinity, the workload can run on CPU 0-7 in the first >> group, and only on CPU 8-11 in the second group. CPU 12-15 are always idle. >> >> { 0 1 2 3 4 5 6 7 } {8 9 10 11 12 13 14 15} >> * * * * * * * * * * * * >> >> When looking for dst group for newly forked threads, in many times >> update_sg_wakeup_stats() reports the second group has more idle CPUs >> than the first group. The scheduler thinks the second group is less >> busy. Then it selects least busy CPUs among CPU 8-11. So CPU 8-11 can be >> crowded with newly forked threads, at the same time CPU 0-7 can be idle. >> >> The first patch 'Only update stats of allowed CPUs when looking for dst >> group' *alone* can fix this imbalance issue. >> >> And I think the second patch also makes sense in this scenario. If group >> weight includes CPUs a task cannot use, group classification can be >> incorrect. Please comment. >> >> Adam Li (2): >> sched/fair: Only update stats of allowed CPUs when looking for dst >> group >> sched/fair: Only count group weight for allowed CPUs when looking for >> dst group >> >> kernel/sched/fair.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> Hi Peter, > > Hurm... so the thing I noticed is that update_sg_wakeup_stats() and > update_sg_lb_stats() are *very* similar. > > Specifically, the first patch does something to wakeup_stats that > lb_stats already does. While the second patch seems to do something that > might also apply to lb_stats. > Thanks for the idea. I am testing this. > Is there no way we can unify this? I will try to unify the common logic in update_sg_wakeup_stats() and update_sg_lb_stats() into a single function. It seems not easy :). Thanks, -adam
© 2016 - 2025 Red Hat, Inc.