[Patch v2 0/6] Enable Cluster Scheduling for x86 Hybrid CPUs

Tim Chen posted 6 patches 2 years, 8 months ago
There is a newer version of this series
arch/x86/kernel/smpboot.c |   3 +
kernel/sched/debug.c      |   1 +
kernel/sched/fair.c       | 151 ++++++++++++++++++++++++++++++++++++--
kernel/sched/sched.h      |   6 ++
kernel/sched/topology.c   |  10 ++-
5 files changed, 165 insertions(+), 6 deletions(-)
[Patch v2 0/6] Enable Cluster Scheduling for x86 Hybrid CPUs
Posted by Tim Chen 2 years, 8 months ago
This is the second version of patches to fix issues to allow cluster
scheduling on x86 hybrid CPUs.  It addresses concerns by
reviewers in the first version:
https://lore.kernel.org/lkml/CAKfTPtD1W6vJQBsNKEt_4tn2EeAs=73CeH4LoCwENrh2JUDwnQ@mail.gmail.com/T/

The review comments were greatly appreciated.

Changes from v1:
1. Peter pointed out that the number of CPUs in a cluster could
also be modified by bringing CPU on or offline. Balance between the
sibling clusters should not just take into consideration of whether a
cluster has SMT CPUs or pure core CPUs.  In this version, I take the
approach to balance tasks between the clusters such that the
running_tasks/num_cores between the clusters are similar.  This would
accommodate balance between SMT clusters and non-SMT clusters,
or between the same clusters with different number of cores.

2. Vincent pointed out that special case logic in the general path
for detection of fully busy SMT can be simplified.  Fully busy SMT could
be detected during statistics gathering and will make the code cleaner
by detecting such cases there.  This version of the patch series makes
this change.

3. Suggestions by Chen Yu and Hillf Danton to improve commit logs. 

4. Patch by Peter to dump domain sched groups' flags and
include suggestions by Peter to simplify code.

The performance of this version is similar to the previous version for
other benchmarks, though kbuild is about a couple percents worse.
Experiments were done on Alder Lake with 6 P-cores and 8 E-cores,
organized in two clusters of 4 E-core each.

Single Threaded	6.3-rc5 		with cluster 	   Improvement
Benchmark				scheduling	   in Performance
		(run-run deviation) 	(run-run deviation)
-------------------------------------------------------------------------------------------
tjbench		(+/- 0.08%)		(+/- 0.51%)	   -0.34%
PhPbench	(+/- 0.31%)		(+/- 2.48%)	   -0.99%
flac		(+/- 0.58%)		(+/- 0.86%)	   +0.61%
pybench		(+/- 3.16%)		(+/- 3.36%)	   +1.36%


Multi Threaded	6.3-rc5 		with cluster 	   Improvement
Benchmark				scheduling	   in Performance
(-#threads)	(run-run deviation) 	(run-run deviation)
-------------------------------------------------------------------------------------------
Kbuild-8	(+/- 2.90%)		(+/- 0.24%)	   -1.63%
Kbuild-10	(+/- 3.08%)		(+/- 0.47%)	   -2.06%
Kbuild-12	(+/- 3.28%)		(+/- 0.28%)	   -1.38%
Tensor Lite-8	(+/- 4.84%)		(+/- 3.21%)	   -0.57%
Tensor Lite-10	(+/- 0.87%)		(+/- 1.21%)	   -1.00%
Tensor Lite-12	(+/- 1.37%)		(+/- 0.43%)	   -0.05%

Tim Chen

Ricardo Neri (1):
  sched/fair: Consider the idle state of the whole core for load balance

Tim C Chen (5):
  sched/fair: Determine active load balance for SMT sched groups
  sched/topology: Record number of cores in sched group
  sched/fair: Implement prefer sibling imbalance calculation between
    asymmetric groups
  sched/x86: Add cluster topology to hybrid CPU
  sched/debug: Dump domains' sched group flags

 arch/x86/kernel/smpboot.c |   3 +
 kernel/sched/debug.c      |   1 +
 kernel/sched/fair.c       | 151 ++++++++++++++++++++++++++++++++++++--
 kernel/sched/sched.h      |   6 ++
 kernel/sched/topology.c   |  10 ++-
 5 files changed, 165 insertions(+), 6 deletions(-)

-- 
2.32.0