include/linux/sched.h | 37 +++++++++++--------- kernel/sched/core.c | 81 ++++++++++++++++++++++++++++++++++++++++++- kernel/sched/sched.h | 70 +++++++++++++++++++++++-------------- 3 files changed, 144 insertions(+), 44 deletions(-)
Reorder the fields within the `struct cfs_rq` and `struct sched_entity` to improve cache locality. This can reduce cache misses to improve performance in CFS scheduling-related operations, particularly for servers with hundreds of cores and ~1000 cgroups. The reordering is based on the kernel data-type profiling (https://lwn.net/Articles/955709/) indicating hot fields and fields that frequently accessed together. This reordering aims to optimize cache utilization and improve the performance of scheduling-related functions, particularly `tg_throttle_down`, `tg_unthrottle_up`, and `__update_load_avg_cfs_rq`. The reordering mainly considers performance when `CONFIG_FAIR_GROUP_SCHED` is configured. When it is disabled, there is no CFS bandwidth control and only a single `cfs_rq` exists per CPU, thus its layout would not significantly impact performance. We use a benchmark with multiple cgroup levels to simulate real server load. The benchmark constructs a tree structure hierarchy of cgroups, with “width” and “depth” parameters controlling the number of children per node and the depth of the tree. Each leaf cgroup runs a `schbench` workload and gets an 80% quota of the total CPU quota divided by number of leaf cgroups (in other words, the target CPU load is set to 80%) to exercise the throttling functions. Bandwidth control period is set to 10ms. We run the benchmark on Intel and AMD machines; each machine has hundreds of threads. Kernel LLC load misses for 30 seconds. d3 w10 (wider tree) means a cgroup hierarchy of 3 levels, each level has 10 children, totaling 1000 leaf cgroups. d5 w4 represents a deeper tree with more hierarchies. Each benchmark is run 10 times, the table shows 95% confidence intervals of the kernel LLC misses in millions. | Kernel LLC Misses | d3 w10 | d5 w4 | +-------------------+-------------------+-------------------+ | AMD-orig | [3025.5, 3344.1]M | [3382.4, 3607.8]M | | AMD-opt | [2410.7, 2556.9]M | [2565.4, 2931.2]M | | Change | -22.01% | -21.37% | | Intel-orig | [1157.2, 1249.0]M | [1343.7, 1630.7]M | | Intel-opt | [960.2, 1023.0]M | [1092.7, 1350.7]M | | Change | -17.59% | -17.86% | Since the benchmark limits CPU quota, the RPS results reported by `schbench` did not show statistically significant improvement as it does not reflect the kernel overhead reduction. Perf data shows the reduction of LLC misses percentage within the kernel for the depth 5, width 4 workload. The symbols are taken from the union of top 10 symbols in both original and optimized profiles. | Symbol | Intel-orig | Intel-opt | +---------------------------------------+------------+-----------+ | worker_thread | 75.41% | 78.95% | | tg_unthrottle_up | 3.21% | 1.61% | | tg_throttle_down | 2.42% | 1.77% | | __update_load_avg_cfs_rq | 1.95% | 1.60% | | walk_tg_tree_from | 1.23% | 0.91% | | sched_balance_update_blocked_averages | 1.09% | 1.13% | | sched_balance_rq | 1.03% | 1.08% | | _raw_spin_lock | 1.01% | 1.23% | | task_mm_cid_work | 0.87% | 1.09% | | __update_load_avg_se | 0.78% | 0.48% | | Symbol | AMD-orig | AMD-opt | +---------------------------------------+----------+---------+ | worker_thread | 53.97% | 61.49% | | sched_balance_update_blocked_averages | 3.94% | 2.48% | | __update_load_avg_cfs_rq | 3.52% | 2.62% | | update_load_avg | 2.66% | 2.19% | | tg_throttle_down | 1.99% | 1.57% | | tg_unthrottle_up | 1.98% | 1.34% | | __update_load_avg_se | 1.89% | 1.32% | | walk_tg_tree_from | 1.79% | 1.37% | | sched_clock_noinstr | 1.59% | 1.01% | | sched_balance_rq | 1.53% | 1.26% | | _raw_spin_lock | 1.47% | 1.41% | | task_mm_cid_work | 1.34% | 1.42% | The percentage of the LLC misses in the system is reduced. Zecheng Li (2): sched/fair: Reorder struct cfs_rq sched/fair: Reorder struct sched_entity include/linux/sched.h | 37 +++++++++++--------- kernel/sched/core.c | 81 ++++++++++++++++++++++++++++++++++++++++++- kernel/sched/sched.h | 70 +++++++++++++++++++++++-------------- 3 files changed, 144 insertions(+), 44 deletions(-) base-commit: 38fec10eb60d687e30c8c6b5420d86e8149f7557 -- 2.49.0
© 2016 - 2025 Red Hat, Inc.