kernel/sched/cpupri.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
When running a multi-instance ffmpeg transcoding workload which uses rt
thread in a high core count system, cpupri_vec->count contends with the
reading of mask in the same cache line in function cpupri_find_fitness
and cpupri_set.
This change separates each count and mask into different cache lines by
cache aligned attribute to avoid the false sharing.
Tested in a 2 sockets, 240 physical core 480 logical core machine, running
60 ffmpeg transcoding instances. With the change, the kernel cycles% is
reduced from ~20% to ~12%, the fps metric is improved ~11%.
The side effect of this change is that struct cpupri size is increased
from 26 cache lines to 203 cache lines.
Signed-off-by: Pan Deng <pan.deng@intel.com>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
---
kernel/sched/cpupri.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/cpupri.h b/kernel/sched/cpupri.h
index d6cba0020064..245b0fa626be 100644
--- a/kernel/sched/cpupri.h
+++ b/kernel/sched/cpupri.h
@@ -9,7 +9,7 @@
struct cpupri_vec {
atomic_t count;
- cpumask_var_t mask;
+ cpumask_var_t mask ____cacheline_aligned;
};
struct cpupri {
--
2.43.5
Hello,
kernel test robot noticed a 67.7% improvement of stress-ng.mutex.ops_per_sec on:
commit: cd316a87572309a79102940e1856ee877740156e ("[PATCH] sched/rt: optimize cpupri_vec layout")
url: https://github.com/intel-lab-lkp/linux/commits/Pan-Deng/sched-rt-optimize-cpupri_vec-layout/20250612-110857
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git b01f2d9597250e9c4011cb78d8d46287deaa6a69
patch link: https://lore.kernel.org/all/20250612031148.455046-1-pan.deng@intel.com/
patch subject: [PATCH] sched/rt: optimize cpupri_vec layout
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E CPU @ 2.4GHz (Sierra Forest) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: mutex
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250616/202506161643.ab40fa8e-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-srf-2sp2/mutex/stress-ng/60s
commit:
b01f2d9597 ("sched/eevdf: Correct the comment in place_entity")
cd316a8757 ("sched/rt: optimize cpupri_vec layout")
b01f2d9597250e9c cd316a87572309a79102940e185
---------------- ---------------------------
%stddev %change %stddev
\ | \
22409567 +52.5% 34179472 ± 3% cpuidle..usage
21410 ± 30% +26.2% 27010 ± 16% numa-vmstat.node0.nr_slab_reclaimable
0.07 ± 2% +0.0 0.09 ± 2% mpstat.cpu.all.soft%
1.06 +0.6 1.63 ± 3% mpstat.cpu.all.usr%
85656 ± 30% +26.1% 108025 ± 16% numa-meminfo.node0.KReclaimable
85656 ± 30% +26.1% 108025 ± 16% numa-meminfo.node0.SReclaimable
2398650 +60.1% 3839452 ± 2% vmstat.system.cs
1650319 +44.1% 2378651 vmstat.system.in
1821 ± 7% +28.5% 2340 ± 14% perf-c2c.DRAM.local
17138 ± 14% +86.2% 31915 ± 17% perf-c2c.DRAM.remote
91166 ± 16% +134.9% 214147 ± 19% perf-c2c.HITM.local
13399 ± 13% +104.1% 27347 ± 16% perf-c2c.HITM.remote
104565 ± 15% +131.0% 241494 ± 19% perf-c2c.HITM.total
125201 ± 2% -39.4% 75820 ± 2% stress-ng.mutex.nanosecs_per_mutex
85791341 +67.7% 1.438e+08 stress-ng.mutex.ops
1429837 +67.7% 2397156 stress-ng.mutex.ops_per_sec
68606706 +63.5% 1.122e+08 ± 2% stress-ng.time.involuntary_context_switches
9345 -1.3% 9226 stress-ng.time.system_time
99.39 +61.2% 160.24 stress-ng.time.user_time
56563097 +57.6% 89151856 ± 2% stress-ng.time.voluntary_context_switches
7.208e+09 ± 2% +42.9% 1.03e+10 perf-stat.i.branch-instructions
52257508 +47.5% 77078460 ± 2% perf-stat.i.branch-misses
37265287 ± 2% +34.4% 50098262 ± 3% perf-stat.i.cache-misses
2.416e+08 +42.7% 3.449e+08 ± 2% perf-stat.i.cache-references
2500366 +60.9% 4022250 ± 2% perf-stat.i.context-switches
20.66 -29.7% 14.53 perf-stat.i.cpi
490637 +60.9% 789567 ± 2% perf-stat.i.cpu-migrations
15477 ± 4% -25.1% 11585 ± 3% perf-stat.i.cycles-between-cache-misses
3.356e+10 ± 2% +44.2% 4.838e+10 perf-stat.i.instructions
0.06 ± 9% +36.2% 0.08 perf-stat.i.ipc
15.58 +60.8% 25.06 ± 2% perf-stat.i.metric.K/sec
17.01 ± 2% -29.9% 11.93 perf-stat.overall.cpi
15347 ± 3% -24.8% 11539 ± 3% perf-stat.overall.cycles-between-cache-misses
0.06 ± 2% +42.5% 0.08 perf-stat.overall.ipc
7.096e+09 ± 2% +42.6% 1.012e+10 perf-stat.ps.branch-instructions
51310401 +47.6% 75731432 ± 2% perf-stat.ps.branch-misses
36634137 ± 2% +34.4% 49233432 ± 3% perf-stat.ps.cache-misses
2.378e+08 +42.6% 3.392e+08 ± 2% perf-stat.ps.cache-references
2462472 +60.7% 3956471 ± 2% perf-stat.ps.context-switches
483238 +60.7% 776702 ± 2% perf-stat.ps.cpu-migrations
3.304e+10 ± 2% +43.9% 4.756e+10 perf-stat.ps.instructions
2.059e+12 ± 2% +43.2% 2.949e+12 perf-stat.total.instructions
0.61 ± 54% -66.1% 0.21 ± 34% perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio
0.57 ± 63% -79.0% 0.12 ±137% perf-sched.sch_delay.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
20.28 ±215% -98.8% 0.25 ± 40% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
5.85 ±133% -96.7% 0.19 ± 48% perf-sched.sch_delay.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
0.62 ± 41% -65.0% 0.22 ± 31% perf-sched.sch_delay.avg.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
0.42 ± 34% -52.4% 0.20 ± 30% perf-sched.sch_delay.avg.ms.__cond_resched.down_write.unlink_anon_vmas.free_pgtables.exit_mmap
0.46 ± 42% -54.1% 0.21 ± 45% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
0.50 ± 72% -49.6% 0.25 ± 20% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
0.47 ± 34% -62.9% 0.18 ± 29% perf-sched.sch_delay.avg.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read
0.23 ± 56% -85.7% 0.03 ±154% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.mm_init.dup_mm
248.83 ± 26% -60.2% 99.05 ± 73% perf-sched.sch_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
9.67 ±167% -97.1% 0.28 ± 21% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
83.18 ± 21% -68.9% 25.88 ± 26% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
4.32 ± 91% -84.6% 0.67 ± 24% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio
0.90 ± 74% -75.6% 0.22 ±139% perf-sched.sch_delay.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
1.40 ± 48% -74.1% 0.36 ± 85% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.vmstat_start.seq_read_iter.proc_reg_read_iter
358.00 ±219% -99.8% 0.86 ± 51% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
1.18 ± 47% -65.3% 0.41 ± 52% perf-sched.sch_delay.max.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
1.20 ± 53% -70.4% 0.35 ± 58% perf-sched.sch_delay.max.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat
1.30 ± 34% -68.0% 0.42 ± 73% perf-sched.sch_delay.max.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read
1.03 ± 40% -55.5% 0.46 ± 21% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.__percpu_counter_init_many.mm_init
0.30 ± 65% -88.9% 0.03 ±154% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.mm_init.dup_mm
281.41 ± 38% +143.8% 686.20 ± 35% perf-sched.sch_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
0.72 ± 81% -70.7% 0.21 ± 83% perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
888.12 ±173% -99.4% 5.74 ± 97% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
2.66 ± 7% -15.2% 2.25 ± 5% perf-sched.total_wait_and_delay.average.ms
1.68 ± 7% -18.0% 1.38 ± 6% perf-sched.total_wait_time.average.ms
1092 ± 6% -21.5% 857.36 ± 12% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1160 ± 11% -37.2% 728.54 ± 28% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
342.17 ± 8% -16.0% 287.50 perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
7.50 ± 27% -60.0% 3.00 ± 76% perf-sched.wait_and_delay.count.__cond_resched.rcu_gp_cleanup.rcu_gp_kthread.kthread.ret_from_fork
3012 ± 9% +30.1% 3919 ± 9% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
2811 ± 5% +32.8% 3732 ± 7% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
116.17 ± 21% -37.9% 72.17 ± 26% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
699.17 ± 3% -24.6% 527.50 ± 5% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
249.00 ± 2% -38.5% 153.17 ± 8% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
562.82 ± 38% +174.7% 1546 ± 51% perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
0.95 ± 97% -78.2% 0.21 ± 34% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio
0.57 ± 63% -79.0% 0.12 ±137% perf-sched.wait_time.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
5.85 ±133% -96.7% 0.19 ± 48% perf-sched.wait_time.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
0.62 ± 41% -65.0% 0.22 ± 31% perf-sched.wait_time.avg.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
0.42 ± 34% -52.4% 0.20 ± 30% perf-sched.wait_time.avg.ms.__cond_resched.down_write.unlink_anon_vmas.free_pgtables.exit_mmap
0.35 ± 20% -45.7% 0.19 ± 49% perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
0.46 ± 42% -54.1% 0.21 ± 45% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
0.50 ± 72% -49.6% 0.25 ± 20% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
0.47 ± 34% -62.9% 0.18 ± 29% perf-sched.wait_time.avg.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read
0.23 ± 56% -85.7% 0.03 ±154% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.mm_init.dup_mm
9.74 ±165% -96.2% 0.37 ± 48% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
1411 ± 33% -65.5% 487.33 ±141% perf-sched.wait_time.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
1009 ± 7% -17.6% 831.48 ± 12% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1150 ± 12% -37.7% 717.38 ± 28% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
10.09 ±138% -93.4% 0.67 ± 24% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio
0.90 ± 74% -75.6% 0.22 ±139% perf-sched.wait_time.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
1.40 ± 48% -74.1% 0.36 ± 85% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.vmstat_start.seq_read_iter.proc_reg_read_iter
715.15 ±160% -99.7% 2.21 ±133% perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
1.18 ± 47% -65.3% 0.41 ± 52% perf-sched.wait_time.max.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
1.20 ± 53% -70.4% 0.35 ± 58% perf-sched.wait_time.max.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat
1.30 ± 34% -68.0% 0.42 ± 73% perf-sched.wait_time.max.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read
1.03 ± 40% -55.5% 0.46 ± 21% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.__percpu_counter_init_many.mm_init
0.30 ± 65% -88.9% 0.03 ±154% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.mm_init.dup_mm
281.41 ± 38% +205.6% 859.89 ± 67% perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
167.37 ±222% -99.9% 0.21 ± 83% perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
31.88 -1.9 30.02 perf-profile.calltrace.cycles-pp.__schedule.schedule.futex_do_wait.__futex_wait.futex_wait
31.90 -1.9 30.05 perf-profile.calltrace.cycles-pp.schedule.futex_do_wait.__futex_wait.futex_wait.do_futex
31.96 -1.8 30.14 perf-profile.calltrace.cycles-pp.futex_do_wait.__futex_wait.futex_wait.do_futex.__x64_sys_futex
32.28 -1.7 30.62 perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64
9.30 ± 3% -1.6 7.66 ± 5% perf-profile.calltrace.cycles-pp.pull_rt_task.balance_callbacks.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler
32.29 -1.6 30.65 perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
26.81 -1.6 25.25 perf-profile.calltrace.cycles-pp.find_lock_lowest_rq.push_rt_task.push_rt_tasks.finish_task_switch.__schedule
10.51 ± 3% -1.5 9.00 ± 5% perf-profile.calltrace.cycles-pp.balance_callbacks.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler
13.57 -1.5 12.11 ± 4% perf-profile.calltrace.cycles-pp.cpupri_find_fitness.find_lowest_rq.find_lock_lowest_rq.push_rt_task.push_rt_tasks
13.55 -1.5 12.09 ± 4% perf-profile.calltrace.cycles-pp.__cpupri_find.cpupri_find_fitness.find_lowest_rq.find_lock_lowest_rq.push_rt_task
13.61 -1.4 12.17 ± 4% perf-profile.calltrace.cycles-pp.find_lowest_rq.find_lock_lowest_rq.push_rt_task.push_rt_tasks.finish_task_switch
5.90 ± 3% -1.2 4.68 ± 4% perf-profile.calltrace.cycles-pp.pull_rt_task.balance_rt.__pick_next_task.__schedule.schedule
5.92 ± 3% -1.2 4.70 ± 4% perf-profile.calltrace.cycles-pp.balance_rt.__pick_next_task.__schedule.schedule.futex_do_wait
32.27 -1.2 31.09 perf-profile.calltrace.cycles-pp.push_rt_task.push_rt_tasks.finish_task_switch.__schedule.schedule
38.40 -0.9 37.47 perf-profile.calltrace.cycles-pp.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler.do_syscall_64
38.47 -0.9 37.57 perf-profile.calltrace.cycles-pp._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler.do_syscall_64.entry_SYSCALL_64_after_hwframe
19.36 -0.9 18.47 perf-profile.calltrace.cycles-pp.push_rt_tasks.finish_task_switch.__schedule.schedule.futex_do_wait
40.08 -0.9 39.21 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
40.10 -0.9 39.23 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
19.82 -0.8 18.98 perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule.futex_do_wait.__futex_wait
7.86 ± 2% -0.8 7.05 ± 4% perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.futex_do_wait.__futex_wait
38.68 -0.8 37.89 perf-profile.calltrace.cycles-pp.do_sched_setscheduler.__x64_sys_sched_setscheduler.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_setscheduler
38.68 -0.8 37.89 perf-profile.calltrace.cycles-pp.__x64_sys_sched_setscheduler.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_setscheduler
42.56 -0.8 41.79 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_setscheduler
42.57 -0.8 41.81 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_setscheduler
11.76 -0.8 11.01 ± 2% perf-profile.calltrace.cycles-pp.cpupri_set.enqueue_task_rt.enqueue_task.__sched_setscheduler._sched_setscheduler
41.14 -0.7 40.41 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
41.14 -0.7 40.42 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
42.78 -0.6 42.15 perf-profile.calltrace.cycles-pp.__sched_setscheduler
12.24 -0.5 11.75 ± 2% perf-profile.calltrace.cycles-pp.cpupri_set.dequeue_rt_stack.dequeue_task_rt.__sched_setscheduler._sched_setscheduler
3.53 ± 2% -0.5 3.04 ± 3% perf-profile.calltrace.cycles-pp.cpupri_set.dequeue_rt_stack.dequeue_task_rt.try_to_block_task.__schedule
3.56 ± 2% -0.5 3.08 ± 3% perf-profile.calltrace.cycles-pp.dequeue_rt_stack.dequeue_task_rt.try_to_block_task.__schedule.schedule
12.30 -0.5 11.83 ± 2% perf-profile.calltrace.cycles-pp.dequeue_rt_stack.dequeue_task_rt.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler
3.62 ± 2% -0.4 3.23 ± 3% perf-profile.calltrace.cycles-pp.try_to_block_task.__schedule.schedule.futex_do_wait.__futex_wait
2.19 ± 3% -0.4 1.80 ± 3% perf-profile.calltrace.cycles-pp.cpupri_set.enqueue_task_rt.enqueue_task.activate_task.push_rt_task
3.62 ± 2% -0.4 3.23 ± 3% perf-profile.calltrace.cycles-pp.dequeue_task_rt.try_to_block_task.__schedule.schedule.futex_do_wait
2.27 ± 3% -0.2 2.04 ± 2% perf-profile.calltrace.cycles-pp.enqueue_task_rt.enqueue_task.activate_task.push_rt_task.push_rt_tasks
4.20 -0.2 4.01 perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule.exit_to_user_mode_loop.do_syscall_64
3.90 -0.2 3.72 perf-profile.calltrace.cycles-pp.push_rt_tasks.finish_task_switch.__schedule.schedule.exit_to_user_mode_loop
0.96 +0.0 1.00 perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule_idle.do_idle.cpu_startup_entry
0.96 ± 2% +0.1 1.05 ± 4% perf-profile.calltrace.cycles-pp.enqueue_pushable_task.enqueue_task.activate_task.push_rt_task.push_rt_tasks
1.32 +0.1 1.43 ± 3% perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule_idle.do_idle.cpu_startup_entry
1.56 ± 3% +0.1 1.67 ± 3% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.__sched_setscheduler
1.19 ± 2% +0.1 1.31 ± 3% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.finish_task_switch
1.56 ± 3% +0.1 1.68 ± 3% perf-profile.calltrace.cycles-pp.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.__sched_setscheduler._sched_setscheduler
1.57 ± 3% +0.1 1.69 ± 3% perf-profile.calltrace.cycles-pp.sysvec_call_function_single.asm_sysvec_call_function_single.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler
1.20 ± 2% +0.1 1.32 ± 3% perf-profile.calltrace.cycles-pp.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.finish_task_switch.__schedule
1.00 +0.1 1.13 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.raw_spin_rq_lock_nested.balance_callbacks.__sched_setscheduler
1.20 ± 2% +0.1 1.33 ± 3% perf-profile.calltrace.cycles-pp.sysvec_call_function_single.asm_sysvec_call_function_single.finish_task_switch.__schedule.schedule_idle
1.59 ± 3% +0.1 1.72 ± 3% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler
0.93 ± 4% +0.1 1.06 ± 4% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.raw_spin_rq_lock_nested.task_rq_lock.__sched_setscheduler
1.03 +0.1 1.16 ± 3% perf-profile.calltrace.cycles-pp.raw_spin_rq_lock_nested.balance_callbacks.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler
1.02 +0.1 1.15 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.raw_spin_rq_lock_nested.balance_callbacks.__sched_setscheduler._sched_setscheduler
1.22 ± 2% +0.1 1.34 ± 3% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.finish_task_switch.__schedule.schedule_idle.do_idle
0.96 ± 4% +0.1 1.10 ± 4% perf-profile.calltrace.cycles-pp._raw_spin_lock.raw_spin_rq_lock_nested.task_rq_lock.__sched_setscheduler._sched_setscheduler
0.96 ± 4% +0.1 1.10 ± 4% perf-profile.calltrace.cycles-pp.raw_spin_rq_lock_nested.task_rq_lock.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler
1.65 ± 2% +0.1 1.80 ± 3% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.pv_native_safe_halt
1.66 ± 2% +0.1 1.80 ± 3% perf-profile.calltrace.cycles-pp.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.pv_native_safe_halt.acpi_safe_halt
1.68 ± 2% +0.2 1.84 ± 3% perf-profile.calltrace.cycles-pp.sysvec_call_function_single.asm_sysvec_call_function_single.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry
1.81 ± 2% +0.2 1.98 ± 3% perf-profile.calltrace.cycles-pp.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
1.11 ± 4% +0.2 1.28 ± 4% perf-profile.calltrace.cycles-pp.task_rq_lock.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler
1.75 ± 2% +0.2 1.92 ± 3% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter
1.92 ± 2% +0.2 2.12 ± 3% perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
1.92 ± 2% +0.2 2.13 ± 3% perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
1.92 ± 2% +0.2 2.13 ± 3% perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
1.96 ± 2% +0.2 2.17 ± 3% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
10.09 +0.2 10.31 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
1.95 ± 2% +0.2 2.17 ± 3% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
10.10 +0.2 10.32 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_yield
2.02 ± 2% +0.2 2.26 ± 2% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
1.44 +0.3 1.69 perf-profile.calltrace.cycles-pp._raw_spin_lock.raw_spin_rq_lock_nested.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single
1.44 +0.3 1.69 perf-profile.calltrace.cycles-pp.raw_spin_rq_lock_nested.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single
1.43 +0.3 1.68 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.raw_spin_rq_lock_nested.sched_ttwu_pending.__flush_smp_call_function_queue
0.86 +0.3 1.12 ± 4% perf-profile.calltrace.cycles-pp.dequeue_task_rt.push_rt_task.push_rt_tasks.finish_task_switch.__schedule
4.26 +0.3 4.58 ± 2% perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single
2.68 +0.3 3.01 ± 2% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
2.69 +0.3 3.02 ± 2% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
10.28 +0.4 10.63 perf-profile.calltrace.cycles-pp.__sched_yield
4.90 +0.6 5.53 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
4.90 +0.6 5.53 perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
4.90 +0.6 5.53 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
4.94 +0.6 5.57 perf-profile.calltrace.cycles-pp.common_startup_64
0.00 +0.7 0.72 ± 8% perf-profile.calltrace.cycles-pp.balance_fair.__pick_next_task.__schedule.schedule.futex_do_wait
0.00 +0.7 0.72 ± 8% perf-profile.calltrace.cycles-pp.sched_balance_newidle.balance_fair.__pick_next_task.__schedule.schedule
7.78 ± 3% +0.8 8.55 ± 3% perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.75 ± 8% +5.8 6.56 ± 6% perf-profile.calltrace.cycles-pp._find_first_and_bit.__cpupri_find.cpupri_find_fitness.find_lowest_rq.find_lock_lowest_rq
15.21 ± 3% -2.9 12.34 ± 5% perf-profile.children.cycles-pp.pull_rt_task
32.03 -2.8 29.24 ± 2% perf-profile.children.cycles-pp.cpupri_set
46.02 -1.9 44.16 perf-profile.children.cycles-pp.schedule
31.96 -1.8 30.14 perf-profile.children.cycles-pp.futex_do_wait
32.28 -1.7 30.63 perf-profile.children.cycles-pp.__futex_wait
32.29 -1.6 30.65 perf-profile.children.cycles-pp.futex_wait
28.29 -1.6 26.73 perf-profile.children.cycles-pp.find_lock_lowest_rq
48.72 -1.5 47.18 perf-profile.children.cycles-pp.__schedule
81.25 -1.5 79.73 perf-profile.children.cycles-pp.__sched_setscheduler
15.35 -1.5 13.84 ± 4% perf-profile.children.cycles-pp.cpupri_find_fitness
15.33 -1.5 13.82 ± 5% perf-profile.children.cycles-pp.__cpupri_find
10.53 ± 3% -1.5 9.02 ± 5% perf-profile.children.cycles-pp.balance_callbacks
15.39 -1.5 13.90 ± 4% perf-profile.children.cycles-pp.find_lowest_rq
93.88 -1.3 92.61 perf-profile.children.cycles-pp.do_syscall_64
93.90 -1.2 92.65 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
5.94 ± 3% -1.2 4.72 ± 4% perf-profile.children.cycles-pp.balance_rt
32.36 -1.2 31.21 perf-profile.children.cycles-pp.push_rt_tasks
34.19 -1.1 33.09 perf-profile.children.cycles-pp.push_rt_task
34.74 -1.0 33.76 perf-profile.children.cycles-pp.finish_task_switch
15.91 -0.9 14.98 ± 2% perf-profile.children.cycles-pp.dequeue_rt_stack
38.47 -0.9 37.57 perf-profile.children.cycles-pp._sched_setscheduler
40.08 -0.9 39.21 perf-profile.children.cycles-pp.do_futex
40.10 -0.9 39.23 perf-profile.children.cycles-pp.__x64_sys_futex
38.68 -0.8 37.89 perf-profile.children.cycles-pp.__x64_sys_sched_setscheduler
38.68 -0.8 37.89 perf-profile.children.cycles-pp.do_sched_setscheduler
9.06 ± 2% -0.7 8.40 ± 3% perf-profile.children.cycles-pp.__pick_next_task
3.62 ± 2% -0.4 3.23 ± 3% perf-profile.children.cycles-pp.try_to_block_task
0.22 ± 2% -0.1 0.16 ± 5% perf-profile.children.cycles-pp.sched_tick
0.26 ± 2% -0.1 0.20 ± 3% perf-profile.children.cycles-pp.tick_nohz_handler
0.26 -0.1 0.20 ± 4% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.44 ± 2% -0.1 0.38 ± 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.25 ± 2% -0.1 0.19 ± 2% perf-profile.children.cycles-pp.update_process_times
0.30 ± 2% -0.1 0.24 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.42 ± 2% -0.1 0.36 ± 4% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.29 -0.1 0.23 ± 4% perf-profile.children.cycles-pp.hrtimer_interrupt
0.20 ± 4% -0.0 0.16 ± 12% perf-profile.children.cycles-pp.sched_balance_update_blocked_averages
0.36 -0.0 0.32 perf-profile.children.cycles-pp.irq_work_single
0.09 ± 6% -0.0 0.06 ± 13% perf-profile.children.cycles-pp.sched_balance_rq
0.04 ± 45% +0.0 0.06 ± 7% perf-profile.children.cycles-pp.menu_select
0.06 ± 6% +0.0 0.08 ± 10% perf-profile.children.cycles-pp.plist_add
0.06 ± 8% +0.0 0.08 ± 5% perf-profile.children.cycles-pp.native_irq_return_iret
0.13 ± 3% +0.0 0.16 ± 4% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
0.05 +0.0 0.08 perf-profile.children.cycles-pp.pick_task_rt
0.04 ± 44% +0.0 0.07 ± 9% perf-profile.children.cycles-pp.do_perf_trace_sched_stat_runtime
0.06 ± 8% +0.0 0.09 ± 4% perf-profile.children.cycles-pp._copy_from_user
0.06 ± 7% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.find_task_by_vpid
0.04 ± 44% +0.0 0.07 ± 6% perf-profile.children.cycles-pp.sched_mm_cid_migrate_to
0.10 ± 8% +0.0 0.13 ± 6% perf-profile.children.cycles-pp.prepare_task_switch
0.07 ± 11% +0.0 0.10 ± 8% perf-profile.children.cycles-pp.llist_reverse_order
0.06 ± 9% +0.0 0.10 ± 10% perf-profile.children.cycles-pp.pthread_mutex_lock
0.08 +0.0 0.12 ± 5% perf-profile.children.cycles-pp.sched_clock
0.05 ± 7% +0.0 0.09 perf-profile.children.cycles-pp.__get_user_8
0.06 ± 6% +0.0 0.10 ± 3% perf-profile.children.cycles-pp.rseq_get_rseq_cs
0.07 +0.0 0.11 ± 4% perf-profile.children.cycles-pp.__resched_curr
0.09 ± 4% +0.0 0.13 ± 5% perf-profile.children.cycles-pp.sched_clock_cpu
0.08 ± 6% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.futex_unqueue
0.10 ± 9% +0.0 0.14 ± 6% perf-profile.children.cycles-pp.native_sched_clock
0.08 ± 8% +0.0 0.13 perf-profile.children.cycles-pp.find_get_task
0.09 ± 4% +0.0 0.14 perf-profile.children.cycles-pp.wakeup_preempt
0.00 +0.1 0.05 perf-profile.children.cycles-pp.raw_spin_rq_trylock
0.10 ± 6% +0.1 0.15 ± 4% perf-profile.children.cycles-pp.futex_wake_mark
0.09 ± 5% +0.1 0.14 ± 3% perf-profile.children.cycles-pp.rseq_ip_fixup
0.09 ± 4% +0.1 0.14 ± 4% perf-profile.children.cycles-pp.pthread_setschedparam
0.10 ± 7% +0.1 0.16 ± 6% perf-profile.children.cycles-pp.do_perf_trace_sched_wakeup_template
0.00 +0.1 0.06 ± 9% perf-profile.children.cycles-pp.___perf_sw_event
0.00 +0.1 0.06 ± 6% perf-profile.children.cycles-pp.__x2apic_send_IPI_dest
0.00 +0.1 0.06 ± 6% perf-profile.children.cycles-pp.native_apic_msr_eoi
0.11 ± 6% +0.1 0.17 ± 6% perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.09 ± 4% +0.1 0.15 ± 2% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.11 ± 4% +0.1 0.17 ± 2% perf-profile.children.cycles-pp.stress_mutex_exercise
0.00 +0.1 0.06 perf-profile.children.cycles-pp.sysvec_reschedule_ipi
0.13 ± 5% +0.1 0.19 ± 4% perf-profile.children.cycles-pp.update_rq_clock
0.00 +0.1 0.06 ± 6% perf-profile.children.cycles-pp.rseq_update_cpu_node_id
0.13 ± 5% +0.1 0.20 ± 3% perf-profile.children.cycles-pp.update_rq_clock_task
0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.__smp_call_single_queue
0.10 ± 4% +0.1 0.17 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.00 +0.1 0.07 ± 7% perf-profile.children.cycles-pp.__radix_tree_lookup
0.16 ± 7% +0.1 0.22 ± 7% perf-profile.children.cycles-pp.update_curr_common
0.00 +0.1 0.07 ± 5% perf-profile.children.cycles-pp.__wrgsbase_inactive
0.10 ± 5% +0.1 0.17 ± 2% perf-profile.children.cycles-pp.rt_mutex_adjust_pi
0.13 ± 3% +0.1 0.20 perf-profile.children.cycles-pp.switch_mm_irqs_off
0.14 ± 4% +0.1 0.22 ± 3% perf-profile.children.cycles-pp.__rseq_handle_notify_resume
0.17 ± 5% +0.1 0.26 ± 5% perf-profile.children.cycles-pp.__futex_hash
0.24 ± 3% +0.1 0.34 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.17 ± 3% +0.1 0.27 ± 3% perf-profile.children.cycles-pp.os_xsave
0.15 ± 4% +0.1 0.25 ± 3% perf-profile.children.cycles-pp.__switch_to
0.24 ± 2% +0.1 0.34 ± 3% perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
0.18 ± 2% +0.1 0.29 perf-profile.children.cycles-pp.set_load_weight
0.22 ± 4% +0.1 0.34 ± 5% perf-profile.children.cycles-pp.futex_wait_setup
0.23 ± 2% +0.2 0.38 ± 2% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
1.12 ± 4% +0.2 1.29 ± 4% perf-profile.children.cycles-pp.task_rq_lock
0.28 +0.2 0.46 ± 2% perf-profile.children.cycles-pp.switch_fpu_return
0.36 ± 5% +0.2 0.55 ± 4% perf-profile.children.cycles-pp.futex_hash
1.93 ± 2% +0.2 2.14 ± 3% perf-profile.children.cycles-pp.acpi_idle_do_entry
1.93 ± 2% +0.2 2.14 ± 3% perf-profile.children.cycles-pp.acpi_safe_halt
1.93 ± 2% +0.2 2.14 ± 3% perf-profile.children.cycles-pp.pv_native_safe_halt
1.93 ± 2% +0.2 2.14 ± 3% perf-profile.children.cycles-pp.acpi_idle_enter
1.97 ± 2% +0.2 2.18 ± 3% perf-profile.children.cycles-pp.cpuidle_enter_state
1.97 ± 2% +0.2 2.19 ± 3% perf-profile.children.cycles-pp.cpuidle_enter
2.04 ± 2% +0.2 2.28 ± 3% perf-profile.children.cycles-pp.cpuidle_idle_call
3.62 +0.3 3.88 ± 3% perf-profile.children.cycles-pp.enqueue_pushable_task
0.40 ± 4% +0.3 0.72 ± 8% perf-profile.children.cycles-pp.balance_fair
0.40 ± 4% +0.3 0.72 ± 8% perf-profile.children.cycles-pp.sched_balance_newidle
2.71 +0.3 3.05 ± 2% perf-profile.children.cycles-pp.schedule_idle
5.99 +0.3 6.33 perf-profile.children.cycles-pp.ttwu_do_activate
10.30 +0.4 10.67 perf-profile.children.cycles-pp.__sched_yield
5.26 ± 2% +0.5 5.71 ± 3% perf-profile.children.cycles-pp.sched_ttwu_pending
5.71 ± 2% +0.5 6.18 ± 3% perf-profile.children.cycles-pp.__sysvec_call_function_single
5.82 ± 2% +0.5 6.30 ± 3% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
5.76 ± 2% +0.5 6.25 ± 3% perf-profile.children.cycles-pp.sysvec_call_function_single
5.91 ± 2% +0.6 6.48 ± 3% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
4.93 +0.6 5.56 perf-profile.children.cycles-pp.do_idle
4.90 +0.6 5.53 perf-profile.children.cycles-pp.start_secondary
4.94 +0.6 5.57 perf-profile.children.cycles-pp.common_startup_64
4.94 +0.6 5.57 perf-profile.children.cycles-pp.cpu_startup_entry
7.78 ± 3% +0.8 8.55 ± 3% perf-profile.children.cycles-pp.futex_wake
1.45 ± 8% +6.0 7.48 ± 6% perf-profile.children.cycles-pp._find_first_and_bit
12.44 -7.7 4.74 ± 7% perf-profile.self.cycles-pp.__cpupri_find
14.90 ± 3% -2.9 12.04 ± 5% perf-profile.self.cycles-pp.pull_rt_task
32.02 -2.8 29.23 ± 2% perf-profile.self.cycles-pp.cpupri_set
0.08 -0.0 0.06 ± 7% perf-profile.self.cycles-pp.irq_work_single
0.06 +0.0 0.08 perf-profile.self.cycles-pp.prepare_task_switch
0.06 ± 8% +0.0 0.08 ± 6% perf-profile.self.cycles-pp.update_rq_clock
0.06 ± 9% +0.0 0.08 ± 6% perf-profile.self.cycles-pp.plist_add
0.07 ± 5% +0.0 0.09 ± 5% perf-profile.self.cycles-pp.do_sched_setscheduler
0.06 ± 6% +0.0 0.08 ± 5% perf-profile.self.cycles-pp.futex_do_wait
0.05 +0.0 0.08 ± 4% perf-profile.self.cycles-pp.pick_task_rt
0.06 ± 8% +0.0 0.08 ± 5% perf-profile.self.cycles-pp.native_irq_return_iret
0.05 +0.0 0.08 perf-profile.self.cycles-pp._copy_from_user
0.06 ± 6% +0.0 0.09 ± 6% perf-profile.self.cycles-pp.__pick_next_task
0.06 ± 11% +0.0 0.09 ± 6% perf-profile.self.cycles-pp.sched_ttwu_pending
0.05 ± 7% +0.0 0.08 ± 5% perf-profile.self.cycles-pp.push_rt_task
0.06 ± 7% +0.0 0.10 ± 6% perf-profile.self.cycles-pp.do_perf_trace_sched_wakeup_template
0.13 ± 6% +0.0 0.16 ± 5% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
0.07 ± 11% +0.0 0.10 ± 8% perf-profile.self.cycles-pp.llist_reverse_order
0.05 ± 7% +0.0 0.09 ± 4% perf-profile.self.cycles-pp.__get_user_8
0.08 ± 6% +0.0 0.11 ± 3% perf-profile.self.cycles-pp.finish_task_switch
0.03 ± 70% +0.0 0.07 ± 5% perf-profile.self.cycles-pp.sched_mm_cid_migrate_to
0.06 ± 6% +0.0 0.10 perf-profile.self.cycles-pp.pthread_setschedparam
0.07 ± 6% +0.0 0.11 ± 4% perf-profile.self.cycles-pp.futex_unqueue
0.06 ± 7% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.11 +0.0 0.15 ± 3% perf-profile.self.cycles-pp.pv_native_safe_halt
0.03 ± 70% +0.0 0.07 ± 6% perf-profile.self.cycles-pp.ttwu_queue_wakelist
0.07 +0.0 0.11 ± 6% perf-profile.self.cycles-pp.__resched_curr
0.09 ± 7% +0.0 0.13 ± 2% perf-profile.self.cycles-pp.futex_wake_mark
0.10 ± 9% +0.0 0.14 ± 7% perf-profile.self.cycles-pp.native_sched_clock
0.09 ± 5% +0.0 0.13 ± 3% perf-profile.self.cycles-pp.stress_mutex_exercise
0.00 +0.1 0.05 perf-profile.self.cycles-pp.__sched_yield
0.00 +0.1 0.05 perf-profile.self.cycles-pp.update_curr_common
0.00 +0.1 0.05 ± 7% perf-profile.self.cycles-pp.exit_to_user_mode_loop
0.10 ± 8% +0.1 0.15 ± 5% perf-profile.self.cycles-pp.futex_wake
0.10 ± 4% +0.1 0.16 ± 7% perf-profile.self.cycles-pp.select_task_rq_rt
0.10 ± 3% +0.1 0.16 ± 3% perf-profile.self.cycles-pp.update_rq_clock_task
0.02 ±141% +0.1 0.07 ± 9% perf-profile.self.cycles-pp.pthread_mutex_lock
0.02 ±141% +0.1 0.07 ± 6% perf-profile.self.cycles-pp.switch_fpu_return
0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.__x2apic_send_IPI_dest
0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.native_apic_msr_eoi
0.09 ± 4% +0.1 0.15 ± 2% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.00 +0.1 0.06 perf-profile.self.cycles-pp.__radix_tree_lookup
0.00 +0.1 0.06 perf-profile.self.cycles-pp.__wrgsbase_inactive
0.09 ± 4% +0.1 0.15 ± 3% perf-profile.self.cycles-pp.do_syscall_64
0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.rseq_update_cpu_node_id
0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.select_task_rq
0.19 ± 3% +0.1 0.26 perf-profile.self.cycles-pp.find_lock_lowest_rq
0.13 ± 5% +0.1 0.20 ± 4% perf-profile.self.cycles-pp.__sched_setscheduler
0.11 ± 3% +0.1 0.18 ± 4% perf-profile.self.cycles-pp.switch_mm_irqs_off
0.44 +0.1 0.51 ± 2% perf-profile.self.cycles-pp._raw_spin_lock
0.17 ± 6% +0.1 0.26 ± 4% perf-profile.self.cycles-pp.__futex_hash
0.14 ± 3% +0.1 0.24 perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.14 ± 4% +0.1 0.24 ± 4% perf-profile.self.cycles-pp.__switch_to
0.18 ± 3% +0.1 0.28 ± 4% perf-profile.self.cycles-pp.futex_hash
0.16 ± 3% +0.1 0.26 ± 2% perf-profile.self.cycles-pp.os_xsave
0.17 ± 2% +0.1 0.28 ± 2% perf-profile.self.cycles-pp.set_load_weight
0.23 ± 3% +0.1 0.36 ± 2% perf-profile.self.cycles-pp.__schedule
0.23 +0.2 0.38 ± 3% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
3.54 ± 2% +0.2 3.78 ± 3% perf-profile.self.cycles-pp.enqueue_pushable_task
0.14 ± 3% +0.4 0.52 ± 9% perf-profile.self.cycles-pp.sched_balance_newidle
1.45 ± 2% +0.6 2.03 ± 4% perf-profile.self.cycles-pp.dequeue_task_rt
0.72 ± 5% +2.1 2.80 ± 11% perf-profile.self.cycles-pp.enqueue_task_rt
1.45 ± 8% +6.0 7.47 ± 6% perf-profile.self.cycles-pp._find_first_and_bit
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
As an alternative, with a little bit complicated change, we can separate counts
and masks into 2 vectors inlined in cpupri(counts[] and masks[]), and add two
paddings:
1. Between counts[0] and counts[1], since counts[0] is more frequently
updated than others along with a rt task enqueues an empty runq or
dequeues from a non-overloaded runq.
2. Between the two vectors, since counts[] is RW while masks[] is read
access when it stores pointers.
The alternative approach introduces the complexity of 31+/21- LoC changes,
while it achieves the same performance as the simple, at the same time, struct
cpupri size is reduced from 26 cache lines to 21 cache lines.
The alternative approach is also prepared, can be sent out if you have any interest.
Best Regards
Pan
> -----Original Message-----
> From: Pan Deng <pan.deng@intel.com>
> Sent: Thursday, June 12, 2025 11:12 AM
> To: peterz@infradead.org; mingo@kernel.org
> Cc: linux-kernel@vger.kernel.org; Li, Tianyou <tianyou.li@intel.com>;
> tim.c.chen@linux.intel.com; Deng, Pan <pan.deng@intel.com>
> Subject: [PATCH] sched/rt: optimize cpupri_vec layout
>
> When running a multi-instance ffmpeg transcoding workload which uses rt
> thread in a high core count system, cpupri_vec->count contends with the
> reading of mask in the same cache line in function cpupri_find_fitness and
> cpupri_set.
> This change separates each count and mask into different cache lines by cache
> aligned attribute to avoid the false sharing.
> Tested in a 2 sockets, 240 physical core 480 logical core machine, running
> 60 ffmpeg transcoding instances. With the change, the kernel cycles% is
> reduced from ~20% to ~12%, the fps metric is improved ~11%.
> The side effect of this change is that struct cpupri size is increased from 26
> cache lines to 203 cache lines.
>
> Signed-off-by: Pan Deng <pan.deng@intel.com>
> Signed-off-by: Tianyou Li <tianyou.li@intel.com>
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
> kernel/sched/cpupri.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/cpupri.h b/kernel/sched/cpupri.h index
> d6cba0020064..245b0fa626be 100644
> --- a/kernel/sched/cpupri.h
> +++ b/kernel/sched/cpupri.h
> @@ -9,7 +9,7 @@
>
> struct cpupri_vec {
> atomic_t count;
> - cpumask_var_t mask;
> + cpumask_var_t mask ____cacheline_aligned;
> };
>
> struct cpupri {
> --
> 2.43.5
© 2016 - 2026 Red Hat, Inc.