kernel/sched/cpupri.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
When running a multi-instance ffmpeg transcoding workload which uses rt
thread in a high core count system, cpupri_vec->count contends with the
reading of mask in the same cache line in function cpupri_find_fitness
and cpupri_set.
This change separates each count and mask into different cache lines by
cache aligned attribute to avoid the false sharing.
Tested in a 2 sockets, 240 physical core 480 logical core machine, running
60 ffmpeg transcoding instances. With the change, the kernel cycles% is
reduced from ~20% to ~12%, the fps metric is improved ~11%.
The side effect of this change is that struct cpupri size is increased
from 26 cache lines to 203 cache lines.
Signed-off-by: Pan Deng <pan.deng@intel.com>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
---
kernel/sched/cpupri.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/cpupri.h b/kernel/sched/cpupri.h
index d6cba0020064..245b0fa626be 100644
--- a/kernel/sched/cpupri.h
+++ b/kernel/sched/cpupri.h
@@ -9,7 +9,7 @@
struct cpupri_vec {
atomic_t count;
- cpumask_var_t mask;
+ cpumask_var_t mask ____cacheline_aligned;
};
struct cpupri {
--
2.43.5
Hello, kernel test robot noticed a 67.7% improvement of stress-ng.mutex.ops_per_sec on: commit: cd316a87572309a79102940e1856ee877740156e ("[PATCH] sched/rt: optimize cpupri_vec layout") url: https://github.com/intel-lab-lkp/linux/commits/Pan-Deng/sched-rt-optimize-cpupri_vec-layout/20250612-110857 base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git b01f2d9597250e9c4011cb78d8d46287deaa6a69 patch link: https://lore.kernel.org/all/20250612031148.455046-1-pan.deng@intel.com/ patch subject: [PATCH] sched/rt: optimize cpupri_vec layout testcase: stress-ng config: x86_64-rhel-9.4 compiler: gcc-12 test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E CPU @ 2.4GHz (Sierra Forest) with 256G memory parameters: nr_threads: 100% testtime: 60s test: mutex cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250616/202506161643.ab40fa8e-lkp@intel.com ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-srf-2sp2/mutex/stress-ng/60s commit: b01f2d9597 ("sched/eevdf: Correct the comment in place_entity") cd316a8757 ("sched/rt: optimize cpupri_vec layout") b01f2d9597250e9c cd316a87572309a79102940e185 ---------------- --------------------------- %stddev %change %stddev \ | \ 22409567 +52.5% 34179472 ± 3% cpuidle..usage 21410 ± 30% +26.2% 27010 ± 16% numa-vmstat.node0.nr_slab_reclaimable 0.07 ± 2% +0.0 0.09 ± 2% mpstat.cpu.all.soft% 1.06 +0.6 1.63 ± 3% mpstat.cpu.all.usr% 85656 ± 30% +26.1% 108025 ± 16% numa-meminfo.node0.KReclaimable 85656 ± 30% +26.1% 108025 ± 16% numa-meminfo.node0.SReclaimable 2398650 +60.1% 3839452 ± 2% vmstat.system.cs 1650319 +44.1% 2378651 vmstat.system.in 1821 ± 7% +28.5% 2340 ± 14% perf-c2c.DRAM.local 17138 ± 14% +86.2% 31915 ± 17% perf-c2c.DRAM.remote 91166 ± 16% +134.9% 214147 ± 19% perf-c2c.HITM.local 13399 ± 13% +104.1% 27347 ± 16% perf-c2c.HITM.remote 104565 ± 15% +131.0% 241494 ± 19% perf-c2c.HITM.total 125201 ± 2% -39.4% 75820 ± 2% stress-ng.mutex.nanosecs_per_mutex 85791341 +67.7% 1.438e+08 stress-ng.mutex.ops 1429837 +67.7% 2397156 stress-ng.mutex.ops_per_sec 68606706 +63.5% 1.122e+08 ± 2% stress-ng.time.involuntary_context_switches 9345 -1.3% 9226 stress-ng.time.system_time 99.39 +61.2% 160.24 stress-ng.time.user_time 56563097 +57.6% 89151856 ± 2% stress-ng.time.voluntary_context_switches 7.208e+09 ± 2% +42.9% 1.03e+10 perf-stat.i.branch-instructions 52257508 +47.5% 77078460 ± 2% perf-stat.i.branch-misses 37265287 ± 2% +34.4% 50098262 ± 3% perf-stat.i.cache-misses 2.416e+08 +42.7% 3.449e+08 ± 2% perf-stat.i.cache-references 2500366 +60.9% 4022250 ± 2% perf-stat.i.context-switches 20.66 -29.7% 14.53 perf-stat.i.cpi 490637 +60.9% 789567 ± 2% perf-stat.i.cpu-migrations 15477 ± 4% -25.1% 11585 ± 3% perf-stat.i.cycles-between-cache-misses 3.356e+10 ± 2% +44.2% 4.838e+10 perf-stat.i.instructions 0.06 ± 9% +36.2% 0.08 perf-stat.i.ipc 15.58 +60.8% 25.06 ± 2% perf-stat.i.metric.K/sec 17.01 ± 2% -29.9% 11.93 perf-stat.overall.cpi 15347 ± 3% -24.8% 11539 ± 3% perf-stat.overall.cycles-between-cache-misses 0.06 ± 2% +42.5% 0.08 perf-stat.overall.ipc 7.096e+09 ± 2% +42.6% 1.012e+10 perf-stat.ps.branch-instructions 51310401 +47.6% 75731432 ± 2% perf-stat.ps.branch-misses 36634137 ± 2% +34.4% 49233432 ± 3% perf-stat.ps.cache-misses 2.378e+08 +42.6% 3.392e+08 ± 2% perf-stat.ps.cache-references 2462472 +60.7% 3956471 ± 2% perf-stat.ps.context-switches 483238 +60.7% 776702 ± 2% perf-stat.ps.cpu-migrations 3.304e+10 ± 2% +43.9% 4.756e+10 perf-stat.ps.instructions 2.059e+12 ± 2% +43.2% 2.949e+12 perf-stat.total.instructions 0.61 ± 54% -66.1% 0.21 ± 34% perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio 0.57 ± 63% -79.0% 0.12 ±137% perf-sched.sch_delay.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault 20.28 ±215% -98.8% 0.25 ± 40% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork 5.85 ±133% -96.7% 0.19 ± 48% perf-sched.sch_delay.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range 0.62 ± 41% -65.0% 0.22 ± 31% perf-sched.sch_delay.avg.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap 0.42 ± 34% -52.4% 0.20 ± 30% perf-sched.sch_delay.avg.ms.__cond_resched.down_write.unlink_anon_vmas.free_pgtables.exit_mmap 0.46 ± 42% -54.1% 0.21 ± 45% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma 0.50 ± 72% -49.6% 0.25 ± 20% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region 0.47 ± 34% -62.9% 0.18 ± 29% perf-sched.sch_delay.avg.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read 0.23 ± 56% -85.7% 0.03 ±154% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.mm_init.dup_mm 248.83 ± 26% -60.2% 99.05 ± 73% perf-sched.sch_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read 9.67 ±167% -97.1% 0.28 ± 21% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown] 83.18 ± 21% -68.9% 25.88 ± 26% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 4.32 ± 91% -84.6% 0.67 ± 24% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio 0.90 ± 74% -75.6% 0.22 ±139% perf-sched.sch_delay.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault 1.40 ± 48% -74.1% 0.36 ± 85% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.vmstat_start.seq_read_iter.proc_reg_read_iter 358.00 ±219% -99.8% 0.86 ± 51% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork 1.18 ± 47% -65.3% 0.41 ± 52% perf-sched.sch_delay.max.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap 1.20 ± 53% -70.4% 0.35 ± 58% perf-sched.sch_delay.max.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat 1.30 ± 34% -68.0% 0.42 ± 73% perf-sched.sch_delay.max.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read 1.03 ± 40% -55.5% 0.46 ± 21% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.__percpu_counter_init_many.mm_init 0.30 ± 65% -88.9% 0.03 ±154% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.mm_init.dup_mm 281.41 ± 38% +143.8% 686.20 ± 35% perf-sched.sch_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin 0.72 ± 81% -70.7% 0.21 ± 83% perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 888.12 ±173% -99.4% 5.74 ± 97% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown] 2.66 ± 7% -15.2% 2.25 ± 5% perf-sched.total_wait_and_delay.average.ms 1.68 ± 7% -18.0% 1.38 ± 6% perf-sched.total_wait_time.average.ms 1092 ± 6% -21.5% 857.36 ± 12% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 1160 ± 11% -37.2% 728.54 ± 28% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 342.17 ± 8% -16.0% 287.50 perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity 7.50 ± 27% -60.0% 3.00 ± 76% perf-sched.wait_and_delay.count.__cond_resched.rcu_gp_cleanup.rcu_gp_kthread.kthread.ret_from_fork 3012 ± 9% +30.1% 3919 ± 9% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown] 2811 ± 5% +32.8% 3732 ± 7% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown] 116.17 ± 21% -37.9% 72.17 ± 26% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 699.17 ± 3% -24.6% 527.50 ± 5% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 249.00 ± 2% -38.5% 153.17 ± 8% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 562.82 ± 38% +174.7% 1546 ± 51% perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin 0.95 ± 97% -78.2% 0.21 ± 34% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio 0.57 ± 63% -79.0% 0.12 ±137% perf-sched.wait_time.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault 5.85 ±133% -96.7% 0.19 ± 48% perf-sched.wait_time.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range 0.62 ± 41% -65.0% 0.22 ± 31% perf-sched.wait_time.avg.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap 0.42 ± 34% -52.4% 0.20 ± 30% perf-sched.wait_time.avg.ms.__cond_resched.down_write.unlink_anon_vmas.free_pgtables.exit_mmap 0.35 ± 20% -45.7% 0.19 ± 49% perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk.path_openat 0.46 ± 42% -54.1% 0.21 ± 45% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma 0.50 ± 72% -49.6% 0.25 ± 20% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region 0.47 ± 34% -62.9% 0.18 ± 29% perf-sched.wait_time.avg.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read 0.23 ± 56% -85.7% 0.03 ±154% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.mm_init.dup_mm 9.74 ±165% -96.2% 0.37 ± 48% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown] 1411 ± 33% -65.5% 487.33 ±141% perf-sched.wait_time.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork 1009 ± 7% -17.6% 831.48 ± 12% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 1150 ± 12% -37.7% 717.38 ± 28% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 10.09 ±138% -93.4% 0.67 ± 24% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio 0.90 ± 74% -75.6% 0.22 ±139% perf-sched.wait_time.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault 1.40 ± 48% -74.1% 0.36 ± 85% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.vmstat_start.seq_read_iter.proc_reg_read_iter 715.15 ±160% -99.7% 2.21 ±133% perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork 1.18 ± 47% -65.3% 0.41 ± 52% perf-sched.wait_time.max.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap 1.20 ± 53% -70.4% 0.35 ± 58% perf-sched.wait_time.max.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat 1.30 ± 34% -68.0% 0.42 ± 73% perf-sched.wait_time.max.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read 1.03 ± 40% -55.5% 0.46 ± 21% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.__percpu_counter_init_many.mm_init 0.30 ± 65% -88.9% 0.03 ±154% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.mm_init.dup_mm 281.41 ± 38% +205.6% 859.89 ± 67% perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin 167.37 ±222% -99.9% 0.21 ± 83% perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 31.88 -1.9 30.02 perf-profile.calltrace.cycles-pp.__schedule.schedule.futex_do_wait.__futex_wait.futex_wait 31.90 -1.9 30.05 perf-profile.calltrace.cycles-pp.schedule.futex_do_wait.__futex_wait.futex_wait.do_futex 31.96 -1.8 30.14 perf-profile.calltrace.cycles-pp.futex_do_wait.__futex_wait.futex_wait.do_futex.__x64_sys_futex 32.28 -1.7 30.62 perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64 9.30 ± 3% -1.6 7.66 ± 5% perf-profile.calltrace.cycles-pp.pull_rt_task.balance_callbacks.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler 32.29 -1.6 30.65 perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe 26.81 -1.6 25.25 perf-profile.calltrace.cycles-pp.find_lock_lowest_rq.push_rt_task.push_rt_tasks.finish_task_switch.__schedule 10.51 ± 3% -1.5 9.00 ± 5% perf-profile.calltrace.cycles-pp.balance_callbacks.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler 13.57 -1.5 12.11 ± 4% perf-profile.calltrace.cycles-pp.cpupri_find_fitness.find_lowest_rq.find_lock_lowest_rq.push_rt_task.push_rt_tasks 13.55 -1.5 12.09 ± 4% perf-profile.calltrace.cycles-pp.__cpupri_find.cpupri_find_fitness.find_lowest_rq.find_lock_lowest_rq.push_rt_task 13.61 -1.4 12.17 ± 4% perf-profile.calltrace.cycles-pp.find_lowest_rq.find_lock_lowest_rq.push_rt_task.push_rt_tasks.finish_task_switch 5.90 ± 3% -1.2 4.68 ± 4% perf-profile.calltrace.cycles-pp.pull_rt_task.balance_rt.__pick_next_task.__schedule.schedule 5.92 ± 3% -1.2 4.70 ± 4% perf-profile.calltrace.cycles-pp.balance_rt.__pick_next_task.__schedule.schedule.futex_do_wait 32.27 -1.2 31.09 perf-profile.calltrace.cycles-pp.push_rt_task.push_rt_tasks.finish_task_switch.__schedule.schedule 38.40 -0.9 37.47 perf-profile.calltrace.cycles-pp.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler.do_syscall_64 38.47 -0.9 37.57 perf-profile.calltrace.cycles-pp._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler.do_syscall_64.entry_SYSCALL_64_after_hwframe 19.36 -0.9 18.47 perf-profile.calltrace.cycles-pp.push_rt_tasks.finish_task_switch.__schedule.schedule.futex_do_wait 40.08 -0.9 39.21 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe 40.10 -0.9 39.23 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe 19.82 -0.8 18.98 perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule.futex_do_wait.__futex_wait 7.86 ± 2% -0.8 7.05 ± 4% perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.futex_do_wait.__futex_wait 38.68 -0.8 37.89 perf-profile.calltrace.cycles-pp.do_sched_setscheduler.__x64_sys_sched_setscheduler.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_setscheduler 38.68 -0.8 37.89 perf-profile.calltrace.cycles-pp.__x64_sys_sched_setscheduler.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_setscheduler 42.56 -0.8 41.79 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_setscheduler 42.57 -0.8 41.81 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_setscheduler 11.76 -0.8 11.01 ± 2% perf-profile.calltrace.cycles-pp.cpupri_set.enqueue_task_rt.enqueue_task.__sched_setscheduler._sched_setscheduler 41.14 -0.7 40.41 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe 41.14 -0.7 40.42 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe 42.78 -0.6 42.15 perf-profile.calltrace.cycles-pp.__sched_setscheduler 12.24 -0.5 11.75 ± 2% perf-profile.calltrace.cycles-pp.cpupri_set.dequeue_rt_stack.dequeue_task_rt.__sched_setscheduler._sched_setscheduler 3.53 ± 2% -0.5 3.04 ± 3% perf-profile.calltrace.cycles-pp.cpupri_set.dequeue_rt_stack.dequeue_task_rt.try_to_block_task.__schedule 3.56 ± 2% -0.5 3.08 ± 3% perf-profile.calltrace.cycles-pp.dequeue_rt_stack.dequeue_task_rt.try_to_block_task.__schedule.schedule 12.30 -0.5 11.83 ± 2% perf-profile.calltrace.cycles-pp.dequeue_rt_stack.dequeue_task_rt.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler 3.62 ± 2% -0.4 3.23 ± 3% perf-profile.calltrace.cycles-pp.try_to_block_task.__schedule.schedule.futex_do_wait.__futex_wait 2.19 ± 3% -0.4 1.80 ± 3% perf-profile.calltrace.cycles-pp.cpupri_set.enqueue_task_rt.enqueue_task.activate_task.push_rt_task 3.62 ± 2% -0.4 3.23 ± 3% perf-profile.calltrace.cycles-pp.dequeue_task_rt.try_to_block_task.__schedule.schedule.futex_do_wait 2.27 ± 3% -0.2 2.04 ± 2% perf-profile.calltrace.cycles-pp.enqueue_task_rt.enqueue_task.activate_task.push_rt_task.push_rt_tasks 4.20 -0.2 4.01 perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule.exit_to_user_mode_loop.do_syscall_64 3.90 -0.2 3.72 perf-profile.calltrace.cycles-pp.push_rt_tasks.finish_task_switch.__schedule.schedule.exit_to_user_mode_loop 0.96 +0.0 1.00 perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule_idle.do_idle.cpu_startup_entry 0.96 ± 2% +0.1 1.05 ± 4% perf-profile.calltrace.cycles-pp.enqueue_pushable_task.enqueue_task.activate_task.push_rt_task.push_rt_tasks 1.32 +0.1 1.43 ± 3% perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule_idle.do_idle.cpu_startup_entry 1.56 ± 3% +0.1 1.67 ± 3% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.__sched_setscheduler 1.19 ± 2% +0.1 1.31 ± 3% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.finish_task_switch 1.56 ± 3% +0.1 1.68 ± 3% perf-profile.calltrace.cycles-pp.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.__sched_setscheduler._sched_setscheduler 1.57 ± 3% +0.1 1.69 ± 3% perf-profile.calltrace.cycles-pp.sysvec_call_function_single.asm_sysvec_call_function_single.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler 1.20 ± 2% +0.1 1.32 ± 3% perf-profile.calltrace.cycles-pp.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.finish_task_switch.__schedule 1.00 +0.1 1.13 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.raw_spin_rq_lock_nested.balance_callbacks.__sched_setscheduler 1.20 ± 2% +0.1 1.33 ± 3% perf-profile.calltrace.cycles-pp.sysvec_call_function_single.asm_sysvec_call_function_single.finish_task_switch.__schedule.schedule_idle 1.59 ± 3% +0.1 1.72 ± 3% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler 0.93 ± 4% +0.1 1.06 ± 4% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.raw_spin_rq_lock_nested.task_rq_lock.__sched_setscheduler 1.03 +0.1 1.16 ± 3% perf-profile.calltrace.cycles-pp.raw_spin_rq_lock_nested.balance_callbacks.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler 1.02 +0.1 1.15 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.raw_spin_rq_lock_nested.balance_callbacks.__sched_setscheduler._sched_setscheduler 1.22 ± 2% +0.1 1.34 ± 3% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.finish_task_switch.__schedule.schedule_idle.do_idle 0.96 ± 4% +0.1 1.10 ± 4% perf-profile.calltrace.cycles-pp._raw_spin_lock.raw_spin_rq_lock_nested.task_rq_lock.__sched_setscheduler._sched_setscheduler 0.96 ± 4% +0.1 1.10 ± 4% perf-profile.calltrace.cycles-pp.raw_spin_rq_lock_nested.task_rq_lock.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler 1.65 ± 2% +0.1 1.80 ± 3% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.pv_native_safe_halt 1.66 ± 2% +0.1 1.80 ± 3% perf-profile.calltrace.cycles-pp.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.pv_native_safe_halt.acpi_safe_halt 1.68 ± 2% +0.2 1.84 ± 3% perf-profile.calltrace.cycles-pp.sysvec_call_function_single.asm_sysvec_call_function_single.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry 1.81 ± 2% +0.2 1.98 ± 3% perf-profile.calltrace.cycles-pp.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state 1.11 ± 4% +0.2 1.28 ± 4% perf-profile.calltrace.cycles-pp.task_rq_lock.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler 1.75 ± 2% +0.2 1.92 ± 3% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter 1.92 ± 2% +0.2 2.12 ± 3% perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter 1.92 ± 2% +0.2 2.13 ± 3% perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call 1.92 ± 2% +0.2 2.13 ± 3% perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 1.96 ± 2% +0.2 2.17 ± 3% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary 10.09 +0.2 10.31 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield 1.95 ± 2% +0.2 2.17 ± 3% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry 10.10 +0.2 10.32 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_yield 2.02 ± 2% +0.2 2.26 ± 2% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64 1.44 +0.3 1.69 perf-profile.calltrace.cycles-pp._raw_spin_lock.raw_spin_rq_lock_nested.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single 1.44 +0.3 1.69 perf-profile.calltrace.cycles-pp.raw_spin_rq_lock_nested.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single 1.43 +0.3 1.68 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.raw_spin_rq_lock_nested.sched_ttwu_pending.__flush_smp_call_function_queue 0.86 +0.3 1.12 ± 4% perf-profile.calltrace.cycles-pp.dequeue_task_rt.push_rt_task.push_rt_tasks.finish_task_switch.__schedule 4.26 +0.3 4.58 ± 2% perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single 2.68 +0.3 3.01 ± 2% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary 2.69 +0.3 3.02 ± 2% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64 10.28 +0.4 10.63 perf-profile.calltrace.cycles-pp.__sched_yield 4.90 +0.6 5.53 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64 4.90 +0.6 5.53 perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64 4.90 +0.6 5.53 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64 4.94 +0.6 5.57 perf-profile.calltrace.cycles-pp.common_startup_64 0.00 +0.7 0.72 ± 8% perf-profile.calltrace.cycles-pp.balance_fair.__pick_next_task.__schedule.schedule.futex_do_wait 0.00 +0.7 0.72 ± 8% perf-profile.calltrace.cycles-pp.sched_balance_newidle.balance_fair.__pick_next_task.__schedule.schedule 7.78 ± 3% +0.8 8.55 ± 3% perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.75 ± 8% +5.8 6.56 ± 6% perf-profile.calltrace.cycles-pp._find_first_and_bit.__cpupri_find.cpupri_find_fitness.find_lowest_rq.find_lock_lowest_rq 15.21 ± 3% -2.9 12.34 ± 5% perf-profile.children.cycles-pp.pull_rt_task 32.03 -2.8 29.24 ± 2% perf-profile.children.cycles-pp.cpupri_set 46.02 -1.9 44.16 perf-profile.children.cycles-pp.schedule 31.96 -1.8 30.14 perf-profile.children.cycles-pp.futex_do_wait 32.28 -1.7 30.63 perf-profile.children.cycles-pp.__futex_wait 32.29 -1.6 30.65 perf-profile.children.cycles-pp.futex_wait 28.29 -1.6 26.73 perf-profile.children.cycles-pp.find_lock_lowest_rq 48.72 -1.5 47.18 perf-profile.children.cycles-pp.__schedule 81.25 -1.5 79.73 perf-profile.children.cycles-pp.__sched_setscheduler 15.35 -1.5 13.84 ± 4% perf-profile.children.cycles-pp.cpupri_find_fitness 15.33 -1.5 13.82 ± 5% perf-profile.children.cycles-pp.__cpupri_find 10.53 ± 3% -1.5 9.02 ± 5% perf-profile.children.cycles-pp.balance_callbacks 15.39 -1.5 13.90 ± 4% perf-profile.children.cycles-pp.find_lowest_rq 93.88 -1.3 92.61 perf-profile.children.cycles-pp.do_syscall_64 93.90 -1.2 92.65 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 5.94 ± 3% -1.2 4.72 ± 4% perf-profile.children.cycles-pp.balance_rt 32.36 -1.2 31.21 perf-profile.children.cycles-pp.push_rt_tasks 34.19 -1.1 33.09 perf-profile.children.cycles-pp.push_rt_task 34.74 -1.0 33.76 perf-profile.children.cycles-pp.finish_task_switch 15.91 -0.9 14.98 ± 2% perf-profile.children.cycles-pp.dequeue_rt_stack 38.47 -0.9 37.57 perf-profile.children.cycles-pp._sched_setscheduler 40.08 -0.9 39.21 perf-profile.children.cycles-pp.do_futex 40.10 -0.9 39.23 perf-profile.children.cycles-pp.__x64_sys_futex 38.68 -0.8 37.89 perf-profile.children.cycles-pp.__x64_sys_sched_setscheduler 38.68 -0.8 37.89 perf-profile.children.cycles-pp.do_sched_setscheduler 9.06 ± 2% -0.7 8.40 ± 3% perf-profile.children.cycles-pp.__pick_next_task 3.62 ± 2% -0.4 3.23 ± 3% perf-profile.children.cycles-pp.try_to_block_task 0.22 ± 2% -0.1 0.16 ± 5% perf-profile.children.cycles-pp.sched_tick 0.26 ± 2% -0.1 0.20 ± 3% perf-profile.children.cycles-pp.tick_nohz_handler 0.26 -0.1 0.20 ± 4% perf-profile.children.cycles-pp.__hrtimer_run_queues 0.44 ± 2% -0.1 0.38 ± 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.25 ± 2% -0.1 0.19 ± 2% perf-profile.children.cycles-pp.update_process_times 0.30 ± 2% -0.1 0.24 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.42 ± 2% -0.1 0.36 ± 4% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 0.29 -0.1 0.23 ± 4% perf-profile.children.cycles-pp.hrtimer_interrupt 0.20 ± 4% -0.0 0.16 ± 12% perf-profile.children.cycles-pp.sched_balance_update_blocked_averages 0.36 -0.0 0.32 perf-profile.children.cycles-pp.irq_work_single 0.09 ± 6% -0.0 0.06 ± 13% perf-profile.children.cycles-pp.sched_balance_rq 0.04 ± 45% +0.0 0.06 ± 7% perf-profile.children.cycles-pp.menu_select 0.06 ± 6% +0.0 0.08 ± 10% perf-profile.children.cycles-pp.plist_add 0.06 ± 8% +0.0 0.08 ± 5% perf-profile.children.cycles-pp.native_irq_return_iret 0.13 ± 3% +0.0 0.16 ± 4% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode 0.05 +0.0 0.08 perf-profile.children.cycles-pp.pick_task_rt 0.04 ± 44% +0.0 0.07 ± 9% perf-profile.children.cycles-pp.do_perf_trace_sched_stat_runtime 0.06 ± 8% +0.0 0.09 ± 4% perf-profile.children.cycles-pp._copy_from_user 0.06 ± 7% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.find_task_by_vpid 0.04 ± 44% +0.0 0.07 ± 6% perf-profile.children.cycles-pp.sched_mm_cid_migrate_to 0.10 ± 8% +0.0 0.13 ± 6% perf-profile.children.cycles-pp.prepare_task_switch 0.07 ± 11% +0.0 0.10 ± 8% perf-profile.children.cycles-pp.llist_reverse_order 0.06 ± 9% +0.0 0.10 ± 10% perf-profile.children.cycles-pp.pthread_mutex_lock 0.08 +0.0 0.12 ± 5% perf-profile.children.cycles-pp.sched_clock 0.05 ± 7% +0.0 0.09 perf-profile.children.cycles-pp.__get_user_8 0.06 ± 6% +0.0 0.10 ± 3% perf-profile.children.cycles-pp.rseq_get_rseq_cs 0.07 +0.0 0.11 ± 4% perf-profile.children.cycles-pp.__resched_curr 0.09 ± 4% +0.0 0.13 ± 5% perf-profile.children.cycles-pp.sched_clock_cpu 0.08 ± 6% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.futex_unqueue 0.10 ± 9% +0.0 0.14 ± 6% perf-profile.children.cycles-pp.native_sched_clock 0.08 ± 8% +0.0 0.13 perf-profile.children.cycles-pp.find_get_task 0.09 ± 4% +0.0 0.14 perf-profile.children.cycles-pp.wakeup_preempt 0.00 +0.1 0.05 perf-profile.children.cycles-pp.raw_spin_rq_trylock 0.10 ± 6% +0.1 0.15 ± 4% perf-profile.children.cycles-pp.futex_wake_mark 0.09 ± 5% +0.1 0.14 ± 3% perf-profile.children.cycles-pp.rseq_ip_fixup 0.09 ± 4% +0.1 0.14 ± 4% perf-profile.children.cycles-pp.pthread_setschedparam 0.10 ± 7% +0.1 0.16 ± 6% perf-profile.children.cycles-pp.do_perf_trace_sched_wakeup_template 0.00 +0.1 0.06 ± 9% perf-profile.children.cycles-pp.___perf_sw_event 0.00 +0.1 0.06 ± 6% perf-profile.children.cycles-pp.__x2apic_send_IPI_dest 0.00 +0.1 0.06 ± 6% perf-profile.children.cycles-pp.native_apic_msr_eoi 0.11 ± 6% +0.1 0.17 ± 6% perf-profile.children.cycles-pp.ttwu_queue_wakelist 0.09 ± 4% +0.1 0.15 ± 2% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 0.11 ± 4% +0.1 0.17 ± 2% perf-profile.children.cycles-pp.stress_mutex_exercise 0.00 +0.1 0.06 perf-profile.children.cycles-pp.sysvec_reschedule_ipi 0.13 ± 5% +0.1 0.19 ± 4% perf-profile.children.cycles-pp.update_rq_clock 0.00 +0.1 0.06 ± 6% perf-profile.children.cycles-pp.rseq_update_cpu_node_id 0.13 ± 5% +0.1 0.20 ± 3% perf-profile.children.cycles-pp.update_rq_clock_task 0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.__smp_call_single_queue 0.10 ± 4% +0.1 0.17 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64 0.00 +0.1 0.07 ± 7% perf-profile.children.cycles-pp.__radix_tree_lookup 0.16 ± 7% +0.1 0.22 ± 7% perf-profile.children.cycles-pp.update_curr_common 0.00 +0.1 0.07 ± 5% perf-profile.children.cycles-pp.__wrgsbase_inactive 0.10 ± 5% +0.1 0.17 ± 2% perf-profile.children.cycles-pp.rt_mutex_adjust_pi 0.13 ± 3% +0.1 0.20 perf-profile.children.cycles-pp.switch_mm_irqs_off 0.14 ± 4% +0.1 0.22 ± 3% perf-profile.children.cycles-pp.__rseq_handle_notify_resume 0.17 ± 5% +0.1 0.26 ± 5% perf-profile.children.cycles-pp.__futex_hash 0.24 ± 3% +0.1 0.34 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 0.17 ± 3% +0.1 0.27 ± 3% perf-profile.children.cycles-pp.os_xsave 0.15 ± 4% +0.1 0.25 ± 3% perf-profile.children.cycles-pp.__switch_to 0.24 ± 2% +0.1 0.34 ± 3% perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi 0.18 ± 2% +0.1 0.29 perf-profile.children.cycles-pp.set_load_weight 0.22 ± 4% +0.1 0.34 ± 5% perf-profile.children.cycles-pp.futex_wait_setup 0.23 ± 2% +0.2 0.38 ± 2% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate 1.12 ± 4% +0.2 1.29 ± 4% perf-profile.children.cycles-pp.task_rq_lock 0.28 +0.2 0.46 ± 2% perf-profile.children.cycles-pp.switch_fpu_return 0.36 ± 5% +0.2 0.55 ± 4% perf-profile.children.cycles-pp.futex_hash 1.93 ± 2% +0.2 2.14 ± 3% perf-profile.children.cycles-pp.acpi_idle_do_entry 1.93 ± 2% +0.2 2.14 ± 3% perf-profile.children.cycles-pp.acpi_safe_halt 1.93 ± 2% +0.2 2.14 ± 3% perf-profile.children.cycles-pp.pv_native_safe_halt 1.93 ± 2% +0.2 2.14 ± 3% perf-profile.children.cycles-pp.acpi_idle_enter 1.97 ± 2% +0.2 2.18 ± 3% perf-profile.children.cycles-pp.cpuidle_enter_state 1.97 ± 2% +0.2 2.19 ± 3% perf-profile.children.cycles-pp.cpuidle_enter 2.04 ± 2% +0.2 2.28 ± 3% perf-profile.children.cycles-pp.cpuidle_idle_call 3.62 +0.3 3.88 ± 3% perf-profile.children.cycles-pp.enqueue_pushable_task 0.40 ± 4% +0.3 0.72 ± 8% perf-profile.children.cycles-pp.balance_fair 0.40 ± 4% +0.3 0.72 ± 8% perf-profile.children.cycles-pp.sched_balance_newidle 2.71 +0.3 3.05 ± 2% perf-profile.children.cycles-pp.schedule_idle 5.99 +0.3 6.33 perf-profile.children.cycles-pp.ttwu_do_activate 10.30 +0.4 10.67 perf-profile.children.cycles-pp.__sched_yield 5.26 ± 2% +0.5 5.71 ± 3% perf-profile.children.cycles-pp.sched_ttwu_pending 5.71 ± 2% +0.5 6.18 ± 3% perf-profile.children.cycles-pp.__sysvec_call_function_single 5.82 ± 2% +0.5 6.30 ± 3% perf-profile.children.cycles-pp.__flush_smp_call_function_queue 5.76 ± 2% +0.5 6.25 ± 3% perf-profile.children.cycles-pp.sysvec_call_function_single 5.91 ± 2% +0.6 6.48 ± 3% perf-profile.children.cycles-pp.asm_sysvec_call_function_single 4.93 +0.6 5.56 perf-profile.children.cycles-pp.do_idle 4.90 +0.6 5.53 perf-profile.children.cycles-pp.start_secondary 4.94 +0.6 5.57 perf-profile.children.cycles-pp.common_startup_64 4.94 +0.6 5.57 perf-profile.children.cycles-pp.cpu_startup_entry 7.78 ± 3% +0.8 8.55 ± 3% perf-profile.children.cycles-pp.futex_wake 1.45 ± 8% +6.0 7.48 ± 6% perf-profile.children.cycles-pp._find_first_and_bit 12.44 -7.7 4.74 ± 7% perf-profile.self.cycles-pp.__cpupri_find 14.90 ± 3% -2.9 12.04 ± 5% perf-profile.self.cycles-pp.pull_rt_task 32.02 -2.8 29.23 ± 2% perf-profile.self.cycles-pp.cpupri_set 0.08 -0.0 0.06 ± 7% perf-profile.self.cycles-pp.irq_work_single 0.06 +0.0 0.08 perf-profile.self.cycles-pp.prepare_task_switch 0.06 ± 8% +0.0 0.08 ± 6% perf-profile.self.cycles-pp.update_rq_clock 0.06 ± 9% +0.0 0.08 ± 6% perf-profile.self.cycles-pp.plist_add 0.07 ± 5% +0.0 0.09 ± 5% perf-profile.self.cycles-pp.do_sched_setscheduler 0.06 ± 6% +0.0 0.08 ± 5% perf-profile.self.cycles-pp.futex_do_wait 0.05 +0.0 0.08 ± 4% perf-profile.self.cycles-pp.pick_task_rt 0.06 ± 8% +0.0 0.08 ± 5% perf-profile.self.cycles-pp.native_irq_return_iret 0.05 +0.0 0.08 perf-profile.self.cycles-pp._copy_from_user 0.06 ± 6% +0.0 0.09 ± 6% perf-profile.self.cycles-pp.__pick_next_task 0.06 ± 11% +0.0 0.09 ± 6% perf-profile.self.cycles-pp.sched_ttwu_pending 0.05 ± 7% +0.0 0.08 ± 5% perf-profile.self.cycles-pp.push_rt_task 0.06 ± 7% +0.0 0.10 ± 6% perf-profile.self.cycles-pp.do_perf_trace_sched_wakeup_template 0.13 ± 6% +0.0 0.16 ± 5% perf-profile.self.cycles-pp.__flush_smp_call_function_queue 0.07 ± 11% +0.0 0.10 ± 8% perf-profile.self.cycles-pp.llist_reverse_order 0.05 ± 7% +0.0 0.09 ± 4% perf-profile.self.cycles-pp.__get_user_8 0.08 ± 6% +0.0 0.11 ± 3% perf-profile.self.cycles-pp.finish_task_switch 0.03 ± 70% +0.0 0.07 ± 5% perf-profile.self.cycles-pp.sched_mm_cid_migrate_to 0.06 ± 6% +0.0 0.10 perf-profile.self.cycles-pp.pthread_setschedparam 0.07 ± 6% +0.0 0.11 ± 4% perf-profile.self.cycles-pp.futex_unqueue 0.06 ± 7% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64 0.11 +0.0 0.15 ± 3% perf-profile.self.cycles-pp.pv_native_safe_halt 0.03 ± 70% +0.0 0.07 ± 6% perf-profile.self.cycles-pp.ttwu_queue_wakelist 0.07 +0.0 0.11 ± 6% perf-profile.self.cycles-pp.__resched_curr 0.09 ± 7% +0.0 0.13 ± 2% perf-profile.self.cycles-pp.futex_wake_mark 0.10 ± 9% +0.0 0.14 ± 7% perf-profile.self.cycles-pp.native_sched_clock 0.09 ± 5% +0.0 0.13 ± 3% perf-profile.self.cycles-pp.stress_mutex_exercise 0.00 +0.1 0.05 perf-profile.self.cycles-pp.__sched_yield 0.00 +0.1 0.05 perf-profile.self.cycles-pp.update_curr_common 0.00 +0.1 0.05 ± 7% perf-profile.self.cycles-pp.exit_to_user_mode_loop 0.10 ± 8% +0.1 0.15 ± 5% perf-profile.self.cycles-pp.futex_wake 0.10 ± 4% +0.1 0.16 ± 7% perf-profile.self.cycles-pp.select_task_rq_rt 0.10 ± 3% +0.1 0.16 ± 3% perf-profile.self.cycles-pp.update_rq_clock_task 0.02 ±141% +0.1 0.07 ± 9% perf-profile.self.cycles-pp.pthread_mutex_lock 0.02 ±141% +0.1 0.07 ± 6% perf-profile.self.cycles-pp.switch_fpu_return 0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.__x2apic_send_IPI_dest 0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.native_apic_msr_eoi 0.09 ± 4% +0.1 0.15 ± 2% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 0.00 +0.1 0.06 perf-profile.self.cycles-pp.__radix_tree_lookup 0.00 +0.1 0.06 perf-profile.self.cycles-pp.__wrgsbase_inactive 0.09 ± 4% +0.1 0.15 ± 3% perf-profile.self.cycles-pp.do_syscall_64 0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.rseq_update_cpu_node_id 0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.select_task_rq 0.19 ± 3% +0.1 0.26 perf-profile.self.cycles-pp.find_lock_lowest_rq 0.13 ± 5% +0.1 0.20 ± 4% perf-profile.self.cycles-pp.__sched_setscheduler 0.11 ± 3% +0.1 0.18 ± 4% perf-profile.self.cycles-pp.switch_mm_irqs_off 0.44 +0.1 0.51 ± 2% perf-profile.self.cycles-pp._raw_spin_lock 0.17 ± 6% +0.1 0.26 ± 4% perf-profile.self.cycles-pp.__futex_hash 0.14 ± 3% +0.1 0.24 perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.14 ± 4% +0.1 0.24 ± 4% perf-profile.self.cycles-pp.__switch_to 0.18 ± 3% +0.1 0.28 ± 4% perf-profile.self.cycles-pp.futex_hash 0.16 ± 3% +0.1 0.26 ± 2% perf-profile.self.cycles-pp.os_xsave 0.17 ± 2% +0.1 0.28 ± 2% perf-profile.self.cycles-pp.set_load_weight 0.23 ± 3% +0.1 0.36 ± 2% perf-profile.self.cycles-pp.__schedule 0.23 +0.2 0.38 ± 3% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate 3.54 ± 2% +0.2 3.78 ± 3% perf-profile.self.cycles-pp.enqueue_pushable_task 0.14 ± 3% +0.4 0.52 ± 9% perf-profile.self.cycles-pp.sched_balance_newidle 1.45 ± 2% +0.6 2.03 ± 4% perf-profile.self.cycles-pp.dequeue_task_rt 0.72 ± 5% +2.1 2.80 ± 11% perf-profile.self.cycles-pp.enqueue_task_rt 1.45 ± 8% +6.0 7.47 ± 6% perf-profile.self.cycles-pp._find_first_and_bit Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
As an alternative, with a little bit complicated change, we can separate counts and masks into 2 vectors inlined in cpupri(counts[] and masks[]), and add two paddings: 1. Between counts[0] and counts[1], since counts[0] is more frequently updated than others along with a rt task enqueues an empty runq or dequeues from a non-overloaded runq. 2. Between the two vectors, since counts[] is RW while masks[] is read access when it stores pointers. The alternative approach introduces the complexity of 31+/21- LoC changes, while it achieves the same performance as the simple, at the same time, struct cpupri size is reduced from 26 cache lines to 21 cache lines. The alternative approach is also prepared, can be sent out if you have any interest. Best Regards Pan > -----Original Message----- > From: Pan Deng <pan.deng@intel.com> > Sent: Thursday, June 12, 2025 11:12 AM > To: peterz@infradead.org; mingo@kernel.org > Cc: linux-kernel@vger.kernel.org; Li, Tianyou <tianyou.li@intel.com>; > tim.c.chen@linux.intel.com; Deng, Pan <pan.deng@intel.com> > Subject: [PATCH] sched/rt: optimize cpupri_vec layout > > When running a multi-instance ffmpeg transcoding workload which uses rt > thread in a high core count system, cpupri_vec->count contends with the > reading of mask in the same cache line in function cpupri_find_fitness and > cpupri_set. > This change separates each count and mask into different cache lines by cache > aligned attribute to avoid the false sharing. > Tested in a 2 sockets, 240 physical core 480 logical core machine, running > 60 ffmpeg transcoding instances. With the change, the kernel cycles% is > reduced from ~20% to ~12%, the fps metric is improved ~11%. > The side effect of this change is that struct cpupri size is increased from 26 > cache lines to 203 cache lines. > > Signed-off-by: Pan Deng <pan.deng@intel.com> > Signed-off-by: Tianyou Li <tianyou.li@intel.com> > Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> > --- > kernel/sched/cpupri.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/cpupri.h b/kernel/sched/cpupri.h index > d6cba0020064..245b0fa626be 100644 > --- a/kernel/sched/cpupri.h > +++ b/kernel/sched/cpupri.h > @@ -9,7 +9,7 @@ > > struct cpupri_vec { > atomic_t count; > - cpumask_var_t mask; > + cpumask_var_t mask ____cacheline_aligned; > }; > > struct cpupri { > -- > 2.43.5
© 2016 - 2025 Red Hat, Inc.