[PATCH] sched/fair: prefer available idle cpu in select_idle_core

zhangwei123171@gmail.com posted 1 patch 1 year, 8 months ago
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] sched/fair: prefer available idle cpu in select_idle_core
Posted by zhangwei123171@gmail.com 1 year, 8 months ago
From: zhangwei123171 <zhangwei123171@jd.com>

When the idle core cannot be found, the first sched idle cpu
or first available idle cpu will be used if exsit.

We can use the available idle cpu detected later to ensure it
can be used if exsit.

Signed-off-by: zhangwei123171 <zhangwei123171@jd.com>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 41b58387023d..653ca3ea09b6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7341,7 +7341,7 @@ static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpu
 			}
 			break;
 		}
-		if (*idle_cpu == -1 && cpumask_test_cpu(cpu, cpus))
+		if (cpumask_test_cpu(cpu, cpus))
 			*idle_cpu = cpu;
 	}
 
-- 
2.33.0
Re: [PATCH] sched/fair: prefer available idle cpu in select_idle_core
Posted by kernel test robot 1 year, 7 months ago

Hello,

kernel test robot noticed a 2.9% improvement of stress-ng.vm-rw.ops_per_sec on:


commit: 9f2e02ee19cda318b3889a27c13aee04fdbeb179 ("[PATCH] sched/fair: prefer available idle cpu in select_idle_core")
url: https://github.com/intel-lab-lkp/linux/commits/zhangwei123171-gmail-com/sched-fair-prefer-available-idle-cpu-in-select_idle_core/20240612-195645
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git c793a62823d1ce8f70d9cfc7803e3ea436277cda
patch link: https://lore.kernel.org/all/20240612115410.1659149-1-zhangwei123171@jd.com/
patch subject: [PATCH] sched/fair: prefer available idle cpu in select_idle_core

testcase: stress-ng
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: vm-rw
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240620/202406201547.f5077fa1-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/vm-rw/stress-ng/60s

commit: 
  c793a62823 ("sched/core: Drop spinlocks on contention iff kernel is preemptible")
  9f2e02ee19 ("sched/fair: prefer available idle cpu in select_idle_core")

c793a62823d1ce8f 9f2e02ee19cda318b3889a27c13 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    295657 ±  6%     +22.8%     362935 ±  8%  meminfo.Active
    295610 ±  6%     +22.8%     362887 ±  8%  meminfo.Active(anon)
    150724 ± 21%     +67.4%     252378 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.stddev
    150724 ± 21%     +67.4%     252378 ±  3%  sched_debug.cfs_rq:/.min_vruntime.stddev
  10941857            +3.9%   11367018        vmstat.system.cs
   1076455            -1.9%    1055781        vmstat.system.in
     74324 ±  6%     +21.9%      90584 ±  7%  proc-vmstat.nr_active_anon
 3.512e+09            +3.3%  3.626e+09        proc-vmstat.nr_foll_pin_acquired
 3.512e+09            +3.3%  3.626e+09        proc-vmstat.nr_foll_pin_released
     74324 ±  6%     +21.9%      90584 ±  7%  proc-vmstat.nr_zone_active_anon
    760031           -70.4%     224972        stress-ng.time.involuntary_context_switches
 3.948e+08            +2.9%  4.062e+08        stress-ng.time.voluntary_context_switches
 1.975e+08            +2.9%  2.032e+08        stress-ng.vm-rw.ops
   3291726            +2.9%    3387191        stress-ng.vm-rw.ops_per_sec
 4.035e+10            +2.2%  4.123e+10        perf-stat.i.branch-instructions
      0.67            -0.0        0.65        perf-stat.i.branch-miss-rate%
 6.491e+09            +1.1%  6.564e+09        perf-stat.i.cache-references
  11493579            +3.5%   11900089        perf-stat.i.context-switches
      2.41            -1.9%       2.37        perf-stat.i.cpi
   4817773            +1.7%    4901418        perf-stat.i.cpu-migrations
  2.16e+11            +2.2%  2.208e+11        perf-stat.i.instructions
      0.43            +1.7%       0.43        perf-stat.i.ipc
     71.91            +3.6%      74.50        perf-stat.i.metric.K/sec
      0.62            -0.0        0.61        perf-stat.overall.branch-miss-rate%
      2.38            -1.6%       2.34        perf-stat.overall.cpi
      0.42            +1.6%       0.43        perf-stat.overall.ipc
 3.903e+10            +2.5%  3.999e+10        perf-stat.ps.branch-instructions
 6.286e+09            +1.7%  6.395e+09        perf-stat.ps.cache-references
  11123522            +4.1%   11584380        perf-stat.ps.context-switches
   4664338            +2.4%    4775066        perf-stat.ps.cpu-migrations
  2.09e+11            +2.5%  2.143e+11        perf-stat.ps.instructions
 1.266e+13            +2.8%  1.301e+13        perf-stat.total.instructions
     16.29            -0.3       15.97        perf-profile.calltrace.cycles-pp.pipe_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     16.70            -0.3       16.39        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     14.46            -0.3       14.16        perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_sync_key.pipe_write.vfs_write.ksys_write
     14.53            -0.3       14.23        perf-profile.calltrace.cycles-pp.__wake_up_sync_key.pipe_write.vfs_write.ksys_write.do_syscall_64
     13.71            -0.3       13.41        perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.pipe_write
     13.73            -0.3       13.44        perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.pipe_write.vfs_write
     18.54            -0.2       18.31        perf-profile.calltrace.cycles-pp.__clone
      8.93            -0.2        8.76        perf-profile.calltrace.cycles-pp.write.stress_vm_rw
      8.77            -0.2        8.60        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.stress_vm_rw
      8.71            -0.2        8.54        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.stress_vm_rw
      8.78            -0.2        8.62        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write.stress_vm_rw
     62.61            -0.2       62.46        perf-profile.calltrace.cycles-pp.stress_vm_rw
      8.39            -0.1        8.25        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.__clone
      8.40            -0.1        8.26        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write.__clone
      8.32            -0.1        8.18        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.__clone
      8.52            -0.1        8.38        perf-profile.calltrace.cycles-pp.write.__clone
     13.62            -0.1       13.48        perf-profile.calltrace.cycles-pp.pipe_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
     14.30            -0.1       14.17        perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      9.69            -0.1        9.56        perf-profile.calltrace.cycles-pp.schedule.pipe_read.vfs_read.ksys_read.do_syscall_64
      9.59            -0.1        9.47        perf-profile.calltrace.cycles-pp.__schedule.schedule.pipe_read.vfs_read.ksys_read
     23.18            -0.1       23.07        perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.process_vm_rw_single_vec.process_vm_rw_core.process_vm_rw
     23.42            -0.1       23.31        perf-profile.calltrace.cycles-pp.copy_page_to_iter.process_vm_rw_single_vec.process_vm_rw_core.process_vm_rw.__x64_sys_process_vm_readv
      9.04            -0.1        8.95        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read.stress_vm_rw
      4.71            -0.1        4.62        perf-profile.calltrace.cycles-pp.available_idle_cpu.select_idle_core.select_idle_cpu.select_idle_sibling.select_task_rq_fair
      9.05            -0.1        8.96        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read.stress_vm_rw
      7.79            -0.1        7.71        perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read.stress_vm_rw
      0.63            -0.1        0.56        perf-profile.calltrace.cycles-pp.read
      0.70            -0.0        0.67        perf-profile.calltrace.cycles-pp.prepare_task_switch.__schedule.schedule.pipe_read.vfs_read
      0.58            -0.0        0.55        perf-profile.calltrace.cycles-pp._raw_spin_lock.__schedule.schedule_idle.do_idle.cpu_startup_entry
      1.42            -0.0        1.40        perf-profile.calltrace.cycles-pp.switch_fpu_return.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      0.84            -0.0        0.83        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.read.__clone
      0.71            +0.0        0.74        perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_entity.enqueue_task_fair.ttwu_do_activate.sched_ttwu_pending
      0.87            +0.0        0.90        perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule_idle.do_idle.cpu_startup_entry
      6.19            +0.0        6.22        perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      6.24            +0.0        6.28        perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      5.16            +0.0        5.21        perf-profile.calltrace.cycles-pp._copy_from_iter.copy_page_from_iter.process_vm_rw_single_vec.process_vm_rw_core.process_vm_rw
      2.76            +0.0        2.81        perf-profile.calltrace.cycles-pp.pin_user_pages_remote.process_vm_rw_single_vec.process_vm_rw_core.process_vm_rw.__x64_sys_process_vm_writev
      3.50            +0.0        3.54        perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      3.36            +0.0        3.41        perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
      5.35            +0.0        5.40        perf-profile.calltrace.cycles-pp.copy_page_from_iter.process_vm_rw_single_vec.process_vm_rw_core.process_vm_rw.__x64_sys_process_vm_writev
      0.66            +0.1        0.73        perf-profile.calltrace.cycles-pp.sched_mm_cid_migrate_to.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue
      6.76            +0.1        6.82        perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      2.15            +0.1        2.22        perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue
      7.38            +0.1        7.45        perf-profile.calltrace.cycles-pp.select_idle_cpu.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up
      9.00            +0.1        9.08        perf-profile.calltrace.cycles-pp.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common
      9.37            +0.1        9.45        perf-profile.calltrace.cycles-pp.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key
      2.58            +0.1        2.68        perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue
      9.08            +0.1        9.19        perf-profile.calltrace.cycles-pp.process_vm_rw_single_vec.process_vm_rw_core.process_vm_rw.__x64_sys_process_vm_writev.do_syscall_64
      9.71            +0.1        9.82        perf-profile.calltrace.cycles-pp.process_vm_rw_core.process_vm_rw.__x64_sys_process_vm_writev.do_syscall_64.entry_SYSCALL_64_after_hwframe
      8.48            +0.1        8.60        perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function
     10.27            +0.1       10.39        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.process_vm_writev.stress_vm_rw
     10.12            +0.1       10.23        perf-profile.calltrace.cycles-pp.__x64_sys_process_vm_writev.do_syscall_64.entry_SYSCALL_64_after_hwframe.process_vm_writev.stress_vm_rw
     10.24            +0.1       10.36        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.process_vm_writev.stress_vm_rw
     10.10            +0.1       10.22        perf-profile.calltrace.cycles-pp.process_vm_rw.__x64_sys_process_vm_writev.do_syscall_64.entry_SYSCALL_64_after_hwframe.process_vm_writev
     10.64            +0.1       10.77        perf-profile.calltrace.cycles-pp.process_vm_writev.stress_vm_rw
      4.31            +0.2        4.46        perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry
      5.36            +0.2        5.52        perf-profile.calltrace.cycles-pp.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      3.40            +0.2        3.57        perf-profile.calltrace.cycles-pp.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle
      4.81            +0.2        4.99        perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary
     16.66            +0.3       16.96        perf-profile.calltrace.cycles-pp.common_startup_64
     16.57            +0.3       16.88        perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
     16.54            +0.3       16.84        perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
     16.58            +0.3       16.89        perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
     76.34            -0.4       75.95        perf-profile.children.cycles-pp.do_syscall_64
     76.51            -0.4       76.12        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     16.31            -0.3       16.00        perf-profile.children.cycles-pp.pipe_write
     14.46            -0.3       14.16        perf-profile.children.cycles-pp.__wake_up_common
     14.54            -0.3       14.24        perf-profile.children.cycles-pp.__wake_up_sync_key
     13.74            -0.3       13.44        perf-profile.children.cycles-pp.autoremove_wake_function
     13.72            -0.3       13.43        perf-profile.children.cycles-pp.try_to_wake_up
     16.83            -0.3       16.57        perf-profile.children.cycles-pp.vfs_write
     17.16            -0.2       16.91        perf-profile.children.cycles-pp.ksys_write
     17.70            -0.2       17.45        perf-profile.children.cycles-pp.write
     18.54            -0.2       18.31        perf-profile.children.cycles-pp.__clone
     18.78            -0.2       18.58        perf-profile.children.cycles-pp.read
      2.91            -0.2        2.72        perf-profile.children.cycles-pp._raw_spin_lock
     13.88            -0.2       13.71        perf-profile.children.cycles-pp.pipe_read
     14.54            -0.2       14.37        perf-profile.children.cycles-pp.vfs_read
      9.93            -0.2        9.76        perf-profile.children.cycles-pp.schedule
     14.70            -0.2       14.54        perf-profile.children.cycles-pp.ksys_read
      0.24            -0.2        0.08        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     62.61            -0.2       62.46        perf-profile.children.cycles-pp.stress_vm_rw
     23.98            -0.1       23.87        perf-profile.children.cycles-pp._copy_to_iter
     24.25            -0.1       24.14        perf-profile.children.cycles-pp.copy_page_to_iter
      3.16            -0.1        3.10        perf-profile.children.cycles-pp.enqueue_task_fair
      2.61            -0.0        2.57        perf-profile.children.cycles-pp.update_load_avg
      1.78            -0.0        1.74        perf-profile.children.cycles-pp.prepare_task_switch
      0.28            -0.0        0.24        perf-profile.children.cycles-pp.update_rq_clock_task
      2.31            -0.0        2.28        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.38            -0.0        0.35        perf-profile.children.cycles-pp.wake_affine
      0.82            -0.0        0.79        perf-profile.children.cycles-pp.update_rq_clock
      0.89            -0.0        0.86        perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      1.43            -0.0        1.40        perf-profile.children.cycles-pp.switch_fpu_return
      0.10 ±  4%      -0.0        0.08 ±  3%  perf-profile.children.cycles-pp.cpuacct_charge
      0.21 ±  2%      -0.0        0.19 ±  2%  perf-profile.children.cycles-pp.task_h_load
      0.13 ±  3%      -0.0        0.11        perf-profile.children.cycles-pp.wakeup_preempt
      0.95            -0.0        0.93        perf-profile.children.cycles-pp.prepare_to_wait_event
      0.16            +0.0        0.17        perf-profile.children.cycles-pp.tick_nohz_idle_enter
      0.62            +0.0        0.63        perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.29            +0.0        0.30        perf-profile.children.cycles-pp.nohz_run_idle_balance
      0.92            +0.0        0.94        perf-profile.children.cycles-pp.mod_node_page_state
      0.25            +0.0        0.27        perf-profile.children.cycles-pp.update_min_vruntime
      0.77            +0.0        0.79        perf-profile.children.cycles-pp.__switch_to_asm
      0.46            +0.0        0.48        perf-profile.children.cycles-pp.llist_reverse_order
      6.27            +0.0        6.30        perf-profile.children.cycles-pp.cpuidle_enter
      0.48 ±  2%      +0.0        0.52 ±  2%  perf-profile.children.cycles-pp.pick_next_task_idle
      0.47 ±  2%      +0.0        0.51 ±  2%  perf-profile.children.cycles-pp.__update_idle_core
      0.81            +0.0        0.85        perf-profile.children.cycles-pp.sched_mm_cid_migrate_to
      0.09 ±  6%      +0.0        0.13 ±  5%  perf-profile.children.cycles-pp.generic_perform_write
      3.52            +0.0        3.57        perf-profile.children.cycles-pp.schedule_idle
      5.40            +0.0        5.45        perf-profile.children.cycles-pp._copy_from_iter
      0.00            +0.1        0.05 ±  5%  perf-profile.children.cycles-pp.copy_page_from_iter_atomic
      0.09 ±  6%      +0.1        0.14 ±  7%  perf-profile.children.cycles-pp.shmem_file_write_iter
      5.64            +0.1        5.69        perf-profile.children.cycles-pp.copy_page_from_iter
      0.00            +0.1        0.06 ±  7%  perf-profile.children.cycles-pp.ring_buffer_read_head
      0.12 ±  6%      +0.1        0.18 ±  6%  perf-profile.children.cycles-pp.record__pushfn
      6.80            +0.1        6.86        perf-profile.children.cycles-pp.cpuidle_idle_call
      0.12 ±  6%      +0.1        0.18 ±  6%  perf-profile.children.cycles-pp.writen
      0.00            +0.1        0.07 ± 10%  perf-profile.children.cycles-pp.perf_mmap__read_head
      7.43            +0.1        7.51        perf-profile.children.cycles-pp.select_idle_cpu
      9.00            +0.1        9.09        perf-profile.children.cycles-pp.select_task_rq_fair
      9.37            +0.1        9.46        perf-profile.children.cycles-pp.select_task_rq
      0.18 ±  5%      +0.1        0.29 ±  6%  perf-profile.children.cycles-pp.perf_mmap__push
      0.19 ±  5%      +0.1        0.30 ±  6%  perf-profile.children.cycles-pp.record__mmap_read_evlist
      0.19 ±  5%      +0.1        0.30 ±  6%  perf-profile.children.cycles-pp.cmd_record
      0.19 ±  6%      +0.1        0.31 ±  6%  perf-profile.children.cycles-pp.main
      0.19 ±  6%      +0.1        0.31 ±  6%  perf-profile.children.cycles-pp.run_builtin
      8.50            +0.1        8.62        perf-profile.children.cycles-pp.select_idle_sibling
     10.12            +0.1       10.24        perf-profile.children.cycles-pp.__x64_sys_process_vm_writev
     10.80            +0.1       10.92        perf-profile.children.cycles-pp.process_vm_writev
      4.92            +0.1        5.07        perf-profile.children.cycles-pp.sched_ttwu_pending
      5.43            +0.1        5.58        perf-profile.children.cycles-pp.flush_smp_call_function_queue
      5.54            +0.2        5.70        perf-profile.children.cycles-pp.__flush_smp_call_function_queue
     16.66            +0.3       16.96        perf-profile.children.cycles-pp.common_startup_64
     16.66            +0.3       16.96        perf-profile.children.cycles-pp.cpu_startup_entry
     16.63            +0.3       16.93        perf-profile.children.cycles-pp.do_idle
     16.58            +0.3       16.89        perf-profile.children.cycles-pp.start_secondary
      0.24            -0.2        0.08        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
     23.76            -0.1       23.65        perf-profile.self.cycles-pp._copy_to_iter
      1.59            -0.0        1.54        perf-profile.self.cycles-pp.prepare_task_switch
      1.31            -0.0        1.28        perf-profile.self.cycles-pp.update_load_avg
      0.36            -0.0        0.33        perf-profile.self.cycles-pp.switch_fpu_return
      0.24 ±  2%      -0.0        0.21 ±  2%  perf-profile.self.cycles-pp.update_rq_clock_task
      0.70            -0.0        0.67        perf-profile.self.cycles-pp.update_rq_clock
      5.43            -0.0        5.40        perf-profile.self.cycles-pp.intel_idle
      0.21 ±  2%      -0.0        0.18 ±  2%  perf-profile.self.cycles-pp.task_h_load
      0.10            -0.0        0.08        perf-profile.self.cycles-pp.cpuacct_charge
      0.17            -0.0        0.15 ±  3%  perf-profile.self.cycles-pp.ttwu_queue_wakelist
      0.27            -0.0        0.26        perf-profile.self.cycles-pp.try_to_wake_up
      0.13            -0.0        0.12        perf-profile.self.cycles-pp.pick_next_task_fair
      0.22            -0.0        0.21        perf-profile.self.cycles-pp.set_task_cpu
      0.07            -0.0        0.06        perf-profile.self.cycles-pp.wakeup_preempt
      0.16 ±  2%      +0.0        0.18 ±  2%  perf-profile.self.cycles-pp.menu_select
      0.78            +0.0        0.80        perf-profile.self.cycles-pp.__switch_to
      0.49            +0.0        0.50        perf-profile.self.cycles-pp.call_function_single_prep_ipi
      0.84            +0.0        0.85        perf-profile.self.cycles-pp.mod_node_page_state
      0.24 ±  2%      +0.0        0.26        perf-profile.self.cycles-pp.remove_entity_load_avg
      0.24            +0.0        0.26 ±  2%  perf-profile.self.cycles-pp.update_min_vruntime
      0.46            +0.0        0.48        perf-profile.self.cycles-pp.llist_reverse_order
      0.55            +0.0        0.57        perf-profile.self.cycles-pp.enqueue_entity
      0.76            +0.0        0.79        perf-profile.self.cycles-pp.__switch_to_asm
      0.37 ±  3%      +0.0        0.40 ±  2%  perf-profile.self.cycles-pp.__update_idle_core
      0.81            +0.0        0.85        perf-profile.self.cycles-pp.sched_mm_cid_migrate_to
      1.12            +0.0        1.16        perf-profile.self.cycles-pp.select_idle_core
      5.34            +0.0        5.39        perf-profile.self.cycles-pp._copy_from_iter
      0.48            +0.1        0.53        perf-profile.self.cycles-pp.select_idle_cpu
      0.00            +0.1        0.05 ±  5%  perf-profile.self.cycles-pp.copy_page_from_iter_atomic
      0.00            +0.1        0.06 ± 10%  perf-profile.self.cycles-pp.ring_buffer_read_head




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH] sched/fair: prefer available idle cpu in select_idle_core
Posted by K Prateek Nayak 1 year, 8 months ago
Hello there,

On 6/12/2024 5:24 PM, zhangwei123171@gmail.com wrote:
> From: zhangwei123171 <zhangwei123171@jd.com>
> 
> When the idle core cannot be found, the first sched idle cpu
> or first available idle cpu will be used if exsit.
> 
> We can use the available idle cpu detected later to ensure it
> can be used if exsit.

Is there any particular advantage of the same? Based on my understanding
the check exists to prevent unnecessary calls to cpumask_test_cpu() if
an idle CPU is already found. On a large core count system with a large
number of cores in the LLC domain, this may result in a lot more calls
to cpumask_test_cpu() if only one core is in fact idle and there is a
storm of wakeups.

For SMT-2 system, I believe any idle thread on a busy core would be the
same (if we consider all task to have same behavior). On a larger SMT
system, it takes more overhead to consider which core is the most idle.
Consider the following case:

o CPUs of core: 0-7; Only CPU1 is busy (i is idle, b is busy)

   +---+---+---+---+---+---+---+---+
   | i | b | i | i | i | i | i | i |
   +---+---+---+---+---+---+---+---+
         ^
   select idle core bails out at first busy CPU which is CPU1 however
   this core is only 1/8th busy.

o CPUs of core: 8-15; CPU10 to CPU15 are busy (i is idle, b is busy)

   +---+---+---+---+---+---+---+---+
   | i | i | b | b | b | b | b | b |
   +---+---+---+---+---+---+---+---+
             ^
   select idle core bails out at first busy CPU which is CPU10 however
   this core is in fact 5/8th busy.

Technically, core with CPU0 is better but with your change, we'll select
core of CPU8. Bottom line being, there does not seem to exist a good
case where selecting the last idle thread is better than selecting the
first one. The best the scheduler can do is reduce the number of calls
to cpumask_test_cpu() once an idle CPU is found unless it decides to
scan all the CPUs of the core to find the core which is the idlest and
in a large, busy system, that is a big hammer.

Thoughts?

> 
> Signed-off-by: zhangwei123171 <zhangwei123171@jd.com>
> ---
>   kernel/sched/fair.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 41b58387023d..653ca3ea09b6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7341,7 +7341,7 @@ static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpu
>   			}
>   			break;
>   		}
> -		if (*idle_cpu == -1 && cpumask_test_cpu(cpu, cpus))
> +		if (cpumask_test_cpu(cpu, cpus))
>   			*idle_cpu = cpu;
>   	}
>   

--
Thanks and Regards,
Prateek