[v1] sched/fair: Reschedule the cfs_rq when current is ineligible

[PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Chunxin Zang 1 year, 8 months ago

I found that some tasks have been running for a long enough time and
have become illegal, but they are still not releasing the CPU. This
will increase the scheduling delay of other processes. Therefore, I
tried checking the current process in wakeup_preempt and entity_tick,
and if it is illegal, reschedule that cfs queue.

The modification can reduce the scheduling delay by about 30% when
RUN_TO_PARITY is enabled.
So far, it has been running well in my test environment, and I have
pasted some test results below.

I isolated four cores for testing. I ran Hackbench in the background
and observed the test results of cyclictest.

hackbench -g 4 -l 100000000 &
cyclictest --mlockall -D 5m -q

                                 EEVDF      PATCH  EEVDF-NO_PARITY  PATCH-NO_PARITY

                # Min Latencies: 00006      00006      00006      00006
  LNICE(-19)    # Avg Latencies: 00191      00122      00089      00066
                # Max Latencies: 15442      07648      14133      07713

                # Min Latencies: 00006      00010      00006      00006
  LNICE(0)      # Avg Latencies: 00466      00277      00289      00257
                # Max Latencies: 38917      32391      32665      17710

                # Min Latencies: 00019      00053      00010      00013
  LNICE(19)     # Avg Latencies: 37151      31045      18293      23035
                # Max Latencies: 2688299    7031295    426196     425708

I'm actually a bit hesitant about placing this modification under the
NO_PARITY feature. This is because the modification conflicts with the
semantics of RUN_TO_PARITY. So, I captured and compared the number of
resched occurrences in wakeup_preempt to see if it introduced any
additional overhead.

Similarly, hackbench is used to stress the utilization of four cores to
100%, and the method for capturing the number of PREEMPT occurrences is
referenced from [1].

schedstats                          EEVDF       PATCH   EEVDF-NO_PARITY  PATCH-NO_PARITY  CFS(6.5)
.stats.check_preempt_count          5053054     5057286    5003806    5018589    5031908
.stats.patch_cause_preempt_count    -------     858044     -------    765726     -------
.stats.need_preempt_count           570520      858684     3380513    3426977    1140821

From the above test results, there is a slight increase in the number of
resched occurrences in wakeup_preempt. However, the results vary with each
test, and sometimes the difference is not that significant. But overall,
the count of reschedules remains lower than that of CFS and is much less
than that of NO_PARITY.

[1]: https://lore.kernel.org/all/20230816134059.GC982867@hirez.programming.kicks-ass.net/T/#m52057282ceb6203318be1ce9f835363de3bef5cb

Signed-off-by: Chunxin Zang <zangchunxin@lixiang.com>
Reviewed-by: Chen Yang <yangchen11@lixiang.com>
---
 kernel/sched/fair.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 03be0d1330a6..a0005d240db5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
 			hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
 		return;
 #endif
+
+	if (!entity_eligible(cfs_rq, curr))
+		resched_curr(rq_of(cfs_rq));
 }
 
 
@@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
 	if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
 		return;
 
+	if (!entity_eligible(cfs_rq, se))
+		goto preempt;
+
 	find_matching_se(&se, &pse);
 	WARN_ON_ONCE(!pse);
 
-- 
2.34.1

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by kernel test robot 1 year, 8 months ago


Hello,

kernel test robot noticed a -11.8% regression of netperf.Throughput_Mbps on:


commit: e2bbd1c498980c5cb68f9973f418ae09f353258d ("[PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible")
url: https://github.com/intel-lab-lkp/linux/commits/Chunxin-Zang/sched-fair-Reschedule-the-cfs_rq-when-current-is-ineligible/20240524-214314
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 97450eb909658573dcacc1063b06d3d08642c0c1
patch link: https://lore.kernel.org/all/20240524134011.270861-1-spring.cxz@gmail.com/
patch subject: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

testcase: netperf
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
parameters:

	ip: ipv4
	runtime: 300s
	nr_threads: 200%
	cluster: cs-localhost
	test: UDP_STREAM
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.fstat.ops_per_sec -3.9% regression                                            |
| test machine     | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory          |
| test parameters  | cpufreq_governor=performance                                                                       |
|                  | disk=1HDD                                                                                          |
|                  | fs=ext4                                                                                            |
|                  | nr_threads=100%                                                                                    |
|                  | test=fstat                                                                                         |
|                  | testtime=60s                                                                                       |
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | aim7: aim7.jobs-per-min 9.6% improvement                                                           |
| test machine     | 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory |
| test parameters  | cpufreq_governor=performance                                                                       |
|                  | disk=4BRD_12G                                                                                      |
|                  | fs=xfs                                                                                             |
|                  | load=300                                                                                           |
|                  | md=RAID1                                                                                           |
|                  | test=sync_disk_rw                                                                                  |
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | kbuild: kbuild.user_time_per_iteration 2.3% regression                                             |
| test machine     | 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory |
| test parameters  | build_kconfig=defconfig                                                                            |
|                  | cpufreq_governor=performance                                                                       |
|                  | nr_task=200%                                                                                       |
|                  | runtime=300s                                                                                       |
|                  | target=vmlinux                                                                                     |
+------------------+----------------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202405291359.3f662525-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240529/202405291359.3f662525-oliver.sang@intel.com

=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
  cs-localhost/gcc-13/performance/ipv4/x86_64-rhel-8.3/200%/debian-12-x86_64-20240206.cgz/300s/lkp-icl-2sp2/UDP_STREAM/netperf

commit: 
  97450eb909 ("sched/pelt: Remove shift of thermal clock")
  e2bbd1c498 ("sched/fair: Reschedule the cfs_rq when current is ineligible")

97450eb909658573 e2bbd1c498980c5cb68f9973f41 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.46 ±  5%      +0.1        0.56 ±  4%  mpstat.cpu.all.irq%
   1867628 ±  2%     -12.6%    1632289 ±  8%  meminfo.Active
   1867580 ±  2%     -12.6%    1632257 ±  8%  meminfo.Active(anon)
   1865825 ±  2%     -12.7%    1629647 ±  9%  numa-meminfo.node1.Active
   1865809 ±  2%     -12.7%    1629633 ±  9%  numa-meminfo.node1.Active(anon)
     68.00 ±  8%    +136.3%     160.67 ± 18%  perf-c2c.DRAM.local
      2951 ±  9%     +98.5%       5858        perf-c2c.DRAM.remote
   7054758            -5.6%    6656686        vmstat.system.cs
    192398            -9.7%     173722        vmstat.system.in
 1.632e+09           -10.7%  1.458e+09        numa-numastat.node0.local_node
 1.633e+09           -10.7%  1.458e+09        numa-numastat.node0.numa_hit
 1.632e+09           -11.4%  1.446e+09        numa-numastat.node1.local_node
 1.633e+09           -11.4%  1.447e+09        numa-numastat.node1.numa_hit
 1.633e+09           -10.7%  1.458e+09        numa-vmstat.node0.numa_hit
 1.632e+09           -10.7%  1.458e+09        numa-vmstat.node0.numa_local
    466378 ±  2%     -12.6%     407484 ±  8%  numa-vmstat.node1.nr_active_anon
    466377 ±  2%     -12.6%     407484 ±  8%  numa-vmstat.node1.nr_zone_active_anon
 1.633e+09           -11.4%  1.447e+09        numa-vmstat.node1.numa_hit
 1.632e+09           -11.4%  1.446e+09        numa-vmstat.node1.numa_local
    467142 ±  3%     -12.7%     407846 ±  9%  proc-vmstat.nr_active_anon
     31481            +2.0%      32110        proc-vmstat.nr_kernel_stack
    467142 ±  3%     -12.7%     407846 ±  9%  proc-vmstat.nr_zone_active_anon
 3.266e+09           -11.0%  2.905e+09        proc-vmstat.numa_hit
 3.264e+09           -11.0%  2.904e+09        proc-vmstat.numa_local
 2.608e+10           -11.0%   2.32e+10        proc-vmstat.pgalloc_normal
 2.608e+10           -11.0%   2.32e+10        proc-vmstat.pgfree
     29563           -10.1%      26584        netperf.ThroughputBoth_Mbps
   7563274           -10.3%    6783505        netperf.ThroughputBoth_total_Mbps
      7788            -5.1%       7388        netperf.ThroughputRecv_Mbps
   1992482            -5.4%    1885347        netperf.ThroughputRecv_total_Mbps
     21775           -11.8%      19196        netperf.Throughput_Mbps
   5570791           -12.1%    4898158        netperf.Throughput_total_Mbps
 1.083e+09            -5.3%  1.025e+09        netperf.time.involuntary_context_switches
      8403            -3.4%       8116        netperf.time.percent_of_cpu_this_job_got
     24883            -3.5%      24000        netperf.time.system_time
    789.48            +1.2%     799.09        netperf.time.user_time
  4.33e+09           -10.3%  3.883e+09        netperf.workload
      4.31 ±  4%     +11.6%       4.81 ±  4%  sched_debug.cfs_rq:/.h_nr_running.max
      0.68 ±  3%     +12.6%       0.77        sched_debug.cfs_rq:/.h_nr_running.stddev
     16.51 ± 12%     +19.3%      19.70 ±  6%  sched_debug.cfs_rq:/.load_avg.avg
      5.04 ± 34%     +60.0%       8.07 ± 17%  sched_debug.cfs_rq:/.removed.load_avg.avg
     28.02 ± 21%     +27.6%      35.75 ±  7%  sched_debug.cfs_rq:/.removed.load_avg.stddev
      2.48 ± 35%     +48.3%       3.68 ±  9%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
      2.48 ± 35%     +48.3%       3.68 ±  9%  sched_debug.cfs_rq:/.removed.util_avg.avg
    114.64 ±  8%     -10.3%     102.79 ±  4%  sched_debug.cfs_rq:/.util_avg.stddev
     36.81 ± 10%     +50.7%      55.47 ± 11%  sched_debug.cpu.clock.stddev
      0.00 ±  6%     +43.3%       0.00 ±  9%  sched_debug.cpu.next_balance.stddev
      4.31 ±  4%     +10.3%       4.75 ±  3%  sched_debug.cpu.nr_running.max
      0.68 ±  3%     +12.4%       0.76 ±  2%  sched_debug.cpu.nr_running.stddev
   7177076 ±  2%      -9.9%    6466454 ±  4%  sched_debug.cpu.nr_switches.min
      0.23 ± 88%    +290.5%       0.92 ± 24%  sched_debug.rt_rq:.rt_time.avg
     30.05 ± 88%    +290.5%     117.32 ± 24%  sched_debug.rt_rq:.rt_time.max
      2.65 ± 88%    +290.5%      10.33 ± 24%  sched_debug.rt_rq:.rt_time.stddev
      1.39 ±  3%    +232.7%       4.63        perf-stat.i.MPKI
 2.345e+10            -9.9%  2.113e+10        perf-stat.i.branch-instructions
      1.05            +0.0        1.09        perf-stat.i.branch-miss-rate%
 2.419e+08            -6.1%   2.27e+08        perf-stat.i.branch-misses
      4.04 ±  3%      +5.9        9.96        perf-stat.i.cache-miss-rate%
 1.769e+08 ±  3%    +200.3%  5.312e+08        perf-stat.i.cache-misses
  4.43e+09           +20.9%  5.355e+09        perf-stat.i.cache-references
   7118377            -5.9%    6699288        perf-stat.i.context-switches
      2.32           +10.4%       2.56        perf-stat.i.cpi
      1759 ±  4%     -63.3%     644.95        perf-stat.i.cycles-between-cache-misses
 1.271e+11            -9.9%  1.145e+11        perf-stat.i.instructions
      0.44            -9.1%       0.40        perf-stat.i.ipc
     55.61            -6.0%      52.29        perf-stat.i.metric.K/sec
      1.39 ±  3%    +233.2%       4.64        perf-stat.overall.MPKI
      1.03            +0.0        1.07        perf-stat.overall.branch-miss-rate%
      3.99 ±  3%      +5.9        9.91        perf-stat.overall.cache-miss-rate%
      2.31           +10.3%       2.55        perf-stat.overall.cpi
      1658 ±  3%     -66.9%     548.75        perf-stat.overall.cycles-between-cache-misses
      0.43            -9.4%       0.39        perf-stat.overall.ipc
 2.337e+10           -10.0%  2.103e+10        perf-stat.ps.branch-instructions
  2.41e+08            -6.2%   2.26e+08        perf-stat.ps.branch-misses
 1.764e+08 ±  3%    +199.8%   5.29e+08        perf-stat.ps.cache-misses
 4.417e+09           +20.8%  5.336e+09        perf-stat.ps.cache-references
   7096909            -6.0%    6672410        perf-stat.ps.context-switches
 1.266e+11           -10.0%   1.14e+11        perf-stat.ps.instructions
 3.879e+13           -10.0%  3.492e+13        perf-stat.total.instructions
     67.36            -3.1       64.24        perf-profile.calltrace.cycles-pp.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto
     67.58            -3.1       64.47        perf-profile.calltrace.cycles-pp.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto.send_omni_inner
     71.02            -3.0       68.03        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto.send_omni_inner.send_udp_stream
     64.63            -2.9       61.72        perf-profile.calltrace.cycles-pp.udp_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe
     72.79            -2.9       69.94        perf-profile.calltrace.cycles-pp.send_omni_inner.send_udp_stream.main
     72.82            -2.8       69.97        perf-profile.calltrace.cycles-pp.send_udp_stream.main
     71.47            -2.8       68.64        perf-profile.calltrace.cycles-pp.sendto.send_omni_inner.send_udp_stream.main
     70.66            -2.8       67.87        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sendto.send_omni_inner.send_udp_stream.main
     45.90            -1.2       44.69        perf-profile.calltrace.cycles-pp.ip_make_skb.udp_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64
     15.20            -1.2       14.03        perf-profile.calltrace.cycles-pp.udp_send_skb.udp_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64
     14.76            -1.1       13.62        perf-profile.calltrace.cycles-pp.ip_send_skb.udp_send_skb.udp_sendmsg.__sys_sendto.__x64_sys_sendto
     13.76            -1.1       12.70        perf-profile.calltrace.cycles-pp.ip_finish_output2.ip_send_skb.udp_send_skb.udp_sendmsg.__sys_sendto
     13.10            -1.0       12.09        perf-profile.calltrace.cycles-pp.__dev_queue_xmit.ip_finish_output2.ip_send_skb.udp_send_skb.udp_sendmsg
     10.87            -0.7       10.16        perf-profile.calltrace.cycles-pp.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.ip_send_skb.udp_send_skb
     10.78            -0.7       10.07        perf-profile.calltrace.cycles-pp.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.ip_send_skb
     10.64            -0.7        9.94        perf-profile.calltrace.cycles-pp.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2
     34.84            -0.6       34.20        perf-profile.calltrace.cycles-pp.ip_generic_getfrag.__ip_append_data.ip_make_skb.udp_sendmsg.__sys_sendto
      4.83 ±  4%      -0.6        4.19        perf-profile.calltrace.cycles-pp.__ip_make_skb.ip_make_skb.udp_sendmsg.__sys_sendto.__x64_sys_sendto
      9.72            -0.6        9.10        perf-profile.calltrace.cycles-pp.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit
      4.34 ±  4%      -0.6        3.73        perf-profile.calltrace.cycles-pp.__ip_select_ident.__ip_make_skb.ip_make_skb.udp_sendmsg.__sys_sendto
      9.41            -0.6        8.82        perf-profile.calltrace.cycles-pp.__napi_poll.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip
      9.33            -0.6        8.74        perf-profile.calltrace.cycles-pp.process_backlog.__napi_poll.net_rx_action.__do_softirq.do_softirq
     33.97            -0.6       33.42        perf-profile.calltrace.cycles-pp._copy_from_iter.ip_generic_getfrag.__ip_append_data.ip_make_skb.udp_sendmsg
      8.70            -0.5        8.17        perf-profile.calltrace.cycles-pp.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_action.__do_softirq
      7.36            -0.5        6.84 ±  2%  perf-profile.calltrace.cycles-pp.ip_local_deliver_finish.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_action
      7.29            -0.5        6.78 ±  2%  perf-profile.calltrace.cycles-pp.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core.process_backlog.__napi_poll
      7.02            -0.5        6.54 ±  2%  perf-profile.calltrace.cycles-pp.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core.process_backlog
      5.91            -0.4        5.56 ±  2%  perf-profile.calltrace.cycles-pp.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core
      5.81            -0.3        5.47 ±  2%  perf-profile.calltrace.cycles-pp.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish
      0.55            -0.3        0.25 ±100%  perf-profile.calltrace.cycles-pp.irqtime_account_irq.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit
      1.42            -0.2        1.22        perf-profile.calltrace.cycles-pp.dev_hard_start_xmit.__dev_queue_xmit.ip_finish_output2.ip_send_skb.udp_send_skb
      1.26            -0.2        1.08        perf-profile.calltrace.cycles-pp.loopback_xmit.dev_hard_start_xmit.__dev_queue_xmit.ip_finish_output2.ip_send_skb
      1.49            -0.2        1.31        perf-profile.calltrace.cycles-pp.kfree_skb_reason.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu
      1.45            -0.2        1.27        perf-profile.calltrace.cycles-pp.skb_release_data.kfree_skb_reason.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv
      1.94 ±  2%      -0.2        1.77        perf-profile.calltrace.cycles-pp.ip_route_output_flow.udp_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64
      1.54            -0.2        1.38        perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key
      1.38 ±  3%      -0.1        1.26        perf-profile.calltrace.cycles-pp.ip_route_output_key_hash_rcu.ip_route_output_flow.udp_sendmsg.__sys_sendto.__x64_sys_sendto
      1.22 ±  3%      -0.1        1.12        perf-profile.calltrace.cycles-pp.fib_table_lookup.ip_route_output_key_hash_rcu.ip_route_output_flow.udp_sendmsg.__sys_sendto
      1.72            -0.1        1.63        perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.__ip_append_data.ip_make_skb.udp_sendmsg.__sys_sendto
      0.71            -0.1        0.62        perf-profile.calltrace.cycles-pp.__udp4_lib_lookup.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core
      1.35            -0.1        1.26        perf-profile.calltrace.cycles-pp.alloc_skb_with_frags.sock_alloc_send_pskb.__ip_append_data.ip_make_skb.udp_sendmsg
      0.66            -0.1        0.59        perf-profile.calltrace.cycles-pp.__check_object_size.ip_generic_getfrag.__ip_append_data.ip_make_skb.udp_sendmsg
      0.79            -0.1        0.71        perf-profile.calltrace.cycles-pp.free_unref_page.skb_release_data.kfree_skb_reason.udp_queue_rcv_one_skb.udp_unicast_rcv_skb
      1.26            -0.1        1.19        perf-profile.calltrace.cycles-pp.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.__ip_append_data.ip_make_skb
      0.62            -0.1        0.57        perf-profile.calltrace.cycles-pp.move_addr_to_kernel.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.14            -0.0        1.11        perf-profile.calltrace.cycles-pp.recvfrom
      0.54            -0.0        0.52        perf-profile.calltrace.cycles-pp.sockfd_lookup_light.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.05            +0.1        3.13        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto.send_omni_inner
      0.91 ± 17%      +0.2        1.07        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvfrom.recv_omni
      2.12 ±  2%      +0.3        2.41        perf-profile.calltrace.cycles-pp.skb_page_frag_refill.sk_page_frag_refill.__ip_append_data.ip_make_skb.udp_sendmsg
      2.17 ±  2%      +0.3        2.46        perf-profile.calltrace.cycles-pp.sk_page_frag_refill.__ip_append_data.ip_make_skb.udp_sendmsg.__sys_sendto
      1.65 ±  3%      +0.3        1.95        perf-profile.calltrace.cycles-pp.alloc_pages_mpol.skb_page_frag_refill.sk_page_frag_refill.__ip_append_data.ip_make_skb
      0.59            +0.3        0.89        perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages.alloc_pages_mpol.skb_page_frag_refill
      1.50 ±  3%      +0.3        1.81        perf-profile.calltrace.cycles-pp.__alloc_pages.alloc_pages_mpol.skb_page_frag_refill.sk_page_frag_refill.__ip_append_data
      1.19 ±  3%      +0.3        1.52        perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.alloc_pages_mpol.skb_page_frag_refill.sk_page_frag_refill
      0.09 ±223%      +0.4        0.53 ±  4%  perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.schedule_timeout
      0.00            +0.6        0.55        perf-profile.calltrace.cycles-pp.free_unref_page_commit.free_unref_page.skb_release_data.__consume_stateless_skb.udp_recvmsg
      0.82 ± 19%      +0.6        1.40        perf-profile.calltrace.cycles-pp.skb_release_data.__consume_stateless_skb.udp_recvmsg.inet_recvmsg.sock_recvmsg
      0.84 ± 19%      +0.6        1.40        perf-profile.calltrace.cycles-pp.__consume_stateless_skb.udp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom
      0.39 ± 70%      +0.6        1.01        perf-profile.calltrace.cycles-pp.free_unref_page.skb_release_data.__consume_stateless_skb.udp_recvmsg.inet_recvmsg
     14.08            +2.3       16.40        perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg
     14.82            +2.4       17.18        perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom
     14.78            +2.4       17.16        perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg.sock_recvmsg
     21.41            +2.8       24.22        perf-profile.calltrace.cycles-pp.inet_recvmsg.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64
     21.60            +2.8       24.42        perf-profile.calltrace.cycles-pp.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
     21.33            +2.8       24.14        perf-profile.calltrace.cycles-pp.udp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom
     22.52            +2.8       25.36        perf-profile.calltrace.cycles-pp.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvfrom
     21.96            +2.9       24.81        perf-profile.calltrace.cycles-pp.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvfrom.recv_omni
     23.14            +2.9       26.02        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.recvfrom.recv_omni.process_requests.spawn_child
     23.11            +2.9       25.99        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvfrom.recv_omni.process_requests
     23.45            +2.9       26.33        perf-profile.calltrace.cycles-pp.recvfrom.recv_omni.process_requests.spawn_child.accept_connection
     23.94            +2.9       26.87        perf-profile.calltrace.cycles-pp.recv_omni.process_requests.spawn_child.accept_connection.accept_connections
     21.17 ± 18%      +5.7       26.88        perf-profile.calltrace.cycles-pp.accept_connection.accept_connections.main
     21.17 ± 18%      +5.7       26.88        perf-profile.calltrace.cycles-pp.accept_connections.main
     21.17 ± 18%      +5.7       26.88        perf-profile.calltrace.cycles-pp.process_requests.spawn_child.accept_connection.accept_connections.main
     21.17 ± 18%      +5.7       26.88        perf-profile.calltrace.cycles-pp.spawn_child.accept_connection.accept_connections.main
     73.35            -3.0       70.38        perf-profile.children.cycles-pp.send_udp_stream
     73.55            -2.9       70.64        perf-profile.children.cycles-pp.sendto
     73.42            -2.9       70.53        perf-profile.children.cycles-pp.send_omni_inner
     67.53            -2.9       64.66        perf-profile.children.cycles-pp.__sys_sendto
     67.74            -2.9       64.88        perf-profile.children.cycles-pp.__x64_sys_sendto
     64.82            -2.7       62.13        perf-profile.children.cycles-pp.udp_sendmsg
     46.18            -1.2       44.97        perf-profile.children.cycles-pp.ip_make_skb
     15.36            -1.2       14.17        perf-profile.children.cycles-pp.udp_send_skb
     14.91            -1.2       13.76        perf-profile.children.cycles-pp.ip_send_skb
     13.90            -1.1       12.82        perf-profile.children.cycles-pp.ip_finish_output2
     13.25            -1.0       12.23        perf-profile.children.cycles-pp.__dev_queue_xmit
     11.02            -0.7       10.29        perf-profile.children.cycles-pp.__local_bh_enable_ip
     10.90            -0.7       10.18        perf-profile.children.cycles-pp.do_softirq
     10.77            -0.7       10.08        perf-profile.children.cycles-pp.__do_softirq
     35.12            -0.7       34.47        perf-profile.children.cycles-pp.ip_generic_getfrag
      4.90 ±  4%      -0.6        4.25        perf-profile.children.cycles-pp.__ip_make_skb
      9.82            -0.6        9.19        perf-profile.children.cycles-pp.net_rx_action
      4.39 ±  4%      -0.6        3.77        perf-profile.children.cycles-pp.__ip_select_ident
      9.51            -0.6        8.90        perf-profile.children.cycles-pp.__napi_poll
      9.43            -0.6        8.84        perf-profile.children.cycles-pp.process_backlog
     34.24            -0.6       33.68        perf-profile.children.cycles-pp._copy_from_iter
     40.92            -0.5       40.38        perf-profile.children.cycles-pp.__ip_append_data
      8.79            -0.5        8.26        perf-profile.children.cycles-pp.__netif_receive_skb_one_core
      7.44            -0.5        6.91 ±  2%  perf-profile.children.cycles-pp.ip_local_deliver_finish
      7.37            -0.5        6.86 ±  2%  perf-profile.children.cycles-pp.ip_protocol_deliver_rcu
      7.11            -0.5        6.62 ±  2%  perf-profile.children.cycles-pp.__udp4_lib_rcv
      5.97            -0.4        5.61 ±  2%  perf-profile.children.cycles-pp.udp_unicast_rcv_skb
      5.91            -0.4        5.55 ±  2%  perf-profile.children.cycles-pp.udp_queue_rcv_one_skb
      1.45            -0.2        1.25        perf-profile.children.cycles-pp.dev_hard_start_xmit
      1.32            -0.2        1.13        perf-profile.children.cycles-pp.loopback_xmit
      1.52            -0.2        1.33        perf-profile.children.cycles-pp.kfree_skb_reason
      1.97 ±  2%      -0.2        1.80        perf-profile.children.cycles-pp.ip_route_output_flow
      1.56            -0.2        1.40        perf-profile.children.cycles-pp.ttwu_do_activate
      0.27            -0.1        0.15        perf-profile.children.cycles-pp.wakeup_preempt
      1.40 ±  3%      -0.1        1.28        perf-profile.children.cycles-pp.ip_route_output_key_hash_rcu
      0.18            -0.1        0.06 ±  6%  perf-profile.children.cycles-pp.check_preempt_wakeup_fair
      1.24 ±  3%      -0.1        1.14        perf-profile.children.cycles-pp.fib_table_lookup
      0.74            -0.1        0.64        perf-profile.children.cycles-pp.__udp4_lib_lookup
      1.75            -0.1        1.66        perf-profile.children.cycles-pp.sock_alloc_send_pskb
      1.38            -0.1        1.29        perf-profile.children.cycles-pp.alloc_skb_with_frags
      1.30            -0.1        1.22        perf-profile.children.cycles-pp.__alloc_skb
      0.36            -0.1        0.29        perf-profile.children.cycles-pp.sock_wfree
      0.53 ±  2%      -0.1        0.46        perf-profile.children.cycles-pp.__netif_rx
      1.22            -0.1        1.14        perf-profile.children.cycles-pp.__check_object_size
      0.50            -0.1        0.43        perf-profile.children.cycles-pp.netif_rx_internal
      0.47 ±  2%      -0.1        0.40        perf-profile.children.cycles-pp.enqueue_to_backlog
      0.51            -0.1        0.44 ±  2%  perf-profile.children.cycles-pp.udp4_lib_lookup2
      0.32 ±  2%      -0.1        0.26        perf-profile.children.cycles-pp.pick_eevdf
      0.65            -0.1        0.59        perf-profile.children.cycles-pp.move_addr_to_kernel
      0.59            -0.1        0.53        perf-profile.children.cycles-pp.irqtime_account_irq
      0.83            -0.1        0.78        perf-profile.children.cycles-pp.kmem_cache_alloc_node
      0.56            -0.1        0.50        perf-profile.children.cycles-pp.sched_clock_cpu
      0.35            -0.0        0.30        perf-profile.children.cycles-pp.validate_xmit_skb
      0.48            -0.0        0.44        perf-profile.children.cycles-pp.sched_clock
      0.40            -0.0        0.36        perf-profile.children.cycles-pp._raw_spin_trylock
      0.40            -0.0        0.35        perf-profile.children.cycles-pp.reweight_entity
      0.46            -0.0        0.42        perf-profile.children.cycles-pp._copy_from_user
      0.94            -0.0        0.90        perf-profile.children.cycles-pp.update_load_avg
      1.11            -0.0        1.07        perf-profile.children.cycles-pp.switch_mm_irqs_off
      0.36            -0.0        0.32        perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.40 ±  2%      -0.0        0.36        perf-profile.children.cycles-pp.kmalloc_reserve
      0.43            -0.0        0.39        perf-profile.children.cycles-pp.native_sched_clock
      0.65            -0.0        0.61        perf-profile.children.cycles-pp.kmem_cache_free
      1.27            -0.0        1.23 ±  2%  perf-profile.children.cycles-pp.activate_task
      1.23            -0.0        1.20 ±  2%  perf-profile.children.cycles-pp.enqueue_task_fair
      0.48            -0.0        0.45        perf-profile.children.cycles-pp.__virt_addr_valid
      0.78            -0.0        0.75        perf-profile.children.cycles-pp.check_heap_object
      0.17 ±  2%      -0.0        0.14 ±  2%  perf-profile.children.cycles-pp.destroy_large_folio
      0.38            -0.0        0.35        perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.33            -0.0        0.30        perf-profile.children.cycles-pp.__cond_resched
      0.12            -0.0        0.09 ±  4%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge
      0.36            -0.0        0.33        perf-profile.children.cycles-pp.__mkroute_output
      0.30            -0.0        0.27        perf-profile.children.cycles-pp.ip_output
      0.56            -0.0        0.53        perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.24            -0.0        0.21 ±  2%  perf-profile.children.cycles-pp.ip_setup_cork
      0.18            -0.0        0.16 ±  3%  perf-profile.children.cycles-pp.ipv4_pktinfo_prepare
      0.27            -0.0        0.24        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.18 ±  2%      -0.0        0.16        perf-profile.children.cycles-pp.netif_skb_features
      0.27            -0.0        0.25 ±  2%  perf-profile.children.cycles-pp.get_pfnblock_flags_mask
      0.51            -0.0        0.49        perf-profile.children.cycles-pp.__netif_receive_skb_core
      0.26            -0.0        0.24 ±  2%  perf-profile.children.cycles-pp.__ip_local_out
      0.17 ±  2%      -0.0        0.15 ±  2%  perf-profile.children.cycles-pp.dst_release
      0.15            -0.0        0.13        perf-profile.children.cycles-pp.update_curr_se
      0.12            -0.0        0.10        perf-profile.children.cycles-pp.rcu_all_qs
      0.10 ±  5%      -0.0        0.08 ±  6%  perf-profile.children.cycles-pp.vruntime_eligible
      0.13 ±  4%      -0.0        0.11 ±  6%  perf-profile.children.cycles-pp.security_sock_rcv_skb
      0.29 ±  3%      -0.0        0.27        perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
      0.15            -0.0        0.13 ±  2%  perf-profile.children.cycles-pp.ip_send_check
      0.19 ±  2%      -0.0        0.18 ±  2%  perf-profile.children.cycles-pp.siphash_3u32
      0.21 ±  3%      -0.0        0.19        perf-profile.children.cycles-pp.sk_filter_trim_cap
      0.14 ±  3%      -0.0        0.13 ±  3%  perf-profile.children.cycles-pp.__folio_put
      0.20 ±  2%      -0.0        0.18        perf-profile.children.cycles-pp.udp4_csum_init
      0.21 ±  2%      -0.0        0.19 ±  2%  perf-profile.children.cycles-pp.ipv4_mtu
      0.25 ±  2%      -0.0        0.23        perf-profile.children.cycles-pp.skb_set_owner_w
      0.26            -0.0        0.24        perf-profile.children.cycles-pp.rseq_update_cpu_node_id
      0.14 ±  2%      -0.0        0.13 ±  3%  perf-profile.children.cycles-pp.__ip_finish_output
      0.30            -0.0        0.29        perf-profile.children.cycles-pp.__update_load_avg_se
      0.15 ±  2%      -0.0        0.13 ±  3%  perf-profile.children.cycles-pp.avg_vruntime
      0.08            -0.0        0.07 ±  7%  perf-profile.children.cycles-pp.skb_network_protocol
      0.14            -0.0        0.13 ±  2%  perf-profile.children.cycles-pp.check_stack_object
      0.13 ±  2%      -0.0        0.12        perf-profile.children.cycles-pp.xfrm_lookup_route
      0.14            -0.0        0.13        perf-profile.children.cycles-pp.__put_user_8
      0.09            -0.0        0.08        perf-profile.children.cycles-pp.nf_hook_slow
      0.09            -0.0        0.08        perf-profile.children.cycles-pp.raw_v4_input
      0.06            -0.0        0.05        perf-profile.children.cycles-pp.validate_xmit_xfrm
      0.11            -0.0        0.10        perf-profile.children.cycles-pp.xfrm_lookup_with_ifid
      0.11            +0.0        0.12        perf-profile.children.cycles-pp.security_socket_recvmsg
      0.06            +0.0        0.08 ±  6%  perf-profile.children.cycles-pp.demo_interval_tick
      0.08 ±  4%      +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.__build_skb_around
      0.07 ±  5%      +0.0        0.09        perf-profile.children.cycles-pp.should_failslab
      0.06            +0.0        0.08 ±  4%  perf-profile.children.cycles-pp.skb_clone_tx_timestamp
      0.37            +0.0        0.39        perf-profile.children.cycles-pp.simple_copy_to_iter
      0.06 ±  7%      +0.0        0.09 ±  4%  perf-profile.children.cycles-pp.task_work_run
      0.06 ±  6%      +0.0        0.09        perf-profile.children.cycles-pp.task_mm_cid_work
      0.21 ±  2%      +0.0        0.24 ±  5%  perf-profile.children.cycles-pp.recv_data
      0.69            +0.0        0.73 ±  2%  perf-profile.children.cycles-pp.ip_rcv
      0.25            +0.0        0.30 ±  2%  perf-profile.children.cycles-pp.ip_rcv_core
      0.06 ± 13%      +0.1        0.11 ±  3%  perf-profile.children.cycles-pp.__free_one_page
      0.85            +0.1        0.91        perf-profile.children.cycles-pp.switch_fpu_return
      0.71            +0.1        0.78 ±  2%  perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
      0.28 ±  5%      +0.1        0.36 ±  2%  perf-profile.children.cycles-pp.update_process_times
      0.31 ±  5%      +0.1        0.40 ±  2%  perf-profile.children.cycles-pp.tick_nohz_handler
      4.78            +0.1        4.88        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.34 ±  4%      +0.1        0.44 ±  2%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.43 ±  6%      +0.1        0.56 ±  5%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.46 ±  6%      +0.1        0.59 ±  5%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.49 ±  5%      +0.2        0.64 ±  5%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.58 ±  5%      +0.2        0.75 ±  4%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      2.21 ±  2%      +0.3        2.50        perf-profile.children.cycles-pp.sk_page_frag_refill
      2.41            +0.3        2.69        perf-profile.children.cycles-pp.skb_release_data
      2.15 ±  2%      +0.3        2.44        perf-profile.children.cycles-pp.skb_page_frag_refill
      1.70 ±  2%      +0.3        1.99        perf-profile.children.cycles-pp.alloc_pages_mpol
      0.62 ±  2%      +0.3        0.92 ±  2%  perf-profile.children.cycles-pp.rmqueue
      1.54 ±  2%      +0.3        1.85        perf-profile.children.cycles-pp.__alloc_pages
      1.23 ±  3%      +0.3        1.56        perf-profile.children.cycles-pp.get_page_from_freelist
      0.08 ± 12%      +0.3        0.41 ±  3%  perf-profile.children.cycles-pp.rmqueue_bulk
      0.18 ±  4%      +0.3        0.52 ±  2%  perf-profile.children.cycles-pp.__rmqueue_pcplist
      1.40 ±  2%      +0.4        1.75        perf-profile.children.cycles-pp.free_unref_page
      0.09 ± 11%      +0.4        0.47 ±  2%  perf-profile.children.cycles-pp.free_pcppages_bulk
      0.27 ±  7%      +0.4        0.65        perf-profile.children.cycles-pp.free_unref_page_commit
      0.94 ±  2%      +0.5        1.41        perf-profile.children.cycles-pp.__consume_stateless_skb
      0.52            +0.6        1.10        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.06 ±  9%      +0.6        0.66 ±  3%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     14.10            +2.3       16.43        perf-profile.children.cycles-pp._copy_to_iter
     14.80            +2.4       17.17        perf-profile.children.cycles-pp.__skb_datagram_iter
     14.82            +2.4       17.19        perf-profile.children.cycles-pp.skb_copy_datagram_iter
     21.42            +2.8       24.24        perf-profile.children.cycles-pp.inet_recvmsg
     21.61            +2.8       24.43        perf-profile.children.cycles-pp.sock_recvmsg
     21.34            +2.8       24.16        perf-profile.children.cycles-pp.udp_recvmsg
     22.59            +2.8       25.43        perf-profile.children.cycles-pp.__x64_sys_recvfrom
     22.53            +2.8       25.37        perf-profile.children.cycles-pp.__sys_recvfrom
     24.70            +2.9       27.55        perf-profile.children.cycles-pp.recvfrom
     23.94            +2.9       26.88        perf-profile.children.cycles-pp.accept_connection
     23.94            +2.9       26.88        perf-profile.children.cycles-pp.accept_connections
     23.94            +2.9       26.88        perf-profile.children.cycles-pp.process_requests
     23.94            +2.9       26.88        perf-profile.children.cycles-pp.spawn_child
     23.94            +2.9       26.88        perf-profile.children.cycles-pp.recv_omni
     34.03            -0.6       33.41        perf-profile.self.cycles-pp._copy_from_iter
      4.17 ±  5%      -0.6        3.56 ±  2%  perf-profile.self.cycles-pp.__ip_select_ident
      0.87            -0.1        0.77        perf-profile.self.cycles-pp.__sys_sendto
      1.14            -0.1        1.04        perf-profile.self.cycles-pp.udp_sendmsg
      0.92 ±  3%      -0.1        0.84 ±  2%  perf-profile.self.cycles-pp.fib_table_lookup
      0.36            -0.1        0.28        perf-profile.self.cycles-pp.sock_wfree
      1.80            -0.1        1.72        perf-profile.self.cycles-pp.__ip_append_data
      0.31            -0.1        0.26        perf-profile.self.cycles-pp.loopback_xmit
      0.63            -0.1        0.58 ±  2%  perf-profile.self.cycles-pp.kmem_cache_alloc_node
      0.30            -0.0        0.26        perf-profile.self.cycles-pp.udp4_lib_lookup2
      0.46            -0.0        0.41        perf-profile.self.cycles-pp.do_syscall_64
      0.58            -0.0        0.54        perf-profile.self.cycles-pp.ip_finish_output2
      0.38            -0.0        0.34        perf-profile.self.cycles-pp._raw_spin_trylock
      0.45            -0.0        0.41        perf-profile.self.cycles-pp._copy_from_user
      0.37            -0.0        0.33        perf-profile.self.cycles-pp.udp_send_skb
      0.34            -0.0        0.30 ±  2%  perf-profile.self.cycles-pp.__alloc_skb
      0.39 ±  3%      -0.0        0.36        perf-profile.self.cycles-pp.__dev_queue_xmit
      0.35            -0.0        0.31        perf-profile.self.cycles-pp._raw_spin_lock_irq
      0.42            -0.0        0.38        perf-profile.self.cycles-pp.native_sched_clock
      1.09            -0.0        1.06        perf-profile.self.cycles-pp.switch_mm_irqs_off
      0.46 ±  2%      -0.0        0.43        perf-profile.self.cycles-pp.__virt_addr_valid
      0.32            -0.0        0.29        perf-profile.self.cycles-pp.__mkroute_output
      0.23 ±  4%      -0.0        0.20 ±  2%  perf-profile.self.cycles-pp.pick_eevdf
      0.23 ±  2%      -0.0        0.20 ±  2%  perf-profile.self.cycles-pp.__udp4_lib_lookup
      0.46            -0.0        0.43        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.21            -0.0        0.18 ±  2%  perf-profile.self.cycles-pp.ip_route_output_flow
      0.06            -0.0        0.03 ± 70%  perf-profile.self.cycles-pp.check_preempt_wakeup_fair
      0.26            -0.0        0.23 ±  2%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.29            -0.0        0.26        perf-profile.self.cycles-pp.net_rx_action
      0.11 ±  4%      -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.__mem_cgroup_uncharge
      0.29 ±  2%      -0.0        0.26        perf-profile.self.cycles-pp.__check_object_size
      0.50            -0.0        0.48        perf-profile.self.cycles-pp.__netif_receive_skb_core
      0.28            -0.0        0.26 ±  2%  perf-profile.self.cycles-pp.process_backlog
      0.22 ±  2%      -0.0        0.19        perf-profile.self.cycles-pp.ip_output
      0.35            -0.0        0.33        perf-profile.self.cycles-pp.__ip_make_skb
      0.38            -0.0        0.36        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.26            -0.0        0.24 ±  2%  perf-profile.self.cycles-pp.get_pfnblock_flags_mask
      0.47            -0.0        0.45        perf-profile.self.cycles-pp.kmem_cache_free
      0.17 ±  2%      -0.0        0.15 ±  3%  perf-profile.self.cycles-pp.reweight_entity
      0.22            -0.0        0.20        perf-profile.self.cycles-pp.__udp4_lib_rcv
      0.14            -0.0        0.12        perf-profile.self.cycles-pp.validate_xmit_skb
      0.15            -0.0        0.13        perf-profile.self.cycles-pp.dst_release
      0.36            -0.0        0.34        perf-profile.self.cycles-pp.__do_softirq
      0.10 ±  3%      -0.0        0.08        perf-profile.self.cycles-pp.rcu_all_qs
      0.08            -0.0        0.06 ±  6%  perf-profile.self.cycles-pp.vruntime_eligible
      0.24 ±  2%      -0.0        0.23 ±  4%  perf-profile.self.cycles-pp.__udp_enqueue_schedule_skb
      0.16 ±  4%      -0.0        0.14 ±  3%  perf-profile.self.cycles-pp.enqueue_to_backlog
      0.19            -0.0        0.17        perf-profile.self.cycles-pp.siphash_3u32
      0.25            -0.0        0.24        perf-profile.self.cycles-pp.rseq_update_cpu_node_id
      0.14            -0.0        0.12 ±  3%  perf-profile.self.cycles-pp.ip_send_check
      0.14            -0.0        0.12 ±  3%  perf-profile.self.cycles-pp.ip_setup_cork
      0.28            -0.0        0.26        perf-profile.self.cycles-pp.__update_load_avg_se
      0.16            -0.0        0.14 ±  3%  perf-profile.self.cycles-pp.ip_route_output_key_hash_rcu
      0.19            -0.0        0.17 ±  2%  perf-profile.self.cycles-pp.irqtime_account_irq
      0.19            -0.0        0.17 ±  2%  perf-profile.self.cycles-pp.udp4_csum_init
      0.17            -0.0        0.16 ±  3%  perf-profile.self.cycles-pp.ip_generic_getfrag
      0.23 ±  2%      -0.0        0.21 ±  3%  perf-profile.self.cycles-pp.ip_send_skb
      0.13            -0.0        0.12 ±  4%  perf-profile.self.cycles-pp.update_curr_se
      0.12            -0.0        0.10 ±  4%  perf-profile.self.cycles-pp.__netif_receive_skb_one_core
      0.08 ±  8%      -0.0        0.06 ±  7%  perf-profile.self.cycles-pp.__sk_mem_raise_allocated
      0.11 ±  3%      -0.0        0.09 ±  5%  perf-profile.self.cycles-pp.security_sock_rcv_skb
      0.06 ±  7%      -0.0        0.05        perf-profile.self.cycles-pp.ip_local_deliver_finish
      0.20            -0.0        0.19 ±  2%  perf-profile.self.cycles-pp.__cond_resched
      0.20 ±  3%      -0.0        0.18 ±  2%  perf-profile.self.cycles-pp.ipv4_mtu
      0.30            -0.0        0.28 ±  2%  perf-profile.self.cycles-pp.__alloc_pages
      0.14 ±  2%      -0.0        0.13 ±  2%  perf-profile.self.cycles-pp.do_softirq
      0.07 ±  5%      -0.0        0.06        perf-profile.self.cycles-pp.skb_network_protocol
      0.13            -0.0        0.12        perf-profile.self.cycles-pp.__wrgsbase_inactive
      0.13            -0.0        0.12        perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.14            -0.0        0.13        perf-profile.self.cycles-pp.ip_make_skb
      0.13            -0.0        0.12        perf-profile.self.cycles-pp.move_addr_to_kernel
      0.06            -0.0        0.05        perf-profile.self.cycles-pp.__ip_finish_output
      0.14 ±  2%      +0.0        0.15 ±  2%  perf-profile.self.cycles-pp.enqueue_task_fair
      0.12 ±  3%      +0.0        0.14 ±  5%  perf-profile.self.cycles-pp.recvfrom
      0.08 ±  6%      +0.0        0.09        perf-profile.self.cycles-pp.__build_skb_around
      0.28            +0.0        0.30        perf-profile.self.cycles-pp.udp_recvmsg
      0.06 ±  6%      +0.0        0.08        perf-profile.self.cycles-pp.should_failslab
      0.05            +0.0        0.07 ±  8%  perf-profile.self.cycles-pp.demo_interval_tick
      0.20            +0.0        0.22 ±  2%  perf-profile.self.cycles-pp.__skb_datagram_iter
      0.26            +0.0        0.28        perf-profile.self.cycles-pp.prepare_task_switch
      0.06            +0.0        0.08 ±  5%  perf-profile.self.cycles-pp.task_mm_cid_work
      0.21 ±  3%      +0.0        0.24 ±  4%  perf-profile.self.cycles-pp.recv_omni
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.update_rq_clock
      0.25            +0.1        0.30 ±  2%  perf-profile.self.cycles-pp.ip_rcv_core
      0.00            +0.1        0.06 ±  6%  perf-profile.self.cycles-pp.skb_clone_tx_timestamp
      0.04 ± 71%      +0.1        0.10        perf-profile.self.cycles-pp.__free_one_page
      0.01 ±223%      +0.1        0.08 ±  6%  perf-profile.self.cycles-pp.rmqueue_bulk
      0.71            +0.1        0.78        perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
      0.10 ±  3%      +0.1        0.18 ±  3%  perf-profile.self.cycles-pp.select_task_rq_fair
      0.06 ±  9%      +0.6        0.66 ±  3%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
     14.03            +2.3       16.35        perf-profile.self.cycles-pp._copy_to_iter


***************************************************************************************************
lkp-icl-2sp8: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-13/performance/1HDD/ext4/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/fstat/stress-ng/60s

commit: 
  97450eb909 ("sched/pelt: Remove shift of thermal clock")
  e2bbd1c498 ("sched/fair: Reschedule the cfs_rq when current is ineligible")

97450eb909658573 e2bbd1c498980c5cb68f9973f41 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      5.96 ±  2%     +22.6%       7.30 ±  3%  iostat.cpu.user
      6.10 ±  2%      +1.4        7.50 ±  3%  mpstat.cpu.all.usr%
    728616           +22.2%     890173        vmstat.system.cs
   1448236 ±  4%     -81.4%     268871 ± 17%  meminfo.Active
   1441121 ±  4%     -81.8%     261809 ± 18%  meminfo.Active(anon)
   6834186           +13.2%    7738594        meminfo.Inactive
   6821947           +13.3%    7726478        meminfo.Inactive(anon)
    152537 ± 31%     -94.5%       8342 ± 61%  numa-meminfo.node0.Active
    150754 ± 32%     -95.4%       6931 ± 78%  numa-meminfo.node0.Active(anon)
   1297193 ±  4%     -79.9%     261111 ± 16%  numa-meminfo.node1.Active
   1291857 ±  4%     -80.2%     255458 ± 16%  numa-meminfo.node1.Active(anon)
     37730 ± 32%     -95.4%       1725 ± 77%  numa-vmstat.node0.nr_active_anon
     37730 ± 32%     -95.4%       1725 ± 77%  numa-vmstat.node0.nr_zone_active_anon
    323570 ±  4%     -80.5%      63052 ± 16%  numa-vmstat.node1.nr_active_anon
    323570 ±  4%     -80.5%      63052 ± 16%  numa-vmstat.node1.nr_zone_active_anon
   4980068            -3.9%    4786411        stress-ng.fstat.ops
     83000            -3.9%      79773        stress-ng.fstat.ops_per_sec
  12565616           +88.8%   23722089        stress-ng.time.involuntary_context_switches
      4457            +8.9%       4855        stress-ng.time.percent_of_cpu_this_job_got
      2494            +7.3%       2678        stress-ng.time.system_time
    183.10 ±  2%     +30.7%     239.27 ±  2%  stress-ng.time.user_time
   7738050            -1.3%    7637067        stress-ng.time.voluntary_context_switches
   1011067           +10.6%    1118668        sched_debug.cfs_rq:/.avg_vruntime.min
     63124 ±  2%     +24.2%      78376 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.stddev
   1011067           +10.6%    1118668        sched_debug.cfs_rq:/.min_vruntime.min
     63124 ±  2%     +24.2%      78376 ±  3%  sched_debug.cfs_rq:/.min_vruntime.stddev
    779551 ±  5%     -16.6%     649990 ± 14%  sched_debug.cpu.curr->pid.avg
   1406043 ±  2%     -27.2%    1023836        sched_debug.cpu.curr->pid.max
    695149 ±  3%     -30.9%     480606 ±  6%  sched_debug.cpu.curr->pid.stddev
    356539           +22.0%     435017        sched_debug.cpu.nr_switches.avg
    375244           +21.5%     455819        sched_debug.cpu.nr_switches.max
    239983           +21.3%     290992        sched_debug.cpu.nr_switches.min
     17526 ±  2%     +25.0%      21908 ±  3%  sched_debug.cpu.nr_switches.stddev
    360657 ±  4%     -81.8%      65651 ± 18%  proc-vmstat.nr_active_anon
   2656885            -2.5%    2589142        proc-vmstat.nr_file_pages
   1706585           +13.3%    1933352        proc-vmstat.nr_inactive_anon
     20023            +1.3%      20285        proc-vmstat.nr_kernel_stack
   1863195            -3.6%    1795497        proc-vmstat.nr_shmem
    360657 ±  4%     -81.8%      65651 ± 18%  proc-vmstat.nr_zone_active_anon
   1706585           +13.3%    1933352        proc-vmstat.nr_zone_inactive_anon
  56763914            -6.2%   53230553        proc-vmstat.numa_hit
  56704746            -6.2%   53170292        proc-vmstat.numa_local
     57428 ±  2%     -50.0%      28742 ±  4%  proc-vmstat.pgactivate
  76049389            -7.0%   70719008        proc-vmstat.pgalloc_normal
  73054423            -7.1%   67838944        proc-vmstat.pgfree
      1.88           -18.2%       1.54        perf-stat.i.MPKI
 2.208e+10           +17.1%  2.586e+10        perf-stat.i.branch-instructions
      0.30            -0.0        0.27        perf-stat.i.branch-miss-rate%
  61030073            +4.7%   63912456        perf-stat.i.branch-misses
     25.57            -0.3       25.30        perf-stat.i.cache-miss-rate%
 2.211e+08            -2.8%  2.149e+08        perf-stat.i.cache-misses
 8.649e+08            -1.7%  8.499e+08        perf-stat.i.cache-references
    765417           +21.4%     928892        perf-stat.i.context-switches
      1.90           -16.2%       1.59        perf-stat.i.cpi
    168415           -15.6%     142157        perf-stat.i.cpu-migrations
      1008            +2.6%       1034        perf-stat.i.cycles-between-cache-misses
 1.177e+11           +18.5%  1.394e+11        perf-stat.i.instructions
      0.53           +18.7%       0.63        perf-stat.i.ipc
     14.67           +14.4%      16.78        perf-stat.i.metric.K/sec
      1.88           -18.0%       1.54        perf-stat.overall.MPKI
      0.28            -0.0        0.25        perf-stat.overall.branch-miss-rate%
     25.57            -0.3       25.30        perf-stat.overall.cache-miss-rate%
      1.89           -15.8%       1.59        perf-stat.overall.cpi
      1007            +2.6%       1033        perf-stat.overall.cycles-between-cache-misses
      0.53           +18.8%       0.63        perf-stat.overall.ipc
    724089 ±  6%     +17.5%     850512 ±  7%  perf-stat.ps.context-switches
    159312 ±  6%     -18.3%     130155 ±  7%  perf-stat.ps.cpu-migrations
 4.962e+12 ±  2%     +21.1%  6.007e+12 ±  2%  perf-stat.total.instructions
     60.50           -11.7       48.80        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
     60.48           -11.7       48.78        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
     33.86            -7.5       26.35        perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
     33.85            -7.5       26.34        perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
     29.48            -7.4       22.06        perf-profile.calltrace.cycles-pp.exit_notify.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
     25.34            -4.2       21.15        perf-profile.calltrace.cycles-pp.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe
     21.38            -4.2       17.18        perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe
     25.29            -4.2       21.10        perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe
     14.14            -4.1       10.07        perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.copy_process.kernel_clone.__do_sys_clone3.do_syscall_64
     13.70            -4.1        9.65        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.copy_process.kernel_clone.__do_sys_clone3
     13.86            -3.7       10.11        perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.exit_notify.do_exit.__x64_sys_exit.do_syscall_64
     13.42            -3.7        9.68        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.exit_notify.do_exit.__x64_sys_exit
     15.23            -3.6       11.59        perf-profile.calltrace.cycles-pp.release_task.exit_notify.do_exit.__x64_sys_exit.do_syscall_64
     13.21            -3.6        9.59        perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.release_task.exit_notify.do_exit.__x64_sys_exit
     12.73            -3.6        9.12        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.release_task.exit_notify.do_exit
      0.53            -0.3        0.25 ±100%  perf-profile.calltrace.cycles-pp.remove_vm_area.vfree.delayed_vfree_work.process_one_work.worker_thread
      1.20            -0.1        1.09        perf-profile.calltrace.cycles-pp.__schedule.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64
      2.66            -0.1        2.54        perf-profile.calltrace.cycles-pp.ret_from_fork_asm
      2.44            -0.1        2.32        perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
      1.22            -0.1        1.10        perf-profile.calltrace.cycles-pp.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.29            -0.1        2.19        perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
      0.72            -0.1        0.66        perf-profile.calltrace.cycles-pp.__schedule.schedule.futex_wait_queue.__futex_wait.futex_wait
      0.85            -0.1        0.78        perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.84            -0.1        0.78        perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.83            -0.1        0.76        perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64
      0.74            -0.1        0.67        perf-profile.calltrace.cycles-pp.futex_wait_queue.__futex_wait.futex_wait.do_futex.__x64_sys_futex
      0.73            -0.1        0.66        perf-profile.calltrace.cycles-pp.schedule.futex_wait_queue.__futex_wait.futex_wait.do_futex
      0.85            -0.1        0.79        perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.05            -0.1        0.99        perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.50            -0.0        1.45        perf-profile.calltrace.cycles-pp.alloc_pid.copy_process.kernel_clone.__do_sys_clone3.do_syscall_64
      1.17            -0.0        1.12        perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1.24            -0.0        1.19        perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.85            -0.0        0.81        perf-profile.calltrace.cycles-pp.delayed_vfree_work.process_one_work.worker_thread.kthread.ret_from_fork
      0.80            -0.0        0.76        perf-profile.calltrace.cycles-pp.vfree.delayed_vfree_work.process_one_work.worker_thread.kthread
      0.94            -0.0        0.90        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.alloc_pid.copy_process.kernel_clone
      1.16            -0.0        1.12        perf-profile.calltrace.cycles-pp.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
      0.94            -0.0        0.90        perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.12            -0.0        1.08        perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.alloc_pid.copy_process.kernel_clone.__do_sys_clone3
      1.15            -0.0        1.11        perf-profile.calltrace.cycles-pp.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread
      1.14            -0.0        1.10        perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn
      0.90            -0.0        0.87        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
      0.89            -0.0        0.86        perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      0.88            -0.0        0.86        perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      0.82            -0.0        0.79        perf-profile.calltrace.cycles-pp.__vmalloc_area_node.__vmalloc_node_range.alloc_thread_stack_node.dup_task_struct.copy_process
      0.89            -0.0        0.87        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      0.67            -0.0        0.65        perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.66            -0.0        0.64        perf-profile.calltrace.cycles-pp.__alloc_pages_bulk.__vmalloc_area_node.__vmalloc_node_range.alloc_thread_stack_node.dup_task_struct
      0.92            -0.0        0.89        perf-profile.calltrace.cycles-pp.__madvise
      2.11            +0.0        2.13        perf-profile.calltrace.cycles-pp.update_sg_wakeup_stats.sched_balance_find_dst_group.sched_balance_find_dst_cpu.select_task_rq_fair.wake_up_new_task
      0.61            +0.0        0.64        perf-profile.calltrace.cycles-pp.do_futex.mm_release.exit_mm.do_exit.__x64_sys_exit
      0.60            +0.0        0.64        perf-profile.calltrace.cycles-pp.futex_wake.do_futex.mm_release.exit_mm.do_exit
      0.64            +0.0        0.67        perf-profile.calltrace.cycles-pp.mm_release.exit_mm.do_exit.__x64_sys_exit.do_syscall_64
      0.78            +0.0        0.82        perf-profile.calltrace.cycles-pp.exit_mm.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.52            +0.2        0.70        perf-profile.calltrace.cycles-pp.kmem_cache_free.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
      0.53            +0.2        0.72        perf-profile.calltrace.cycles-pp.cp_statx.do_statx.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.53            +0.2        0.72        perf-profile.calltrace.cycles-pp.security_inode_getattr.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64
      0.55            +0.2        0.75        perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.strncpy_from_user.getname_flags.vfs_fstatat
      0.51            +0.2        0.72        perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.64 ±  3%      +0.2        0.86        perf-profile.calltrace.cycles-pp.shim_statx
      0.64            +0.3        0.89        perf-profile.calltrace.cycles-pp.complete_walk.path_lookupat.filename_lookup.vfs_statx.do_statx
      0.70            +0.3        0.95        perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.strncpy_from_user.getname_flags.__x64_sys_statx
      0.72            +0.3        0.99        perf-profile.calltrace.cycles-pp.kmem_cache_alloc.getname_flags.vfs_fstatat.__do_sys_newfstatat.do_syscall_64
      0.63            +0.3        0.90        perf-profile.calltrace.cycles-pp.dput.path_put.vfs_statx.vfs_fstatat.__do_sys_newfstatat
      0.43 ± 44%      +0.3        0.72        perf-profile.calltrace.cycles-pp.lockref_put_return.dput.path_put.vfs_statx.vfs_fstatat
      0.67            +0.3        0.96        perf-profile.calltrace.cycles-pp.path_put.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64
      0.67            +0.3        0.96        perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.42 ± 44%      +0.3        0.71        perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.83            +0.3        1.13        perf-profile.calltrace.cycles-pp.link_path_walk.path_lookupat.filename_lookup.vfs_statx.do_statx
      0.75            +0.3        1.08        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.75            +0.3        1.08        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.93            +0.3        1.28        perf-profile.calltrace.cycles-pp.kmem_cache_alloc.getname_flags.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.04            +0.4        1.40        perf-profile.calltrace.cycles-pp.cp_new_stat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
      0.87            +0.4        1.23        perf-profile.calltrace.cycles-pp.__sched_yield
      1.09            +0.4        1.48        perf-profile.calltrace.cycles-pp.__check_object_size.strncpy_from_user.getname_flags.vfs_fstatat.__do_sys_newfstatat
      1.42            +0.5        1.89        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.statx
      1.42            +0.5        1.90        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.fstatat64
      1.38            +0.5        1.88        perf-profile.calltrace.cycles-pp.__check_object_size.strncpy_from_user.getname_flags.__x64_sys_statx.do_syscall_64
      0.00            +0.5        0.54        perf-profile.calltrace.cycles-pp._copy_to_user.cp_new_stat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.5        0.54 ±  2%  perf-profile.calltrace.cycles-pp.lookup_fast.walk_component.path_lookupat.filename_lookup.vfs_statx
      1.38            +0.5        1.93        perf-profile.calltrace.cycles-pp.complete_walk.path_lookupat.filename_lookup.vfs_statx.vfs_fstatat
      0.00            +0.6        0.55        perf-profile.calltrace.cycles-pp.path_init.path_lookupat.filename_lookup.vfs_statx.do_statx
      0.00            +0.6        0.59        perf-profile.calltrace.cycles-pp.common_perm_cond.security_inode_getattr.vfs_statx.vfs_fstatat.__do_sys_newfstatat
      0.00            +0.6        0.60 ±  2%  perf-profile.calltrace.cycles-pp.path_init.path_lookupat.filename_lookup.vfs_statx.vfs_fstatat
      0.00            +0.6        0.65 ±  2%  perf-profile.calltrace.cycles-pp.walk_component.path_lookupat.filename_lookup.vfs_statx.vfs_fstatat
      1.84            +0.7        2.50        perf-profile.calltrace.cycles-pp.link_path_walk.path_lookupat.filename_lookup.vfs_statx.vfs_fstatat
      0.00            +0.7        0.67 ±  2%  perf-profile.calltrace.cycles-pp.inode_permission.link_path_walk.path_lookupat.filename_lookup.vfs_statx
      1.69            +0.7        2.37        perf-profile.calltrace.cycles-pp.__legitimize_path.try_to_unlazy.complete_walk.path_lookupat.filename_lookup
      2.09            +0.7        2.81        perf-profile.calltrace.cycles-pp.strncpy_from_user.getname_flags.vfs_fstatat.__do_sys_newfstatat.do_syscall_64
      1.90            +0.8        2.65        perf-profile.calltrace.cycles-pp.try_to_unlazy.complete_walk.path_lookupat.filename_lookup.vfs_statx
      2.30            +0.8        3.14        perf-profile.calltrace.cycles-pp.path_lookupat.filename_lookup.vfs_statx.do_statx.__x64_sys_statx
      2.63            +0.9        3.54        perf-profile.calltrace.cycles-pp.strncpy_from_user.getname_flags.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.95            +1.0        1.91        perf-profile.calltrace.cycles-pp.lockref_get_not_dead.__legitimize_path.try_to_unlazy.complete_walk.path_lookupat
      2.87            +1.0        3.92        perf-profile.calltrace.cycles-pp.filename_lookup.vfs_statx.do_statx.__x64_sys_statx.do_syscall_64
      3.50            +1.2        4.72        perf-profile.calltrace.cycles-pp.getname_flags.vfs_fstatat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.10            +1.4        5.54        perf-profile.calltrace.cycles-pp.getname_flags.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
      4.03            +1.5        5.52        perf-profile.calltrace.cycles-pp.vfs_statx.do_statx.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.67            +1.7        6.40        perf-profile.calltrace.cycles-pp.path_lookupat.filename_lookup.vfs_statx.vfs_fstatat.__do_sys_newfstatat
      5.24            +1.9        7.15        perf-profile.calltrace.cycles-pp.do_statx.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
      5.44            +2.0        7.44        perf-profile.calltrace.cycles-pp.filename_lookup.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64
      7.67            +2.8       10.49        perf-profile.calltrace.cycles-pp.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe
     10.57            +3.8       14.36        perf-profile.calltrace.cycles-pp.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
     11.30            +4.1       15.41        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
     11.56            +4.2       15.77        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.statx
     12.09            +4.4       16.46        perf-profile.calltrace.cycles-pp.vfs_fstatat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
     13.87            +5.0       18.84        perf-profile.calltrace.cycles-pp.statx
     13.97            +5.0       18.98        perf-profile.calltrace.cycles-pp.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
     14.70            +5.3       20.04        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
     14.99            +5.5       20.46        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fstatat64
     17.21            +6.2       23.42        perf-profile.calltrace.cycles-pp.fstatat64
     43.74           -11.4       32.30        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     41.21           -11.4       29.78        perf-profile.children.cycles-pp.queued_write_lock_slowpath
     33.86            -7.5       26.35        perf-profile.children.cycles-pp.__x64_sys_exit
     33.86            -7.5       26.36        perf-profile.children.cycles-pp.do_exit
     29.50            -7.4       22.07        perf-profile.children.cycles-pp.exit_notify
     25.30            -4.2       21.10        perf-profile.children.cycles-pp.kernel_clone
     25.34            -4.2       21.15        perf-profile.children.cycles-pp.__do_sys_clone3
     21.40            -4.2       17.21        perf-profile.children.cycles-pp.copy_process
     15.24            -3.6       11.60        perf-profile.children.cycles-pp.release_task
     89.78            -2.0       87.81        perf-profile.children.cycles-pp.do_syscall_64
     90.27            -1.8       88.48        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.73            -0.2        0.58        perf-profile.children.cycles-pp.sched_balance_newidle
      0.64            -0.1        0.50 ±  2%  perf-profile.children.cycles-pp.sched_balance_rq
      1.22            -0.1        1.10        perf-profile.children.cycles-pp.do_task_dead
      2.44            -0.1        2.32        perf-profile.children.cycles-pp.ret_from_fork
      2.66            -0.1        2.54        perf-profile.children.cycles-pp.ret_from_fork_asm
      1.70            -0.1        1.60        perf-profile.children.cycles-pp.__do_softirq
      2.29            -0.1        2.19        perf-profile.children.cycles-pp.kthread
      1.59            -0.1        1.49        perf-profile.children.cycles-pp.rcu_core
      1.56            -0.1        1.47        perf-profile.children.cycles-pp.rcu_do_batch
      0.91            -0.1        0.83        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.74            -0.1        0.67        perf-profile.children.cycles-pp.futex_wait_queue
      0.84            -0.1        0.78        perf-profile.children.cycles-pp.futex_wait
      0.85            -0.1        0.79        perf-profile.children.cycles-pp.__x64_sys_futex
      0.54            -0.1        0.48        perf-profile.children.cycles-pp.irq_exit_rcu
      0.83            -0.1        0.77        perf-profile.children.cycles-pp.__futex_wait
      1.05            -0.1        0.99        perf-profile.children.cycles-pp.worker_thread
      0.21            -0.1        0.16 ±  2%  perf-profile.children.cycles-pp.detach_tasks
      0.87            -0.0        0.83        perf-profile.children.cycles-pp.activate_task
      1.50            -0.0        1.46        perf-profile.children.cycles-pp.alloc_pid
      1.17            -0.0        1.12        perf-profile.children.cycles-pp.run_ksoftirqd
      1.24            -0.0        1.19        perf-profile.children.cycles-pp.smpboot_thread_fn
      0.55            -0.0        0.51 ±  2%  perf-profile.children.cycles-pp.perf_session__process_user_event
      0.85            -0.0        0.81        perf-profile.children.cycles-pp.delayed_vfree_work
      0.54            -0.0        0.50 ±  2%  perf-profile.children.cycles-pp.perf_session__deliver_event
      0.94            -0.0        0.90        perf-profile.children.cycles-pp.process_one_work
      0.55            -0.0        0.51 ±  2%  perf-profile.children.cycles-pp.__ordered_events__flush
      0.80            -0.0        0.76        perf-profile.children.cycles-pp.vfree
      0.19 ±  3%      -0.0        0.15        perf-profile.children.cycles-pp.free_unref_page_commit
      0.24 ±  2%      -0.0        0.20        perf-profile.children.cycles-pp.free_unref_page
      0.50            -0.0        0.46 ±  2%  perf-profile.children.cycles-pp.read
      0.49            -0.0        0.45        perf-profile.children.cycles-pp.ksys_read
      0.89            -0.0        0.85        perf-profile.children.cycles-pp.enqueue_task_fair
      0.48            -0.0        0.44 ±  2%  perf-profile.children.cycles-pp.seq_read
      0.48            -0.0        0.44 ±  2%  perf-profile.children.cycles-pp.seq_read_iter
      0.44            -0.0        0.41 ±  2%  perf-profile.children.cycles-pp.proc_pid_status
      0.17 ±  4%      -0.0        0.14 ±  3%  perf-profile.children.cycles-pp.free_pcppages_bulk
      0.48            -0.0        0.44 ±  3%  perf-profile.children.cycles-pp.machine__process_fork_event
      0.48            -0.0        0.44        perf-profile.children.cycles-pp.vfs_read
      0.95            -0.0        0.91        perf-profile.children.cycles-pp.dequeue_task_fair
      1.46            -0.0        1.42        perf-profile.children.cycles-pp.do_futex
      0.19 ±  2%      -0.0        0.16 ±  2%  perf-profile.children.cycles-pp.update_sd_lb_stats
      0.47            -0.0        0.44 ±  2%  perf-profile.children.cycles-pp.____machine__findnew_thread
      0.45            -0.0        0.41        perf-profile.children.cycles-pp.proc_single_show
      0.78            -0.0        0.74        perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.20            -0.0        0.17 ±  2%  perf-profile.children.cycles-pp.sched_balance_find_src_group
      0.56            -0.0        0.53        perf-profile.children.cycles-pp.dequeue_entity
      0.19 ±  4%      -0.0        0.16 ±  4%  perf-profile.children.cycles-pp.__put_partials
      0.51            -0.0        0.48        perf-profile.children.cycles-pp.enqueue_entity
      0.53            -0.0        0.50        perf-profile.children.cycles-pp.remove_vm_area
      0.17 ±  2%      -0.0        0.14 ±  2%  perf-profile.children.cycles-pp.update_sg_lb_stats
      0.92            -0.0        0.90        perf-profile.children.cycles-pp.__madvise
      0.89            -0.0        0.86        perf-profile.children.cycles-pp.__x64_sys_madvise
      0.82            -0.0        0.79        perf-profile.children.cycles-pp.__vmalloc_area_node
      0.89            -0.0        0.86        perf-profile.children.cycles-pp.do_madvise
      0.68            -0.0        0.65        perf-profile.children.cycles-pp.madvise_vma_behavior
      0.28            -0.0        0.25        perf-profile.children.cycles-pp.__slab_free
      0.66            -0.0        0.64        perf-profile.children.cycles-pp.__alloc_pages_bulk
      0.09            -0.0        0.07 ±  5%  perf-profile.children.cycles-pp.flush_tlb_mm_range
      0.63            -0.0        0.61        perf-profile.children.cycles-pp.zap_page_range_single
      0.52            -0.0        0.50        perf-profile.children.cycles-pp.perf_event_task_output
      0.09 ±  4%      -0.0        0.07 ±  5%  perf-profile.children.cycles-pp.on_each_cpu_cond_mask
      0.09 ±  4%      -0.0        0.07 ±  5%  perf-profile.children.cycles-pp.smp_call_function_many_cond
      0.11 ±  4%      -0.0        0.09 ±  6%  perf-profile.children.cycles-pp.tlb_finish_mmu
      0.80            -0.0        0.78        perf-profile.children.cycles-pp.__exit_signal
      0.09            -0.0        0.08 ±  6%  perf-profile.children.cycles-pp.schedule_tail
      0.57            -0.0        0.55        perf-profile.children.cycles-pp.perf_iterate_sb
      0.14 ±  3%      -0.0        0.13        perf-profile.children.cycles-pp.__task_pid_nr_ns
      0.48            -0.0        0.46        perf-profile.children.cycles-pp.clear_page_erms
      0.07            -0.0        0.06        perf-profile.children.cycles-pp.__free_pages
      0.07            -0.0        0.06        perf-profile.children.cycles-pp.bitmap_string
      0.05            +0.0        0.06        perf-profile.children.cycles-pp.mem_cgroup_handle_over_high
      0.07            +0.0        0.08        perf-profile.children.cycles-pp.rseq_get_rseq_cs
      0.06            +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.cpuacct_charge
      0.06            +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.os_xsave
      0.18 ±  2%      +0.0        0.19        perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
      0.05            +0.0        0.06 ±  7%  perf-profile.children.cycles-pp.select_idle_sibling
      0.07 ±  5%      +0.0        0.08 ±  5%  perf-profile.children.cycles-pp.shmem_is_huge
      0.09 ±  5%      +0.0        0.11 ±  3%  perf-profile.children.cycles-pp.rseq_ip_fixup
      0.11 ±  3%      +0.0        0.13 ±  2%  perf-profile.children.cycles-pp.select_task_rq
      0.09            +0.0        0.11 ±  4%  perf-profile.children.cycles-pp.__switch_to
      0.10 ±  3%      +0.0        0.12        perf-profile.children.cycles-pp.___perf_sw_event
      0.05            +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.make_vfsgid
      0.07 ±  5%      +0.0        0.09        perf-profile.children.cycles-pp.__enqueue_entity
      0.13            +0.0        0.15 ±  4%  perf-profile.children.cycles-pp.stress_fstat_thread
      0.21            +0.0        0.23        perf-profile.children.cycles-pp.set_next_entity
      0.05            +0.0        0.07        perf-profile.children.cycles-pp.proc_pid_get_link
      0.07            +0.0        0.09        perf-profile.children.cycles-pp.pick_eevdf
      0.49            +0.0        0.51        perf-profile.children.cycles-pp.try_to_wake_up
      0.41            +0.0        0.43        perf-profile.children.cycles-pp.wake_up_q
      0.22 ±  2%      +0.0        0.25        perf-profile.children.cycles-pp.switch_fpu_return
      0.12 ±  4%      +0.0        0.15        perf-profile.children.cycles-pp.__rseq_handle_notify_resume
      0.18 ±  2%      +0.0        0.21 ±  3%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.06 ±  6%      +0.0        0.09 ±  5%  perf-profile.children.cycles-pp.__x64_sys_newfstatat
      0.17 ±  2%      +0.0        0.20 ±  2%  perf-profile.children.cycles-pp.update_process_times
      0.03 ± 70%      +0.0        0.06        perf-profile.children.cycles-pp.statx@plt
      0.19 ±  2%      +0.0        0.22 ±  3%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.33            +0.0        0.36 ±  3%  perf-profile.children.cycles-pp.pick_link
      0.09            +0.0        0.12        perf-profile.children.cycles-pp.should_failslab
      0.05            +0.0        0.08        perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
      0.07            +0.0        0.10        perf-profile.children.cycles-pp.mntput
      0.60            +0.0        0.64        perf-profile.children.cycles-pp.futex_wake
      0.24 ±  2%      +0.0        0.27 ±  3%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.23 ±  2%      +0.0        0.26 ±  3%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.10 ±  4%      +0.0        0.13 ±  2%  perf-profile.children.cycles-pp.legitimize_links
      0.64            +0.0        0.67        perf-profile.children.cycles-pp.mm_release
      0.16            +0.0        0.19 ±  2%  perf-profile.children.cycles-pp.prepare_task_switch
      0.07            +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.yield_task_fair
      0.78            +0.0        0.82        perf-profile.children.cycles-pp.exit_mm
      0.12 ±  3%      +0.0        0.16 ±  2%  perf-profile.children.cycles-pp.mntput_no_expire
      0.15 ±  3%      +0.0        0.19        perf-profile.children.cycles-pp.__get_user_1
      0.18            +0.0        0.22 ±  2%  perf-profile.children.cycles-pp.switch_mm_irqs_off
      0.13 ±  3%      +0.0        0.18 ±  3%  perf-profile.children.cycles-pp.vfs_fstat
      0.20            +0.0        0.25 ±  2%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      0.02 ±141%      +0.0        0.06 ± 17%  perf-profile.children.cycles-pp.try_to_unlazy_next
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.apparmor_inode_getattr
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.tid_fd_revalidate
      0.16 ±  3%      +0.1        0.21        perf-profile.children.cycles-pp.security_inode_permission
      0.17 ±  2%      +0.1        0.22 ±  2%  perf-profile.children.cycles-pp.from_kgid_munged
      0.15            +0.1        0.21 ±  2%  perf-profile.children.cycles-pp.is_vmalloc_addr
      0.15 ±  2%      +0.1        0.21        perf-profile.children.cycles-pp.amd_clear_divider
      0.16            +0.1        0.22        perf-profile.children.cycles-pp.from_kuid_munged
      0.67            +0.1        0.73        perf-profile.children.cycles-pp.update_curr
      0.11 ±  4%      +0.1        0.17 ±  2%  perf-profile.children.cycles-pp.put_prev_entity
      0.24 ±  4%      +0.1        0.31 ±  5%  perf-profile.children.cycles-pp.__fdget_raw
      0.23 ±  2%      +0.1        0.30        perf-profile.children.cycles-pp.rcu_all_qs
      0.19 ± 16%      +0.1        0.26 ± 10%  perf-profile.children.cycles-pp.__lookup_mnt
      0.16 ±  2%      +0.1        0.23        perf-profile.children.cycles-pp.do_sched_yield
      0.26            +0.1        0.34        perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      0.23 ±  2%      +0.1        0.32 ±  5%  perf-profile.children.cycles-pp.terminate_walk
      0.24 ±  2%      +0.1        0.33        perf-profile.children.cycles-pp.make_vfsuid
      0.28            +0.1        0.38        perf-profile.children.cycles-pp.map_id_up
      1.42            +0.1        1.52 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock
      0.28            +0.1        0.39        perf-profile.children.cycles-pp.check_stack_object
      0.32            +0.1        0.44        perf-profile.children.cycles-pp.generic_fillattr
      0.34            +0.1        0.46 ±  2%  perf-profile.children.cycles-pp.__legitimize_mnt
      0.31 ±  4%      +0.1        0.46 ±  6%  perf-profile.children.cycles-pp.set_root
      0.44            +0.1        0.59        perf-profile.children.cycles-pp.vfs_getattr_nosec
      0.49            +0.2        0.64        perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.53            +0.2        0.70 ±  2%  perf-profile.children.cycles-pp.generic_permission
      0.58            +0.2        0.76        perf-profile.children.cycles-pp.shmem_getattr
      0.55            +0.2        0.74        perf-profile.children.cycles-pp.cp_statx
      0.45 ±  2%      +0.2        0.64 ±  4%  perf-profile.children.cycles-pp.nd_jump_root
      0.62            +0.2        0.82        perf-profile.children.cycles-pp.__check_heap_object
      0.69            +0.2        0.91        perf-profile.children.cycles-pp._copy_to_user
      0.66 ±  3%      +0.2        0.88        perf-profile.children.cycles-pp.shim_statx
      0.60            +0.2        0.84        perf-profile.children.cycles-pp.__cond_resched
      0.45 ± 34%      +0.2        0.70 ± 25%  perf-profile.children.cycles-pp.stress_fstat_helper
      0.67            +0.2        0.91        perf-profile.children.cycles-pp.common_perm_cond
      0.66 ±  2%      +0.2        0.90 ±  2%  perf-profile.children.cycles-pp.__virt_addr_valid
      0.66 ±  4%      +0.2        0.91 ±  4%  perf-profile.children.cycles-pp.__d_lookup_rcu
      0.77            +0.3        1.04        perf-profile.children.cycles-pp.inode_permission
      2.72            +0.3        3.01        perf-profile.children.cycles-pp.__schedule
      0.82            +0.3        1.11        perf-profile.children.cycles-pp.putname
      0.67            +0.3        0.96        perf-profile.children.cycles-pp.__x64_sys_sched_yield
      1.06 ±  2%      +0.3        1.36 ±  2%  perf-profile.children.cycles-pp.step_into
      0.81            +0.3        1.11        perf-profile.children.cycles-pp.security_inode_getattr
      1.94            +0.3        2.27        perf-profile.children.cycles-pp.kmem_cache_free
      0.88            +0.3        1.21 ±  2%  perf-profile.children.cycles-pp.path_init
      0.82            +0.3        1.16 ±  2%  perf-profile.children.cycles-pp.lockref_put_return
      1.44            +0.3        1.78        perf-profile.children.cycles-pp.schedule
      1.00 ±  2%      +0.3        1.34 ±  2%  perf-profile.children.cycles-pp.lookup_fast
      0.88            +0.4        1.24        perf-profile.children.cycles-pp.__sched_yield
      1.08            +0.4        1.45        perf-profile.children.cycles-pp.cp_new_stat
      1.19 ±  2%      +0.4        1.60 ±  2%  perf-profile.children.cycles-pp.walk_component
      1.07            +0.4        1.50        perf-profile.children.cycles-pp.path_put
      1.51            +0.5        1.96        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      1.13            +0.5        1.58        perf-profile.children.cycles-pp.dput
      1.37            +0.5        1.86        perf-profile.children.cycles-pp.check_heap_object
      1.27            +0.5        1.79        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      1.43            +0.6        2.00        perf-profile.children.cycles-pp.lockref_get_not_dead
      1.87            +0.6        2.46        perf-profile.children.cycles-pp.entry_SYSCALL_64
      1.84            +0.7        2.50        perf-profile.children.cycles-pp.kmem_cache_alloc
      1.80            +0.7        2.52        perf-profile.children.cycles-pp.__legitimize_path
      1.96            +0.8        2.73        perf-profile.children.cycles-pp.try_to_unlazy
      2.04            +0.8        2.85        perf-profile.children.cycles-pp.complete_walk
      2.84            +1.0        3.81        perf-profile.children.cycles-pp.link_path_walk
      2.78            +1.0        3.75        perf-profile.children.cycles-pp.__check_object_size
      4.85            +1.7        6.52        perf-profile.children.cycles-pp.strncpy_from_user
      5.28            +1.9        7.21        perf-profile.children.cycles-pp.do_statx
      7.37            +2.6        9.97        perf-profile.children.cycles-pp.path_lookupat
      8.01            +2.8       10.82        perf-profile.children.cycles-pp.getname_flags
      8.65            +3.0       11.70        perf-profile.children.cycles-pp.filename_lookup
     10.65            +3.8       14.48        perf-profile.children.cycles-pp.__x64_sys_statx
     12.16            +4.3       16.49        perf-profile.children.cycles-pp.vfs_statx
     12.54            +4.4       16.91        perf-profile.children.cycles-pp.vfs_fstatat
     14.41            +5.0       19.42        perf-profile.children.cycles-pp.__do_sys_newfstatat
     13.92            +5.0       18.92        perf-profile.children.cycles-pp.statx
     17.67            +6.2       23.87        perf-profile.children.cycles-pp.fstatat64
     43.74           -11.4       32.29        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      1.36            -0.0        1.33        perf-profile.self.cycles-pp.queued_write_lock_slowpath
      0.14 ±  3%      -0.0        0.12 ±  3%  perf-profile.self.cycles-pp.update_sg_lb_stats
      0.07 ±  5%      -0.0        0.05 ±  8%  perf-profile.self.cycles-pp.smp_call_function_many_cond
      0.60            -0.0        0.58        perf-profile.self.cycles-pp.__memcpy
      0.47            -0.0        0.46        perf-profile.self.cycles-pp.clear_page_erms
      0.07            -0.0        0.06        perf-profile.self.cycles-pp.__free_pages
      0.13            -0.0        0.12        perf-profile.self.cycles-pp.sched_balance_find_dst_cpu
      0.07            +0.0        0.08        perf-profile.self.cycles-pp.available_idle_cpu
      0.06            +0.0        0.07        perf-profile.self.cycles-pp.__dequeue_entity
      0.09            +0.0        0.10        perf-profile.self.cycles-pp.__switch_to
      0.06            +0.0        0.07        perf-profile.self.cycles-pp.shmem_is_huge
      0.06            +0.0        0.07 ±  5%  perf-profile.self.cycles-pp.os_xsave
      0.18 ±  2%      +0.0        0.19        perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
      0.10 ±  4%      +0.0        0.11        perf-profile.self.cycles-pp.___perf_sw_event
      0.12            +0.0        0.14 ±  3%  perf-profile.self.cycles-pp.stress_fstat_thread
      0.06 ±  6%      +0.0        0.08        perf-profile.self.cycles-pp.pick_eevdf
      0.05            +0.0        0.07        perf-profile.self.cycles-pp.mntput
      0.05            +0.0        0.07        perf-profile.self.cycles-pp.should_failslab
      0.07            +0.0        0.09        perf-profile.self.cycles-pp.__enqueue_entity
      0.07            +0.0        0.09 ±  5%  perf-profile.self.cycles-pp.legitimize_links
      0.08 ±  4%      +0.0        0.11        perf-profile.self.cycles-pp.amd_clear_divider
      0.09            +0.0        0.12        perf-profile.self.cycles-pp.__legitimize_path
      0.08            +0.0        0.11        perf-profile.self.cycles-pp.pick_next_task_fair
      0.10            +0.0        0.13 ±  2%  perf-profile.self.cycles-pp.complete_walk
      0.11 ±  4%      +0.0        0.14        perf-profile.self.cycles-pp.dput
      0.11 ±  4%      +0.0        0.14 ±  2%  perf-profile.self.cycles-pp.security_inode_getattr
      0.11            +0.0        0.15        perf-profile.self.cycles-pp.mntput_no_expire
      0.12 ±  3%      +0.0        0.16 ±  2%  perf-profile.self.cycles-pp.try_to_unlazy
      0.86            +0.0        0.90        perf-profile.self.cycles-pp._raw_spin_lock
      0.11            +0.0        0.15 ±  2%  perf-profile.self.cycles-pp.is_vmalloc_addr
      0.18 ±  2%      +0.0        0.22        perf-profile.self.cycles-pp.switch_mm_irqs_off
      0.14 ±  3%      +0.0        0.18        perf-profile.self.cycles-pp.__get_user_1
      0.13            +0.0        0.17 ±  2%  perf-profile.self.cycles-pp.security_inode_permission
      0.13            +0.0        0.17 ±  2%  perf-profile.self.cycles-pp.terminate_walk
      0.32            +0.0        0.36 ±  2%  perf-profile.self.cycles-pp.update_curr
      0.14            +0.0        0.19 ±  2%  perf-profile.self.cycles-pp.nd_jump_root
      0.19 ±  2%      +0.0        0.24        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.18            +0.0        0.23        perf-profile.self.cycles-pp.rcu_all_qs
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.from_kgid_munged
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.make_vfsgid
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.path_put
      0.00            +0.1        0.06 ±  9%  perf-profile.self.cycles-pp.__x64_sys_newfstatat
      0.18 ±  2%      +0.1        0.24 ±  2%  perf-profile.self.cycles-pp.walk_component
      0.22            +0.1        0.28        perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
      0.16 ± 16%      +0.1        0.23 ± 12%  perf-profile.self.cycles-pp.__lookup_mnt
      0.23 ±  4%      +0.1        0.29 ±  5%  perf-profile.self.cycles-pp.__fdget_raw
      0.20 ±  2%      +0.1        0.27        perf-profile.self.cycles-pp.lookup_fast
      0.20 ±  2%      +0.1        0.27        perf-profile.self.cycles-pp.shmem_getattr
      0.18            +0.1        0.25        perf-profile.self.cycles-pp.make_vfsuid
      0.21            +0.1        0.28        perf-profile.self.cycles-pp.vfs_fstatat
      0.22            +0.1        0.30        perf-profile.self.cycles-pp.cp_statx
      0.24 ±  2%      +0.1        0.32        perf-profile.self.cycles-pp.generic_fillattr
      0.23            +0.1        0.31        perf-profile.self.cycles-pp.check_stack_object
      0.24 ±  3%      +0.1        0.33        perf-profile.self.cycles-pp.map_id_up
      0.24            +0.1        0.33        perf-profile.self.cycles-pp.__x64_sys_statx
      0.31 ±  2%      +0.1        0.40        perf-profile.self.cycles-pp.__cond_resched
      0.23 ±  2%      +0.1        0.32 ±  3%  perf-profile.self.cycles-pp.inode_permission
      0.30 ±  2%      +0.1        0.40        perf-profile.self.cycles-pp.path_init
      0.34            +0.1        0.44        perf-profile.self.cycles-pp.__schedule
      0.32            +0.1        0.42        perf-profile.self.cycles-pp.path_lookupat
      0.31            +0.1        0.42 ±  2%  perf-profile.self.cycles-pp.__legitimize_mnt
      0.30 ±  3%      +0.1        0.43 ±  6%  perf-profile.self.cycles-pp.set_root
      0.42            +0.1        0.56        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.42            +0.1        0.56 ±  3%  perf-profile.self.cycles-pp.generic_permission
      0.41            +0.1        0.56        perf-profile.self.cycles-pp.vfs_getattr_nosec
      0.48            +0.1        0.63        perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.42            +0.2        0.57        perf-profile.self.cycles-pp.cp_new_stat
      0.48            +0.2        0.64        perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.59            +0.2        0.77        perf-profile.self.cycles-pp.__check_heap_object
      0.54            +0.2        0.72        perf-profile.self.cycles-pp.vfs_statx
      0.54            +0.2        0.73 ±  2%  perf-profile.self.cycles-pp.step_into
      0.52 ±  8%      +0.2        0.72 ±  5%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.58            +0.2        0.77        perf-profile.self.cycles-pp.check_heap_object
      0.56            +0.2        0.77        perf-profile.self.cycles-pp.__check_object_size
      0.67            +0.2        0.88        perf-profile.self.cycles-pp._copy_to_user
      0.62 ±  3%      +0.2        0.83        perf-profile.self.cycles-pp.shim_statx
      0.62            +0.2        0.84        perf-profile.self.cycles-pp.common_perm_cond
      0.39 ± 38%      +0.2        0.62 ± 27%  perf-profile.self.cycles-pp.stress_fstat_helper
      0.61 ±  4%      +0.2        0.84 ±  4%  perf-profile.self.cycles-pp.__d_lookup_rcu
      0.60 ±  2%      +0.2        0.84 ±  2%  perf-profile.self.cycles-pp.__virt_addr_valid
      0.65            +0.2        0.88        perf-profile.self.cycles-pp.do_statx
      0.67            +0.2        0.90        perf-profile.self.cycles-pp.__do_sys_newfstatat
      0.78            +0.3        1.04        perf-profile.self.cycles-pp.do_syscall_64
      0.76            +0.3        1.02        perf-profile.self.cycles-pp.putname
      0.77            +0.3        1.05        perf-profile.self.cycles-pp.getname_flags
      0.84            +0.3        1.12 ±  3%  perf-profile.self.cycles-pp.link_path_walk
      0.87            +0.3        1.16        perf-profile.self.cycles-pp.fstatat64
      0.94            +0.3        1.26        perf-profile.self.cycles-pp.statx
      0.80            +0.3        1.12 ±  2%  perf-profile.self.cycles-pp.lockref_put_return
      1.21            +0.4        1.60        perf-profile.self.cycles-pp.kmem_cache_free
      1.46            +0.4        1.90        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      1.26            +0.4        1.70        perf-profile.self.cycles-pp.filename_lookup
      1.39            +0.5        1.89        perf-profile.self.cycles-pp.kmem_cache_alloc
      1.39            +0.6        1.96        perf-profile.self.cycles-pp.lockref_get_not_dead
      2.12            +0.7        2.83        perf-profile.self.cycles-pp.strncpy_from_user



***************************************************************************************************
lkp-csl-2sp3: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
  gcc-13/performance/4BRD_12G/xfs/x86_64-rhel-8.3/300/RAID1/debian-12-x86_64-20240206.cgz/lkp-csl-2sp3/sync_disk_rw/aim7

commit: 
  97450eb909 ("sched/pelt: Remove shift of thermal clock")
  e2bbd1c498 ("sched/fair: Reschedule the cfs_rq when current is ineligible")

97450eb909658573 e2bbd1c498980c5cb68f9973f41 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  24356729 ±  2%     +15.8%   28212175        cpuidle..usage
     30.69 ±  2%     +13.6%      34.87 ±  2%  iostat.cpu.idle
     68.53            -6.3%      64.25        iostat.cpu.system
    753687           +10.4%     831869 ±  4%  meminfo.Inactive(anon)
    164050 ±  6%     +45.9%     239358 ± 11%  meminfo.Mapped
      6182            +9.6%       6775 ±  2%  perf-c2c.DRAM.remote
      3724 ±  3%     +11.5%       4151 ±  2%  perf-c2c.HITM.remote
    648354            +7.8%     698666 ±  2%  vmstat.io.bo
     95.98 ±  6%     -12.9%      83.64 ±  4%  vmstat.procs.r
    801317           +16.2%     930980 ±  2%  vmstat.system.cs
     29.52 ±  2%      +4.2       33.67 ±  3%  mpstat.cpu.all.idle%
      0.04 ±  5%      -0.0        0.04 ±  5%  mpstat.cpu.all.iowait%
      0.74            +0.1        0.85 ±  3%  mpstat.cpu.all.usr%
     85.46           -10.7%      76.32        mpstat.max_utilization_pct
     16199            +9.6%      17753        aim7.jobs-per-min
    111.17            -8.7%     101.44        aim7.time.elapsed_time
    111.17            -8.7%     101.44        aim7.time.elapsed_time.max
   4379281           +75.1%    7669146        aim7.time.involuntary_context_switches
      6400            -6.5%       5983        aim7.time.percent_of_cpu_this_job_got
      7089           -14.8%       6042        aim7.time.system_time
  52793765            -4.9%   50219682        aim7.time.voluntary_context_switches
    336033            +3.7%     348553 ±  2%  proc-vmstat.nr_active_anon
   1179748            +2.6%    1210206        proc-vmstat.nr_file_pages
    188573           +10.4%     208211 ±  4%  proc-vmstat.nr_inactive_anon
     41560 ±  6%     +45.9%      60654 ± 11%  proc-vmstat.nr_mapped
    360884            +8.6%     391938        proc-vmstat.nr_shmem
    336033            +3.7%     348553 ±  2%  proc-vmstat.nr_zone_active_anon
    188573           +10.4%     208211 ±  4%  proc-vmstat.nr_zone_inactive_anon
   1474174           -21.8%    1153102        sched_debug.cfs_rq:/.avg_vruntime.avg
   1644121           -20.9%    1300217 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.max
   1394133           -21.2%    1098002 ±  2%  sched_debug.cfs_rq:/.avg_vruntime.min
     38297 ±  7%     -36.5%      24315 ± 12%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.78 ± 18%     -27.4%       0.57 ±  7%  sched_debug.cfs_rq:/.h_nr_running.avg
   1474174           -21.8%    1153102        sched_debug.cfs_rq:/.min_vruntime.avg
   1644121           -20.9%    1300217 ±  3%  sched_debug.cfs_rq:/.min_vruntime.max
   1394133           -21.2%    1098002 ±  2%  sched_debug.cfs_rq:/.min_vruntime.min
     38297 ±  7%     -36.5%      24315 ± 12%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.47 ±  2%      -9.8%       0.42 ±  4%  sched_debug.cfs_rq:/.nr_running.avg
    853.31 ± 10%     -26.5%     627.20        sched_debug.cfs_rq:/.runnable_avg.avg
      3612 ± 27%     -45.1%       1983 ± 17%  sched_debug.cfs_rq:/.runnable_avg.max
    588.74 ± 21%     -37.5%     367.74 ±  5%  sched_debug.cfs_rq:/.runnable_avg.stddev
      2305 ± 22%     -30.1%       1612 ± 16%  sched_debug.cfs_rq:/.util_avg.max
    394.30 ± 16%     -21.5%     309.35 ±  5%  sched_debug.cfs_rq:/.util_avg.stddev
    272.57 ± 17%     -41.5%     159.47 ±  7%  sched_debug.cfs_rq:/.util_est.avg
    377.83 ± 35%     -43.3%     214.15 ± 16%  sched_debug.cfs_rq:/.util_est.stddev
      0.79 ± 18%     -30.8%       0.55 ±  7%  sched_debug.cpu.nr_running.avg
    221477           +20.1%     265968        sched_debug.cpu.nr_switches.avg
    238906 ±  3%     +18.7%     283476 ±  2%  sched_debug.cpu.nr_switches.max
      6940 ±  8%     +31.5%       9124 ±  9%  sched_debug.cpu.nr_switches.stddev
      1.47 ±  2%     +10.1%       1.62 ±  3%  perf-stat.i.MPKI
      1.21            +0.2        1.36 ±  3%  perf-stat.i.branch-miss-rate%
  86242084 ±  4%     +16.3%  1.003e+08 ±  4%  perf-stat.i.branch-misses
  80906704 ±  2%     +12.5%   90980799 ±  2%  perf-stat.i.cache-misses
    817901           +16.5%     952574 ±  2%  perf-stat.i.context-switches
      3.85            -7.4%       3.57        perf-stat.i.cpi
 2.134e+11            -5.3%   2.02e+11        perf-stat.i.cpu-cycles
      2671 ±  2%     -14.9%       2272 ±  3%  perf-stat.i.cycles-between-cache-misses
      0.32            +7.6%       0.34        perf-stat.i.ipc
     10.77           +13.5%      12.23 ±  2%  perf-stat.i.metric.K/sec
      6523            +8.0%       7047 ±  3%  perf-stat.i.minor-faults
      6524            +8.0%       7048 ±  3%  perf-stat.i.page-faults
      1.54 ±  2%     +10.5%       1.70 ±  2%  perf-stat.overall.MPKI
      0.80 ±  3%      +0.1        0.92 ±  4%  perf-stat.overall.branch-miss-rate%
      4.06            -7.0%       3.78        perf-stat.overall.cpi
      2640 ±  2%     -15.8%       2222 ±  2%  perf-stat.overall.cycles-between-cache-misses
      0.25            +7.6%       0.26        perf-stat.overall.ipc
  85266080 ±  4%     +16.3%   99200665 ±  4%  perf-stat.ps.branch-misses
  80071004 ±  2%     +12.5%   90070323 ±  2%  perf-stat.ps.cache-misses
    809411           +16.5%     943274 ±  2%  perf-stat.ps.context-switches
 2.113e+11            -5.3%      2e+11        perf-stat.ps.cpu-cycles
 5.852e+12            -6.5%   5.47e+12 ±  2%  perf-stat.total.instructions
     12.13 ±  4%      -4.2        7.88 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.md_flush_request.raid1_make_request.md_handle_request
     12.19 ±  4%      -4.2        7.96 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.md_flush_request.raid1_make_request.md_handle_request.__submit_bio
     13.44 ±  4%      -4.2        9.25 ±  3%  perf-profile.calltrace.cycles-pp.md_flush_request.raid1_make_request.md_handle_request.__submit_bio.__submit_bio_noacct
     13.46 ±  4%      -4.2        9.27 ±  3%  perf-profile.calltrace.cycles-pp.md_handle_request.__submit_bio.__submit_bio_noacct.submit_bio_wait.blkdev_issue_flush
     13.48 ±  4%      -4.2        9.30 ±  3%  perf-profile.calltrace.cycles-pp.__submit_bio_noacct.submit_bio_wait.blkdev_issue_flush.xfs_file_fsync.xfs_file_buffered_write
     13.44 ±  4%      -4.2        9.26 ±  3%  perf-profile.calltrace.cycles-pp.raid1_make_request.md_handle_request.__submit_bio.__submit_bio_noacct.submit_bio_wait
     13.48 ±  4%      -4.2        9.30 ±  3%  perf-profile.calltrace.cycles-pp.__submit_bio.__submit_bio_noacct.submit_bio_wait.blkdev_issue_flush.xfs_file_fsync
     13.57 ±  4%      -4.2        9.41 ±  3%  perf-profile.calltrace.cycles-pp.submit_bio_wait.blkdev_issue_flush.xfs_file_fsync.xfs_file_buffered_write.vfs_write
     13.59 ±  4%      -4.2        9.43 ±  3%  perf-profile.calltrace.cycles-pp.blkdev_issue_flush.xfs_file_fsync.xfs_file_buffered_write.vfs_write.ksys_write
     84.77            -3.7       81.04        perf-profile.calltrace.cycles-pp.xfs_file_fsync.xfs_file_buffered_write.vfs_write.ksys_write.do_syscall_64
     86.25            -3.7       82.54        perf-profile.calltrace.cycles-pp.xfs_file_buffered_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     86.38            -3.7       82.68        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     86.40            -3.7       82.69        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     86.57            -3.7       82.88        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
     86.56            -3.7       82.88        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     86.80            -3.7       83.13        perf-profile.calltrace.cycles-pp.write
      3.35            -1.3        2.10 ±  3%  perf-profile.calltrace.cycles-pp.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write.vfs_write
      3.06            -1.2        1.86 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.remove_wait_queue.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync
      3.06            -1.2        1.86 ±  2%  perf-profile.calltrace.cycles-pp.remove_wait_queue.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write
      3.05            -1.2        1.85 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.remove_wait_queue.xlog_wait_on_iclog.xfs_log_force_seq
      3.86 ±  2%      -1.1        2.78 ±  3%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.remove_wait_queue.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync
      3.84 ±  2%      -1.1        2.76 ±  3%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.remove_wait_queue.xlog_cil_force_seq.xfs_log_force_seq
      3.86 ±  2%      -1.1        2.78 ±  3%  perf-profile.calltrace.cycles-pp.remove_wait_queue.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write
      0.84 ±  2%      -0.0        0.80        perf-profile.calltrace.cycles-pp.kiocb_modified.xfs_file_write_checks.xfs_file_buffered_write.vfs_write.ksys_write
      0.82 ±  2%      -0.0        0.77        perf-profile.calltrace.cycles-pp.xfs_vn_update_time.kiocb_modified.xfs_file_write_checks.xfs_file_buffered_write.vfs_write
      0.58 ±  2%      -0.0        0.55 ±  2%  perf-profile.calltrace.cycles-pp.__xfs_trans_commit.xfs_vn_update_time.kiocb_modified.xfs_file_write_checks.xfs_file_buffered_write
      0.51            +0.0        0.56        perf-profile.calltrace.cycles-pp.xfs_end_ioend.xfs_end_io.process_one_work.worker_thread.kthread
      0.52            +0.0        0.57        perf-profile.calltrace.cycles-pp.xfs_end_io.process_one_work.worker_thread.kthread.ret_from_fork
      0.58 ±  2%      +0.1        0.62 ±  3%  perf-profile.calltrace.cycles-pp.iomap_file_buffered_write.xfs_file_buffered_write.vfs_write.ksys_write.do_syscall_64
      0.90 ±  2%      +0.1        0.99 ±  3%  perf-profile.calltrace.cycles-pp.copy_to_brd.brd_submit_bio.__submit_bio.__submit_bio_noacct.iomap_submit_ioend
      2.58 ±  2%      +0.1        2.68 ±  2%  perf-profile.calltrace.cycles-pp.__submit_bio.__submit_bio_noacct.iomap_submit_ioend.xfs_vm_writepages.do_writepages
      2.59 ±  2%      +0.1        2.70 ±  2%  perf-profile.calltrace.cycles-pp.__submit_bio_noacct.iomap_submit_ioend.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc
      2.12            +0.1        2.22        perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      2.62 ±  2%      +0.1        2.73 ±  2%  perf-profile.calltrace.cycles-pp.iomap_submit_ioend.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc.__filemap_fdatawrite_range
      1.17 ±  2%      +0.1        1.29 ±  3%  perf-profile.calltrace.cycles-pp.brd_submit_bio.__submit_bio.__submit_bio_noacct.iomap_submit_ioend.xfs_vm_writepages
      2.24            +0.1        2.38        perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
      2.24            +0.1        2.38        perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
      2.24            +0.1        2.38        perf-profile.calltrace.cycles-pp.ret_from_fork_asm
      2.23            +0.1        2.37        perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.91 ±  2%      +0.7        1.58        perf-profile.calltrace.cycles-pp.mutex_spin_on_owner.__mutex_lock.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq
     54.72            +1.0       55.70        perf-profile.calltrace.cycles-pp.osq_lock.__mutex_lock.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq
      1.71 ±  5%      +1.1        2.86 ± 10%  perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
     64.02            +1.6       65.62        perf-profile.calltrace.cycles-pp.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write.vfs_write
      7.56 ±  2%      +1.9        9.42        perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
     55.82            +2.0       57.86        perf-profile.calltrace.cycles-pp.__mutex_lock.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force_seq
     58.14            +2.6       60.76        perf-profile.calltrace.cycles-pp.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync
     59.46            +2.7       62.18        perf-profile.calltrace.cycles-pp.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write
      9.68 ±  2%      +3.2       12.86 ±  3%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      9.69 ±  2%      +3.2       12.86 ±  3%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      9.86 ±  2%      +3.2       13.08 ±  3%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
     10.46 ±  2%      +3.5       13.94 ±  2%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
     10.47 ±  2%      +3.5       13.94 ±  2%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
     10.47 ±  2%      +3.5       13.94 ±  2%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
     10.60 ±  2%      +3.5       14.08 ±  2%  perf-profile.calltrace.cycles-pp.common_startup_64
     22.14 ±  2%      -6.5       15.63 ±  2%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     12.70 ±  3%      -4.2        8.46 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irq
     13.54 ±  4%      -4.2        9.34 ±  3%  perf-profile.children.cycles-pp.md_flush_request
     14.22 ±  3%      -4.2       10.06 ±  3%  perf-profile.children.cycles-pp.md_handle_request
     14.20 ±  3%      -4.2       10.04 ±  3%  perf-profile.children.cycles-pp.raid1_make_request
     13.57 ±  4%      -4.2        9.41 ±  3%  perf-profile.children.cycles-pp.submit_bio_wait
     13.59 ±  4%      -4.2        9.43 ±  3%  perf-profile.children.cycles-pp.blkdev_issue_flush
     16.32 ±  3%      -4.1       12.26 ±  2%  perf-profile.children.cycles-pp.__submit_bio
     16.34 ±  3%      -4.1       12.28 ±  2%  perf-profile.children.cycles-pp.__submit_bio_noacct
     84.77            -3.7       81.04        perf-profile.children.cycles-pp.xfs_file_fsync
     86.25            -3.7       82.54        perf-profile.children.cycles-pp.xfs_file_buffered_write
     86.40            -3.7       82.70        perf-profile.children.cycles-pp.vfs_write
     86.41            -3.7       82.71        perf-profile.children.cycles-pp.ksys_write
     86.71            -3.7       83.03        perf-profile.children.cycles-pp.do_syscall_64
     86.71            -3.7       83.04        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     86.85            -3.7       83.18        perf-profile.children.cycles-pp.write
      8.05            -2.3        5.71 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      7.14            -2.3        4.82 ±  2%  perf-profile.children.cycles-pp.remove_wait_queue
      3.48            -1.3        2.23 ±  3%  perf-profile.children.cycles-pp.xlog_wait_on_iclog
      0.28 ±  3%      -0.1        0.21 ±  5%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.26 ±  3%      -0.1        0.19 ±  4%  perf-profile.children.cycles-pp.sysvec_call_function_single
      0.24 ±  3%      -0.1        0.18 ±  5%  perf-profile.children.cycles-pp.__sysvec_call_function_single
      0.16 ± 14%      -0.1        0.10 ±  7%  perf-profile.children.cycles-pp.sb_clear_inode_writeback
      0.15 ± 10%      -0.1        0.10 ±  8%  perf-profile.children.cycles-pp.sb_mark_inode_writeback
      0.34 ±  6%      -0.0        0.29 ±  4%  perf-profile.children.cycles-pp.__folio_end_writeback
      0.21 ±  8%      -0.0        0.17 ±  4%  perf-profile.children.cycles-pp.__folio_start_writeback
      0.84 ±  2%      -0.0        0.80        perf-profile.children.cycles-pp.kiocb_modified
      0.82 ±  2%      -0.0        0.78        perf-profile.children.cycles-pp.xfs_vn_update_time
      0.33 ±  4%      -0.0        0.30 ±  4%  perf-profile.children.cycles-pp.iomap_writepage_map
      0.43 ±  3%      -0.0        0.40 ±  3%  perf-profile.children.cycles-pp.iomap_writepages
      0.10 ±  6%      -0.0        0.08 ±  7%  perf-profile.children.cycles-pp.xfs_log_ticket_ungrant
      0.12            -0.0        0.11 ±  3%  perf-profile.children.cycles-pp.xlog_state_clean_iclog
      0.05            +0.0        0.06        perf-profile.children.cycles-pp.__update_blocked_fair
      0.05            +0.0        0.06        perf-profile.children.cycles-pp.kmem_cache_free
      0.11            +0.0        0.12        perf-profile.children.cycles-pp.llseek
      0.07            +0.0        0.08        perf-profile.children.cycles-pp.switch_fpu_return
      0.06            +0.0        0.07        perf-profile.children.cycles-pp.sched_clock
      0.06            +0.0        0.07        perf-profile.children.cycles-pp.wake_page_function
      0.10            +0.0        0.11 ±  4%  perf-profile.children.cycles-pp.xfs_buffered_write_iomap_begin
      0.10 ±  3%      +0.0        0.12 ±  4%  perf-profile.children.cycles-pp.__switch_to_asm
      0.05            +0.0        0.06 ±  7%  perf-profile.children.cycles-pp.ktime_get
      0.05 ±  8%      +0.0        0.07 ±  7%  perf-profile.children.cycles-pp.xfs_btree_lookup_get_block
      0.07 ±  7%      +0.0        0.08        perf-profile.children.cycles-pp.__switch_to
      0.05 ±  7%      +0.0        0.06 ±  7%  perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
      0.05 ±  7%      +0.0        0.07 ±  7%  perf-profile.children.cycles-pp.mutex_lock
      0.07 ±  5%      +0.0        0.09 ±  5%  perf-profile.children.cycles-pp.select_idle_core
      0.05 ±  8%      +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.llist_add_batch
      0.06 ±  8%      +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.mutex_unlock
      0.06 ±  6%      +0.0        0.08 ±  6%  perf-profile.children.cycles-pp.perf_tp_event
      0.14 ±  3%      +0.0        0.15 ±  2%  perf-profile.children.cycles-pp.xlog_cil_committed
      0.12 ±  3%      +0.0        0.14 ±  4%  perf-profile.children.cycles-pp.iomap_iter
      0.08 ±  4%      +0.0        0.10 ±  3%  perf-profile.children.cycles-pp.xfs_btree_lookup
      0.14 ±  3%      +0.0        0.15 ±  3%  perf-profile.children.cycles-pp.xlog_cil_process_committed
      0.13 ±  5%      +0.0        0.15        perf-profile.children.cycles-pp.xlog_cil_write_commit_record
      0.07 ±  7%      +0.0        0.08 ±  5%  perf-profile.children.cycles-pp.submit_flushes
      0.13 ±  4%      +0.0        0.15 ±  2%  perf-profile.children.cycles-pp.xlog_cil_set_ctx_write_state
      0.09 ±  5%      +0.0        0.11 ±  6%  perf-profile.children.cycles-pp.kick_pool
      0.14 ±  4%      +0.0        0.16 ±  2%  perf-profile.children.cycles-pp.xfs_bmap_add_extent_unwritten_real
      0.11 ±  3%      +0.0        0.13 ±  4%  perf-profile.children.cycles-pp.__queue_work
      0.07 ±  6%      +0.0        0.10 ±  8%  perf-profile.children.cycles-pp.__smp_call_single_queue
      0.13 ±  3%      +0.0        0.15 ±  2%  perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
      0.14 ±  3%      +0.0        0.17 ±  2%  perf-profile.children.cycles-pp.prepare_task_switch
      0.11            +0.0        0.13 ±  5%  perf-profile.children.cycles-pp.sched_balance_update_blocked_averages
      0.15 ±  3%      +0.0        0.18 ±  4%  perf-profile.children.cycles-pp.bio_alloc_bioset
      0.18 ±  4%      +0.0        0.20 ±  3%  perf-profile.children.cycles-pp.xfs_bmapi_write
      0.08            +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.__cond_resched
      0.15 ±  5%      +0.0        0.17 ±  2%  perf-profile.children.cycles-pp.xfs_bmapi_convert_unwritten
      0.28 ±  3%      +0.0        0.30        perf-profile.children.cycles-pp.select_idle_sibling
      0.18 ±  5%      +0.0        0.20 ±  4%  perf-profile.children.cycles-pp.available_idle_cpu
      0.11 ±  4%      +0.0        0.14 ±  4%  perf-profile.children.cycles-pp.queue_work_on
      0.12            +0.0        0.15 ±  4%  perf-profile.children.cycles-pp.ttwu_queue_wakelist
      0.58            +0.0        0.60 ±  2%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.22 ±  4%      +0.0        0.25 ±  2%  perf-profile.children.cycles-pp.select_idle_cpu
      0.14 ±  3%      +0.0        0.18 ±  3%  perf-profile.children.cycles-pp.menu_select
      0.36 ±  2%      +0.0        0.41        perf-profile.children.cycles-pp.xfs_iomap_write_unwritten
      0.51            +0.0        0.56        perf-profile.children.cycles-pp.xfs_end_ioend
      0.52            +0.0        0.57        perf-profile.children.cycles-pp.xfs_end_io
      0.58 ±  2%      +0.1        0.63 ±  3%  perf-profile.children.cycles-pp.iomap_file_buffered_write
      0.48            +0.1        0.53        perf-profile.children.cycles-pp.sched_ttwu_pending
      0.00            +0.1        0.06 ± 16%  perf-profile.children.cycles-pp.poll_idle
      0.57            +0.1        0.63        perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.24 ±  3%      +0.1        0.32 ±  2%  perf-profile.children.cycles-pp.flush_workqueue_prep_pwqs
      0.20 ±  3%      +0.1        0.28 ±  2%  perf-profile.children.cycles-pp.schedule_idle
      2.12            +0.1        2.22        perf-profile.children.cycles-pp.process_one_work
      2.62 ±  2%      +0.1        2.73 ±  2%  perf-profile.children.cycles-pp.iomap_submit_ioend
      1.04 ±  2%      +0.1        1.15 ±  2%  perf-profile.children.cycles-pp.copy_to_brd
      2.24            +0.1        2.38        perf-profile.children.cycles-pp.kthread
      2.24            +0.1        2.38        perf-profile.children.cycles-pp.ret_from_fork
      2.24            +0.1        2.38        perf-profile.children.cycles-pp.ret_from_fork_asm
      2.23            +0.1        2.37        perf-profile.children.cycles-pp.worker_thread
      1.32 ±  2%      +0.1        1.46 ±  2%  perf-profile.children.cycles-pp.brd_submit_bio
      0.36 ±  2%      +0.1        0.50 ±  2%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
      0.89            +0.2        1.05 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.09 ±  4%      +0.2        0.27 ±  2%  perf-profile.children.cycles-pp.wake_up_q
      1.25            +0.2        1.46        perf-profile.children.cycles-pp.try_to_wake_up
      0.10 ±  8%      +0.2        0.31 ± 18%  perf-profile.children.cycles-pp.schedule_preempt_disabled
      0.12 ±  3%      +0.3        0.40        perf-profile.children.cycles-pp.__mutex_unlock_slowpath
      0.91 ±  2%      +0.7        1.58        perf-profile.children.cycles-pp.mutex_spin_on_owner
     54.74            +1.0       55.72        perf-profile.children.cycles-pp.osq_lock
      2.02 ±  5%      +1.3        3.30 ± 10%  perf-profile.children.cycles-pp.intel_idle_irq
     64.02            +1.6       65.62        perf-profile.children.cycles-pp.xlog_cil_force_seq
      7.66 ±  2%      +1.9        9.52        perf-profile.children.cycles-pp.intel_idle
     55.82            +2.0       57.86        perf-profile.children.cycles-pp.__mutex_lock
     58.14            +2.6       60.76        perf-profile.children.cycles-pp.__flush_workqueue
     59.46            +2.7       62.18        perf-profile.children.cycles-pp.xlog_cil_push_now
      9.81 ±  2%      +3.2       12.99 ±  3%  perf-profile.children.cycles-pp.cpuidle_enter_state
      9.81 ±  2%      +3.2       12.99 ±  3%  perf-profile.children.cycles-pp.cpuidle_enter
      9.98 ±  2%      +3.2       13.22 ±  3%  perf-profile.children.cycles-pp.cpuidle_idle_call
     10.47 ±  2%      +3.5       13.94 ±  2%  perf-profile.children.cycles-pp.start_secondary
     10.60 ±  2%      +3.5       14.08 ±  2%  perf-profile.children.cycles-pp.do_idle
     10.60 ±  2%      +3.5       14.08 ±  2%  perf-profile.children.cycles-pp.common_startup_64
     10.60 ±  2%      +3.5       14.08 ±  2%  perf-profile.children.cycles-pp.cpu_startup_entry
     22.12 ±  2%      -6.5       15.60 ±  2%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.08 ±  5%      -0.0        0.07 ±  7%  perf-profile.self.cycles-pp.xfs_log_ticket_ungrant
      0.05            +0.0        0.06 ±  6%  perf-profile.self.cycles-pp.finish_task_switch
      0.05 ±  8%      +0.0        0.07 ±  5%  perf-profile.self.cycles-pp.llist_add_batch
      0.06 ±  8%      +0.0        0.07 ±  5%  perf-profile.self.cycles-pp.mutex_unlock
      0.06 ±  6%      +0.0        0.08        perf-profile.self.cycles-pp.__switch_to
      0.08 ±  5%      +0.0        0.10 ±  4%  perf-profile.self.cycles-pp.prepare_task_switch
      0.06 ±  6%      +0.0        0.08 ±  4%  perf-profile.self.cycles-pp.try_to_wake_up
      0.14 ±  2%      +0.0        0.16 ±  5%  perf-profile.self.cycles-pp.__schedule
      0.18 ±  4%      +0.0        0.20 ±  3%  perf-profile.self.cycles-pp.available_idle_cpu
      0.07 ± 10%      +0.0        0.10 ±  5%  perf-profile.self.cycles-pp.menu_select
      0.37            +0.0        0.41        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.12 ±  6%      +0.0        0.16 ±  3%  perf-profile.self.cycles-pp.flush_workqueue_prep_pwqs
      0.23 ±  2%      +0.0        0.28 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock_irq
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.vfs_write
      0.08 ±  4%      +0.1        0.14 ±  6%  perf-profile.self.cycles-pp.__mutex_lock
      0.00            +0.1        0.06 ± 13%  perf-profile.self.cycles-pp.poll_idle
      0.36 ±  2%      +0.1        0.45 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock
      1.01 ±  2%      +0.1        1.12 ±  2%  perf-profile.self.cycles-pp.copy_to_brd
      0.90 ±  2%      +0.7        1.57        perf-profile.self.cycles-pp.mutex_spin_on_owner
     54.26            +1.0       55.30        perf-profile.self.cycles-pp.osq_lock
      1.96 ±  5%      +1.3        3.22 ± 10%  perf-profile.self.cycles-pp.intel_idle_irq
      7.66 ±  2%      +1.9        9.52        perf-profile.self.cycles-pp.intel_idle



***************************************************************************************************
lkp-csl-2sp3: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
=========================================================================================
build_kconfig/compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/target/tbox_group/testcase:
  defconfig/gcc-13/performance/x86_64-rhel-8.3/200%/debian-12-x86_64-20240206.cgz/300s/vmlinux/lkp-csl-2sp3/kbuild

commit: 
  97450eb909 ("sched/pelt: Remove shift of thermal clock")
  e2bbd1c498 ("sched/fair: Reschedule the cfs_rq when current is ineligible")

97450eb909658573 e2bbd1c498980c5cb68f9973f41 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   1944235           -11.3%    1724001        cpuidle..usage
      0.07            -0.0        0.07        mpstat.cpu.all.soft%
     18920          +173.4%      51728        vmstat.system.cs
     66019            +2.3%      67537        vmstat.system.in
    238928 ±  4%     +18.7%     283496 ±  6%  numa-meminfo.node1.Active
    238928 ±  4%     +18.7%     283496 ±  6%  numa-meminfo.node1.Active(anon)
    252057 ±  4%     +19.9%     302260 ±  6%  numa-meminfo.node1.Shmem
     59680 ±  4%     +18.6%      70769 ±  6%  numa-vmstat.node1.nr_active_anon
     62962 ±  4%     +20.0%      75559 ±  6%  numa-vmstat.node1.nr_shmem
     59680 ±  4%     +18.6%      70769 ±  6%  numa-vmstat.node1.nr_zone_active_anon
     40.33 ± 16%     +46.7%      59.17 ± 17%  perf-c2c.DRAM.remote
     99.50 ±  7%     +36.3%     135.67 ± 10%  perf-c2c.HITM.local
     21.67 ± 22%     +85.4%      40.17 ± 18%  perf-c2c.HITM.remote
    266436           +22.5%     326455        meminfo.Active
    266436           +22.5%     326455        meminfo.Active(anon)
     71309           +19.2%      85005        meminfo.Mapped
    284474           +23.8%     352101        meminfo.Shmem
     50.65            +1.4%      51.35        kbuild.buildtime_per_iteration
     50.65            +1.4%      51.35        kbuild.real_time_per_iteration
    162.16            +1.3%     164.26        kbuild.sys_time_per_iteration
   4290772          +259.3%   15415002        kbuild.time.involuntary_context_switches
      5389            +1.0%       5444        kbuild.time.percent_of_cpu_this_job_got
    983.50            +1.3%     996.08        kbuild.time.system_time
     17417            +2.3%      17819        kbuild.time.user_time
      2898            +2.3%       2965        kbuild.user_time_per_iteration
     66625           +22.7%      81776        proc-vmstat.nr_active_anon
   1491011            +2.1%    1522533        proc-vmstat.nr_anon_pages
   1496099            +2.2%    1529304        proc-vmstat.nr_inactive_anon
     18140           +18.3%      21461        proc-vmstat.nr_mapped
      9585            +1.5%       9732        proc-vmstat.nr_page_table_pages
     71184           +23.7%      88073        proc-vmstat.nr_shmem
     66625           +22.7%      81776        proc-vmstat.nr_zone_active_anon
   1496099            +2.2%    1529304        proc-vmstat.nr_zone_inactive_anon
     64689            +7.4%      69486        proc-vmstat.numa_huge_pte_updates
  33355254            +7.4%   35829277        proc-vmstat.numa_pte_updates
    377789            +4.4%     394459        proc-vmstat.pgactivate
      2.67            +0.1        2.76        perf-stat.i.branch-miss-rate%
 7.979e+08            +3.1%  8.224e+08        perf-stat.i.branch-misses
     27.01            -1.2       25.84        perf-stat.i.cache-miss-rate%
 2.753e+08            +3.3%  2.843e+08        perf-stat.i.cache-misses
 8.884e+08           +11.4%  9.895e+08        perf-stat.i.cache-references
     18965          +174.5%      52066        perf-stat.i.context-switches
      1.08            +1.5%       1.09        perf-stat.i.cpi
 1.538e+11            +1.2%  1.557e+11        perf-stat.i.cpu-cycles
    702.89            -5.5%     664.26        perf-stat.i.cpu-migrations
      1.05            -1.0%       1.04        perf-stat.i.ipc
     12.50           -10.1%      11.23 ±  2%  perf-stat.i.major-faults
    329354            -1.1%     325687        perf-stat.i.minor-faults
    329366            -1.1%     325698        perf-stat.i.page-faults
      2.30            +4.3%       2.40        perf-stat.overall.MPKI
      3.07            +0.1        3.20        perf-stat.overall.branch-miss-rate%
     30.99            -2.2       28.74        perf-stat.overall.cache-miss-rate%
      1.28            +2.2%       1.31        perf-stat.overall.cpi
    558.80            -2.0%     547.74        perf-stat.overall.cycles-between-cache-misses
      0.78            -2.2%       0.76        perf-stat.overall.ipc
 7.966e+08            +2.9%  8.199e+08        perf-stat.ps.branch-misses
 2.748e+08            +3.2%  2.835e+08        perf-stat.ps.cache-misses
 8.869e+08           +11.2%  9.865e+08        perf-stat.ps.cache-references
     18918          +174.4%      51908        perf-stat.ps.context-switches
 1.536e+11            +1.1%  1.553e+11        perf-stat.ps.cpu-cycles
    701.06            -5.6%     662.09        perf-stat.ps.cpu-migrations
 1.197e+11            -1.1%  1.183e+11        perf-stat.ps.instructions
     12.47           -10.2%      11.20 ±  2%  perf-stat.ps.major-faults
    328614            -1.2%     324786        perf-stat.ps.minor-faults
    328627            -1.2%     324797        perf-stat.ps.page-faults
      0.73           -45.9%       0.40        sched_debug.cfs_rq:/.h_nr_running.avg
      2.06 ±  6%     -23.0%       1.58 ±  8%  sched_debug.cfs_rq:/.h_nr_running.max
      0.33           -50.0%       0.17        sched_debug.cfs_rq:/.h_nr_running.min
      0.38 ±  6%     -18.2%       0.31 ±  3%  sched_debug.cfs_rq:/.h_nr_running.stddev
      1689           -48.8%     864.97        sched_debug.cfs_rq:/.load.min
      1.61 ±  4%     -48.3%       0.83        sched_debug.cfs_rq:/.load_avg.min
      0.38 ±  2%     -41.7%       0.22        sched_debug.cfs_rq:/.nr_running.avg
      0.33           -50.0%       0.17        sched_debug.cfs_rq:/.nr_running.min
      0.16 ±  8%     +19.4%       0.20 ±  3%  sched_debug.cfs_rq:/.nr_running.stddev
    741.02 ±  2%     -43.4%     419.70        sched_debug.cfs_rq:/.runnable_avg.avg
      1841 ±  4%     -25.7%       1368 ±  2%  sched_debug.cfs_rq:/.runnable_avg.max
    310.50 ±  7%     -49.2%     157.89 ±  7%  sched_debug.cfs_rq:/.runnable_avg.min
    328.61 ±  6%     -21.0%     259.55 ±  3%  sched_debug.cfs_rq:/.runnable_avg.stddev
    404.14 ±  3%     -38.5%     248.50 ±  2%  sched_debug.cfs_rq:/.util_avg.avg
    181.06 ± 13%     -51.2%      88.44 ± 14%  sched_debug.cfs_rq:/.util_avg.min
     64.30 ±  8%     -18.5%      52.39 ±  2%  sched_debug.cfs_rq:/.util_est.avg
     92.86 ± 13%     -24.6%      70.04 ±  4%  sched_debug.cfs_rq:/.util_est.stddev
    720896           +17.0%     843275 ±  3%  sched_debug.cpu.avg_idle.avg
     11.27 ± 12%     -39.6%       6.81 ±  5%  sched_debug.cpu.clock.stddev
     52931           -42.2%      30600 ±  2%  sched_debug.cpu.curr->pid.avg
      7508 ± 22%     +67.6%      12586 ± 11%  sched_debug.cpu.curr->pid.stddev
      0.73           -45.8%       0.40 ±  2%  sched_debug.cpu.nr_running.avg
      0.33           -50.0%       0.17        sched_debug.cpu.nr_running.min
      0.38 ±  9%     -18.5%       0.31 ±  5%  sched_debug.cpu.nr_running.stddev
     30375          +163.9%      80145        sched_debug.cpu.nr_switches.avg
     42800 ±  4%    +120.8%      94514 ±  2%  sched_debug.cpu.nr_switches.max
     25282          +190.1%      73354        sched_debug.cpu.nr_switches.min
   -107.64           -18.1%     -88.19        sched_debug.cpu.nr_uninterruptible.min
     36.76 ±  4%     -17.1%      30.49 ±  5%  sched_debug.cpu.nr_uninterruptible.stddev
      4.63 ±  6%      -1.9        2.69 ±  8%  perf-profile.calltrace.cycles-pp.common_startup_64
      4.57 ±  7%      -1.9        2.66 ±  9%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
      4.57 ±  7%      -1.9        2.66 ±  9%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
      4.57 ±  7%      -1.9        2.66 ±  9%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      4.55 ±  7%      -1.9        2.65 ±  9%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      4.46 ±  7%      -1.8        2.62 ±  9%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      4.42 ±  7%      -1.8        2.61 ±  9%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      3.97 ±  7%      -1.5        2.44 ±  9%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      0.64 ±  2%      -0.1        0.59 ±  3%  perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      1.40 ±  2%      +0.1        1.46        perf-profile.calltrace.cycles-pp.open64
      1.00 ±  2%      +0.1        1.06 ±  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.99 ±  2%      +0.1        1.06 ±  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.81            +0.1        1.92        perf-profile.calltrace.cycles-pp.malloc
      0.72 ±  4%      +0.1        0.87 ±  2%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.35 ± 70%      +0.2        0.56 ±  2%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      4.63 ±  6%      -1.9        2.69 ±  8%  perf-profile.children.cycles-pp.common_startup_64
      4.63 ±  6%      -1.9        2.69 ±  8%  perf-profile.children.cycles-pp.cpu_startup_entry
      4.63 ±  6%      -1.9        2.69 ±  8%  perf-profile.children.cycles-pp.do_idle
      4.61 ±  6%      -1.9        2.69 ±  8%  perf-profile.children.cycles-pp.cpuidle_idle_call
      4.57 ±  7%      -1.9        2.66 ±  9%  perf-profile.children.cycles-pp.start_secondary
      4.51 ±  6%      -1.9        2.65 ±  8%  perf-profile.children.cycles-pp.cpuidle_enter
      4.51 ±  6%      -1.9        2.65 ±  8%  perf-profile.children.cycles-pp.cpuidle_enter_state
      4.01 ±  6%      -1.5        2.47 ±  8%  perf-profile.children.cycles-pp.intel_idle
      1.92 ±  3%      -0.2        1.76 ±  2%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      1.56 ±  3%      -0.1        1.48        perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.27 ±  6%      -0.1        0.21 ±  5%  perf-profile.children.cycles-pp.irq_exit_rcu
      0.07 ± 10%      -0.0        0.05 ±  8%  perf-profile.children.cycles-pp.free_pcppages_bulk
      0.11 ±  4%      -0.0        0.09 ±  4%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
      0.07 ±  6%      -0.0        0.06        perf-profile.children.cycles-pp.perf_rotate_context
      0.25            +0.0        0.26        perf-profile.children.cycles-pp.ggc_free(void*)
      0.08            +0.0        0.09 ±  4%  perf-profile.children.cycles-pp.dput
      0.25            +0.0        0.27        perf-profile.children.cycles-pp._cpp_pop_context
      0.27 ±  3%      +0.0        0.29 ±  2%  perf-profile.children.cycles-pp.mmap_region
      0.08 ± 10%      +0.0        0.10 ±  3%  perf-profile.children.cycles-pp.__split_vma
      0.13 ±  5%      +0.0        0.15 ±  3%  perf-profile.children.cycles-pp.do_vmi_align_munmap
      0.12 ±  4%      +0.0        0.14 ±  3%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
      0.32 ±  3%      +0.0        0.34        perf-profile.children.cycles-pp.mark_exp_read(tree_node*)
      0.12 ±  5%      +0.0        0.14        perf-profile.children.cycles-pp.do_vmi_munmap
      0.11 ±  4%      +0.0        0.14 ±  3%  perf-profile.children.cycles-pp.update_load_avg
      0.11 ±  6%      +0.0        0.13 ±  5%  perf-profile.children.cycles-pp.next_uptodate_folio
      0.20 ±  4%      +0.0        0.23 ±  3%  perf-profile.children.cycles-pp.filemap_map_pages
      0.35            +0.0        0.37 ±  2%  perf-profile.children.cycles-pp.walk_component
      0.32 ±  3%      +0.0        0.35 ±  2%  perf-profile.children.cycles-pp.do_mmap
      0.34 ±  3%      +0.0        0.37 ±  2%  perf-profile.children.cycles-pp.vm_mmap_pgoff
      0.29 ±  3%      +0.0        0.32 ±  3%  perf-profile.children.cycles-pp.lookup_name(tree_node*)
      0.22 ±  4%      +0.0        0.25 ±  3%  perf-profile.children.cycles-pp.do_read_fault
      0.05            +0.0        0.08 ±  5%  perf-profile.children.cycles-pp.smpboot_thread_fn
      0.27 ±  4%      +0.0        0.30 ±  3%  perf-profile.children.cycles-pp.do_fault
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.__rseq_handle_notify_resume
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.__update_load_avg_se
      1.20 ±  2%      +0.1        1.26        perf-profile.children.cycles-pp.path_openat
      1.41 ±  2%      +0.1        1.47        perf-profile.children.cycles-pp.open64
      0.00            +0.1        0.06 ±  7%  perf-profile.children.cycles-pp.run_ksoftirqd
      1.21 ±  2%      +0.1        1.28        perf-profile.children.cycles-pp.do_filp_open
      1.42 ±  2%      +0.1        1.49        perf-profile.children.cycles-pp.__x64_sys_openat
      1.41 ±  2%      +0.1        1.48        perf-profile.children.cycles-pp.do_sys_openat2
      1.84            +0.1        1.95        perf-profile.children.cycles-pp.malloc
      0.00            +0.1        0.11 ±  4%  perf-profile.children.cycles-pp.pick_next_task_fair
      4.32            +0.1        4.43        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      4.30            +0.1        4.42        perf-profile.children.cycles-pp.do_syscall_64
      0.13 ±  5%      +0.2        0.30 ±  3%  perf-profile.children.cycles-pp.__schedule
      0.12 ±  7%      +0.2        0.30 ±  4%  perf-profile.children.cycles-pp.schedule
      0.20 ±  4%      +0.2        0.43        perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
      4.01 ±  6%      -1.5        2.47 ±  8%  perf-profile.self.cycles-pp.intel_idle
      0.09 ±  9%      +0.0        0.11 ±  4%  perf-profile.self.cycles-pp.next_uptodate_folio
      0.28 ±  3%      +0.0        0.31 ±  3%  perf-profile.self.cycles-pp.lookup_name(tree_node*)
      1.71            +0.1        1.80        perf-profile.self.cycles-pp.malloc





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Honglei Wang 1 year, 8 months ago


On 2024/5/24 21:40, Chunxin Zang wrote:
> I found that some tasks have been running for a long enough time and
> have become illegal, but they are still not releasing the CPU. This
> will increase the scheduling delay of other processes. Therefore, I
> tried checking the current process in wakeup_preempt and entity_tick,
> and if it is illegal, reschedule that cfs queue.
> 
> The modification can reduce the scheduling delay by about 30% when
> RUN_TO_PARITY is enabled.
> So far, it has been running well in my test environment, and I have
> pasted some test results below.
> 
> I isolated four cores for testing. I ran Hackbench in the background
> and observed the test results of cyclictest.
> 
> hackbench -g 4 -l 100000000 &
> cyclictest --mlockall -D 5m -q
> 
>                                   EEVDF      PATCH  EEVDF-NO_PARITY  PATCH-NO_PARITY
> 
>                  # Min Latencies: 00006      00006      00006      00006
>    LNICE(-19)    # Avg Latencies: 00191      00122      00089      00066
>                  # Max Latencies: 15442      07648      14133      07713
> 
>                  # Min Latencies: 00006      00010      00006      00006
>    LNICE(0)      # Avg Latencies: 00466      00277      00289      00257
>                  # Max Latencies: 38917      32391      32665      17710
> 
>                  # Min Latencies: 00019      00053      00010      00013
>    LNICE(19)     # Avg Latencies: 37151      31045      18293      23035
>                  # Max Latencies: 2688299    7031295    426196     425708
> 
> I'm actually a bit hesitant about placing this modification under the
> NO_PARITY feature. This is because the modification conflicts with the
> semantics of RUN_TO_PARITY. So, I captured and compared the number of
> resched occurrences in wakeup_preempt to see if it introduced any
> additional overhead.
> 
> Similarly, hackbench is used to stress the utilization of four cores to
> 100%, and the method for capturing the number of PREEMPT occurrences is
> referenced from [1].
> 
> schedstats                          EEVDF       PATCH   EEVDF-NO_PARITY  PATCH-NO_PARITY  CFS(6.5)
> stats.check_preempt_count          5053054     5057286    5003806    5018589    5031908
> stats.patch_cause_preempt_count    -------     858044     -------    765726     -------
> stats.need_preempt_count           570520      858684     3380513    3426977    1140821
> 
>  From the above test results, there is a slight increase in the number of
> resched occurrences in wakeup_preempt. However, the results vary with each
> test, and sometimes the difference is not that significant. But overall,
> the count of reschedules remains lower than that of CFS and is much less
> than that of NO_PARITY.
> 
> [1]: https://lore.kernel.org/all/20230816134059.GC982867@hirez.programming.kicks-ass.net/T/#m52057282ceb6203318be1ce9f835363de3bef5cb
> 
> Signed-off-by: Chunxin Zang <zangchunxin@lixiang.com>
> Reviewed-by: Chen Yang <yangchen11@lixiang.com>
> ---
>   kernel/sched/fair.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 03be0d1330a6..a0005d240db5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>   			hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>   		return;
>   #endif
> +
> +	if (!entity_eligible(cfs_rq, curr))
> +		resched_curr(rq_of(cfs_rq));
>   }
>   
>   
> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>   	if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>   		return;
>   
> +	if (!entity_eligible(cfs_rq, se))
> +		goto preempt;
> +
>   	find_matching_se(&se, &pse);
>   	WARN_ON_ONCE(!pse);
>   
Hi Chunxin,

Did you run a comparative test to see which modification is more helpful 
on improve the latency? Modification at tick point makes more sense to 
me. But, seems just resched arbitrarily in wakeup might introduce too 
much preemption (and maybe more context switch?) in complex environment 
such as cgroup hierarchy.

Thanks,
Honglei

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Mike Galbraith 1 year, 8 months ago

On Fri, 2024-05-24 at 21:40 +0800, Chunxin Zang wrote:
> I found that some tasks have been running for a long enough time and
> have become illegal, but they are still not releasing the CPU. This
> will increase the scheduling delay of other processes. Therefore, I
> tried checking the current process in wakeup_preempt and entity_tick,
> and if it is illegal, reschedule that cfs queue.

My box gave making the XXX below reality a two thumbs up when fiddling
with the original unfettered and a bit harsh RUN_TO_PARITY.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8a5b1ae0aa55..922834f172b0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8413,12 +8413,13 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
 	update_curr(cfs_rq);

 	/*
-	 * XXX pick_eevdf(cfs_rq) != se ?
+	 * Run @curr until it is no longer our best option.  Basing the preempt
+	 * decision on @curr reselection puts any previous decisions back on the
+	 * table in context "now", including granularity preservation decisions
+	 * by RUN_TO_PARITY.
 	 */
-	if (pick_eevdf(cfs_rq) == pse)
-		goto preempt;
-
-	return;
+	if (pick_eevdf(cfs_rq) == se)
+		return;

 preempt:
 	resched_curr(rq);

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Peter Zijlstra 1 year, 8 months ago

On Sat, May 25, 2024 at 08:41:28AM +0200, Mike Galbraith wrote:

> -	if (pick_eevdf(cfs_rq) == pse)
> -		goto preempt;
> -
> -	return;
> +	if (pick_eevdf(cfs_rq) == se)
> +		return;

Right, this will preempt more.

This is probably going to make Prateek's case worse though. Then again,
I was already leaning to towards not making his stronger slice
protection default, because it simply hurts too much elsewhere.

Still, his observation that placing tasks can move V left which in turn
can cause the just scheduled in current non-eligible and cause
over-scheduling is valid -- just not sure what to do about it yet.

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Mike Galbraith 1 year, 8 months ago

On Mon, 2024-05-27 at 10:05 +0200, Peter Zijlstra wrote:
> On Sat, May 25, 2024 at 08:41:28AM +0200, Mike Galbraith wrote:
> 
> > -       if (pick_eevdf(cfs_rq) == pse)
> > -               goto preempt;
> > -
> > -       return;
> > +       if (pick_eevdf(cfs_rq) == se)
> > +               return;
> 
> Right, this will preempt more.

Yeah, and for no tangible benefit that I can see.  Repeating the mixed
load GUI vs compute testing a bunch of times, there's enough variance
to swamp any signal.

	-Mike

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Chen Yu 1 year, 8 months ago

On 2024-05-25 at 08:41:28 +0200, Mike Galbraith wrote:
> On Fri, 2024-05-24 at 21:40 +0800, Chunxin Zang wrote:
> > I found that some tasks have been running for a long enough time and
> > have become illegal, but they are still not releasing the CPU. This
> > will increase the scheduling delay of other processes. Therefore, I
> > tried checking the current process in wakeup_preempt and entity_tick,
> > and if it is illegal, reschedule that cfs queue.
> 
> My box gave making the XXX below reality a two thumbs up when fiddling
> with the original unfettered and a bit harsh RUN_TO_PARITY.
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a5b1ae0aa55..922834f172b0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8413,12 +8413,13 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>  	update_curr(cfs_rq);
> 
>  	/*
> -	 * XXX pick_eevdf(cfs_rq) != se ?
> +	 * Run @curr until it is no longer our best option.  Basing the preempt
> +	 * decision on @curr reselection puts any previous decisions back on the
> +	 * table in context "now", including granularity preservation decisions
> +	 * by RUN_TO_PARITY.
>  	 */
> -	if (pick_eevdf(cfs_rq) == pse)
> -		goto preempt;
> -
> -	return;
> +	if (pick_eevdf(cfs_rq) == se)
> +		return;
>

I suppose this change benefits the overloaded scenario:
neither current nor the wakee is the best one.

before: current continues to run.
after: best se in the tree preempts current.

hackbench -g 12 -l 1000000000 & (480 tasks, 2x of the CPUs)

cyclictest --mlockall -D 1m -q
before:
T: 0 (15983) P: 0 I:1000 C:  43054 Min:     11 Act:  144 Avg:  627 Max:   11446

after:
T: 0 (16473) P: 0 I:1000 C:  49822 Min:      7 Act:  160 Avg:  388 Max:   10190

Min, Avg, Max latency all decreased.

thanks,
Chenyu

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Mike Galbraith 1 year, 8 months ago

On Sat, 2024-05-25 at 19:57 +0800, Chen Yu wrote:
>
> I suppose this change benefits the overloaded scenario:
> neither current nor the wakee is the best one.

Depends on your definition of benefit. It'll increasing ctx switches a
bit, but I recall it not being much.

I dug up the script I was using at the time, numbers below for the
bored.  Bottom line: yeah, it's not much of a delta, especially when
comparing allegedly current EEVDF to CFS in an otherwise identical..
absolutely everything.

load: 5m chrome playing 1080p clip vs massive_intr (1 88% hog/cpu)

6.1.91-cfs
 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  massive_intr:(9)      |1897454.685 ms | 11581161 | avg:   0.026 ms | max:  43.008 ms | sum:296184.363 ms |
  dav1d-worker:(8)      |  94252.513 ms |   396284 | avg:   0.089 ms | max:  12.513 ms | sum:35275.546 ms |
  Compositor:5824       |  36851.590 ms |    61771 | avg:   0.080 ms | max:   9.310 ms | sum: 4965.456 ms |
  X:2362                |  32306.450 ms |   102571 | avg:   0.021 ms | max:  14.967 ms | sum: 2148.121 ms |
  VizCompositorTh:5913  |  25116.956 ms |    56602 | avg:   0.053 ms | max:   8.441 ms | sum: 2986.101 ms |
  chrome:(8)            |  23134.386 ms |    85335 | avg:   0.052 ms | max:  34.540 ms | sum: 4459.871 ms |
  ThreadPoolForeg:(43)  |  16742.353 ms |    71410 | avg:   0.083 ms | max:  23.059 ms | sum: 5943.056 ms |
  kwin_x11:2776         |  11383.572 ms |    95643 | avg:   0.017 ms | max:   8.358 ms | sum: 1589.414 ms |
  VideoFrameCompo:5919  |   9589.949 ms |    37838 | avg:   0.029 ms | max:   6.842 ms | sum: 1098.123 ms |
  kworker/5:1+eve:4508  |   8743.004 ms |  1647598 | avg:   0.003 ms | max:  12.002 ms | sum: 4956.587 ms |
  kworker/6:2-mm_:5407  |   8686.689 ms |  1636766 | avg:   0.003 ms | max:  10.407 ms | sum: 4779.475 ms |
  kworker/2:0-mm_:5707  |   8536.257 ms |  1607213 | avg:   0.003 ms | max:   9.473 ms | sum: 4776.918 ms |
  kworker/4:1-mm_:379   |   8532.410 ms |  1603438 | avg:   0.003 ms | max:  10.328 ms | sum: 4824.572 ms |
  kworker/1:0-eve:5409  |   8508.321 ms |  1598742 | avg:   0.003 ms | max:  13.124 ms | sum: 4742.128 ms |
  perf:(2)              |   5386.613 ms |      713 | avg:   0.020 ms | max:   2.268 ms | sum:   13.985 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |2242804.984 ms | 26015202 |                 |       43.008 ms |    416326.240 ms |
 ----------------------------------------------------------------------------------------------------------

6.1.91-eevdf
 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  massive_intr:(9)      |1971557.115 ms |  6127207 | avg:   0.034 ms | max:  16.351 ms | sum:208289.732 ms |
  dav1d-worker:(8)      |  85561.180 ms |   499175 | avg:   0.262 ms | max:  15.656 ms | sum:130584.659 ms |
  Compositor:4346       |  37730.564 ms |   200925 | avg:   0.112 ms | max:  10.922 ms | sum:22406.729 ms |
  X:2379                |  31761.636 ms |   229381 | avg:   0.081 ms | max:   9.740 ms | sum:18645.752 ms |
  VizCompositorTh:4423  |  24650.743 ms |   155138 | avg:   0.170 ms | max:  11.227 ms | sum:26426.655 ms |
  chrome:(8)            |  19551.099 ms |   156680 | avg:   0.201 ms | max:  18.183 ms | sum:31449.401 ms |
  ThreadPoolForeg:(43)  |  15547.777 ms |    89292 | avg:   0.223 ms | max:  20.007 ms | sum:19916.046 ms |
  kwin_x11:2776         |  11052.045 ms |   119945 | avg:   0.122 ms | max:  12.757 ms | sum:14687.478 ms |
  VideoFrameCompo:4429  |   8794.874 ms |    76728 | avg:   0.142 ms | max:  10.183 ms | sum:10895.692 ms |
  Chrome_ChildIOT:(7)   |   4917.764 ms |   165906 | avg:   0.190 ms | max:  10.212 ms | sum:31461.521 ms |
  Media:4428            |   3787.952 ms |    65288 | avg:   0.194 ms | max:  12.048 ms | sum:12662.386 ms |
  kworker/6:1-eve:135   |   3359.276 ms |   616547 | avg:   0.009 ms | max:   7.999 ms | sum: 5762.212 ms |
  kworker/4:1-eve:365   |   3144.292 ms |   578287 | avg:   0.009 ms | max:   7.619 ms | sum: 5322.637 ms |
  kworker/3:2-eve:297   |   3104.034 ms |   557150 | avg:   0.013 ms | max:   8.006 ms | sum: 7050.461 ms |
  perf:(2)              |   3098.480 ms |     1585 | avg:   0.102 ms | max:   5.470 ms | sum:  160.995 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |2271694.585 ms | 16259483 |                 |       32.144 ms |    669428.151 ms |
 ----------------------------------------------------------------------------------------------------------

+tweak
----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  massive_intr:(9)      |1965603.161 ms |  6284089 | avg:   0.034 ms | max:  16.005 ms | sum:214120.602 ms |
  dav1d-worker:(8)      |  89853.413 ms |   599733 | avg:   0.240 ms | max:  48.387 ms | sum:144117.080 ms |
  Compositor:4342       |  36473.771 ms |   171986 | avg:   0.129 ms | max:  11.366 ms | sum:22135.405 ms |
  X:2365                |  32167.915 ms |   218157 | avg:   0.088 ms | max:   9.841 ms | sum:19105.816 ms |
  VizCompositorTh:4425  |  24338.749 ms |   151884 | avg:   0.181 ms | max:  11.755 ms | sum:27553.783 ms |
  chrome:(8)            |  20154.023 ms |   158554 | avg:   0.207 ms | max:  15.979 ms | sum:32742.291 ms |
  ThreadPoolForeg:(45)  |  15672.931 ms |    94051 | avg:   0.215 ms | max:  17.452 ms | sum:20185.561 ms |
  kwin_x11:2773         |  11424.789 ms |   121491 | avg:   0.140 ms | max:  11.116 ms | sum:16958.020 ms |
  VideoFrameCompo:4431  |   8869.431 ms |    82385 | avg:   0.139 ms | max:  10.906 ms | sum:11471.193 ms |
  Chrome_ChildIOT:(7)   |   5148.973 ms |   167824 | avg:   0.189 ms | max:  13.755 ms | sum:31640.759 ms |
  kworker/7:1-eve:86    |   4258.124 ms |   784269 | avg:   0.009 ms | max:   8.228 ms | sum: 6780.999 ms |
  Media:4430            |   3897.705 ms |    62985 | avg:   0.205 ms | max:  10.797 ms | sum:12904.412 ms |
  kworker/6:1-eve:189   |   3608.493 ms |   663349 | avg:   0.009 ms | max:   7.902 ms | sum: 6034.231 ms |
  kworker/5:2-eve:827   |   3309.865 ms |   611424 | avg:   0.009 ms | max:   7.112 ms | sum: 5552.591 ms |
  perf:(2)              |   3241.897 ms |     1847 | avg:   0.087 ms | max:   5.464 ms | sum:  160.383 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |2272683.607 ms | 16721925 |                 |       57.181 ms |    692810.431 ms |
 ----------------------------------------------------------------------------------------------------------
                                          hohum
+peterz queue w. RUN_TO_PARITY
 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  massive_intr:(9)      |1972481.970 ms |  4989513 | avg:   0.042 ms | max:  20.019 ms | sum:208651.087 ms |
  dav1d-worker:(8)      |  85235.372 ms |   528422 | avg:   0.254 ms | max:  15.253 ms | sum:134274.493 ms |
  Compositor:4343       |  36977.626 ms |   154214 | avg:   0.122 ms | max:   9.868 ms | sum:18854.543 ms |
  X:2359                |  31873.877 ms |   187392 | avg:   0.094 ms | max:  10.100 ms | sum:17644.947 ms |
  VizCompositorTh:4427  |  24881.223 ms |   120412 | avg:   0.176 ms | max:  14.813 ms | sum:21210.898 ms |
  chrome:(8)            |  21579.151 ms |   133086 | avg:   0.200 ms | max:  12.952 ms | sum:26600.419 ms |
  ThreadPoolForeg:(45)  |  15327.978 ms |    94395 | avg:   0.196 ms | max:  35.000 ms | sum:18547.639 ms |
  kwin_x11:2776         |  11232.090 ms |   121392 | avg:   0.135 ms | max:  10.313 ms | sum:16426.213 ms |
  VideoFrameCompo:4433  |   8858.806 ms |    65658 | avg:   0.144 ms | max:  11.409 ms | sum: 9485.191 ms |
  Chrome_ChildIOT:(7)   |   4970.611 ms |   172570 | avg:   0.142 ms | max:  11.008 ms | sum:24467.160 ms |
  Media:4432            |   3781.277 ms |    63640 | avg:   0.162 ms | max:  10.096 ms | sum:10283.264 ms |
  kworker/7:1-eve:91    |   2930.823 ms |   534857 | avg:   0.009 ms | max:   8.234 ms | sum: 4723.577 ms |
  kworker/6:2-eve:356   |   2579.393 ms |   472864 | avg:   0.009 ms | max:   8.046 ms | sum: 4148.828 ms |
  perf:(2)              |   2569.531 ms |     1609 | avg:   0.101 ms | max:   5.966 ms | sum:  163.224 ms |
  kworker/4:0-eve:40    |   2432.133 ms |   442300 | avg:   0.009 ms | max:   9.475 ms | sum: 3992.979 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |2263072.188 ms | 12993836 |                 |       35.000 ms |    601609.374 ms |
 ----------------------------------------------------------------------------------------------------------
                                          marko?
+NO_DELAY_DEQUEUE
 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  massive_intr:(9)      |1968212.427 ms |  6050894 | avg:   0.035 ms | max:  20.032 ms | sum:213163.997 ms |
  dav1d-worker:(8)      |  86929.255 ms |   583692 | avg:   0.246 ms | max:  14.986 ms | sum:143561.571 ms |
  Compositor:4933       |  36733.711 ms |   219265 | avg:   0.100 ms | max:  14.986 ms | sum:21888.378 ms |
  X:2359                |  31624.338 ms |   233581 | avg:   0.074 ms | max:   8.629 ms | sum:17324.762 ms |
  VizCompositorTh:5018  |  24597.941 ms |   179049 | avg:   0.147 ms | max:  11.717 ms | sum:26333.576 ms |
  chrome:(8)            |  20430.046 ms |   179393 | avg:   0.173 ms | max:  20.903 ms | sum:30976.208 ms |
  ThreadPoolForeg:(39)  |  15423.142 ms |   109837 | avg:   0.183 ms | max:  24.525 ms | sum:20115.906 ms |
  kwin_x11:2776         |  11413.866 ms |   129426 | avg:   0.121 ms | max:  10.718 ms | sum:15719.900 ms |
  VideoFrameCompo:5023  |   8817.956 ms |    78028 | avg:   0.130 ms | max:  18.471 ms | sum:10162.602 ms |
  Chrome_ChildIOT:(7)   |   5356.461 ms |   187001 | avg:   0.160 ms | max:  11.565 ms | sum:29969.033 ms |
  Media:5022            |   3793.341 ms |    64887 | avg:   0.186 ms | max:  13.229 ms | sum:12096.948 ms |
  kworker/6:0-eve:5052  |   3509.228 ms |   643562 | avg:   0.010 ms | max:   8.005 ms | sum: 6305.605 ms |
  kworker/3:0-eve:34    |   3363.538 ms |   598417 | avg:   0.012 ms | max:   8.892 ms | sum: 6910.297 ms |
  perf:(2)              |   3167.463 ms |     1835 | avg:   0.090 ms | max:   5.039 ms | sum:  164.352 ms |
  kworker/4:2+eve:4808  |   3002.682 ms |   549210 | avg:   0.010 ms | max:   8.622 ms | sum: 5400.444 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |2270484.307 ms | 16315986 |                 |       24.525 ms |    677870.230 ms |
 ----------------------------------------------------------------------------------------------------------
                                          polo

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Chen Yu 1 year, 8 months ago

On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
> I found that some tasks have been running for a long enough time and
> have become illegal, but they are still not releasing the CPU. This
> will increase the scheduling delay of other processes. Therefore, I
> tried checking the current process in wakeup_preempt and entity_tick,
> and if it is illegal, reschedule that cfs queue.
>
> The modification can reduce the scheduling delay by about 30% when
> RUN_TO_PARITY is enabled.
> So far, it has been running well in my test environment, and I have
> pasted some test results below.
> 

Interesting, besides hackbench, I assume that you have workload in
real production environment that is sensitive to wakeup latency?

>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 03be0d1330a6..a0005d240db5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>  			hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>  		return;
>  #endif
> +
> +	if (!entity_eligible(cfs_rq, curr))
> +		resched_curr(rq_of(cfs_rq));
>  }
>

entity_tick() -> update_curr() -> update_deadline():
se->vruntime >= se->deadline ? resched_curr()
only current has expired its slice will it be scheduled out.

So here you want to schedule current out if its lag becomes 0.

In lastest sched/eevdf branch, it is controlled by two sched features:
RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c

Maybe something like this can achieve your goal
	if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
		resched_curr

>  
> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>  	if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>  		return;
>  
> +	if (!entity_eligible(cfs_rq, se))
> +		goto preempt;
> +

Not sure if this is applicable, later in this function, pick_eevdf() checks
if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
be evicted. And this change does not consider the cgroup hierarchy.

Besides, the check of current eligiblity can get false negative result,
if the enqueued entity has a positive lag. Prateek proposed to
remove the check of current's eligibility in pick_eevdf():
https://lore.kernel.org/lkml/20240325060226.1540-2-kprateek.nayak@amd.com/

If I understand your requirement correctly, you want to reduce the wakeup
latency. There are some codes under developed by Peter, which could
customized task's wakeup latency via setting its slice:
https://lore.kernel.org/lkml/20240405110010.934104715@infradead.org/

thanks,
Chenyu

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Chunxin Zang 1 year, 8 months ago

> On May 24, 2024, at 23:30, Chen Yu <yu.c.chen@intel.com> wrote:
> 
> On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
>> I found that some tasks have been running for a long enough time and
>> have become illegal, but they are still not releasing the CPU. This
>> will increase the scheduling delay of other processes. Therefore, I
>> tried checking the current process in wakeup_preempt and entity_tick,
>> and if it is illegal, reschedule that cfs queue.
>> 
>> The modification can reduce the scheduling delay by about 30% when
>> RUN_TO_PARITY is enabled.
>> So far, it has been running well in my test environment, and I have
>> pasted some test results below.
>> 
> 
> Interesting, besides hackbench, I assume that you have workload in
> real production environment that is sensitive to wakeup latency?

Hi Chen

Yes, my workload  are quite sensitive to wakeup latency .
> 
>> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 03be0d1330a6..a0005d240db5 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>> 			hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>> 		return;
>> #endif
>> +
>> +	if (!entity_eligible(cfs_rq, curr))
>> +		resched_curr(rq_of(cfs_rq));
>> }
>> 
> 
> entity_tick() -> update_curr() -> update_deadline():
> se->vruntime >= se->deadline ? resched_curr()
> only current has expired its slice will it be scheduled out.
> 
> So here you want to schedule current out if its lag becomes 0.
> 
> In lastest sched/eevdf branch, it is controlled by two sched features:
> RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
> RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c
> 
> Maybe something like this can achieve your goal
> 	if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
> 		resched_curr
> 
>> 
>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>> 	if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>> 		return;
>> 
>> +	if (!entity_eligible(cfs_rq, se))
>> +		goto preempt;
>> +
> 
> Not sure if this is applicable, later in this function, pick_eevdf() checks
> if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
> be evicted. And this change does not consider the cgroup hierarchy.
> 
> Besides, the check of current eligiblity can get false negative result,
> if the enqueued entity has a positive lag. Prateek proposed to
> remove the check of current's eligibility in pick_eevdf():
> https://lore.kernel.org/lkml/20240325060226.1540-2-kprateek.nayak@amd.com/

Thank you for letting me know about Peter's latest updates and thoughts.
Actually, the original intention of my modification was to minimize the
traversal of the rb-tree as much as possible. For example, in the following
scenario, if 'curr' is ineligible, the system would still traverse the rb-tree in
'pick_eevdf' to return an optimal 'se', and then trigger  'resched_curr'. After
resched, the scheduler will call 'pick_eevdf' again, traversing the
rb-tree once more. This ultimately results in the rb-tree being traversed
twice. If it's possible to determine that 'curr' is ineligible within 'wakeup_preempt'
and directly trigger a 'resched', it would reduce the traversal of the rb-tree
by one time.


wakeup_preempt-> pick_eevdf                                      -> resched_curr
                                                 |->'traverse the rb-tree'  |
schedule->pick_eevdf
                                   |->'traverse the rb-tree'


Of course, this would break the semantics of RESPECT_SLICE as well as
RUN_TO_PARITY. So, this might be considered a performance enhancement
for scenarios without NO_RESPECT_SLICE/NO_RUN_TO_PARITY.

thanks 
Chunxin


> If I understand your requirement correctly, you want to reduce the wakeup
> latency. There are some codes under developed by Peter, which could
> customized task's wakeup latency via setting its slice:
> https://lore.kernel.org/lkml/20240405110010.934104715@infradead.org/
> 
> thanks,
> Chenyu

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Chunxin Zang 1 year, 8 months ago


> On May 28, 2024, at 10:42, Chunxin Zang <spring.cxz@gmail.com> wrote:
> 
>> 
>> On May 24, 2024, at 23:30, Chen Yu <yu.c.chen@intel.com> wrote:
>> 
>> On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
>>> I found that some tasks have been running for a long enough time and
>>> have become illegal, but they are still not releasing the CPU. This
>>> will increase the scheduling delay of other processes. Therefore, I
>>> tried checking the current process in wakeup_preempt and entity_tick,
>>> and if it is illegal, reschedule that cfs queue.
>>> 
>>> The modification can reduce the scheduling delay by about 30% when
>>> RUN_TO_PARITY is enabled.
>>> So far, it has been running well in my test environment, and I have
>>> pasted some test results below.
>>> 
>> 
>> Interesting, besides hackbench, I assume that you have workload in
>> real production environment that is sensitive to wakeup latency?
> 
> Hi Chen
> 
> Yes, my workload  are quite sensitive to wakeup latency .
>> 
>>> 
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 03be0d1330a6..a0005d240db5 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>>> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>>> return;
>>> #endif
>>> +
>>> + if (!entity_eligible(cfs_rq, curr))
>>> + resched_curr(rq_of(cfs_rq));
>>> }
>>> 
>> 
>> entity_tick() -> update_curr() -> update_deadline():
>> se->vruntime >= se->deadline ? resched_curr()
>> only current has expired its slice will it be scheduled out.
>> 
>> So here you want to schedule current out if its lag becomes 0.
>> 
>> In lastest sched/eevdf branch, it is controlled by two sched features:
>> RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
>> RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
>> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c
>> 
>> Maybe something like this can achieve your goal
>> if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
>> resched_curr
>> 
>>> 
>>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>>> return;
>>> 
>>> + if (!entity_eligible(cfs_rq, se))
>>> + goto preempt;
>>> +
>> 
>> Not sure if this is applicable, later in this function, pick_eevdf() checks
>> if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
>> be evicted. And this change does not consider the cgroup hierarchy.
>> 
>> Besides, the check of current eligiblity can get false negative result,
>> if the enqueued entity has a positive lag. Prateek proposed to
>> remove the check of current's eligibility in pick_eevdf():
>> https://lore.kernel.org/lkml/20240325060226.1540-2-kprateek.nayak@amd.com/
> 
> Thank you for letting me know about Peter's latest updates and thoughts.
> Actually, the original intention of my modification was to minimize the
> traversal of the rb-tree as much as possible. For example, in the following
> scenario, if 'curr' is ineligible, the system would still traverse the rb-tree in
> 'pick_eevdf' to return an optimal 'se', and then trigger  'resched_curr'. After
> resched, the scheduler will call 'pick_eevdf' again, traversing the
> rb-tree once more. This ultimately results in the rb-tree being traversed
> twice. If it's possible to determine that 'curr' is ineligible within 'wakeup_preempt'
> and directly trigger a 'resched', it would reduce the traversal of the rb-tree
> by one time.
> 
> 
> wakeup_preempt-> pick_eevdf                                      -> resched_curr
>                                                 |->'traverse the rb-tree'  |
> schedule->pick_eevdf
>                                   |->'traverse the rb-tree'
> 
> 
> Of course, this would break the semantics of RESPECT_SLICE as well as
> RUN_TO_PARITY. So, this might be considered a performance enhancement
> for scenarios without NO_RESPECT_SLICE/NO_RUN_TO_PARITY.
> 
Sorry for the mistake. I mean it should be a performance enhancement for scenarios
with NO_RESPECT_SLICE/NO_RUN_TO_PARITY.

Maybe it should be like this

@@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
return;

+ if (!sched_feat(RESPECT_SLICE) && !sched_feat(RUN_TO_PARITY) && !entity_eligible(cfs_rq, se))
+ 	goto preempt;
+

> thanks 
> Chunxin
> 
> 
>> If I understand your requirement correctly, you want to reduce the wakeup
>> latency. There are some codes under developed by Peter, which could
>> customized task's wakeup latency via setting its slice:
>> https://lore.kernel.org/lkml/20240405110010.934104715@infradead.org/
>> 
>> thanks,
>> Chenyu

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by K Prateek Nayak 1 year, 8 months ago

Hello Chunxin,

On 5/28/2024 8:12 AM, Chunxin Zang wrote:
> 
>> On May 24, 2024, at 23:30, Chen Yu <yu.c.chen@intel.com> wrote:
>>
>> On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
>>> I found that some tasks have been running for a long enough time and
>>> have become illegal, but they are still not releasing the CPU. This
>>> will increase the scheduling delay of other processes. Therefore, I
>>> tried checking the current process in wakeup_preempt and entity_tick,
>>> and if it is illegal, reschedule that cfs queue.
>>>
>>> The modification can reduce the scheduling delay by about 30% when
>>> RUN_TO_PARITY is enabled.
>>> So far, it has been running well in my test environment, and I have
>>> pasted some test results below.
>>>
>>
>> Interesting, besides hackbench, I assume that you have workload in
>> real production environment that is sensitive to wakeup latency?
> 
> Hi Chen
> 
> Yes, my workload  are quite sensitive to wakeup latency .
>>
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 03be0d1330a6..a0005d240db5 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>>> 			hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>>> 		return;
>>> #endif
>>> +
>>> +	if (!entity_eligible(cfs_rq, curr))
>>> +		resched_curr(rq_of(cfs_rq));
>>> }
>>>
>>
>> entity_tick() -> update_curr() -> update_deadline():
>> se->vruntime >= se->deadline ? resched_curr()
>> only current has expired its slice will it be scheduled out.
>>
>> So here you want to schedule current out if its lag becomes 0.
>>
>> In lastest sched/eevdf branch, it is controlled by two sched features:
>> RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
>> RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
>> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c
>>
>> Maybe something like this can achieve your goal
>> 	if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
>> 		resched_curr
>>
>>>
>>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>> 	if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>>> 		return;
>>>
>>> +	if (!entity_eligible(cfs_rq, se))
>>> +		goto preempt;
>>> +
>>
>> Not sure if this is applicable, later in this function, pick_eevdf() checks
>> if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
>> be evicted. And this change does not consider the cgroup hierarchy.

The above line will be referred to as [1] below.

>>
>> Besides, the check of current eligiblity can get false negative result,
>> if the enqueued entity has a positive lag. Prateek proposed to
>> remove the check of current's eligibility in pick_eevdf():
>> https://lore.kernel.org/lkml/20240325060226.1540-2-kprateek.nayak@amd.com/
> 
> Thank you for letting me know about Peter's latest updates and thoughts.
> Actually, the original intention of my modification was to minimize the
> traversal of the rb-tree as much as possible. For example, in the following
> scenario, if 'curr' is ineligible, the system would still traverse the rb-tree in
> 'pick_eevdf' to return an optimal 'se', and then trigger  'resched_curr'. After
> resched, the scheduler will call 'pick_eevdf' again, traversing the
> rb-tree once more. This ultimately results in the rb-tree being traversed
> twice. If it's possible to determine that 'curr' is ineligible within 'wakeup_preempt'
> and directly trigger a 'resched', it would reduce the traversal of the rb-tree
> by one time.
> 
> 
> wakeup_preempt-> pick_eevdf                                      -> resched_curr
>                                                  |->'traverse the rb-tree'  |
> schedule->pick_eevdf
>                                    |->'traverse the rb-tree'

I see what you mean but a couple of things:

(I'm adding the check_preempt_wakeup_fair() hunk from the original patch
below for ease of interpretation)

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 03be0d1330a6..a0005d240db5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>  	if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>  		return;
>  
> +	if (!entity_eligible(cfs_rq, se))
> +		goto preempt;
> +

This check uses the root cfs_rq since "task_cfs_rq()" returns the
"rq->cfs" of the runqueue the task is on. In presence of cgroups or
CONFIG_SCHED_AUTOGROUP, there is a good chance this the task is queued
on a higher order cfs_rq and this entity_eligible() calculation might
not be valid since the vruntime calculation for the "se" is relative to
the "cfs_rq" where it is queued on. Please correct me if I'm wrong but
I believe that is what Chenyu was referring to in [1].

>  	find_matching_se(&se, &pse);
>  	WARN_ON_ONCE(!pse);
>  
> -- 

In addition to that, There is an update_curr() call below for the first
cfs_rq where both the entities' hierarchy is queued which is found by
find_matching_se(). I believe that is required too to update the
vruntime and deadline of the entity where preemption can happen.

If you want to circumvent a second call to pick_eevdf(), could you
perhaps do:

(Only build tested)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9eb63573110c..653b1bee1e62 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8407,9 +8407,13 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
 	update_curr(cfs_rq);
 
 	/*
-	 * XXX pick_eevdf(cfs_rq) != se ?
+	 * If the hierarchy of current task is ineligible at the common
+	 * point on the newly woken entity, there is a good chance of
+	 * wakeup preemption by the newly woken entity. Mark for resched
+	 * and allow pick_eevdf() in schedule() to judge which task to
+	 * run next.
 	 */
-	if (pick_eevdf(cfs_rq) == pse)
+	if (!entity_eligible(cfs_rq, se))
 		goto preempt;
 
 	return;

--

There are other implications here which is specifically highlighted by
the "XXX pick_eevdf(cfs_rq) != se ?" comment. If the current waking
entity is not the entity with the earliest eligible virtual deadline,
the current task is still preempted if any other entity has the EEVD.

Mike's box gave switching to above two thumbs up; I have to check what
my box says :)

Following are DeathStarBench results with your original patch compared
to v6.9-rc5 based tip:sched/core:

==================================================================
Test          : DeathStarBench
Why?	      : Some tasks here do no like aggressive preemption
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
Pinning      scaling     tip            eager_preempt (pct imp)
 1CCD           1       1.00            0.99 (%diff: -1.13%)
 2CCD           2       1.00            0.97 (%diff: -3.21%)
 4CCD           3       1.00            0.97 (%diff: -3.41%)
 8CCD           6       1.00            0.97 (%diff: -3.20%)
--

I'll give the variants mentioned in the thread a try too to see if
some of my assumptions around heavy preemption hold good. I was also
able to dig up an old patch by Balakumaran Kannan which skipped
pick_eevdf() altogether if "pse" is ineligible which also seems like
a good optimization based on current check in
check_preempt_wakeup_fair() but it perhaps doesn't help the case of 
wakeup-latency sensitivity you are optimizing for; only reduces
rb-tree traversal if there is no chance of pick_eevdf() returning "pse" 
https://lore.kernel.org/lkml/20240301130100.267727-1-kumaran.4353@gmail.com/ 

--
Thanks and Regards,
Prateek

> 
> 
> Of course, this would break the semantics of RESPECT_SLICE as well as
> RUN_TO_PARITY. So, this might be considered a performance enhancement
> for scenarios without NO_RESPECT_SLICE/NO_RUN_TO_PARITY.
> 
> thanks 
> Chunxin
> 
> 
>> If I understand your requirement correctly, you want to reduce the wakeup
>> latency. There are some codes under developed by Peter, which could
>> customized task's wakeup latency via setting its slice:
>> https://lore.kernel.org/lkml/20240405110010.934104715@infradead.org/
>>
>> thanks,
>> Chenyu

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Chen Yu 1 year, 8 months ago

Hi Prateek, Chunxin,

On 2024-05-28 at 10:32:23 +0530, K Prateek Nayak wrote:
> Hello Chunxin,
> 
> On 5/28/2024 8:12 AM, Chunxin Zang wrote:
> > 
> >> On May 24, 2024, at 23:30, Chen Yu <yu.c.chen@intel.com> wrote:
> >>
> >> On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
> >>> I found that some tasks have been running for a long enough time and
> >>> have become illegal, but they are still not releasing the CPU. This
> >>> will increase the scheduling delay of other processes. Therefore, I
> >>> tried checking the current process in wakeup_preempt and entity_tick,
> >>> and if it is illegal, reschedule that cfs queue.
> >>>
> >>> The modification can reduce the scheduling delay by about 30% when
> >>> RUN_TO_PARITY is enabled.
> >>> So far, it has been running well in my test environment, and I have
> >>> pasted some test results below.
> >>>
> >>
> >> Interesting, besides hackbench, I assume that you have workload in
> >> real production environment that is sensitive to wakeup latency?
> > 
> > Hi Chen
> > 
> > Yes, my workload  are quite sensitive to wakeup latency .
> >>
> >>>
> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>> index 03be0d1330a6..a0005d240db5 100644
> >>> --- a/kernel/sched/fair.c
> >>> +++ b/kernel/sched/fair.c
> >>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
> >>> 			hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
> >>> 		return;
> >>> #endif
> >>> +
> >>> +	if (!entity_eligible(cfs_rq, curr))
> >>> +		resched_curr(rq_of(cfs_rq));
> >>> }
> >>>
> >>
> >> entity_tick() -> update_curr() -> update_deadline():
> >> se->vruntime >= se->deadline ? resched_curr()
> >> only current has expired its slice will it be scheduled out.
> >>
> >> So here you want to schedule current out if its lag becomes 0.
> >>
> >> In lastest sched/eevdf branch, it is controlled by two sched features:
> >> RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
> >> RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
> >> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c
> >>
> >> Maybe something like this can achieve your goal
> >> 	if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
> >> 		resched_curr
> >>
> >>>
> >>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> >>> 	if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
> >>> 		return;
> >>>
> >>> +	if (!entity_eligible(cfs_rq, se))
> >>> +		goto preempt;
> >>> +
> >>
> >> Not sure if this is applicable, later in this function, pick_eevdf() checks
> >> if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
> >> be evicted. And this change does not consider the cgroup hierarchy.
> 
> The above line will be referred to as [1] below.
> 
> >>
> >> Besides, the check of current eligiblity can get false negative result,
> >> if the enqueued entity has a positive lag. Prateek proposed to
> >> remove the check of current's eligibility in pick_eevdf():
> >> https://lore.kernel.org/lkml/20240325060226.1540-2-kprateek.nayak@amd.com/
> > 
> > Thank you for letting me know about Peter's latest updates and thoughts.
> > Actually, the original intention of my modification was to minimize the
> > traversal of the rb-tree as much as possible. For example, in the following
> > scenario, if 'curr' is ineligible, the system would still traverse the rb-tree in
> > 'pick_eevdf' to return an optimal 'se', and then trigger  'resched_curr'. After
> > resched, the scheduler will call 'pick_eevdf' again, traversing the
> > rb-tree once more. This ultimately results in the rb-tree being traversed
> > twice. If it's possible to determine that 'curr' is ineligible within 'wakeup_preempt'
> > and directly trigger a 'resched', it would reduce the traversal of the rb-tree
> > by one time.
> > 
> > 
> > wakeup_preempt-> pick_eevdf                                      -> resched_curr
> >                                                  |->'traverse the rb-tree'  |
> > schedule->pick_eevdf
> >                                    |->'traverse the rb-tree'
> 
> I see what you mean but a couple of things:
> 
> (I'm adding the check_preempt_wakeup_fair() hunk from the original patch
> below for ease of interpretation)
> 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 03be0d1330a6..a0005d240db5 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> >  	if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
> >  		return;
> >  
> > +	if (!entity_eligible(cfs_rq, se))
> > +		goto preempt;
> > +
> 
> This check uses the root cfs_rq since "task_cfs_rq()" returns the
> "rq->cfs" of the runqueue the task is on. In presence of cgroups or
> CONFIG_SCHED_AUTOGROUP, there is a good chance this the task is queued
> on a higher order cfs_rq and this entity_eligible() calculation might
> not be valid since the vruntime calculation for the "se" is relative to
> the "cfs_rq" where it is queued on. Please correct me if I'm wrong but
> I believe that is what Chenyu was referring to in [1].
>

Sorry for the late reply and thanks for help clarify this. Yes, this is
what my previous concern was:
1. It does not consider the cgroup and does not check preemption in the same
   level which is covered by find_matching_se().
2. The if (!entity_eligible(cfs_rq, se)) for current is redundant because
   later pick_eevdf() will check the eligible of current anyway. But
   as pointed out by Chunxi, his concern is the double-traverse of the rb-tree,
   I just wonder if we could leverage the cfs_rq->next to store the next
   candidate, so it can be picked directly in the 2nd pick as a fast path?
   Something like below untested:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8a5b1ae0aa55..f716646d595e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8349,7 +8349,7 @@ static void set_next_buddy(struct sched_entity *se)
 static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
 {
        struct task_struct *curr = rq->curr;
-       struct sched_entity *se = &curr->se, *pse = &p->se;
+       struct sched_entity *se = &curr->se, *pse = &p->se, *next;
        struct cfs_rq *cfs_rq = task_cfs_rq(curr);
        int cse_is_idle, pse_is_idle;
 
@@ -8415,7 +8415,11 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
        /*
         * XXX pick_eevdf(cfs_rq) != se ?
         */
-       if (pick_eevdf(cfs_rq) == pse)
+       next = pick_eevdf(cfs_rq);
+       if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && next)
+               set_next_buddy(next);
+
+       if (next == pse)
                goto preempt;
 
        return;


thanks,
Chenyu

> >  	find_matching_se(&se, &pse);
> >  	WARN_ON_ONCE(!pse);
> >  
> > -- 
> 
> In addition to that, There is an update_curr() call below for the first
> cfs_rq where both the entities' hierarchy is queued which is found by
> find_matching_se(). I believe that is required too to update the
> vruntime and deadline of the entity where preemption can happen.
> 
> If you want to circumvent a second call to pick_eevdf(), could you
> perhaps do:
> 
> (Only build tested)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9eb63573110c..653b1bee1e62 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8407,9 +8407,13 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>  	update_curr(cfs_rq);
>  
>  	/*
> -	 * XXX pick_eevdf(cfs_rq) != se ?
> +	 * If the hierarchy of current task is ineligible at the common
> +	 * point on the newly woken entity, there is a good chance of
> +	 * wakeup preemption by the newly woken entity. Mark for resched
> +	 * and allow pick_eevdf() in schedule() to judge which task to
> +	 * run next.
>  	 */
> -	if (pick_eevdf(cfs_rq) == pse)
> +	if (!entity_eligible(cfs_rq, se))
>  		goto preempt;
>  
>  	return;
> 
> --
> 
> There are other implications here which is specifically highlighted by
> the "XXX pick_eevdf(cfs_rq) != se ?" comment. If the current waking
> entity is not the entity with the earliest eligible virtual deadline,
> the current task is still preempted if any other entity has the EEVD.
> 
> Mike's box gave switching to above two thumbs up; I have to check what
> my box says :)
> 
> Following are DeathStarBench results with your original patch compared
> to v6.9-rc5 based tip:sched/core:
> 
> ==================================================================
> Test          : DeathStarBench
> Why?	      : Some tasks here do no like aggressive preemption
> Units         : Normalized throughput
> Interpretation: Higher is better
> Statistic     : Mean
> ==================================================================
> Pinning      scaling     tip            eager_preempt (pct imp)
>  1CCD           1       1.00            0.99 (%diff: -1.13%)
>  2CCD           2       1.00            0.97 (%diff: -3.21%)
>  4CCD           3       1.00            0.97 (%diff: -3.41%)
>  8CCD           6       1.00            0.97 (%diff: -3.20%)
> --
> 
> I'll give the variants mentioned in the thread a try too to see if
> some of my assumptions around heavy preemption hold good. I was also
> able to dig up an old patch by Balakumaran Kannan which skipped
> pick_eevdf() altogether if "pse" is ineligible which also seems like
> a good optimization based on current check in
> check_preempt_wakeup_fair() but it perhaps doesn't help the case of 
> wakeup-latency sensitivity you are optimizing for; only reduces
> rb-tree traversal if there is no chance of pick_eevdf() returning "pse" 
> https://lore.kernel.org/lkml/20240301130100.267727-1-kumaran.4353@gmail.com/ 
> 
> --
> Thanks and Regards,
> Prateek
> 
> > 
> > 
> > Of course, this would break the semantics of RESPECT_SLICE as well as
> > RUN_TO_PARITY. So, this might be considered a performance enhancement
> > for scenarios without NO_RESPECT_SLICE/NO_RUN_TO_PARITY.
> > 
> > thanks 
> > Chunxin
> > 
> > 
> >> If I understand your requirement correctly, you want to reduce the wakeup
> >> latency. There are some codes under developed by Peter, which could
> >> customized task's wakeup latency via setting its slice:
> >> https://lore.kernel.org/lkml/20240405110010.934104715@infradead.org/
> >>
> >> thanks,
> >> Chenyu
>

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Chunxin Zang 1 year, 8 months ago


> On Jun 6, 2024, at 01:19, Chen Yu <yu.c.chen@intel.com> wrote:
> 
> Hi Prateek, Chunxin,
> 
> On 2024-05-28 at 10:32:23 +0530, K Prateek Nayak wrote:
>> Hello Chunxin,
>> 
>> On 5/28/2024 8:12 AM, Chunxin Zang wrote:
>>> 
>>>> On May 24, 2024, at 23:30, Chen Yu <yu.c.chen@intel.com> wrote:
>>>> 
>>>> On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
>>>>> I found that some tasks have been running for a long enough time and
>>>>> have become illegal, but they are still not releasing the CPU. This
>>>>> will increase the scheduling delay of other processes. Therefore, I
>>>>> tried checking the current process in wakeup_preempt and entity_tick,
>>>>> and if it is illegal, reschedule that cfs queue.
>>>>> 
>>>>> The modification can reduce the scheduling delay by about 30% when
>>>>> RUN_TO_PARITY is enabled.
>>>>> So far, it has been running well in my test environment, and I have
>>>>> pasted some test results below.
>>>>> 
>>>> 
>>>> Interesting, besides hackbench, I assume that you have workload in
>>>> real production environment that is sensitive to wakeup latency?
>>> 
>>> Hi Chen
>>> 
>>> Yes, my workload  are quite sensitive to wakeup latency .
>>>> 
>>>>> 
>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>> index 03be0d1330a6..a0005d240db5 100644
>>>>> --- a/kernel/sched/fair.c
>>>>> +++ b/kernel/sched/fair.c
>>>>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>>>>> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>>>>> return;
>>>>> #endif
>>>>> +
>>>>> + if (!entity_eligible(cfs_rq, curr))
>>>>> + resched_curr(rq_of(cfs_rq));
>>>>> }
>>>>> 
>>>> 
>>>> entity_tick() -> update_curr() -> update_deadline():
>>>> se->vruntime >= se->deadline ? resched_curr()
>>>> only current has expired its slice will it be scheduled out.
>>>> 
>>>> So here you want to schedule current out if its lag becomes 0.
>>>> 
>>>> In lastest sched/eevdf branch, it is controlled by two sched features:
>>>> RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
>>>> RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c
>>>> 
>>>> Maybe something like this can achieve your goal
>>>> if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
>>>> resched_curr
>>>> 
>>>>> 
>>>>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>>>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>>>>> return;
>>>>> 
>>>>> + if (!entity_eligible(cfs_rq, se))
>>>>> + goto preempt;
>>>>> +
>>>> 
>>>> Not sure if this is applicable, later in this function, pick_eevdf() checks
>>>> if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
>>>> be evicted. And this change does not consider the cgroup hierarchy.
>> 
>> The above line will be referred to as [1] below.
>> 
>>>> 
>>>> Besides, the check of current eligiblity can get false negative result,
>>>> if the enqueued entity has a positive lag. Prateek proposed to
>>>> remove the check of current's eligibility in pick_eevdf():
>>>> https://lore.kernel.org/lkml/20240325060226.1540-2-kprateek.nayak@amd.com/
>>> 
>>> Thank you for letting me know about Peter's latest updates and thoughts.
>>> Actually, the original intention of my modification was to minimize the
>>> traversal of the rb-tree as much as possible. For example, in the following
>>> scenario, if 'curr' is ineligible, the system would still traverse the rb-tree in
>>> 'pick_eevdf' to return an optimal 'se', and then trigger  'resched_curr'. After
>>> resched, the scheduler will call 'pick_eevdf' again, traversing the
>>> rb-tree once more. This ultimately results in the rb-tree being traversed
>>> twice. If it's possible to determine that 'curr' is ineligible within 'wakeup_preempt'
>>> and directly trigger a 'resched', it would reduce the traversal of the rb-tree
>>> by one time.
>>> 
>>> 
>>> wakeup_preempt-> pick_eevdf                                      -> resched_curr
>>>                                                 |->'traverse the rb-tree'  |
>>> schedule->pick_eevdf
>>>                                   |->'traverse the rb-tree'
>> 
>> I see what you mean but a couple of things:
>> 
>> (I'm adding the check_preempt_wakeup_fair() hunk from the original patch
>> below for ease of interpretation)
>> 
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 03be0d1330a6..a0005d240db5 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>>> return;
>>> 
>>> + if (!entity_eligible(cfs_rq, se))
>>> + goto preempt;
>>> +
>> 
>> This check uses the root cfs_rq since "task_cfs_rq()" returns the
>> "rq->cfs" of the runqueue the task is on. In presence of cgroups or
>> CONFIG_SCHED_AUTOGROUP, there is a good chance this the task is queued
>> on a higher order cfs_rq and this entity_eligible() calculation might
>> not be valid since the vruntime calculation for the "se" is relative to
>> the "cfs_rq" where it is queued on. Please correct me if I'm wrong but
>> I believe that is what Chenyu was referring to in [1].
>> 
> 
> Sorry for the late reply and thanks for help clarify this. Yes, this is
> what my previous concern was:
> 1. It does not consider the cgroup and does not check preemption in the same
>   level which is covered by find_matching_se().
> 2. The if (!entity_eligible(cfs_rq, se)) for current is redundant because
>   later pick_eevdf() will check the eligible of current anyway. But
>   as pointed out by Chunxi, his concern is the double-traverse of the rb-tree,
>   I just wonder if we could leverage the cfs_rq->next to store the next
>   candidate, so it can be picked directly in the 2nd pick as a fast path?
>   Something like below untested:
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a5b1ae0aa55..f716646d595e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8349,7 +8349,7 @@ static void set_next_buddy(struct sched_entity *se)
> static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
> {
>        struct task_struct *curr = rq->curr;
> -       struct sched_entity *se = &curr->se, *pse = &p->se;
> +       struct sched_entity *se = &curr->se, *pse = &p->se, *next;
>        struct cfs_rq *cfs_rq = task_cfs_rq(curr);
>        int cse_is_idle, pse_is_idle;
> 
> @@ -8415,7 +8415,11 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>        /*
>         * XXX pick_eevdf(cfs_rq) != se ?
>         */
> -       if (pick_eevdf(cfs_rq) == pse)
> +       next = pick_eevdf(cfs_rq);
> +       if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && next)
> +               set_next_buddy(next);
> +
> +       if (next == pse)
>                goto preempt;
> 
>        return;
> 
> 
> thanks,
> Chenyu

Hi Chen

First of all, thank you for your patient response. Regarding the issue of avoiding traversing
the RB-tree twice, I initially had two methods in mind. 
1. Cache the optimal result so that it can be used directly during the second pick_eevdf operation.
  This idea is similar to the one you proposed this time. 
2. Avoid the pick_eevdf operation as much as possible within 'check_preempt_wakeup_fair.' 
  Because I believe that 'checking whether preemption is necessary' and 'finding the optimal
  process to schedule' are two different things. 'check_preempt_wakeup_fair' is not just to
  check if the newly awakened process should preempt the current process; it can also serve
  as an opportunity to check whether any other processes should preempt the current one, 
  thereby improving the real-time performance of the scheduler. Although now in pick_eevdf,
  the legitimacy of 'curr' is also evaluated, if the result returned is not the awakened process,
  then the current process will still not be preempted. Therefore, I posted the v2 PATCH. 
  The implementation of v2 PATCH might express this point more clearly. 
https://lore.kernel.org/lkml/20240529141806.16029-1-spring.cxz@gmail.com/T/

I previously implemented and tested both of these methods, and the test results showed that
method 2 had somewhat more obvious benefits. Therefore, I submitted method 2. Now that I
think about it, perhaps method 1 could also be viable at the same time. :)

thanks 
Chunixn

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 03be0d1330a6..f67894d8fbc8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -563,6 +563,8 @@ static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se)
        return (s64)(se->vruntime - cfs_rq->min_vruntime);
 }

+static void unset_pick_cached(struct cfs_rq *cfs_rq);
+
 #define __node_2_se(node) \
        rb_entry((node), struct sched_entity, run_node)

@@ -632,6 +634,8 @@ avg_vruntime_add(struct cfs_rq *cfs_rq, struct sched_entity *se)

        cfs_rq->avg_vruntime += key * weight;
        cfs_rq->avg_load += weight;
+
+       unset_pick_cached(cfs_rq);
 }

 static void
@@ -642,6 +646,8 @@ avg_vruntime_sub(struct cfs_rq *cfs_rq, struct sched_entity *se)

        cfs_rq->avg_vruntime -= key * weight;
        cfs_rq->avg_load -= weight;
+
+       unset_pick_cached(cfs_rq);
 }

 static inline
@@ -651,6 +657,8 @@ void avg_vruntime_update(struct cfs_rq *cfs_rq, s64 delta)
         * v' = v + d ==> avg_vruntime' = avg_runtime - d*avg_load
         */
        cfs_rq->avg_vruntime -= cfs_rq->avg_load * delta;
+
+       unset_pick_cached(cfs_rq);
 }

 /*
@@ -745,6 +753,36 @@ int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se)
        return vruntime_eligible(cfs_rq, se->vruntime);
 }

+static struct sched_entity *try_to_get_pick_cached(struct cfs_rq* cfs_rq)
+{
+       struct sched_entity *se;
+
+       se = cfs_rq->pick_cached;
+
+       return se == NULL ? NULL : (se->on_rq ? se : NULL);
+}
+
+static void unset_pick_cached(struct cfs_rq *cfs_rq)
+{
+       cfs_rq->pick_cached = NULL;
+}
+
+static void set_pick_cached(struct sched_entity *se)
+{
+       if (!se || !se->on_rq)
+               return;
+
+       cfs_rq_of(se)->pick_cached = se;
+}
+
 static u64 __update_min_vruntime(struct cfs_rq *cfs_rq, u64 vruntime)
 {
        u64 min_vruntime = cfs_rq->min_vruntime;
@@ -856,6 +894,51 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
        return __node_2_se(left);
 }

+static struct sched_entity *__pick_eevdf(struct cfs_rq *cfs_rq)
+{
+       struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
+       struct sched_entity *se = __pick_first_entity(cfs_rq);
+       struct sched_entity *best = NULL;
+
+       /* Pick the leftmost entity if it's eligible */
+       if (se && entity_eligible(cfs_rq, se))
+               return se;
+
+       /* Heap search for the EEVD entity */
+       while (node) {
+               struct rb_node *left = node->rb_left;
+
+               /*
+               * Eligible entities in left subtree are always better
+               * choices, since they have earlier deadlines.
+               */
+               if (left && vruntime_eligible(cfs_rq,
+                               __node_2_se(left)->min_vruntime)) {
+                       node = left;
+                       continue;
+               }
+
+               se = __node_2_se(node);
+
+               /*
+               * The left subtree either is empty or has no eligible
+               * entity, so check the current node since it is the one
+               * with earliest deadline that might be eligible.
+               */
+               if (entity_eligible(cfs_rq, se)) {
+                       best = se;
+                       break;
+               }
+
+               node = node->rb_right;
+       }
+
+       if (best)
+               set_pick_cached(best);
+
+       return best;
+}
+
 /*
  * Earliest Eligible Virtual Deadline First
  *
@@ -877,7 +960,6 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
  */
 static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
 {
-       struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
        struct sched_entity *se = __pick_first_entity(cfs_rq);
        struct sched_entity *curr = cfs_rq->curr;
        struct sched_entity *best = NULL;
@@ -899,41 +981,13 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
        if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag == curr->deadline)
                return curr;

-       /* Pick the leftmost entity if it's eligible */
-       if (se && entity_eligible(cfs_rq, se)) {
-               best = se;
-               goto found;
-       }
+       best = try_to_get_pick_cached(cfs_rq);
+       if (best && !entity_eligible(cfs_rq, best))
+               best = NULL;

-       /* Heap search for the EEVD entity */
-       while (node) {
-               struct rb_node *left = node->rb_left;
-
-               /*
-                * Eligible entities in left subtree are always better
-                * choices, since they have earlier deadlines.
-                */
-               if (left && vruntime_eligible(cfs_rq,
-                                       __node_2_se(left)->min_vruntime)) {
-                       node = left;
-                       continue;
-               }
-
-               se = __node_2_se(node);
+       if (!best)
+               best = __pick_eevdf(cfs_rq);

-               /*
-                * The left subtree either is empty or has no eligible
-                * entity, so check the current node since it is the one
-                * with earliest deadline that might be eligible.
-                */
-               if (entity_eligible(cfs_rq, se)) {
-                       best = se;
-                       break;
-               }
-
-               node = node->rb_right;
-       }
-found:
        if (!best || (curr && entity_before(curr, best)))
                best = curr;

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d2242679239e..373241075449 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -597,6 +597,7 @@ struct cfs_rq {
         */
        struct sched_entity     *curr;
        struct sched_entity     *next;
+       struct sched_entity     *pick_cached;

 #ifdef CONFIG_SCHED_DEBUG
        unsigned int            nr_spread_over;
--
2.34.1


> 
>>> find_matching_se(&se, &pse);
>>> WARN_ON_ONCE(!pse);
>>> 
>>> -- 
>> 
>> In addition to that, There is an update_curr() call below for the first
>> cfs_rq where both the entities' hierarchy is queued which is found by
>> find_matching_se(). I believe that is required too to update the
>> vruntime and deadline of the entity where preemption can happen.
>> 
>> If you want to circumvent a second call to pick_eevdf(), could you
>> perhaps do:
>> 
>> (Only build tested)
>> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 9eb63573110c..653b1bee1e62 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8407,9 +8407,13 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>> update_curr(cfs_rq);
>> 
>> /*
>> -  * XXX pick_eevdf(cfs_rq) != se ?
>> +  * If the hierarchy of current task is ineligible at the common
>> +  * point on the newly woken entity, there is a good chance of
>> +  * wakeup preemption by the newly woken entity. Mark for resched
>> +  * and allow pick_eevdf() in schedule() to judge which task to
>> +  * run next.
>>  */
>> - if (pick_eevdf(cfs_rq) == pse)
>> + if (!entity_eligible(cfs_rq, se))
>> goto preempt;
>> 
>> return;
>> 
>> --
>> 
>> There are other implications here which is specifically highlighted by
>> the "XXX pick_eevdf(cfs_rq) != se ?" comment. If the current waking
>> entity is not the entity with the earliest eligible virtual deadline,
>> the current task is still preempted if any other entity has the EEVD.
>> 
>> Mike's box gave switching to above two thumbs up; I have to check what
>> my box says :)
>> 
>> Following are DeathStarBench results with your original patch compared
>> to v6.9-rc5 based tip:sched/core:
>> 
>> ==================================================================
>> Test          : DeathStarBench
>> Why?       : Some tasks here do no like aggressive preemption
>> Units         : Normalized throughput
>> Interpretation: Higher is better
>> Statistic     : Mean
>> ==================================================================
>> Pinning      scaling     tip            eager_preempt (pct imp)
>> 1CCD           1       1.00            0.99 (%diff: -1.13%)
>> 2CCD           2       1.00            0.97 (%diff: -3.21%)
>> 4CCD           3       1.00            0.97 (%diff: -3.41%)
>> 8CCD           6       1.00            0.97 (%diff: -3.20%)
>> --
>> 
>> I'll give the variants mentioned in the thread a try too to see if
>> some of my assumptions around heavy preemption hold good. I was also
>> able to dig up an old patch by Balakumaran Kannan which skipped
>> pick_eevdf() altogether if "pse" is ineligible which also seems like
>> a good optimization based on current check in
>> check_preempt_wakeup_fair() but it perhaps doesn't help the case of 
>> wakeup-latency sensitivity you are optimizing for; only reduces
>> rb-tree traversal if there is no chance of pick_eevdf() returning "pse" 
>> https://lore.kernel.org/lkml/20240301130100.267727-1-kumaran.4353@gmail.com/ 
>> 
>> --
>> Thanks and Regards,
>> Prateek
>> 
>>> 
>>> 
>>> Of course, this would break the semantics of RESPECT_SLICE as well as
>>> RUN_TO_PARITY. So, this might be considered a performance enhancement
>>> for scenarios without NO_RESPECT_SLICE/NO_RUN_TO_PARITY.
>>> 
>>> thanks 
>>> Chunxin
>>> 
>>> 
>>>> If I understand your requirement correctly, you want to reduce the wakeup
>>>> latency. There are some codes under developed by Peter, which could
>>>> customized task's wakeup latency via setting its slice:
>>>> https://lore.kernel.org/lkml/20240405110010.934104715@infradead.org/
>>>> 
>>>> thanks,
>>>> Chenyu

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Chen Yu 1 year, 8 months ago

On 2024-06-06 at 09:46:53 +0800, Chunxin Zang wrote:
> 
> 
> > On Jun 6, 2024, at 01:19, Chen Yu <yu.c.chen@intel.com> wrote:
> > 
> > 
> > Sorry for the late reply and thanks for help clarify this. Yes, this is
> > what my previous concern was:
> > 1. It does not consider the cgroup and does not check preemption in the same
> >   level which is covered by find_matching_se().
> > 2. The if (!entity_eligible(cfs_rq, se)) for current is redundant because
> >   later pick_eevdf() will check the eligible of current anyway. But
> >   as pointed out by Chunxi, his concern is the double-traverse of the rb-tree,
> >   I just wonder if we could leverage the cfs_rq->next to store the next
> >   candidate, so it can be picked directly in the 2nd pick as a fast path?
> >   Something like below untested:
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 8a5b1ae0aa55..f716646d595e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8349,7 +8349,7 @@ static void set_next_buddy(struct sched_entity *se)
> > static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
> > {
> >        struct task_struct *curr = rq->curr;
> > -       struct sched_entity *se = &curr->se, *pse = &p->se;
> > +       struct sched_entity *se = &curr->se, *pse = &p->se, *next;
> >        struct cfs_rq *cfs_rq = task_cfs_rq(curr);
> >        int cse_is_idle, pse_is_idle;
> > 
> > @@ -8415,7 +8415,11 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> >        /*
> >         * XXX pick_eevdf(cfs_rq) != se ?
> >         */
> > -       if (pick_eevdf(cfs_rq) == pse)
> > +       next = pick_eevdf(cfs_rq);
> > +       if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && next)
> > +               set_next_buddy(next);
> > +
> > +       if (next == pse)
> >                goto preempt;
> > 
> >        return;
> > 
> > 
> > thanks,
> > Chenyu
> 
> Hi Chen
> 
> First of all, thank you for your patient response. Regarding the issue of avoiding traversing
> the RB-tree twice, I initially had two methods in mind. 
> 1. Cache the optimal result so that it can be used directly during the second pick_eevdf operation.
>   This idea is similar to the one you proposed this time. 
> 2. Avoid the pick_eevdf operation as much as possible within 'check_preempt_wakeup_fair.' 
>   Because I believe that 'checking whether preemption is necessary' and 'finding the optimal
>   process to schedule' are two different things.

I agree, and it seems that in current eevdf implementation the former relies on the latter.

> 'check_preempt_wakeup_fair' is not just to
>   check if the newly awakened process should preempt the current process; it can also serve
>   as an opportunity to check whether any other processes should preempt the current one,
>   thereby improving the real-time performance of the scheduler. Although now in pick_eevdf,
>   the legitimacy of 'curr' is also evaluated, if the result returned is not the awakened process,
>   then the current process will still not be preempted.

I thought Mike has proposed a patch to deal with this scenario you mentioned above:
https://lore.kernel.org/lkml/e17d3d90440997b970067fe9eaf088903c65f41d.camel@gmx.de/

And I suppose you are refering to increase the preemption chance on current rather than reducing
the invoke of pick_eevdf() in check_preempt_wakeup_fair().

> Therefore, I posted the v2 PATCH. 
>   The implementation of v2 PATCH might express this point more clearly. 
> https://lore.kernel.org/lkml/20240529141806.16029-1-spring.cxz@gmail.com/T/
>

Let me take a look at it and do some tests.
 
> I previously implemented and tested both of these methods, and the test results showed that
> method 2 had somewhat more obvious benefits. Therefore, I submitted method 2. Now that I
> think about it, perhaps method 1 could also be viable at the same time. :)
>

Actually I found that, even without any changes, if we enabled sched feature NEXT_BUDDY, the
wakeup latency/request latency are both reduced. The following is the schbench result on a
240 CPUs system:

NO_NEXT_BUDDY
Wakeup Latencies percentiles (usec) runtime 100 (s) (1698990 total samples)
        50.0th: 6          (429125 samples)
        90.0th: 14         (682355 samples)
      * 99.0th: 29         (126695 samples)
        99.9th: 529        (14603 samples)
        min=1, max=4741
Request Latencies percentiles (usec) runtime 100 (s) (1702523 total samples)
        50.0th: 14992      (550939 samples)
        90.0th: 15376      (668687 samples)
      * 99.0th: 15600      (128111 samples)
        99.9th: 15888      (11238 samples)
        min=3528, max=31677
RPS percentiles (requests) runtime 100 (s) (101 total samples)
        20.0th: 16864      (31 samples)
      * 50.0th: 16928      (26 samples)
        90.0th: 17248      (36 samples)
        min=16615, max=20041
average rps: 17025.23

NEXT_BUDDY
Wakeup Latencies percentiles (usec) runtime 100 (s) (1653564 total samples)
        50.0th: 5          (376845 samples)
        90.0th: 12         (632075 samples)
      * 99.0th: 24         (114398 samples)
        99.9th: 105        (13737 samples)
        min=1, max=7428
Request Latencies percentiles (usec) runtime 100 (s) (1657268 total samples)
        50.0th: 14480      (524763 samples)
        90.0th: 15216      (647982 samples)
      * 99.0th: 15472      (130730 samples)
        99.9th: 15728      (13980 samples)
        min=3542, max=34805
RPS percentiles (requests) runtime 100 (s) (101 total samples)
        20.0th: 16544      (62 samples)
      * 50.0th: 16544      (0 samples)
        90.0th: 16608      (37 samples)
        min=16470, max=16648
average rps: 16572.68

So I think NEXT_BUDDY has more or less reduced the rb-tree scan.

thanks,
Chenyu

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Chunxin Zang 1 year, 8 months ago


> On Jun 7, 2024, at 10:38, Chen Yu <yu.c.chen@intel.com> wrote:
> 
> On 2024-06-06 at 09:46:53 +0800, Chunxin Zang wrote:
>> 
>> 
>>> On Jun 6, 2024, at 01:19, Chen Yu <yu.c.chen@intel.com> wrote:
>>> 
>>> 
>>> Sorry for the late reply and thanks for help clarify this. Yes, this is
>>> what my previous concern was:
>>> 1. It does not consider the cgroup and does not check preemption in the same
>>>  level which is covered by find_matching_se().
>>> 2. The if (!entity_eligible(cfs_rq, se)) for current is redundant because
>>>  later pick_eevdf() will check the eligible of current anyway. But
>>>  as pointed out by Chunxi, his concern is the double-traverse of the rb-tree,
>>>  I just wonder if we could leverage the cfs_rq->next to store the next
>>>  candidate, so it can be picked directly in the 2nd pick as a fast path?
>>>  Something like below untested:
>>> 
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 8a5b1ae0aa55..f716646d595e 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -8349,7 +8349,7 @@ static void set_next_buddy(struct sched_entity *se)
>>> static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
>>> {
>>>       struct task_struct *curr = rq->curr;
>>> -       struct sched_entity *se = &curr->se, *pse = &p->se;
>>> +       struct sched_entity *se = &curr->se, *pse = &p->se, *next;
>>>       struct cfs_rq *cfs_rq = task_cfs_rq(curr);
>>>       int cse_is_idle, pse_is_idle;
>>> 
>>> @@ -8415,7 +8415,11 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>>       /*
>>>        * XXX pick_eevdf(cfs_rq) != se ?
>>>        */
>>> -       if (pick_eevdf(cfs_rq) == pse)
>>> +       next = pick_eevdf(cfs_rq);
>>> +       if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && next)
>>> +               set_next_buddy(next);
>>> +
>>> +       if (next == pse)
>>>               goto preempt;
>>> 
>>>       return;
>>> 
>>> 
>>> thanks,
>>> Chenyu
>> 
>> Hi Chen
>> 
>> First of all, thank you for your patient response. Regarding the issue of avoiding traversing
>> the RB-tree twice, I initially had two methods in mind. 
>> 1. Cache the optimal result so that it can be used directly during the second pick_eevdf operation.
>>  This idea is similar to the one you proposed this time. 
>> 2. Avoid the pick_eevdf operation as much as possible within 'check_preempt_wakeup_fair.' 
>>  Because I believe that 'checking whether preemption is necessary' and 'finding the optimal
>>  process to schedule' are two different things.
> 
> I agree, and it seems that in current eevdf implementation the former relies on the latter.
> 
>> 'check_preempt_wakeup_fair' is not just to
>>  check if the newly awakened process should preempt the current process; it can also serve
>>  as an opportunity to check whether any other processes should preempt the current one,
>>  thereby improving the real-time performance of the scheduler. Although now in pick_eevdf,
>>  the legitimacy of 'curr' is also evaluated, if the result returned is not the awakened process,
>>  then the current process will still not be preempted.
> 
> I thought Mike has proposed a patch to deal with this scenario you mentioned above:
> https://lore.kernel.org/lkml/e17d3d90440997b970067fe9eaf088903c65f41d.camel@gmx.de/
> 
> And I suppose you are refering to increase the preemption chance on current rather than reducing
> the invoke of pick_eevdf() in check_preempt_wakeup_fair().

Hi chen

Happy holidays. I believe the modifications here will indeed provide more opportunities for preemption,
thereby leading to lower scheduling latencies, while also truly reducing calls to pick_eevdf.  It's a win-win situation. :)

I conducted a test. It involved applying my modifications on top of MIKE PATCH, along with
adding some statistical counts following your previous method, in order to assess the potential
benefits of my changes.


diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 03be0d1330a6..c5453866899f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8283,6 +8286,10 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
        struct sched_entity *se = &curr->se, *pse = &p->se;
        struct cfs_rq *cfs_rq = task_cfs_rq(curr);
        int cse_is_idle, pse_is_idle;
+       bool patch_preempt = false;
+       bool pick_preempt = false;
+
+       schedstat_inc(rq->check_preempt_count);

        if (unlikely(se == pse))
                return;
@@ -8343,15 +8350,31 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
        cfs_rq = cfs_rq_of(se);
        update_curr(cfs_rq);

+       if ((sched_feat(RUN_TO_PARITY) && se->vlag != se->deadline && !entity_eligible(cfs_rq, se))
+                       || (!sched_feat(RUN_TO_PARITY) && !entity_eligible(cfs_rq, se))) {
+               schedstat_inc(rq->patch_preempt_count);
+               patch_preempt = true;
+       }
+
        /*
         * XXX pick_eevdf(cfs_rq) != se ?
         */
-       if (pick_eevdf(cfs_rq) == pse)
+       if (pick_eevdf(cfs_rq) != se) {
+               schedstat_inc(rq->pick_preempt_count);
+               pick_preempt = true;
                goto preempt;
+       }

        return;

 preempt:
+       if (patch_preempt && !pick_preempt)
+               schedstat_inc(rq->patch_preempt_only_count);
+       if (!patch_preempt && pick_preempt)
+               schedstat_inc(rq->pick_preempt_only_count);
+
+       schedstat_inc(rq->need_preempt_count);
+
        resched_curr(rq);
 }

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d2242679239e..002c6b0f966a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1141,6 +1141,12 @@ struct rq {
        /* try_to_wake_up() stats */
        unsigned int            ttwu_count;
        unsigned int            ttwu_local;
+       unsigned int            check_preempt_count;
+       unsigned int            need_preempt_count;
+       unsigned int            patch_preempt_count;
+       unsigned int            patch_preempt_only_count;
+       unsigned int            pick_preempt_count;
+       unsigned int            pick_preempt_only_count;
 #endif

 #ifdef CONFIG_CPU_IDLE
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 857f837f52cb..fe5487572409 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -133,12 +133,21 @@ static int show_schedstat(struct seq_file *seq, void *v)

                /* runqueue-specific stats */
                seq_printf(seq,
-                   "cpu%d %u 0 %u %u %u %u %llu %llu %lu",
+                   "cpu%d %u 0 %u %u %u %u %llu %llu %lu *** %u %u * %u %u * %u %u",
                    cpu, rq->yld_count,
                    rq->sched_count, rq->sched_goidle,
                    rq->ttwu_count, rq->ttwu_local,
                    rq->rq_cpu_time,
-                   rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount);
+                   rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount,
+                   rq->check_preempt_count,
+                   rq->need_preempt_count,
+                   rq->patch_preempt_count,
+                   rq->patch_preempt_only_count,
+                   rq->pick_preempt_count,
+                   rq->pick_preempt_only_count);
+

                seq_printf(seq, "\n");

The test results are as follows:

  RUN_TO_PARITY:
                                   EEVDF        PATCH
 .stat.check_preempt_count         5053054      5029546
 .stat.need_preempt_count          0570520      1282780
 .stat.patch_preempt_count         -------      0038602
 .stat.patch_preempt_only_count    -------      0000000
 .stat.pick_preempt_count          -------      1282780
 .stat.pick_preempt_only_count     -------      1244178

  NO_RUN_TO_PARITY:
                                   EEVDF        PATCH
 .stat.check_preempt_count         5018589      5005812
 .stat.need_preempt_count          3380513      2994773
 .stat.patch_preempt_count         -------      0907927
 .stat.patch_preempt_only_count    -------      0000000
 .stat.pick_preempt_count          -------      2994773
 .stat.pick_preempt_only_count     -------      2086846

Looking at the results, adding an ineligible check for the se within check_preempt_wakeup_fair
can prevent 3% of pick_eevdf calls under the RUN_TO_PARITY feature, and in the case of
NO_RUN_TO_PARITY, it can prevent 30% of pick_eevdf calls. It was also discovered that the
patch_preempt_only_count is at 0, indicating that all invalid checks for the se are correct.

It's worth mentioning that under the RUN_TO_PARITY feature, the number of preemptions
triggered by 'pick_eevdf != se' would be 2.25 times that of the original version, which could
lead to a series of other performance issues. However, logically speaking, this is indeed reasonable. :(


> 
>> Therefore, I posted the v2 PATCH. 
>>  The implementation of v2 PATCH might express this point more clearly. 
>> https://lore.kernel.org/lkml/20240529141806.16029-1-spring.cxz@gmail.com/T/
>> 
> 
> Let me take a look at it and do some tests.

Thank you for doing this :)

> 
>> I previously implemented and tested both of these methods, and the test results showed that
>> method 2 had somewhat more obvious benefits. Therefore, I submitted method 2. Now that I
>> think about it, perhaps method 1 could also be viable at the same time. :)
>> 
> 
> Actually I found that, even without any changes, if we enabled sched feature NEXT_BUDDY, the
> wakeup latency/request latency are both reduced. The following is the schbench result on a
> 240 CPUs system:
> 
> NO_NEXT_BUDDY
> Wakeup Latencies percentiles (usec) runtime 100 (s) (1698990 total samples)
>         50.0th: 6          (429125 samples)
>         90.0th: 14         (682355 samples)
>       * 99.0th: 29         (126695 samples)
>         99.9th: 529        (14603 samples)
>         min=1, max=4741
> Request Latencies percentiles (usec) runtime 100 (s) (1702523 total samples)
>         50.0th: 14992      (550939 samples)
>         90.0th: 15376      (668687 samples)
>       * 99.0th: 15600      (128111 samples)
>         99.9th: 15888      (11238 samples)
>         min=3528, max=31677
> RPS percentiles (requests) runtime 100 (s) (101 total samples)
>         20.0th: 16864      (31 samples)
>       * 50.0th: 16928      (26 samples)
>         90.0th: 17248      (36 samples)
>         min=16615, max=20041
> average rps: 17025.23
> 
> NEXT_BUDDY
> Wakeup Latencies percentiles (usec) runtime 100 (s) (1653564 total samples)
>         50.0th: 5          (376845 samples)
>         90.0th: 12         (632075 samples)
>       * 99.0th: 24         (114398 samples)
>         99.9th: 105        (13737 samples)
>         min=1, max=7428
> Request Latencies percentiles (usec) runtime 100 (s) (1657268 total samples)
>         50.0th: 14480      (524763 samples)
>         90.0th: 15216      (647982 samples)
>       * 99.0th: 15472      (130730 samples)
>         99.9th: 15728      (13980 samples)
>         min=3542, max=34805
> RPS percentiles (requests) runtime 100 (s) (101 total samples)
>         20.0th: 16544      (62 samples)
>       * 50.0th: 16544      (0 samples)
>         90.0th: 16608      (37 samples)
>         min=16470, max=16648
> average rps: 16572.68
> 
> So I think NEXT_BUDDY has more or less reduced the rb-tree scan.
> 
> thanks,
> Chenyu

I'm not completely sure if my understanding is correct, but NEXT_BUDDY can only cache the process
that has been woken up; it doesn't necessarily correspond to the result returned by pick_eevdf.  Furthermore,
even if it does cache the result returned by pick_eevdf, by the time the next scheduling occurs, due to
other processes enqueing or dequeuing, it might not be the result picked by pick_eevdf at that moment.
Hence, it's a 'best effort' approach, and therefore, its impact on scheduling latency may vary depending
on the use case.

thanks
Chunxin

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Chen Yu 1 year, 8 months ago

On 2024-06-11 at 21:10:50 +0800, Chunxin Zang wrote:
> 
> 
> > On Jun 7, 2024, at 10:38, Chen Yu <yu.c.chen@intel.com> wrote:
> > 
> > On 2024-06-06 at 09:46:53 +0800, Chunxin Zang wrote:
> >> 
> >> 
> >>> On Jun 6, 2024, at 01:19, Chen Yu <yu.c.chen@intel.com> wrote:
> >>> 
> >>> 
> >>> Sorry for the late reply and thanks for help clarify this. Yes, this is
> >>> what my previous concern was:
> >>> 1. It does not consider the cgroup and does not check preemption in the same
> >>>  level which is covered by find_matching_se().
> >>> 2. The if (!entity_eligible(cfs_rq, se)) for current is redundant because
> >>>  later pick_eevdf() will check the eligible of current anyway. But
> >>>  as pointed out by Chunxi, his concern is the double-traverse of the rb-tree,
> >>>  I just wonder if we could leverage the cfs_rq->next to store the next
> >>>  candidate, so it can be picked directly in the 2nd pick as a fast path?
> >>>  Something like below untested:
> >>> 
> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>> index 8a5b1ae0aa55..f716646d595e 100644
> >>> --- a/kernel/sched/fair.c
> >>> +++ b/kernel/sched/fair.c
> >>> @@ -8349,7 +8349,7 @@ static void set_next_buddy(struct sched_entity *se)
> >>> static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
> >>> {
> >>>       struct task_struct *curr = rq->curr;
> >>> -       struct sched_entity *se = &curr->se, *pse = &p->se;
> >>> +       struct sched_entity *se = &curr->se, *pse = &p->se, *next;
> >>>       struct cfs_rq *cfs_rq = task_cfs_rq(curr);
> >>>       int cse_is_idle, pse_is_idle;
> >>> 
> >>> @@ -8415,7 +8415,11 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> >>>       /*
> >>>        * XXX pick_eevdf(cfs_rq) != se ?
> >>>        */
> >>> -       if (pick_eevdf(cfs_rq) == pse)
> >>> +       next = pick_eevdf(cfs_rq);
> >>> +       if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && next)
> >>> +               set_next_buddy(next);
> >>> +
> >>> +       if (next == pse)
> >>>               goto preempt;
> >>> 
> >>>       return;
> >>> 
> >>> 
> >>> thanks,
> >>> Chenyu
> >> 
> >> Hi Chen
> >> 
> >> First of all, thank you for your patient response. Regarding the issue of avoiding traversing
> >> the RB-tree twice, I initially had two methods in mind. 
> >> 1. Cache the optimal result so that it can be used directly during the second pick_eevdf operation.
> >>  This idea is similar to the one you proposed this time. 
> >> 2. Avoid the pick_eevdf operation as much as possible within 'check_preempt_wakeup_fair.' 
> >>  Because I believe that 'checking whether preemption is necessary' and 'finding the optimal
> >>  process to schedule' are two different things.
> > 
> > I agree, and it seems that in current eevdf implementation the former relies on the latter.
> > 
> >> 'check_preempt_wakeup_fair' is not just to
> >>  check if the newly awakened process should preempt the current process; it can also serve
> >>  as an opportunity to check whether any other processes should preempt the current one,
> >>  thereby improving the real-time performance of the scheduler. Although now in pick_eevdf,
> >>  the legitimacy of 'curr' is also evaluated, if the result returned is not the awakened process,
> >>  then the current process will still not be preempted.
> > 
> > I thought Mike has proposed a patch to deal with this scenario you mentioned above:
> > https://lore.kernel.org/lkml/e17d3d90440997b970067fe9eaf088903c65f41d.camel@gmx.de/
> > 
> > And I suppose you are refering to increase the preemption chance on current rather than reducing
> > the invoke of pick_eevdf() in check_preempt_wakeup_fair().
> 
> Hi chen
> 
> Happy holidays. I believe the modifications here will indeed provide more opportunities for preemption,
> thereby leading to lower scheduling latencies, while also truly reducing calls to pick_eevdf.  It's a win-win situation. :)
> 
> I conducted a test. It involved applying my modifications on top of MIKE PATCH, along with
> adding some statistical counts following your previous method, in order to assess the potential
> benefits of my changes.
>

[snip]
 
> Looking at the results, adding an ineligible check for the se within check_preempt_wakeup_fair
> can prevent 3% of pick_eevdf calls under the RUN_TO_PARITY feature, and in the case of
> NO_RUN_TO_PARITY, it can prevent 30% of pick_eevdf calls. It was also discovered that the
> patch_preempt_only_count is at 0, indicating that all invalid checks for the se are correct.
> 
> It's worth mentioning that under the RUN_TO_PARITY feature, the number of preemptions
> triggered by 'pick_eevdf != se' would be 2.25 times that of the original version, which could
> lead to a series of other performance issues. However, logically speaking, this is indeed reasonable. :(
> 
>

I wonder if we can only do this for NO_RUN_TO_PARITY? That is to say, if RUN_TO_PARITY is enabled,
we do not preempt the current task based on its eligibility in check_preempt_wakeup_fair()
or entity_tick(). Personally I don't have objection to increase the preemption a little bit, however
it seems that we have encountered over-scheduling and that is why RUN_TO_PARITY was introduced,
and RUN_TO_PARITY means "respect the slice" per my understanding.

> > So I think NEXT_BUDDY has more or less reduced the rb-tree scan.
> > 
> > thanks,
> > Chenyu
> 
> I'm not completely sure if my understanding is correct, but NEXT_BUDDY can only cache the process
> that has been woken up; it doesn't necessarily correspond to the result returned by pick_eevdf.  Furthermore,
> even if it does cache the result returned by pick_eevdf, by the time the next scheduling occurs, due to
> other processes enqueing or dequeuing, it might not be the result picked by pick_eevdf at that moment.
> Hence, it's a 'best effort' approach, and therefore, its impact on scheduling latency may vary depending
> on the use case.
>

That is true, currently the NEXT_BUDDY is set to the wakee if it is eligible, not mean it is the best
candidate in the tree. I think it is 'best effort' to reduce the wakeup latency rather than fairness.

thanks,
Chenyu

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by Chunxin Zang 1 year, 8 months ago

Hi Prateek

> On May 28, 2024, at 13:02, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> 
> Hello Chunxin,
> 
> On 5/28/2024 8:12 AM, Chunxin Zang wrote:
>> 
>>> On May 24, 2024, at 23:30, Chen Yu <yu.c.chen@intel.com> wrote:
>>> 
>>> On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
>>>> I found that some tasks have been running for a long enough time and
>>>> have become illegal, but they are still not releasing the CPU. This
>>>> will increase the scheduling delay of other processes. Therefore, I
>>>> tried checking the current process in wakeup_preempt and entity_tick,
>>>> and if it is illegal, reschedule that cfs queue.
>>>> 
>>>> The modification can reduce the scheduling delay by about 30% when
>>>> RUN_TO_PARITY is enabled.
>>>> So far, it has been running well in my test environment, and I have
>>>> pasted some test results below.
>>>> 
>>> 
>>> Interesting, besides hackbench, I assume that you have workload in
>>> real production environment that is sensitive to wakeup latency?
>> 
>> Hi Chen
>> 
>> Yes, my workload  are quite sensitive to wakeup latency .
>>> 
>>>> 
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 03be0d1330a6..a0005d240db5 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>>>> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>>>> return;
>>>> #endif
>>>> +
>>>> + if (!entity_eligible(cfs_rq, curr))
>>>> + resched_curr(rq_of(cfs_rq));
>>>> }
>>>> 
>>> 
>>> entity_tick() -> update_curr() -> update_deadline():
>>> se->vruntime >= se->deadline ? resched_curr()
>>> only current has expired its slice will it be scheduled out.
>>> 
>>> So here you want to schedule current out if its lag becomes 0.
>>> 
>>> In lastest sched/eevdf branch, it is controlled by two sched features:
>>> RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
>>> RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
>>> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c
>>> 
>>> Maybe something like this can achieve your goal
>>> if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
>>> resched_curr
>>> 
>>>> 
>>>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>>>> return;
>>>> 
>>>> + if (!entity_eligible(cfs_rq, se))
>>>> + goto preempt;
>>>> +
>>> 
>>> Not sure if this is applicable, later in this function, pick_eevdf() checks
>>> if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
>>> be evicted. And this change does not consider the cgroup hierarchy.
> 
> The above line will be referred to as [1] below.
> 
>>> 
>>> Besides, the check of current eligiblity can get false negative result,
>>> if the enqueued entity has a positive lag. Prateek proposed to
>>> remove the check of current's eligibility in pick_eevdf():
>>> https://lore.kernel.org/lkml/20240325060226.1540-2-kprateek.nayak@amd.com/
>> 
>> Thank you for letting me know about Peter's latest updates and thoughts.
>> Actually, the original intention of my modification was to minimize the
>> traversal of the rb-tree as much as possible. For example, in the following
>> scenario, if 'curr' is ineligible, the system would still traverse the rb-tree in
>> 'pick_eevdf' to return an optimal 'se', and then trigger  'resched_curr'. After
>> resched, the scheduler will call 'pick_eevdf' again, traversing the
>> rb-tree once more. This ultimately results in the rb-tree being traversed
>> twice. If it's possible to determine that 'curr' is ineligible within 'wakeup_preempt'
>> and directly trigger a 'resched', it would reduce the traversal of the rb-tree
>> by one time.
>> 
>> 
>> wakeup_preempt-> pick_eevdf                                      -> resched_curr
>>                                                 |->'traverse the rb-tree'  |
>> schedule->pick_eevdf
>>                                   |->'traverse the rb-tree'
> 
> I see what you mean but a couple of things:
> 
> (I'm adding the check_preempt_wakeup_fair() hunk from the original patch
> below for ease of interpretation)
> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 03be0d1330a6..a0005d240db5 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>> return;
>> 
>> + if (!entity_eligible(cfs_rq, se))
>> + goto preempt;
>> +
> 
> This check uses the root cfs_rq since "task_cfs_rq()" returns the
> "rq->cfs" of the runqueue the task is on. In presence of cgroups or
> CONFIG_SCHED_AUTOGROUP, there is a good chance this the task is queued
> on a higher order cfs_rq and this entity_eligible() calculation might
> not be valid since the vruntime calculation for the "se" is relative to
> the "cfs_rq" where it is queued on. Please correct me if I'm wrong but
> I believe that is what Chenyu was referring to in [1].

 
Thank you for explaining so much to me; I am trying to understand all of this. :)

> 
>> find_matching_se(&se, &pse);
>> WARN_ON_ONCE(!pse);
>> 
>> -- 
> 
> In addition to that, There is an update_curr() call below for the first
> cfs_rq where both the entities' hierarchy is queued which is found by
> find_matching_se(). I believe that is required too to update the
> vruntime and deadline of the entity where preemption can happen.
> 
> If you want to circumvent a second call to pick_eevdf(), could you
> perhaps do:
> 
> (Only build tested)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9eb63573110c..653b1bee1e62 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8407,9 +8407,13 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> update_curr(cfs_rq);
> 
> /*
> -  * XXX pick_eevdf(cfs_rq) != se ?
> +  * If the hierarchy of current task is ineligible at the common
> +  * point on the newly woken entity, there is a good chance of
> +  * wakeup preemption by the newly woken entity. Mark for resched
> +  * and allow pick_eevdf() in schedule() to judge which task to
> +  * run next.
>  */
> - if (pick_eevdf(cfs_rq) == pse)
> + if (!entity_eligible(cfs_rq, se))
> goto preempt;
> 
> return;
> 
> --
> 
> There are other implications here which is specifically highlighted by
> the "XXX pick_eevdf(cfs_rq) != se ?" comment. If the current waking
> entity is not the entity with the earliest eligible virtual deadline,
> the current task is still preempted if any other entity has the EEVD.
> 
> Mike's box gave switching to above two thumbs up; I have to check what
> my box says :)
> 
> Following are DeathStarBench results with your original patch compared
> to v6.9-rc5 based tip:sched/core:
> 
> ==================================================================
> Test          : DeathStarBench
> Why?       : Some tasks here do no like aggressive preemption
> Units         : Normalized throughput
> Interpretation: Higher is better
> Statistic     : Mean
> ==================================================================
> Pinning      scaling     tip            eager_preempt (pct imp)
> 1CCD           1       1.00            0.99 (%diff: -1.13%)
> 2CCD           2       1.00            0.97 (%diff: -3.21%)
> 4CCD           3       1.00            0.97 (%diff: -3.41%)
> 8CCD           6       1.00            0.97 (%diff: -3.20%)
> --

Please forgive me as I have not used the DeathStarBench suite before. Does
this test result indicate that my modifications have resulted in tasks that do no
like aggressive preemption being even less likely to be preempted?

thanks
Chunxin

> I'll give the variants mentioned in the thread a try too to see if
> some of my assumptions around heavy preemption hold good. I was also
> able to dig up an old patch by Balakumaran Kannan which skipped
> pick_eevdf() altogether if "pse" is ineligible which also seems like
> a good optimization based on current check in
> check_preempt_wakeup_fair() but it perhaps doesn't help the case of 
> wakeup-latency sensitivity you are optimizing for; only reduces
> rb-tree traversal if there is no chance of pick_eevdf() returning "pse" 
> https://lore.kernel.org/lkml/20240301130100.267727-1-kumaran.4353@gmail.com/ 
> 
> --
> Thanks and Regards,
> Prateek
> 
>> 
>> 
>> Of course, this would break the semantics of RESPECT_SLICE as well as
>> RUN_TO_PARITY. So, this might be considered a performance enhancement
>> for scenarios without NO_RESPECT_SLICE/NO_RUN_TO_PARITY.
>> 
>> thanks 
>> Chunxin
>> 
>> 
>>> If I understand your requirement correctly, you want to reduce the wakeup
>>> latency. There are some codes under developed by Peter, which could
>>> customized task's wakeup latency via setting its slice:
>>> https://lore.kernel.org/lkml/20240405110010.934104715@infradead.org/
>>> 
>>> thanks,
>>> Chenyu

Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Posted by K Prateek Nayak 1 year, 8 months ago

Hello Chunxin,

On 5/28/2024 12:48 PM, Chunxin Zang wrote:
> [..snip..]
>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 03be0d1330a6..a0005d240db5 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>>> return;
>>>
>>> + if (!entity_eligible(cfs_rq, se))
>>> + goto preempt;
>>> +
>>
>> This check uses the root cfs_rq since "task_cfs_rq()" returns the
>> "rq->cfs" of the runqueue the task is on. In presence of cgroups or
>> CONFIG_SCHED_AUTOGROUP, there is a good chance this the task is queued
>> on a higher order cfs_rq and this entity_eligible() calculation might
>> not be valid since the vruntime calculation for the "se" is relative to
>> the "cfs_rq" where it is queued on. Please correct me if I'm wrong but
>> I believe that is what Chenyu was referring to in [1].
> 
>  
> Thank you for explaining so much to me; I am trying to understand all of this. :)
> 
>>
>>> find_matching_se(&se, &pse);
>>> WARN_ON_ONCE(!pse);
>>>
>>> -- 
>>
>> In addition to that, There is an update_curr() call below for the first
>> cfs_rq where both the entities' hierarchy is queued which is found by
>> find_matching_se(). I believe that is required too to update the
>> vruntime and deadline of the entity where preemption can happen.
>>
>> If you want to circumvent a second call to pick_eevdf(), could you
>> perhaps do:
>>
>> (Only build tested)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 9eb63573110c..653b1bee1e62 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8407,9 +8407,13 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>> update_curr(cfs_rq);
>>
>> /*
>> -  * XXX pick_eevdf(cfs_rq) != se ?
>> +  * If the hierarchy of current task is ineligible at the common
>> +  * point on the newly woken entity, there is a good chance of
>> +  * wakeup preemption by the newly woken entity. Mark for resched
>> +  * and allow pick_eevdf() in schedule() to judge which task to
>> +  * run next.
>>  */
>> - if (pick_eevdf(cfs_rq) == pse)
>> + if (!entity_eligible(cfs_rq, se))
>> goto preempt;
>>
>> return;
>>
>> --
>>
>> There are other implications here which is specifically highlighted by
>> the "XXX pick_eevdf(cfs_rq) != se ?" comment. If the current waking
>> entity is not the entity with the earliest eligible virtual deadline,
>> the current task is still preempted if any other entity has the EEVD.
>>
>> Mike's box gave switching to above two thumbs up; I have to check what
>> my box says :)
>>
>> Following are DeathStarBench results with your original patch compared
>> to v6.9-rc5 based tip:sched/core:
>>
>> ==================================================================
>> Test          : DeathStarBench
>> Why?       : Some tasks here do no like aggressive preemption
>> Units         : Normalized throughput
>> Interpretation: Higher is better
>> Statistic     : Mean
>> ==================================================================
>> Pinning      scaling     tip            eager_preempt (pct imp)
>> 1CCD           1       1.00            0.99 (%diff: -1.13%)
>> 2CCD           2       1.00            0.97 (%diff: -3.21%)
>> 4CCD           3       1.00            0.97 (%diff: -3.41%)
>> 8CCD           6       1.00            0.97 (%diff: -3.20%)
>> --
> 
> Please forgive me as I have not used the DeathStarBench suite before. Does
> this test result indicate that my modifications have resulted in tasks that do no
> like aggressive preemption being even less likely to be preempted?

It is actually the opposite. In case of DeathStarBench, the nginx server
tasks responsible for being the entrypoint into the microservice chain
do not like to be preempted. A regression generally indicates that these
tasks have very likely been preempted as a result of which the throughput
drops. More information for DeathStarBench and the problem is highlighted
in https://lore.kernel.org/lkml/20240325060226.1540-1-kprateek.nayak@amd.com/

I'll test with more workloads later today and update the thread. Please
forgive for any delay, I'm slowly crawling through a backlog of
testing.

--
Thanks and Regards,
Prateek

> 
> thanks
> Chunxin
> 
>> I'll give the variants mentioned in the thread a try too to see if
>> some of my assumptions around heavy preemption hold good. I was also
>> able to dig up an old patch by Balakumaran Kannan which skipped
>> pick_eevdf() altogether if "pse" is ineligible which also seems like
>> a good optimization based on current check in
>> check_preempt_wakeup_fair() but it perhaps doesn't help the case of 
>> wakeup-latency sensitivity you are optimizing for; only reduces
>> rb-tree traversal if there is no chance of pick_eevdf() returning "pse" 
>> https://lore.kernel.org/lkml/20240301130100.267727-1-kumaran.4353@gmail.com/ 
>>
>> [..snip..]
>>