kernel/sched/fair.c | 1 + 1 file changed, 1 insertion(+)
If a task yields, the scheduler may decide to pick it again. The task in
turn may decide to yield immediately or shortly after, leading to a tight
loop of yields.
If there's another runnable task as this point, the deadline will be
increased by the slice at each loop. This can cause the deadline to runaway
pretty quickly, and subsequent elevated run delays later on as the task
doesn't get picked again. The reason the scheduler can pick the same task
again and again despite its deadline increasing is because it may be the
only eligible task at that point.
Fix this by making the task forfeiting its remaining vruntime and pushing
the deadline one slice ahead. This implements yield behavior more
authentically.
Fixes: 147f3efaa24182 ("sched/fair: Implement an EEVDF-like scheduling policy")
Link: https://lore.kernel.org/r/20250401123622.584018-1-sieberf@amazon.com
Link: https://lore.kernel.org/r/20250911095113.203439-1-sieberf@amazon.com
Signed-off-by: Fernand Sieber <sieberf@amazon.com>
Changes in v2:
- Implement vruntime forfeiting approach suggested by Peter Zijlstra
- Updated commit name
- Previous Reviewed-by tags removed due to algorithm change
---
kernel/sched/fair.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a14da5396fb..cc4ef7213d43 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9036,6 +9036,7 @@ static void yield_task_fair(struct rq *rq)
*/
rq_clock_skip_update(rq);
+ se->vruntime = se->deadline;
se->deadline += calc_delta_fair(se->slice, se);
}
--
2.34.1
Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07
On Tue, Sep 16, 2025 at 10:33 PM Fernand Sieber <sieberf@amazon.com> wrote: > > If a task yields, the scheduler may decide to pick it again. The task in > turn may decide to yield immediately or shortly after, leading to a tight > loop of yields. > > If there's another runnable task as this point, the deadline will be > increased by the slice at each loop. This can cause the deadline to runaway > pretty quickly, and subsequent elevated run delays later on as the task > doesn't get picked again. The reason the scheduler can pick the same task > again and again despite its deadline increasing is because it may be the > only eligible task at that point. > > Fix this by making the task forfeiting its remaining vruntime and pushing > the deadline one slice ahead. This implements yield behavior more > authentically. > > Fixes: 147f3efaa24182 ("sched/fair: Implement an EEVDF-like scheduling policy") > Link: https://lore.kernel.org/r/20250401123622.584018-1-sieberf@amazon.com > Link: https://lore.kernel.org/r/20250911095113.203439-1-sieberf@amazon.com > Signed-off-by: Fernand Sieber <sieberf@amazon.com> > > Changes in v2: > - Implement vruntime forfeiting approach suggested by Peter Zijlstra > - Updated commit name > - Previous Reviewed-by tags removed due to algorithm change > --- > kernel/sched/fair.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 7a14da5396fb..cc4ef7213d43 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9036,6 +9036,7 @@ static void yield_task_fair(struct rq *rq) > */ > rq_clock_skip_update(rq); > > + se->vruntime = se->deadline; > se->deadline += calc_delta_fair(se->slice, se); Need we update_min_vruntime here? > } > > -- > 2.34.1 > > > > > Amazon Development Centre (South Africa) (Proprietary) Limited > 29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa > Registration Number: 2004 / 034463 / 07 > >
Hi Peter, I noticed you have pulled the change in sched/urgent. https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=78f8764d34c0a1912ce209bb2a428a94d062707f However, I'd appreciate if you could weigh in on my concern regarding this iteration not working well with core scheduling. Since the scheduler prefers to run the yielding task again regardless of its eligibility rather than putting the task in force idle, it can cause the yielding task vruntime to runaway quickly. This scenario causes severe run delays later on. Please see my previous reply with data supporting this concern. I think, the best approach to address it would be to clamp vruntime. I'm not sure how exactly, a simple approach would be to increment the vruntime by one slice until the task becomes ineligible, if you have any suggestions let me know. I'll run some testing soon when I get a chance. Thanks, Fernand Amazon Development Centre (South Africa) (Proprietary) Limited 29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa Registration Number: 2004 / 034463 / 07
After further testing I think we should stick with the original approach or iterate on the vruntime forfeiting. The vruntime forfeiting doesn't work well with core scheduling. The core scheduler picks the best task across the SMT mask, and then the siblings run a matching task no matter what. This means the core scheduler can keep picking the yielding task on the sibling even after it becomes ineligible (because it's preferrable than force idle). In this scenario the vruntime of the yielding task runs away rapidly, which causes problematic imbalances later on. Perhaps an alternative is to forfeit the vruntime (set it to the deadline), but only once. I.e don't do it if the task is already ineligible? If the task is ineligible then we simply increment the deadline as in my original patch? Peter, let me know your thoughts on this. Testing data below showing the vruntime forfeit yields bad max run delays: vruntime forfeit: • **yield_loop**: 4.37s runtime, max delay 272.99ms • **busy_loop**: 13.54s runtime, max delay 552.01ms deadline clamp:, • **busy_loop**: 9.26s runtime, max delay 4.11ms • **yield_loop**: 9.25s runtime, max delay 7.77ms Test program: #define PR_SCHED_CORE_SCOPE_THREAD 0 #define PR_SCHED_CORE_SCOPE_THREAD_GROUP 1 #endif #include <sched.h> #include <time.h> #include <unistd.h> #include <sys/prctl.h> #include <stdlib.h> int main(int argc, char *argv[]) { int should_yield = (argc > 1) ? atoi(argv[1]) : 1; time_t program_start = time(NULL); // Create core cookie for current process prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, 0, PR_SCHED_CORE_SCOPE_THREAD, 0); pid_t pid = fork(); if (pid == 0) { // Child: yield for 5s then busy loop (if should_yield is 1) if (should_yield) { time_t start = time(NULL); while (time(NULL) - start < 5 && time(NULL) - program_start < 30) { sched_yield(); } } while (time(NULL) - program_start < 30) { // busy loop } } else { // Parent: share cookie with child, then busy loop prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_TO, pid, PR_SCHED_CORE_SCOPE_THREAD, 0); while (time(NULL) - program_start < 30) { // busy loop } } return 0; } Repro: taskset -c 0,1 core_yield_loop 1 & #arg 1 = do yield taskset -c 0,1 core_yield_loop 0 & #arg 0 = don't yield Amazon Development Centre (South Africa) (Proprietary) Limited 29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa Registration Number: 2004 / 034463 / 07
On Tue, Sep 16, 2025 at 06:00:35PM +0200, Fernand Sieber wrote: > After further testing I think we should stick with the original approach or > iterate on the vruntime forfeiting. > > The vruntime forfeiting doesn't work well with core scheduling. The core > scheduler picks the best task across the SMT mask, and then the siblings run a > matching task no matter what. This means the core scheduler can keep picking > the yielding task on the sibling even after it becomes ineligible (because it's > preferrable than force idle). In this scenario the vruntime of the yielding > task runs away rapidly, which causes problematic imbalances later on. > > Perhaps an alternative is to forfeit the vruntime (set it to the deadline), but > only once. I.e don't do it if the task is already ineligible? If the task is > ineligible then we simply increment the deadline as in my original patch? > > Peter, let me know your thoughts on this. Sorry, I missed this email earlier. I'll go ponder it a bit -- my brain is esp. slow today due to a cold :/
On Thu, Sep 18, 2025 at 08:43:00AM +0200, Peter Zijlstra wrote: > On Tue, Sep 16, 2025 at 06:00:35PM +0200, Fernand Sieber wrote: > > After further testing I think we should stick with the original approach or > > iterate on the vruntime forfeiting. > > > > The vruntime forfeiting doesn't work well with core scheduling. The core > > scheduler picks the best task across the SMT mask, and then the siblings run a > > matching task no matter what. This means the core scheduler can keep picking > > the yielding task on the sibling even after it becomes ineligible (because it's > > preferrable than force idle). In this scenario the vruntime of the yielding > > task runs away rapidly, which causes problematic imbalances later on. > > > > Perhaps an alternative is to forfeit the vruntime (set it to the deadline), but > > only once. I.e don't do it if the task is already ineligible? If the task is > > ineligible then we simply increment the deadline as in my original patch? > > > > Peter, let me know your thoughts on this. > > Sorry, I missed this email earlier. I'll go ponder it a bit -- my brain > is esp. slow today due to a cold :/ Right; so you're saying something like the below, right? Yeah, I suppose we can do that; please write a coherent comment on it though, so we can remember why, later on. --- diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5c94caa93085..e75abf3c256d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9012,8 +9012,13 @@ static void yield_task_fair(struct rq *rq) */ rq_clock_skip_update(rq); - se->vruntime = se->deadline; - se->deadline += calc_delta_fair(se->slice, se); + /* + * comment... + */ + if (entity_eligible(cfs_rq, se)) { + se->vruntime = se->deadline; + se->deadline += calc_delta_fair(se->slice, se); + } } static bool yield_to_task_fair(struct rq *rq, struct task_struct *p)
If a task yields, the scheduler may decide to pick it again. The task in
turn may decide to yield immediately or shortly after, leading to a tight
loop of yields.
If there's another runnable task as this point, the deadline will be
increased by the slice at each loop. This can cause the deadline to runaway
pretty quickly, and subsequent elevated run delays later on as the task
doesn't get picked again. The reason the scheduler can pick the same task
again and again despite its deadline increasing is because it may be the
only eligible task at that point.
Fix this by making the task forfeiting its remaining vruntime and pushing
the deadline one slice ahead. This implements yield behavior more
authentically.
We limit the forfeiting to eligible tasks. This is because core scheduling
prefers running ineligible tasks rather than force idling. As such, without
the condition, we can end up on a yield loop which makes the vruntime
increase rapidly, leading to anomalous run delays later down the line.
Fixes: 147f3efaa24182 ("sched/fair: Implement an EEVDF-like scheduling policy")
Link: https://lore.kernel.org/r/20250401123622.584018-1-sieberf@amazon.com
Link: https://lore.kernel.org/r/20250911095113.203439-1-sieberf@amazon.com
Signed-off-by: Fernand Sieber <sieberf@amazon.com>
Changes in v2:
- Implement vruntime forfeiting approach suggested by Peter Zijlstra
- Updated commit name
- Previous Reviewed-by tags removed due to algorithm change
Changes in v3:
- Only increase vruntime for eligible tasks to avoid runaway vruntime with
core scheduling
Link: https://lore.kernel.org/r/20250916140228.452231-1-sieberf@amazon.com
---
kernel/sched/fair.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b173a059315c..46e5a976f402 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8921,7 +8921,19 @@ static void yield_task_fair(struct rq *rq)
*/
rq_clock_skip_update(rq);
- se->deadline += calc_delta_fair(se->slice, se);
+ /*
+ * Forfeit the remaining vruntime, only if the entity is eligible. This
+ * condition is necessary because in core scheduling we prefer to run
+ * ineligible tasks rather than force idling. If this happens we may
+ * end up in a loop where the core scheduler picks the yielding task,
+ * which yields immediately again; without the condition the vruntime
+ * ends up quickly running away.
+ */
+ if (entity_eligible(cfs_rq, se)) {
+ se->vruntime = se->deadline;
+ se->deadline += calc_delta_fair(se->slice, se);
+ update_min_vruntime(cfs_rq);
+ }
}
static bool yield_to_task_fair(struct rq *rq, struct task_struct *p)
--
2.34.1
Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07
Hello, we reported "a 55.9% improvement of stress-ng.wait.ops_per_sec" in https://lore.kernel.org/all/202509241501.f14b210a-lkp@intel.com/ now we noticed there is also a regression in our tests. report again FYI. one thing we want to mention is the "stress-ng.sockpair.MB_written_per_sec" is in "miscellaneous metrics" of this stress-ng test. for major part, "stress-ng.sockpair.ops_per_sec", it's just a small difference. 0d4eaf8caf8cd633 15bf8c7b35e31295b26241425c0 ---------------- --------------------------- %stddev %change %stddev \ | \ 551.38 -90.5% 52.18 stress-ng.sockpair.MB_written_per_sec 781743 -2.3% 764106 stress-ng.sockpair.ops_per_sec below is a test example for 15bf8c7b35: 2025-09-25 15:48:21 stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --oom-avoid --sockpair 192 stress-ng: info: [8371] setting to a 1 min run per stressor stress-ng: info: [8371] dispatching hogs: 192 sockpair stress-ng: info: [8371] note: /proc/sys/kernel/sched_autogroup_enabled is 1 and this can impact scheduling throughput for processes not attached to a tty. Setting this to 0 may improve performance metrics stress-ng: metrc: [8371] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max stress-ng: metrc: [8371] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB) stress-ng: metrc: [8371] sockpair 49874197 65.44 72.08 12219.54 762108.28 4057.58 97.82 3132 stress-ng: metrc: [8371] miscellaneous metrics: stress-ng: metrc: [8371] sockpair 27717.04 socketpair calls sec (harmonic mean of 192 instances) stress-ng: metrc: [8371] sockpair 53.01 MB written per sec (harmonic mean of 192 instances) stress-ng: info: [8371] for a 66.13s run time: stress-ng: info: [8371] 12696.46s available CPU time stress-ng: info: [8371] 72.07s user time ( 0.57%) stress-ng: info: [8371] 12219.63s system time ( 96.24%) stress-ng: info: [8371] 12291.70s total time ( 96.81%) stress-ng: info: [8371] load average: 190.99 57.46 19.94 stress-ng: info: [8371] skipped: 0 stress-ng: info: [8371] passed: 192: sockpair (192) stress-ng: info: [8371] failed: 0 stress-ng: info: [8371] metrics untrustworthy: 0 stress-ng: info: [8371] successful run completed in 1 min, 6.13 secs below is an exmple from 0d4eaf8caf: 2025-09-25 18:04:37 stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --oom-avoid --sockpair 192 stress-ng: info: [8360] setting to a 1 min run per stressor stress-ng: info: [8360] dispatching hogs: 192 sockpair stress-ng: info: [8360] note: /proc/sys/kernel/sched_autogroup_enabled is 1 and this can impact scheduling throughput for processes not attached to a tty. Setting this to 0 may improve performance metrics stress-ng: metrc: [8360] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max stress-ng: metrc: [8360] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB) stress-ng: metrc: [8360] sockpair 51705787 65.08 56.75 12254.39 794448.25 4199.92 98.52 5160 stress-ng: metrc: [8360] miscellaneous metrics: stress-ng: metrc: [8360] sockpair 28156.62 socketpair calls sec (harmonic mean of 192 instances) stress-ng: metrc: [8360] sockpair 562.18 MB written per sec (harmonic mean of 192 instances) stress-ng: info: [8360] for a 65.40s run time: stress-ng: info: [8360] 12556.08s available CPU time stress-ng: info: [8360] 56.75s user time ( 0.45%) stress-ng: info: [8360] 12254.48s system time ( 97.60%) stress-ng: info: [8360] 12311.23s total time ( 98.05%) stress-ng: info: [8360] load average: 239.81 72.31 25.10 stress-ng: info: [8360] skipped: 0 stress-ng: info: [8360] passed: 192: sockpair (192) stress-ng: info: [8360] failed: 0 stress-ng: info: [8360] metrics untrustworthy: 0 stress-ng: info: [8360] successful run completed in 1 min, 5.40 secs below is full report. kernel test robot noticed a 90.5% regression of stress-ng.sockpair.MB_written_per_sec on: commit: 15bf8c7b35e31295b26241425c0a61102e92109f ("[PATCH v3] sched/fair: Forfeit vruntime on yield") url: https://github.com/intel-lab-lkp/linux/commits/Fernand-Sieber/sched-fair-Forfeit-vruntime-on-yield/20250918-231320 base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 0d4eaf8caf8cd633b23e949e2996b420052c2d45 patch link: https://lore.kernel.org/all/20250918150528.292620-1-sieberf@amazon.com/ patch subject: [PATCH v3] sched/fair: Forfeit vruntime on yield testcase: stress-ng config: x86_64-rhel-9.4 compiler: gcc-14 test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E CPU @ 2.4GHz (Sierra Forest) with 256G memory parameters: nr_threads: 100% testtime: 60s test: sockpair cpufreq_governor: performance If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202509261113.a87577ce-lkp@intel.com Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250926/202509261113.a87577ce-lkp@intel.com ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp3/sockpair/stress-ng/60s commit: 0d4eaf8caf ("sched/fair: Do not balance task to a throttled cfs_rq") 15bf8c7b35 ("sched/fair: Forfeit vruntime on yield") 0d4eaf8caf8cd633 15bf8c7b35e31295b26241425c0 ---------------- --------------------------- %stddev %change %stddev \ | \ 0.78 ± 2% +0.2 1.02 mpstat.cpu.all.usr% 19.57 -36.8% 12.36 ± 70% turbostat.RAMWatt 4.073e+08 ± 6% +23.1% 5.013e+08 ± 5% cpuidle..time 266261 ± 9% +46.4% 389733 ± 9% cpuidle..usage 451887 ± 77% +160.9% 1178929 ± 33% numa-vmstat.node0.nr_file_pages 192819 ± 30% +101.3% 388191 ± 43% numa-vmstat.node1.nr_shmem 1807416 ± 77% +161.0% 4716665 ± 33% numa-meminfo.node0.FilePages 8980121 -9.0% 8174177 numa-meminfo.node0.SUnreclaim 25356157 ± 8% -22.0% 19772595 ± 9% numa-meminfo.node1.MemUsed 771480 ± 30% +101.4% 1553932 ± 43% numa-meminfo.node1.Shmem 551.38 -90.5% 52.18 stress-ng.sockpair.MB_written_per_sec 51092272 -2.2% 49968621 stress-ng.sockpair.ops 781743 -2.3% 764106 stress-ng.sockpair.ops_per_sec 21418332 ± 4% +69.2% 36232510 stress-ng.time.involuntary_context_switches 56.36 +27.4% 71.81 stress-ng.time.user_time 150809 ± 21% +17217.1% 26115838 ± 3% stress-ng.time.voluntary_context_switches 2165914 ± 7% +92.3% 4165197 ± 4% meminfo.Active 2165898 ± 7% +92.3% 4165181 ± 4% meminfo.Active(anon) 4926568 +39.6% 6875228 meminfo.Cached 6826363 +28.1% 8744371 meminfo.Committed_AS 513281 ± 8% +98.7% 1019681 ± 6% meminfo.Mapped 48472806 ± 2% -14.8% 41314088 meminfo.Memused 1276164 +152.7% 3224818 ± 3% meminfo.Shmem 53022761 ± 2% -15.7% 44672632 meminfo.max_used_kB 0.53 -81.0% 0.10 ± 4% perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown] 0.53 -81.0% 0.10 ± 4% perf-sched.total_sch_delay.average.ms 2.03 -68.4% 0.64 ± 4% perf-sched.total_wait_and_delay.average.ms 1811449 +200.9% 5449776 ± 4% perf-sched.total_wait_and_delay.count.ms 1.50 -64.0% 0.54 ± 4% perf-sched.total_wait_time.average.ms 2.03 -68.4% 0.64 ± 4% perf-sched.wait_and_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown] 1811449 +200.9% 5449776 ± 4% perf-sched.wait_and_delay.count.[unknown].[unknown].[unknown].[unknown].[unknown] 1.50 -64.0% 0.54 ± 4% perf-sched.wait_time.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown] 541937 ± 7% +92.5% 1043389 ± 4% proc-vmstat.nr_active_anon 5242293 +3.5% 5423918 proc-vmstat.nr_dirty_background_threshold 10497404 +3.5% 10861099 proc-vmstat.nr_dirty_threshold 1232280 +39.7% 1721251 proc-vmstat.nr_file_pages 52782357 +3.4% 54601330 proc-vmstat.nr_free_pages 52117733 +3.8% 54073313 proc-vmstat.nr_free_pages_blocks 128259 ± 8% +100.8% 257594 ± 6% proc-vmstat.nr_mapped 319681 +153.0% 808650 ± 3% proc-vmstat.nr_shmem 4489133 -8.9% 4089704 proc-vmstat.nr_slab_unreclaimable 541937 ± 7% +92.5% 1043389 ± 4% proc-vmstat.nr_zone_active_anon 77303955 +2.5% 79201972 proc-vmstat.pgalloc_normal 519724 +5.2% 546556 proc-vmstat.pgfault 76456707 +1.7% 77739095 proc-vmstat.pgfree 12794131 ± 6% -27.4% 9288185 sched_debug.cfs_rq:/.avg_vruntime.max 4610143 ± 8% -14.9% 3923890 ± 5% sched_debug.cfs_rq:/.avg_vruntime.min 1.03 -20.1% 0.83 ± 2% sched_debug.cfs_rq:/.h_nr_queued.avg 1.03 -20.8% 0.82 ± 2% sched_debug.cfs_rq:/.h_nr_runnable.avg 895.00 ± 70% +89.0% 1691 ± 2% sched_debug.cfs_rq:/.load.min 0.67 ± 55% +125.0% 1.50 sched_debug.cfs_rq:/.load_avg.min 12794131 ± 6% -27.4% 9288185 sched_debug.cfs_rq:/.min_vruntime.max 4610143 ± 8% -14.9% 3923896 ± 5% sched_debug.cfs_rq:/.min_vruntime.min 1103 -20.2% 880.86 sched_debug.cfs_rq:/.runnable_avg.avg 428.26 ± 6% -63.4% 156.94 ± 22% sched_debug.cfs_rq:/.util_est.avg 1775 ± 6% -39.3% 1077 ± 15% sched_debug.cfs_rq:/.util_est.max 396.33 ± 6% -50.0% 198.03 ± 17% sched_debug.cfs_rq:/.util_est.stddev 50422 ± 6% -34.7% 32915 ± 18% sched_debug.cpu.avg_idle.min 456725 ± 10% +39.4% 636811 ± 4% sched_debug.cpu.avg_idle.stddev 611566 ± 5% +25.0% 764424 ± 2% sched_debug.cpu.max_idle_balance_cost.avg 190657 ± 12% +36.1% 259410 ± 5% sched_debug.cpu.max_idle_balance_cost.stddev 1.04 -20.4% 0.82 ± 2% sched_debug.cpu.nr_running.avg 57214 ± 4% +183.5% 162228 ± 2% sched_debug.cpu.nr_switches.avg 253314 ± 4% +39.3% 352777 ± 4% sched_debug.cpu.nr_switches.max 59410 ± 6% +31.6% 78186 ± 10% sched_debug.cpu.nr_switches.stddev 3.33 -27.9% 2.40 perf-stat.i.MPKI 1.207e+10 +11.3% 1.344e+10 perf-stat.i.branch-instructions 0.21 ± 7% +0.0 0.24 ± 5% perf-stat.i.branch-miss-rate% 23462655 ± 6% +27.4% 29896517 ± 3% perf-stat.i.branch-misses 75.74 -4.4 71.33 perf-stat.i.cache-miss-rate% 1.861e+08 -21.5% 1.462e+08 perf-stat.i.cache-misses 2.435e+08 -17.1% 2.017e+08 perf-stat.i.cache-references 323065 ± 5% +191.4% 941425 ± 2% perf-stat.i.context-switches 10.73 -9.7% 9.69 perf-stat.i.cpi 353.45 +39.0% 491.13 ± 4% perf-stat.i.cpu-migrations 3589 +30.5% 4685 perf-stat.i.cycles-between-cache-misses 5.645e+10 +12.0% 6.323e+10 perf-stat.i.instructions 0.09 +12.1% 0.11 perf-stat.i.ipc 1.66 ± 5% +193.9% 4.89 ± 2% perf-stat.i.metric.K/sec 6247 +5.7% 6603 ± 2% perf-stat.i.minor-faults 6248 +5.7% 6604 ± 2% perf-stat.i.page-faults 3.33 -29.7% 2.34 perf-stat.overall.MPKI 0.20 ± 7% +0.0 0.23 ± 4% perf-stat.overall.branch-miss-rate% 76.67 -3.9 72.79 perf-stat.overall.cache-miss-rate% 10.54 -11.1% 9.37 perf-stat.overall.cpi 3168 +26.5% 4007 perf-stat.overall.cycles-between-cache-misses 0.09 +12.5% 0.11 perf-stat.overall.ipc 1.204e+10 +11.1% 1.337e+10 perf-stat.ps.branch-instructions 23586580 ± 7% +29.7% 30600100 ± 4% perf-stat.ps.branch-misses 1.873e+08 -21.4% 1.471e+08 perf-stat.ps.cache-misses 2.443e+08 -17.3% 2.021e+08 perf-stat.ps.cache-references 324828 ± 5% +187.0% 932274 ± 2% perf-stat.ps.context-switches 335.13 ± 2% +41.7% 474.95 ± 5% perf-stat.ps.cpu-migrations 5.632e+10 +11.7% 6.293e+10 perf-stat.ps.instructions 6282 +6.5% 6690 ± 2% perf-stat.ps.minor-faults 6284 +6.5% 6692 ± 2% perf-stat.ps.page-faults 3.764e+12 +12.2% 4.224e+12 perf-stat.total.instructions Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Hello, kernel test robot noticed a 55.9% improvement of stress-ng.wait.ops_per_sec on: commit: 15bf8c7b35e31295b26241425c0a61102e92109f ("[PATCH v3] sched/fair: Forfeit vruntime on yield") url: https://github.com/intel-lab-lkp/linux/commits/Fernand-Sieber/sched-fair-Forfeit-vruntime-on-yield/20250918-231320 base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 0d4eaf8caf8cd633b23e949e2996b420052c2d45 patch link: https://lore.kernel.org/all/20250918150528.292620-1-sieberf@amazon.com/ patch subject: [PATCH v3] sched/fair: Forfeit vruntime on yield testcase: stress-ng config: x86_64-rhel-9.4 compiler: gcc-14 test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E CPU @ 2.4GHz (Sierra Forest) with 256G memory parameters: nr_threads: 100% testtime: 60s test: wait cpufreq_governor: performance In addition to that, the commit also has significant impact on the following tests: +------------------+---------------------------------------------------------+ | testcase: change | stress-ng: stress-ng.alarm.ops_per_sec 1.3% improvement | | test machine | 104 threads 2 sockets (Skylake) with 192G memory | | test parameters | cpufreq_governor=performance | | | nr_threads=100% | | | test=alarm | | | testtime=60s | +------------------+---------------------------------------------------------+ Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250924/202509241501.f14b210a-lkp@intel.com ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp3/wait/stress-ng/60s commit: 0d4eaf8caf ("sched/fair: Do not balance task to a throttled cfs_rq") 15bf8c7b35 ("sched/fair: Forfeit vruntime on yield") 0d4eaf8caf8cd633 15bf8c7b35e31295b26241425c0 ---------------- --------------------------- %stddev %change %stddev \ | \ 20935372 ± 13% -74.1% 5416590 ± 38% cpuidle..usage 0.22 ± 6% -0.1 0.15 ± 6% mpstat.cpu.all.irq% 1.56 ± 3% +0.6 2.16 ± 4% mpstat.cpu.all.usr% 2928651 ± 48% +63.3% 4781087 ± 7% numa-numastat.node1.local_node 2986407 ± 47% +63.0% 4867647 ± 8% numa-numastat.node1.numa_hit 65592344 ± 22% +408.5% 3.335e+08 ± 6% stress-ng.time.involuntary_context_switches 64507 ± 3% -10.6% 57643 ± 5% stress-ng.time.minor_page_faults 268.43 +58.0% 424.24 stress-ng.time.user_time 94660203 ± 3% +32.0% 1.25e+08 stress-ng.time.voluntary_context_switches 8733656 ± 3% +55.9% 13619248 stress-ng.wait.ops 145711 ± 3% +55.9% 227211 stress-ng.wait.ops_per_sec 9901871 ± 23% +33.6% 13230903 ± 9% meminfo.Active 9901855 ± 23% +33.6% 13230887 ± 9% meminfo.Active(anon) 12749041 ± 18% +26.5% 16122685 ± 7% meminfo.Cached 14843475 ± 15% +22.4% 18175107 ± 5% meminfo.Committed_AS 16718698 ± 13% +19.8% 20027386 ± 5% meminfo.Memused 9098551 ± 25% +37.1% 12472304 ± 9% meminfo.Shmem 16772967 ± 13% +19.8% 20096231 ± 6% meminfo.max_used_kB 7828333 ± 51% +66.6% 13041791 ± 9% numa-meminfo.node1.Active 7828325 ± 51% +66.6% 13041784 ± 9% numa-meminfo.node1.Active(anon) 7314210 ± 52% +85.0% 13533714 ± 10% numa-meminfo.node1.FilePages 61743 ± 26% +43.3% 88498 ± 20% numa-meminfo.node1.KReclaimable 9385294 ± 42% +66.0% 15578695 ± 9% numa-meminfo.node1.MemUsed 61743 ± 26% +43.3% 88498 ± 20% numa-meminfo.node1.SReclaimable 7219596 ± 53% +72.1% 12426234 ± 9% numa-meminfo.node1.Shmem 1958162 ± 51% +66.6% 3262251 ± 9% numa-vmstat.node1.nr_active_anon 1829587 ± 52% +85.0% 3385199 ± 10% numa-vmstat.node1.nr_file_pages 1805933 ± 53% +72.1% 3108329 ± 9% numa-vmstat.node1.nr_shmem 15439 ± 26% +43.4% 22139 ± 20% numa-vmstat.node1.nr_slab_reclaimable 1958158 ± 51% +66.6% 3262247 ± 9% numa-vmstat.node1.nr_zone_active_anon 2985336 ± 47% +63.0% 4867285 ± 8% numa-vmstat.node1.numa_hit 2927581 ± 48% +63.3% 4780725 ± 7% numa-vmstat.node1.numa_local 2475878 ± 23% +33.7% 3310125 ± 9% proc-vmstat.nr_active_anon 201955 ± 2% -5.5% 190887 ± 3% proc-vmstat.nr_anon_pages 3187672 ± 18% +26.5% 4033035 ± 7% proc-vmstat.nr_file_pages 2275048 ± 25% +37.2% 3120439 ± 9% proc-vmstat.nr_shmem 43269 ± 3% +4.5% 45201 proc-vmstat.nr_slab_reclaimable 2475878 ± 23% +33.7% 3310125 ± 9% proc-vmstat.nr_zone_active_anon 4045331 ± 20% +29.0% 5218368 ± 7% proc-vmstat.numa_hit 3847426 ± 21% +30.5% 5020327 ± 7% proc-vmstat.numa_local 4094249 ± 19% +28.8% 5274030 ± 7% proc-vmstat.pgalloc_normal 9011996 ± 5% +23.4% 11121508 ± 5% sched_debug.cfs_rq:/.avg_vruntime.max 3236082 ± 2% +19.6% 3869616 sched_debug.cfs_rq:/.avg_vruntime.min 1260971 ± 4% +25.1% 1577635 ± 9% sched_debug.cfs_rq:/.avg_vruntime.stddev 0.53 ± 5% -8.9% 0.49 ± 3% sched_debug.cfs_rq:/.h_nr_queued.stddev 0.54 ± 4% -8.7% 0.49 ± 3% sched_debug.cfs_rq:/.h_nr_runnable.stddev 9011996 ± 5% +23.4% 11121508 ± 5% sched_debug.cfs_rq:/.min_vruntime.max 3236082 ± 2% +19.6% 3869616 sched_debug.cfs_rq:/.min_vruntime.min 1260972 ± 4% +25.1% 1577635 ± 9% sched_debug.cfs_rq:/.min_vruntime.stddev 1261 ± 4% -16.4% 1054 ± 6% sched_debug.cfs_rq:/.util_avg.max 170.04 ± 4% -30.0% 119.10 ± 6% sched_debug.cfs_rq:/.util_avg.stddev 390.34 ± 2% +34.0% 523.00 ± 2% sched_debug.cfs_rq:/.util_est.avg 219.06 ± 5% +22.5% 268.29 ± 4% sched_debug.cfs_rq:/.util_est.stddev 765966 ± 3% -13.1% 665650 ± 3% sched_debug.cpu.max_idle_balance_cost.avg 296999 ± 5% -22.6% 229736 ± 5% sched_debug.cpu.max_idle_balance_cost.stddev 0.53 ± 6% -10.2% 0.48 ± 3% sched_debug.cpu.nr_running.stddev 467856 ± 5% +154.2% 1189068 ± 4% sched_debug.cpu.nr_switches.avg 1091334 ± 35% +458.8% 6098488 ± 11% sched_debug.cpu.nr_switches.max 156457 ± 39% +579.7% 1063429 ± 12% sched_debug.cpu.nr_switches.stddev 1.522e+10 ± 2% +33.0% 2.025e+10 ± 4% perf-stat.i.branch-instructions 26461017 ± 8% +25.3% 33152871 ± 4% perf-stat.i.branch-misses 80419215 ± 6% +22.5% 98514949 perf-stat.i.cache-references 2950621 ± 6% +154.2% 7499768 ± 4% perf-stat.i.context-switches 8.86 -23.8% 6.75 perf-stat.i.cpi 4890 ± 16% -56.2% 2140 ± 15% perf-stat.i.cpu-migrations 44725 ± 7% -16.0% 37555 ± 3% perf-stat.i.cycles-between-cache-misses 7.212e+10 ± 2% +31.4% 9.48e+10 ± 4% perf-stat.i.instructions 0.12 ± 3% +32.7% 0.17 ± 7% perf-stat.i.ipc 15.37 ± 6% +154.2% 39.06 ± 4% perf-stat.i.metric.K/sec 8.17 -23.4% 6.26 perf-stat.overall.cpi 0.12 +30.5% 0.16 perf-stat.overall.ipc 1.498e+10 ± 2% +33.0% 1.993e+10 ± 4% perf-stat.ps.branch-instructions 26034509 ± 8% +25.3% 32622824 ± 4% perf-stat.ps.branch-misses 79145687 ± 6% +22.5% 96950950 perf-stat.ps.cache-references 2903516 ± 6% +154.2% 7379460 ± 4% perf-stat.ps.context-switches 4802 ± 16% -56.3% 2099 ± 15% perf-stat.ps.cpu-migrations 7.098e+10 ± 2% +31.4% 9.33e+10 ± 4% perf-stat.ps.instructions 4.42e+12 +30.9% 5.787e+12 perf-stat.total.instructions *************************************************************************************************** lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-skl-fpga01/alarm/stress-ng/60s commit: 0d4eaf8caf ("sched/fair: Do not balance task to a throttled cfs_rq") 15bf8c7b35 ("sched/fair: Forfeit vruntime on yield") 0d4eaf8caf8cd633 15bf8c7b35e31295b26241425c0 ---------------- --------------------------- %stddev %change %stddev \ | \ 13051 ± 26% +40.8% 18378 ± 6% numa-meminfo.node1.PageTables 230411 ± 15% -24.0% 175131 ± 19% numa-numastat.node0.local_node 122.83 ± 10% +24.6% 153.00 ± 9% sched_debug.cfs_rq:/.runnable_avg.min 229700 ± 15% -24.0% 174608 ± 19% numa-vmstat.node0.numa_local 3264 ± 26% +40.4% 4584 ± 6% numa-vmstat.node1.nr_page_table_pages 34.64 -0.5 34.15 turbostat.C1% 1.25 ± 2% -0.3 0.92 ± 6% turbostat.C1E% 1.227e+08 +1.3% 1.243e+08 stress-ng.alarm.ops 2044889 +1.3% 2071190 stress-ng.alarm.ops_per_sec 17839864 +33.4% 23790385 stress-ng.time.involuntary_context_switches 5045 +1.6% 5127 stress-ng.time.percent_of_cpu_this_job_got 1938 +1.8% 1972 stress-ng.time.system_time 1094 +1.4% 1109 stress-ng.time.user_time 1.402e+10 +1.2% 1.419e+10 perf-stat.i.branch-instructions 9.466e+08 +2.1% 9.661e+08 perf-stat.i.cache-references 6720093 +2.3% 6874753 perf-stat.i.context-switches 2.01e+11 +1.4% 2.038e+11 perf-stat.i.cpu-cycles 2173629 +3.4% 2247122 perf-stat.i.cpu-migrations 6.961e+10 +1.2% 7.047e+10 perf-stat.i.instructions 85.51 +2.6% 87.75 perf-stat.i.metric.K/sec 1.373e+10 +1.2% 1.39e+10 perf-stat.ps.branch-instructions 9.333e+08 +2.1% 9.53e+08 perf-stat.ps.cache-references 6626920 +2.3% 6780505 perf-stat.ps.context-switches 1.979e+11 +1.4% 2.007e+11 perf-stat.ps.cpu-cycles 2146232 +3.4% 2219100 perf-stat.ps.cpu-migrations 6.82e+10 +1.2% 6.905e+10 perf-stat.ps.instructions 16.99 -0.7 16.30 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call 0.63 -0.4 0.25 ±100% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.do_nanosleep 0.76 ± 15% -0.3 0.43 ± 73% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 33.81 -0.3 33.51 perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64 32.55 -0.3 32.25 perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary 32.48 -0.3 32.19 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry 1.06 -0.1 0.93 perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.do_nanosleep.hrtimer_nanosleep 5.84 -0.1 5.74 perf-profile.calltrace.cycles-pp.schedule.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 5.66 -0.1 5.56 perf-profile.calltrace.cycles-pp.__schedule.schedule.do_nanosleep.hrtimer_nanosleep.common_nsleep 8.87 -0.1 8.79 perf-profile.calltrace.cycles-pp.__x64_sys_clock_nanosleep.do_syscall_64.entry_SYSCALL_64_after_hwframe 8.02 -0.1 7.94 perf-profile.calltrace.cycles-pp.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep.do_syscall_64 8.38 -0.1 8.31 perf-profile.calltrace.cycles-pp.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep.do_syscall_64.entry_SYSCALL_64_after_hwframe 8.42 -0.1 8.35 perf-profile.calltrace.cycles-pp.common_nsleep.__x64_sys_clock_nanosleep.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.92 +0.0 1.95 perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule 1.40 +0.0 1.44 perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.enqueue_task.ttwu_do_activate.sched_ttwu_pending 1.18 +0.0 1.22 perf-profile.calltrace.cycles-pp.select_idle_cpu.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up 0.68 +0.0 0.72 perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up.complete_signal 2.48 +0.0 2.52 perf-profile.calltrace.cycles-pp.try_to_block_task.__schedule.schedule.do_nanosleep.hrtimer_nanosleep 2.10 +0.0 2.14 perf-profile.calltrace.cycles-pp.try_to_wake_up.complete_signal.__send_signal_locked.do_send_sig_info.kill_pid_info_type 2.38 +0.0 2.42 perf-profile.calltrace.cycles-pp.dequeue_task_fair.try_to_block_task.__schedule.schedule.do_nanosleep 0.99 +0.0 1.03 perf-profile.calltrace.cycles-pp.select_task_rq_fair.select_task_rq.try_to_wake_up.complete_signal.__send_signal_locked 2.32 +0.0 2.36 perf-profile.calltrace.cycles-pp.complete_signal.__send_signal_locked.do_send_sig_info.kill_pid_info_type.kill_something_info 2.24 +0.0 2.28 perf-profile.calltrace.cycles-pp.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule.schedule 3.46 +0.0 3.50 perf-profile.calltrace.cycles-pp.__send_signal_locked.do_send_sig_info.kill_pid_info_type.kill_something_info.__x64_sys_kill 1.79 +0.0 1.84 perf-profile.calltrace.cycles-pp.enqueue_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue 1.73 +0.1 1.78 perf-profile.calltrace.cycles-pp.enqueue_task_fair.enqueue_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue 1.06 +0.1 1.11 perf-profile.calltrace.cycles-pp.select_task_rq.try_to_wake_up.complete_signal.__send_signal_locked.do_send_sig_info 2.36 +0.1 2.41 perf-profile.calltrace.cycles-pp.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle 4.26 +0.1 4.32 perf-profile.calltrace.cycles-pp.kill_pid_info_type.kill_something_info.__x64_sys_kill.do_syscall_64.entry_SYSCALL_64_after_hwframe 6.72 +0.1 6.78 perf-profile.calltrace.cycles-pp.alarm 0.73 +0.1 0.80 perf-profile.calltrace.cycles-pp.pick_task_fair.pick_next_task_fair.__pick_next_task.__schedule.schedule 2.86 +0.1 2.92 perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry 3.26 +0.1 3.33 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary 3.72 +0.1 3.80 perf-profile.calltrace.cycles-pp.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary.common_startup_64 0.85 +0.1 0.94 perf-profile.calltrace.cycles-pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.__x64_sys_sched_yield 0.88 +0.1 0.97 perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64 2.02 +0.1 2.15 perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield 1.54 +0.1 1.67 perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.57 +0.1 1.71 perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield 2.88 +0.2 3.04 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_yield 2.34 +0.2 2.51 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield 5.50 +0.2 5.68 perf-profile.calltrace.cycles-pp.__sched_yield 0.52 +0.5 1.04 perf-profile.calltrace.cycles-pp.select_idle_core.select_idle_cpu.select_idle_sibling.select_task_rq_fair.select_task_rq 34.13 -0.3 33.82 perf-profile.children.cycles-pp.cpuidle_idle_call 32.84 -0.3 32.54 perf-profile.children.cycles-pp.cpuidle_enter 32.79 -0.3 32.50 perf-profile.children.cycles-pp.cpuidle_enter_state 13.10 -0.3 12.81 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.78 ± 13% -0.2 0.58 ± 20% perf-profile.children.cycles-pp.intel_idle 8.88 -0.1 8.80 perf-profile.children.cycles-pp.__x64_sys_clock_nanosleep 8.05 -0.1 7.97 perf-profile.children.cycles-pp.do_nanosleep 8.39 -0.1 8.31 perf-profile.children.cycles-pp.hrtimer_nanosleep 8.46 -0.1 8.39 perf-profile.children.cycles-pp.common_nsleep 1.22 -0.1 1.17 perf-profile.children.cycles-pp.pick_task_fair 3.10 -0.0 3.06 perf-profile.children.cycles-pp.__pick_next_task 2.60 -0.0 2.56 perf-profile.children.cycles-pp.pick_next_task_fair 0.10 ± 3% -0.0 0.08 ± 5% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length 0.09 ± 5% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.sigprocmask 0.91 +0.0 0.94 perf-profile.children.cycles-pp.switch_mm_irqs_off 1.85 +0.0 1.89 perf-profile.children.cycles-pp.enqueue_entity 2.41 +0.0 2.45 perf-profile.children.cycles-pp.enqueue_task 2.39 +0.0 2.43 perf-profile.children.cycles-pp.dequeue_task_fair 2.48 +0.0 2.52 perf-profile.children.cycles-pp.try_to_block_task 1.42 +0.0 1.46 perf-profile.children.cycles-pp.available_idle_cpu 2.32 +0.0 2.37 perf-profile.children.cycles-pp.complete_signal 2.32 +0.0 2.36 perf-profile.children.cycles-pp.enqueue_task_fair 3.46 +0.0 3.51 perf-profile.children.cycles-pp.__send_signal_locked 4.27 +0.1 4.32 perf-profile.children.cycles-pp.kill_pid_info_type 4.03 +0.1 4.08 perf-profile.children.cycles-pp.do_send_sig_info 6.84 +0.1 6.90 perf-profile.children.cycles-pp.alarm 3.09 +0.1 3.15 perf-profile.children.cycles-pp.ttwu_do_activate 1.95 +0.1 2.02 perf-profile.children.cycles-pp.select_idle_core 2.23 +0.1 2.30 perf-profile.children.cycles-pp.select_idle_cpu 3.12 +0.1 3.19 perf-profile.children.cycles-pp.sched_ttwu_pending 3.58 +0.1 3.65 perf-profile.children.cycles-pp.__flush_smp_call_function_queue 2.62 +0.1 2.70 perf-profile.children.cycles-pp.select_idle_sibling 6.14 +0.1 6.22 perf-profile.children.cycles-pp.try_to_wake_up 3.78 +0.1 3.86 perf-profile.children.cycles-pp.flush_smp_call_function_queue 3.05 +0.1 3.14 perf-profile.children.cycles-pp.select_task_rq_fair 3.17 +0.1 3.26 perf-profile.children.cycles-pp.select_task_rq 2.03 +0.1 2.17 perf-profile.children.cycles-pp.__x64_sys_sched_yield 5.56 +0.2 5.75 perf-profile.children.cycles-pp.__sched_yield 0.78 ± 13% -0.2 0.58 ± 20% perf-profile.self.cycles-pp.intel_idle 0.22 ± 2% +0.0 0.23 perf-profile.self.cycles-pp.exit_to_user_mode_loop 0.80 +0.0 0.83 perf-profile.self.cycles-pp.switch_mm_irqs_off 1.40 +0.0 1.45 perf-profile.self.cycles-pp.available_idle_cpu Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
© 2016 - 2025 Red Hat, Inc.