[v2] sched/fair: Forfeit vruntime on yield

[PATCH v2] sched/fair: Forfeit vruntime on yield

Posted by Fernand Sieber 2 weeks, 1 day ago

If a task yields, the scheduler may decide to pick it again. The task in
turn may decide to yield immediately or shortly after, leading to a tight
loop of yields.

If there's another runnable task as this point, the deadline will be
increased by the slice at each loop. This can cause the deadline to runaway
pretty quickly, and subsequent elevated run delays later on as the task
doesn't get picked again. The reason the scheduler can pick the same task
again and again despite its deadline increasing is because it may be the
only eligible task at that point.

Fix this by making the task forfeiting its remaining vruntime and pushing
the deadline one slice ahead. This implements yield behavior more
authentically.

Fixes: 147f3efaa24182 ("sched/fair: Implement an EEVDF-like scheduling  policy")
Link: https://lore.kernel.org/r/20250401123622.584018-1-sieberf@amazon.com
Link: https://lore.kernel.org/r/20250911095113.203439-1-sieberf@amazon.com
Signed-off-by: Fernand Sieber <sieberf@amazon.com>

Changes in v2:
- Implement vruntime forfeiting approach suggested by Peter Zijlstra
- Updated commit name
- Previous Reviewed-by tags removed due to algorithm change
---
 kernel/sched/fair.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a14da5396fb..cc4ef7213d43 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9036,6 +9036,7 @@ static void yield_task_fair(struct rq *rq)
 	 */
 	rq_clock_skip_update(rq);

+	se->vruntime = se->deadline;
 	se->deadline += calc_delta_fair(se->slice, se);
 }

--
2.34.1




Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07

Re: [PATCH v2] sched/fair: Forfeit vruntime on yield

Posted by Xuewen Yan 2 weeks ago

On Tue, Sep 16, 2025 at 10:33 PM Fernand Sieber <sieberf@amazon.com> wrote:
>
> If a task yields, the scheduler may decide to pick it again. The task in
> turn may decide to yield immediately or shortly after, leading to a tight
> loop of yields.
>
> If there's another runnable task as this point, the deadline will be
> increased by the slice at each loop. This can cause the deadline to runaway
> pretty quickly, and subsequent elevated run delays later on as the task
> doesn't get picked again. The reason the scheduler can pick the same task
> again and again despite its deadline increasing is because it may be the
> only eligible task at that point.
>
> Fix this by making the task forfeiting its remaining vruntime and pushing
> the deadline one slice ahead. This implements yield behavior more
> authentically.
>
> Fixes: 147f3efaa24182 ("sched/fair: Implement an EEVDF-like scheduling  policy")
> Link: https://lore.kernel.org/r/20250401123622.584018-1-sieberf@amazon.com
> Link: https://lore.kernel.org/r/20250911095113.203439-1-sieberf@amazon.com
> Signed-off-by: Fernand Sieber <sieberf@amazon.com>
>
> Changes in v2:
> - Implement vruntime forfeiting approach suggested by Peter Zijlstra
> - Updated commit name
> - Previous Reviewed-by tags removed due to algorithm change
> ---
>  kernel/sched/fair.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7a14da5396fb..cc4ef7213d43 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9036,6 +9036,7 @@ static void yield_task_fair(struct rq *rq)
>          */
>         rq_clock_skip_update(rq);
>
> +       se->vruntime = se->deadline;
>         se->deadline += calc_delta_fair(se->slice, se);

Need we update_min_vruntime here?

>  }
>
> --
> 2.34.1
>
>
>
>
> Amazon Development Centre (South Africa) (Proprietary) Limited
> 29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
> Registration Number: 2004 / 034463 / 07
>
>

Re: [PATCH v2] sched/fair: Forfeit vruntime on yield

Posted by Fernand Sieber 2 weeks ago

Hi Peter,

I noticed you have pulled the change in sched/urgent.
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=78f8764d34c0a1912ce209bb2a428a94d062707f

However, I'd appreciate if you could weigh in on my concern regarding this
iteration not working well with core scheduling. Since the scheduler prefers to
run the yielding task again regardless of its eligibility rather than putting
the task in force idle, it can cause the yielding task vruntime to runaway
quickly. This scenario causes severe run delays later on. Please see my
previous reply with data supporting this concern. I think, the best approach to
address it would be to clamp vruntime. I'm not sure how exactly, a simple
approach would be to increment the vruntime by one slice until the task becomes
ineligible, if you have any suggestions let me know. I'll run some testing soon
when I get a chance.

Thanks, Fernand



Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07

Re: [PATCH v2] sched/fair: Forfeit vruntime on yield

Posted by Fernand Sieber 2 weeks, 1 day ago

After further testing I think we should stick with the original approach or
iterate on the vruntime forfeiting.

The vruntime forfeiting doesn't work well with core scheduling. The core
scheduler picks the best task across the SMT mask, and then the siblings run a
matching task no matter what. This means the core scheduler can keep picking
the yielding task on the sibling even after it becomes ineligible (because it's
preferrable than force idle). In this scenario the vruntime of the yielding
task runs away rapidly, which causes problematic imbalances later on.

Perhaps an alternative is to forfeit the vruntime (set it to the deadline), but
only once. I.e don't do it if the task is already ineligible? If the task is
ineligible then we simply increment the deadline as in my original patch?

Peter, let me know your thoughts on this.

Testing data below showing the vruntime forfeit yields bad max run delays:
vruntime forfeit:
• **yield_loop**: 4.37s runtime, max delay 272.99ms
• **busy_loop**: 13.54s runtime, max delay 552.01ms

deadline clamp:,
• **busy_loop**: 9.26s runtime, max delay 4.11ms
• **yield_loop**: 9.25s runtime, max delay 7.77ms

Test program:
#define PR_SCHED_CORE_SCOPE_THREAD 0
#define PR_SCHED_CORE_SCOPE_THREAD_GROUP 1
#endif

#include <sched.h>
#include <time.h>
#include <unistd.h>
#include <sys/prctl.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
    int should_yield = (argc > 1) ? atoi(argv[1]) : 1;
    time_t program_start = time(NULL);

    // Create core cookie for current process
    prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, 0, PR_SCHED_CORE_SCOPE_THREAD, 0);

    pid_t pid = fork();

    if (pid == 0) {
        // Child: yield for 5s then busy loop (if should_yield is 1)
        if (should_yield) {
            time_t start = time(NULL);
            while (time(NULL) - start < 5 && time(NULL) - program_start < 30) {
                sched_yield();
            }
        }
        while (time(NULL) - program_start < 30) {
            // busy loop
        }
    } else {
        // Parent: share cookie with child, then busy loop
        prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_TO, pid, PR_SCHED_CORE_SCOPE_THREAD, 0);
        while (time(NULL) - program_start < 30) {
            // busy loop
        }
    }

    return 0;
}

Repro:
taskset -c 0,1 core_yield_loop 1 &  #arg 1 = do yield
taskset -c 0,1 core_yield_loop 0 &  #arg 0 = don't yield



Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07

Re: [PATCH v2] sched/fair: Forfeit vruntime on yield

Posted by Peter Zijlstra 2 weeks ago

On Tue, Sep 16, 2025 at 06:00:35PM +0200, Fernand Sieber wrote:
> After further testing I think we should stick with the original approach or
> iterate on the vruntime forfeiting.
> 
> The vruntime forfeiting doesn't work well with core scheduling. The core
> scheduler picks the best task across the SMT mask, and then the siblings run a
> matching task no matter what. This means the core scheduler can keep picking
> the yielding task on the sibling even after it becomes ineligible (because it's
> preferrable than force idle). In this scenario the vruntime of the yielding
> task runs away rapidly, which causes problematic imbalances later on.
> 
> Perhaps an alternative is to forfeit the vruntime (set it to the deadline), but
> only once. I.e don't do it if the task is already ineligible? If the task is
> ineligible then we simply increment the deadline as in my original patch?
> 
> Peter, let me know your thoughts on this.

Sorry, I missed this email earlier. I'll go ponder it a bit -- my brain
is esp. slow today due to a cold :/

Re: [PATCH v2] sched/fair: Forfeit vruntime on yield

Posted by Peter Zijlstra 2 weeks ago

On Thu, Sep 18, 2025 at 08:43:00AM +0200, Peter Zijlstra wrote:
> On Tue, Sep 16, 2025 at 06:00:35PM +0200, Fernand Sieber wrote:
> > After further testing I think we should stick with the original approach or
> > iterate on the vruntime forfeiting.
> > 
> > The vruntime forfeiting doesn't work well with core scheduling. The core
> > scheduler picks the best task across the SMT mask, and then the siblings run a
> > matching task no matter what. This means the core scheduler can keep picking
> > the yielding task on the sibling even after it becomes ineligible (because it's
> > preferrable than force idle). In this scenario the vruntime of the yielding
> > task runs away rapidly, which causes problematic imbalances later on.
> > 
> > Perhaps an alternative is to forfeit the vruntime (set it to the deadline), but
> > only once. I.e don't do it if the task is already ineligible? If the task is
> > ineligible then we simply increment the deadline as in my original patch?
> > 
> > Peter, let me know your thoughts on this.
> 
> Sorry, I missed this email earlier. I'll go ponder it a bit -- my brain
> is esp. slow today due to a cold :/

Right; so you're saying something like the below, right?

Yeah, I suppose we can do that; please write a coherent comment on it
though, so we can remember why, later on.

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5c94caa93085..e75abf3c256d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9012,8 +9012,13 @@ static void yield_task_fair(struct rq *rq)
 	 */
 	rq_clock_skip_update(rq);
 
-	se->vruntime = se->deadline;
-	se->deadline += calc_delta_fair(se->slice, se);
+	/*
+	 * comment...
+	 */
+	if (entity_eligible(cfs_rq, se)) {
+		se->vruntime = se->deadline;
+		se->deadline += calc_delta_fair(se->slice, se);
+	}
 }
 
 static bool yield_to_task_fair(struct rq *rq, struct task_struct *p)

[PATCH v3] sched/fair: Forfeit vruntime on yield

Posted by Fernand Sieber 1 week, 6 days ago

If a task yields, the scheduler may decide to pick it again. The task in
turn may decide to yield immediately or shortly after, leading to a tight
loop of yields.

If there's another runnable task as this point, the deadline will be
increased by the slice at each loop. This can cause the deadline to runaway
pretty quickly, and subsequent elevated run delays later on as the task
doesn't get picked again. The reason the scheduler can pick the same task
again and again despite its deadline increasing is because it may be the
only eligible task at that point.

Fix this by making the task forfeiting its remaining vruntime and pushing
the deadline one slice ahead. This implements yield behavior more
authentically.

We limit the forfeiting to eligible tasks. This is because core scheduling
prefers running ineligible tasks rather than force idling. As such, without
the condition, we can end up on a yield loop which makes the vruntime
increase rapidly, leading to anomalous run delays later down the line.

Fixes: 147f3efaa24182 ("sched/fair: Implement an EEVDF-like scheduling  policy")
Link: https://lore.kernel.org/r/20250401123622.584018-1-sieberf@amazon.com
Link: https://lore.kernel.org/r/20250911095113.203439-1-sieberf@amazon.com
Signed-off-by: Fernand Sieber <sieberf@amazon.com>

Changes in v2:
- Implement vruntime forfeiting approach suggested by Peter Zijlstra
- Updated commit name
- Previous Reviewed-by tags removed due to algorithm change

Changes in v3:
- Only increase vruntime for eligible tasks to avoid runaway vruntime with
  core scheduling

Link: https://lore.kernel.org/r/20250916140228.452231-1-sieberf@amazon.com
---
 kernel/sched/fair.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b173a059315c..46e5a976f402 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8921,7 +8921,19 @@ static void yield_task_fair(struct rq *rq)
 	 */
 	rq_clock_skip_update(rq);
 
-	se->deadline += calc_delta_fair(se->slice, se);
+	/*
+	 * Forfeit the remaining vruntime, only if the entity is eligible. This
+	 * condition is necessary because in core scheduling we prefer to run
+	 * ineligible tasks rather than force idling. If this happens we may
+	 * end up in a loop where the core scheduler picks the yielding task,
+	 * which yields immediately again; without the condition the vruntime
+	 * ends up quickly running away.
+	 */
+	if (entity_eligible(cfs_rq, se)) {
+		se->vruntime = se->deadline;
+		se->deadline += calc_delta_fair(se->slice, se);
+		update_min_vruntime(cfs_rq);
+	}
 }
 
 static bool yield_to_task_fair(struct rq *rq, struct task_struct *p)
-- 
2.34.1




Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07

Re: [PATCH v3] sched/fair: Forfeit vruntime on yield

Posted by kernel test robot 6 days, 6 hours ago

Hello,


we reported "a 55.9% improvement of stress-ng.wait.ops_per_sec"
in https://lore.kernel.org/all/202509241501.f14b210a-lkp@intel.com/

now we noticed there is also a regression in our tests. report again FYI.

one thing we want to mention is the "stress-ng.sockpair.MB_written_per_sec" is
in "miscellaneous metrics" of this stress-ng test. for major part,
"stress-ng.sockpair.ops_per_sec", it's just a small difference.

0d4eaf8caf8cd633 15bf8c7b35e31295b26241425c0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    551.38           -90.5%      52.18        stress-ng.sockpair.MB_written_per_sec
    781743            -2.3%     764106        stress-ng.sockpair.ops_per_sec


below is a test example for 15bf8c7b35:

2025-09-25 15:48:21 stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --oom-avoid --sockpair 192
stress-ng: info:  [8371] setting to a 1 min run per stressor
stress-ng: info:  [8371] dispatching hogs: 192 sockpair
stress-ng: info:  [8371] note: /proc/sys/kernel/sched_autogroup_enabled is 1 and this can impact scheduling throughput for processes not attached to a tty. Setting this to 0 may improve performance metrics
stress-ng: metrc: [8371] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [8371]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [8371] sockpair       49874197     65.44     72.08  12219.54    762108.28        4057.58        97.82          3132
stress-ng: metrc: [8371] miscellaneous metrics:
stress-ng: metrc: [8371] sockpair           27717.04 socketpair calls sec (harmonic mean of 192 instances)
stress-ng: metrc: [8371] sockpair              53.01 MB written per sec (harmonic mean of 192 instances)
stress-ng: info:  [8371] for a 66.13s run time:
stress-ng: info:  [8371]   12696.46s available CPU time
stress-ng: info:  [8371]      72.07s user time   (  0.57%)
stress-ng: info:  [8371]   12219.63s system time ( 96.24%)
stress-ng: info:  [8371]   12291.70s total time  ( 96.81%)
stress-ng: info:  [8371] load average: 190.99 57.46 19.94
stress-ng: info:  [8371] skipped: 0
stress-ng: info:  [8371] passed: 192: sockpair (192)
stress-ng: info:  [8371] failed: 0
stress-ng: info:  [8371] metrics untrustworthy: 0
stress-ng: info:  [8371] successful run completed in 1 min, 6.13 secs


below is an exmple from 0d4eaf8caf:

2025-09-25 18:04:37 stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --oom-avoid --sockpair 192
stress-ng: info:  [8360] setting to a 1 min run per stressor
stress-ng: info:  [8360] dispatching hogs: 192 sockpair
stress-ng: info:  [8360] note: /proc/sys/kernel/sched_autogroup_enabled is 1 and this can impact scheduling throughput for processes not attached to a tty. Setting this to 0 may improve performance metrics
stress-ng: metrc: [8360] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [8360]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [8360] sockpair       51705787     65.08     56.75  12254.39    794448.25        4199.92        98.52          5160
stress-ng: metrc: [8360] miscellaneous metrics:
stress-ng: metrc: [8360] sockpair           28156.62 socketpair calls sec (harmonic mean of 192 instances)
stress-ng: metrc: [8360] sockpair             562.18 MB written per sec (harmonic mean of 192 instances)
stress-ng: info:  [8360] for a 65.40s run time:
stress-ng: info:  [8360]   12556.08s available CPU time
stress-ng: info:  [8360]      56.75s user time   (  0.45%)
stress-ng: info:  [8360]   12254.48s system time ( 97.60%)
stress-ng: info:  [8360]   12311.23s total time  ( 98.05%)
stress-ng: info:  [8360] load average: 239.81 72.31 25.10
stress-ng: info:  [8360] skipped: 0
stress-ng: info:  [8360] passed: 192: sockpair (192)
stress-ng: info:  [8360] failed: 0
stress-ng: info:  [8360] metrics untrustworthy: 0
stress-ng: info:  [8360] successful run completed in 1 min, 5.40 secs


below is full report.


kernel test robot noticed a 90.5% regression of stress-ng.sockpair.MB_written_per_sec on:


commit: 15bf8c7b35e31295b26241425c0a61102e92109f ("[PATCH v3] sched/fair: Forfeit vruntime on yield")
url: https://github.com/intel-lab-lkp/linux/commits/Fernand-Sieber/sched-fair-Forfeit-vruntime-on-yield/20250918-231320
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 0d4eaf8caf8cd633b23e949e2996b420052c2d45
patch link: https://lore.kernel.org/all/20250918150528.292620-1-sieberf@amazon.com/
patch subject: [PATCH v3] sched/fair: Forfeit vruntime on yield

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: sockpair
	cpufreq_governor: performance



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202509261113.a87577ce-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250926/202509261113.a87577ce-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp3/sockpair/stress-ng/60s

commit: 
  0d4eaf8caf ("sched/fair: Do not balance task to a throttled cfs_rq")
  15bf8c7b35 ("sched/fair: Forfeit vruntime on yield")

0d4eaf8caf8cd633 15bf8c7b35e31295b26241425c0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.78 ±  2%      +0.2        1.02        mpstat.cpu.all.usr%
     19.57           -36.8%      12.36 ± 70%  turbostat.RAMWatt
 4.073e+08 ±  6%     +23.1%  5.013e+08 ±  5%  cpuidle..time
    266261 ±  9%     +46.4%     389733 ±  9%  cpuidle..usage
    451887 ± 77%    +160.9%    1178929 ± 33%  numa-vmstat.node0.nr_file_pages
    192819 ± 30%    +101.3%     388191 ± 43%  numa-vmstat.node1.nr_shmem
   1807416 ± 77%    +161.0%    4716665 ± 33%  numa-meminfo.node0.FilePages
   8980121            -9.0%    8174177        numa-meminfo.node0.SUnreclaim
  25356157 ±  8%     -22.0%   19772595 ±  9%  numa-meminfo.node1.MemUsed
    771480 ± 30%    +101.4%    1553932 ± 43%  numa-meminfo.node1.Shmem
    551.38           -90.5%      52.18        stress-ng.sockpair.MB_written_per_sec
  51092272            -2.2%   49968621        stress-ng.sockpair.ops
    781743            -2.3%     764106        stress-ng.sockpair.ops_per_sec
  21418332 ±  4%     +69.2%   36232510        stress-ng.time.involuntary_context_switches
     56.36           +27.4%      71.81        stress-ng.time.user_time
    150809 ± 21%  +17217.1%   26115838 ±  3%  stress-ng.time.voluntary_context_switches
   2165914 ±  7%     +92.3%    4165197 ±  4%  meminfo.Active
   2165898 ±  7%     +92.3%    4165181 ±  4%  meminfo.Active(anon)
   4926568           +39.6%    6875228        meminfo.Cached
   6826363           +28.1%    8744371        meminfo.Committed_AS
    513281 ±  8%     +98.7%    1019681 ±  6%  meminfo.Mapped
  48472806 ±  2%     -14.8%   41314088        meminfo.Memused
   1276164          +152.7%    3224818 ±  3%  meminfo.Shmem
  53022761 ±  2%     -15.7%   44672632        meminfo.max_used_kB
      0.53           -81.0%       0.10 ±  4%  perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
      0.53           -81.0%       0.10 ±  4%  perf-sched.total_sch_delay.average.ms
      2.03           -68.4%       0.64 ±  4%  perf-sched.total_wait_and_delay.average.ms
   1811449          +200.9%    5449776 ±  4%  perf-sched.total_wait_and_delay.count.ms
      1.50           -64.0%       0.54 ±  4%  perf-sched.total_wait_time.average.ms
      2.03           -68.4%       0.64 ±  4%  perf-sched.wait_and_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
   1811449          +200.9%    5449776 ±  4%  perf-sched.wait_and_delay.count.[unknown].[unknown].[unknown].[unknown].[unknown]
      1.50           -64.0%       0.54 ±  4%  perf-sched.wait_time.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
    541937 ±  7%     +92.5%    1043389 ±  4%  proc-vmstat.nr_active_anon
   5242293            +3.5%    5423918        proc-vmstat.nr_dirty_background_threshold
  10497404            +3.5%   10861099        proc-vmstat.nr_dirty_threshold
   1232280           +39.7%    1721251        proc-vmstat.nr_file_pages
  52782357            +3.4%   54601330        proc-vmstat.nr_free_pages
  52117733            +3.8%   54073313        proc-vmstat.nr_free_pages_blocks
    128259 ±  8%    +100.8%     257594 ±  6%  proc-vmstat.nr_mapped
    319681          +153.0%     808650 ±  3%  proc-vmstat.nr_shmem
   4489133            -8.9%    4089704        proc-vmstat.nr_slab_unreclaimable
    541937 ±  7%     +92.5%    1043389 ±  4%  proc-vmstat.nr_zone_active_anon
  77303955            +2.5%   79201972        proc-vmstat.pgalloc_normal
    519724            +5.2%     546556        proc-vmstat.pgfault
  76456707            +1.7%   77739095        proc-vmstat.pgfree
  12794131 ±  6%     -27.4%    9288185        sched_debug.cfs_rq:/.avg_vruntime.max
   4610143 ±  8%     -14.9%    3923890 ±  5%  sched_debug.cfs_rq:/.avg_vruntime.min
      1.03           -20.1%       0.83 ±  2%  sched_debug.cfs_rq:/.h_nr_queued.avg
      1.03           -20.8%       0.82 ±  2%  sched_debug.cfs_rq:/.h_nr_runnable.avg
    895.00 ± 70%     +89.0%       1691 ±  2%  sched_debug.cfs_rq:/.load.min
      0.67 ± 55%    +125.0%       1.50        sched_debug.cfs_rq:/.load_avg.min
  12794131 ±  6%     -27.4%    9288185        sched_debug.cfs_rq:/.min_vruntime.max
   4610143 ±  8%     -14.9%    3923896 ±  5%  sched_debug.cfs_rq:/.min_vruntime.min
      1103           -20.2%     880.86        sched_debug.cfs_rq:/.runnable_avg.avg
    428.26 ±  6%     -63.4%     156.94 ± 22%  sched_debug.cfs_rq:/.util_est.avg
      1775 ±  6%     -39.3%       1077 ± 15%  sched_debug.cfs_rq:/.util_est.max
    396.33 ±  6%     -50.0%     198.03 ± 17%  sched_debug.cfs_rq:/.util_est.stddev
     50422 ±  6%     -34.7%      32915 ± 18%  sched_debug.cpu.avg_idle.min
    456725 ± 10%     +39.4%     636811 ±  4%  sched_debug.cpu.avg_idle.stddev
    611566 ±  5%     +25.0%     764424 ±  2%  sched_debug.cpu.max_idle_balance_cost.avg
    190657 ± 12%     +36.1%     259410 ±  5%  sched_debug.cpu.max_idle_balance_cost.stddev
      1.04           -20.4%       0.82 ±  2%  sched_debug.cpu.nr_running.avg
     57214 ±  4%    +183.5%     162228 ±  2%  sched_debug.cpu.nr_switches.avg
    253314 ±  4%     +39.3%     352777 ±  4%  sched_debug.cpu.nr_switches.max
     59410 ±  6%     +31.6%      78186 ± 10%  sched_debug.cpu.nr_switches.stddev
      3.33           -27.9%       2.40        perf-stat.i.MPKI
 1.207e+10           +11.3%  1.344e+10        perf-stat.i.branch-instructions
      0.21 ±  7%      +0.0        0.24 ±  5%  perf-stat.i.branch-miss-rate%
  23462655 ±  6%     +27.4%   29896517 ±  3%  perf-stat.i.branch-misses
     75.74            -4.4       71.33        perf-stat.i.cache-miss-rate%
 1.861e+08           -21.5%  1.462e+08        perf-stat.i.cache-misses
 2.435e+08           -17.1%  2.017e+08        perf-stat.i.cache-references
    323065 ±  5%    +191.4%     941425 ±  2%  perf-stat.i.context-switches
     10.73            -9.7%       9.69        perf-stat.i.cpi
    353.45           +39.0%     491.13 ±  4%  perf-stat.i.cpu-migrations
      3589           +30.5%       4685        perf-stat.i.cycles-between-cache-misses
 5.645e+10           +12.0%  6.323e+10        perf-stat.i.instructions
      0.09           +12.1%       0.11        perf-stat.i.ipc
      1.66 ±  5%    +193.9%       4.89 ±  2%  perf-stat.i.metric.K/sec
      6247            +5.7%       6603 ±  2%  perf-stat.i.minor-faults
      6248            +5.7%       6604 ±  2%  perf-stat.i.page-faults
      3.33           -29.7%       2.34        perf-stat.overall.MPKI
      0.20 ±  7%      +0.0        0.23 ±  4%  perf-stat.overall.branch-miss-rate%
     76.67            -3.9       72.79        perf-stat.overall.cache-miss-rate%
     10.54           -11.1%       9.37        perf-stat.overall.cpi
      3168           +26.5%       4007        perf-stat.overall.cycles-between-cache-misses
      0.09           +12.5%       0.11        perf-stat.overall.ipc
 1.204e+10           +11.1%  1.337e+10        perf-stat.ps.branch-instructions
  23586580 ±  7%     +29.7%   30600100 ±  4%  perf-stat.ps.branch-misses
 1.873e+08           -21.4%  1.471e+08        perf-stat.ps.cache-misses
 2.443e+08           -17.3%  2.021e+08        perf-stat.ps.cache-references
    324828 ±  5%    +187.0%     932274 ±  2%  perf-stat.ps.context-switches
    335.13 ±  2%     +41.7%     474.95 ±  5%  perf-stat.ps.cpu-migrations
 5.632e+10           +11.7%  6.293e+10        perf-stat.ps.instructions
      6282            +6.5%       6690 ±  2%  perf-stat.ps.minor-faults
      6284            +6.5%       6692 ±  2%  perf-stat.ps.page-faults
 3.764e+12           +12.2%  4.224e+12        perf-stat.total.instructions



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH v3] sched/fair: Forfeit vruntime on yield

Posted by kernel test robot 1 week, 1 day ago


Hello,

kernel test robot noticed a 55.9% improvement of stress-ng.wait.ops_per_sec on:


commit: 15bf8c7b35e31295b26241425c0a61102e92109f ("[PATCH v3] sched/fair: Forfeit vruntime on yield")
url: https://github.com/intel-lab-lkp/linux/commits/Fernand-Sieber/sched-fair-Forfeit-vruntime-on-yield/20250918-231320
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 0d4eaf8caf8cd633b23e949e2996b420052c2d45
patch link: https://lore.kernel.org/all/20250918150528.292620-1-sieberf@amazon.com/
patch subject: [PATCH v3] sched/fair: Forfeit vruntime on yield

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: wait
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+---------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.alarm.ops_per_sec 1.3% improvement |
| test machine     | 104 threads 2 sockets (Skylake) with 192G memory        |
| test parameters  | cpufreq_governor=performance                            |
|                  | nr_threads=100%                                         |
|                  | test=alarm                                              |
|                  | testtime=60s                                            |
+------------------+---------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250924/202509241501.f14b210a-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp3/wait/stress-ng/60s

commit: 
  0d4eaf8caf ("sched/fair: Do not balance task to a throttled cfs_rq")
  15bf8c7b35 ("sched/fair: Forfeit vruntime on yield")

0d4eaf8caf8cd633 15bf8c7b35e31295b26241425c0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  20935372 ± 13%     -74.1%    5416590 ± 38%  cpuidle..usage
      0.22 ±  6%      -0.1        0.15 ±  6%  mpstat.cpu.all.irq%
      1.56 ±  3%      +0.6        2.16 ±  4%  mpstat.cpu.all.usr%
   2928651 ± 48%     +63.3%    4781087 ±  7%  numa-numastat.node1.local_node
   2986407 ± 47%     +63.0%    4867647 ±  8%  numa-numastat.node1.numa_hit
  65592344 ± 22%    +408.5%  3.335e+08 ±  6%  stress-ng.time.involuntary_context_switches
     64507 ±  3%     -10.6%      57643 ±  5%  stress-ng.time.minor_page_faults
    268.43           +58.0%     424.24        stress-ng.time.user_time
  94660203 ±  3%     +32.0%   1.25e+08        stress-ng.time.voluntary_context_switches
   8733656 ±  3%     +55.9%   13619248        stress-ng.wait.ops
    145711 ±  3%     +55.9%     227211        stress-ng.wait.ops_per_sec
   9901871 ± 23%     +33.6%   13230903 ±  9%  meminfo.Active
   9901855 ± 23%     +33.6%   13230887 ±  9%  meminfo.Active(anon)
  12749041 ± 18%     +26.5%   16122685 ±  7%  meminfo.Cached
  14843475 ± 15%     +22.4%   18175107 ±  5%  meminfo.Committed_AS
  16718698 ± 13%     +19.8%   20027386 ±  5%  meminfo.Memused
   9098551 ± 25%     +37.1%   12472304 ±  9%  meminfo.Shmem
  16772967 ± 13%     +19.8%   20096231 ±  6%  meminfo.max_used_kB
   7828333 ± 51%     +66.6%   13041791 ±  9%  numa-meminfo.node1.Active
   7828325 ± 51%     +66.6%   13041784 ±  9%  numa-meminfo.node1.Active(anon)
   7314210 ± 52%     +85.0%   13533714 ± 10%  numa-meminfo.node1.FilePages
     61743 ± 26%     +43.3%      88498 ± 20%  numa-meminfo.node1.KReclaimable
   9385294 ± 42%     +66.0%   15578695 ±  9%  numa-meminfo.node1.MemUsed
     61743 ± 26%     +43.3%      88498 ± 20%  numa-meminfo.node1.SReclaimable
   7219596 ± 53%     +72.1%   12426234 ±  9%  numa-meminfo.node1.Shmem
   1958162 ± 51%     +66.6%    3262251 ±  9%  numa-vmstat.node1.nr_active_anon
   1829587 ± 52%     +85.0%    3385199 ± 10%  numa-vmstat.node1.nr_file_pages
   1805933 ± 53%     +72.1%    3108329 ±  9%  numa-vmstat.node1.nr_shmem
     15439 ± 26%     +43.4%      22139 ± 20%  numa-vmstat.node1.nr_slab_reclaimable
   1958158 ± 51%     +66.6%    3262247 ±  9%  numa-vmstat.node1.nr_zone_active_anon
   2985336 ± 47%     +63.0%    4867285 ±  8%  numa-vmstat.node1.numa_hit
   2927581 ± 48%     +63.3%    4780725 ±  7%  numa-vmstat.node1.numa_local
   2475878 ± 23%     +33.7%    3310125 ±  9%  proc-vmstat.nr_active_anon
    201955 ±  2%      -5.5%     190887 ±  3%  proc-vmstat.nr_anon_pages
   3187672 ± 18%     +26.5%    4033035 ±  7%  proc-vmstat.nr_file_pages
   2275048 ± 25%     +37.2%    3120439 ±  9%  proc-vmstat.nr_shmem
     43269 ±  3%      +4.5%      45201        proc-vmstat.nr_slab_reclaimable
   2475878 ± 23%     +33.7%    3310125 ±  9%  proc-vmstat.nr_zone_active_anon
   4045331 ± 20%     +29.0%    5218368 ±  7%  proc-vmstat.numa_hit
   3847426 ± 21%     +30.5%    5020327 ±  7%  proc-vmstat.numa_local
   4094249 ± 19%     +28.8%    5274030 ±  7%  proc-vmstat.pgalloc_normal
   9011996 ±  5%     +23.4%   11121508 ±  5%  sched_debug.cfs_rq:/.avg_vruntime.max
   3236082 ±  2%     +19.6%    3869616        sched_debug.cfs_rq:/.avg_vruntime.min
   1260971 ±  4%     +25.1%    1577635 ±  9%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.53 ±  5%      -8.9%       0.49 ±  3%  sched_debug.cfs_rq:/.h_nr_queued.stddev
      0.54 ±  4%      -8.7%       0.49 ±  3%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
   9011996 ±  5%     +23.4%   11121508 ±  5%  sched_debug.cfs_rq:/.min_vruntime.max
   3236082 ±  2%     +19.6%    3869616        sched_debug.cfs_rq:/.min_vruntime.min
   1260972 ±  4%     +25.1%    1577635 ±  9%  sched_debug.cfs_rq:/.min_vruntime.stddev
      1261 ±  4%     -16.4%       1054 ±  6%  sched_debug.cfs_rq:/.util_avg.max
    170.04 ±  4%     -30.0%     119.10 ±  6%  sched_debug.cfs_rq:/.util_avg.stddev
    390.34 ±  2%     +34.0%     523.00 ±  2%  sched_debug.cfs_rq:/.util_est.avg
    219.06 ±  5%     +22.5%     268.29 ±  4%  sched_debug.cfs_rq:/.util_est.stddev
    765966 ±  3%     -13.1%     665650 ±  3%  sched_debug.cpu.max_idle_balance_cost.avg
    296999 ±  5%     -22.6%     229736 ±  5%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.53 ±  6%     -10.2%       0.48 ±  3%  sched_debug.cpu.nr_running.stddev
    467856 ±  5%    +154.2%    1189068 ±  4%  sched_debug.cpu.nr_switches.avg
   1091334 ± 35%    +458.8%    6098488 ± 11%  sched_debug.cpu.nr_switches.max
    156457 ± 39%    +579.7%    1063429 ± 12%  sched_debug.cpu.nr_switches.stddev
 1.522e+10 ±  2%     +33.0%  2.025e+10 ±  4%  perf-stat.i.branch-instructions
  26461017 ±  8%     +25.3%   33152871 ±  4%  perf-stat.i.branch-misses
  80419215 ±  6%     +22.5%   98514949        perf-stat.i.cache-references
   2950621 ±  6%    +154.2%    7499768 ±  4%  perf-stat.i.context-switches
      8.86           -23.8%       6.75        perf-stat.i.cpi
      4890 ± 16%     -56.2%       2140 ± 15%  perf-stat.i.cpu-migrations
     44725 ±  7%     -16.0%      37555 ±  3%  perf-stat.i.cycles-between-cache-misses
 7.212e+10 ±  2%     +31.4%   9.48e+10 ±  4%  perf-stat.i.instructions
      0.12 ±  3%     +32.7%       0.17 ±  7%  perf-stat.i.ipc
     15.37 ±  6%    +154.2%      39.06 ±  4%  perf-stat.i.metric.K/sec
      8.17           -23.4%       6.26        perf-stat.overall.cpi
      0.12           +30.5%       0.16        perf-stat.overall.ipc
 1.498e+10 ±  2%     +33.0%  1.993e+10 ±  4%  perf-stat.ps.branch-instructions
  26034509 ±  8%     +25.3%   32622824 ±  4%  perf-stat.ps.branch-misses
  79145687 ±  6%     +22.5%   96950950        perf-stat.ps.cache-references
   2903516 ±  6%    +154.2%    7379460 ±  4%  perf-stat.ps.context-switches
      4802 ± 16%     -56.3%       2099 ± 15%  perf-stat.ps.cpu-migrations
 7.098e+10 ±  2%     +31.4%   9.33e+10 ±  4%  perf-stat.ps.instructions
  4.42e+12           +30.9%  5.787e+12        perf-stat.total.instructions


***************************************************************************************************
lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-skl-fpga01/alarm/stress-ng/60s

commit: 
  0d4eaf8caf ("sched/fair: Do not balance task to a throttled cfs_rq")
  15bf8c7b35 ("sched/fair: Forfeit vruntime on yield")

0d4eaf8caf8cd633 15bf8c7b35e31295b26241425c0 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     13051 ± 26%     +40.8%      18378 ±  6%  numa-meminfo.node1.PageTables
    230411 ± 15%     -24.0%     175131 ± 19%  numa-numastat.node0.local_node
    122.83 ± 10%     +24.6%     153.00 ±  9%  sched_debug.cfs_rq:/.runnable_avg.min
    229700 ± 15%     -24.0%     174608 ± 19%  numa-vmstat.node0.numa_local
      3264 ± 26%     +40.4%       4584 ±  6%  numa-vmstat.node1.nr_page_table_pages
     34.64            -0.5       34.15        turbostat.C1%
      1.25 ±  2%      -0.3        0.92 ±  6%  turbostat.C1E%
 1.227e+08            +1.3%  1.243e+08        stress-ng.alarm.ops
   2044889            +1.3%    2071190        stress-ng.alarm.ops_per_sec
  17839864           +33.4%   23790385        stress-ng.time.involuntary_context_switches
      5045            +1.6%       5127        stress-ng.time.percent_of_cpu_this_job_got
      1938            +1.8%       1972        stress-ng.time.system_time
      1094            +1.4%       1109        stress-ng.time.user_time
 1.402e+10            +1.2%  1.419e+10        perf-stat.i.branch-instructions
 9.466e+08            +2.1%  9.661e+08        perf-stat.i.cache-references
   6720093            +2.3%    6874753        perf-stat.i.context-switches
  2.01e+11            +1.4%  2.038e+11        perf-stat.i.cpu-cycles
   2173629            +3.4%    2247122        perf-stat.i.cpu-migrations
 6.961e+10            +1.2%  7.047e+10        perf-stat.i.instructions
     85.51            +2.6%      87.75        perf-stat.i.metric.K/sec
 1.373e+10            +1.2%   1.39e+10        perf-stat.ps.branch-instructions
 9.333e+08            +2.1%   9.53e+08        perf-stat.ps.cache-references
   6626920            +2.3%    6780505        perf-stat.ps.context-switches
 1.979e+11            +1.4%  2.007e+11        perf-stat.ps.cpu-cycles
   2146232            +3.4%    2219100        perf-stat.ps.cpu-migrations
  6.82e+10            +1.2%  6.905e+10        perf-stat.ps.instructions
     16.99            -0.7       16.30        perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      0.63            -0.4        0.25 ±100%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.do_nanosleep
      0.76 ± 15%      -0.3        0.43 ± 73%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
     33.81            -0.3       33.51        perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
     32.55            -0.3       32.25        perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
     32.48            -0.3       32.19        perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      1.06            -0.1        0.93        perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.do_nanosleep.hrtimer_nanosleep
      5.84            -0.1        5.74        perf-profile.calltrace.cycles-pp.schedule.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      5.66            -0.1        5.56        perf-profile.calltrace.cycles-pp.__schedule.schedule.do_nanosleep.hrtimer_nanosleep.common_nsleep
      8.87            -0.1        8.79        perf-profile.calltrace.cycles-pp.__x64_sys_clock_nanosleep.do_syscall_64.entry_SYSCALL_64_after_hwframe
      8.02            -0.1        7.94        perf-profile.calltrace.cycles-pp.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep.do_syscall_64
      8.38            -0.1        8.31        perf-profile.calltrace.cycles-pp.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep.do_syscall_64.entry_SYSCALL_64_after_hwframe
      8.42            -0.1        8.35        perf-profile.calltrace.cycles-pp.common_nsleep.__x64_sys_clock_nanosleep.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.92            +0.0        1.95        perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule
      1.40            +0.0        1.44        perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.enqueue_task.ttwu_do_activate.sched_ttwu_pending
      1.18            +0.0        1.22        perf-profile.calltrace.cycles-pp.select_idle_cpu.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up
      0.68            +0.0        0.72        perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up.complete_signal
      2.48            +0.0        2.52        perf-profile.calltrace.cycles-pp.try_to_block_task.__schedule.schedule.do_nanosleep.hrtimer_nanosleep
      2.10            +0.0        2.14        perf-profile.calltrace.cycles-pp.try_to_wake_up.complete_signal.__send_signal_locked.do_send_sig_info.kill_pid_info_type
      2.38            +0.0        2.42        perf-profile.calltrace.cycles-pp.dequeue_task_fair.try_to_block_task.__schedule.schedule.do_nanosleep
      0.99            +0.0        1.03        perf-profile.calltrace.cycles-pp.select_task_rq_fair.select_task_rq.try_to_wake_up.complete_signal.__send_signal_locked
      2.32            +0.0        2.36        perf-profile.calltrace.cycles-pp.complete_signal.__send_signal_locked.do_send_sig_info.kill_pid_info_type.kill_something_info
      2.24            +0.0        2.28        perf-profile.calltrace.cycles-pp.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule.schedule
      3.46            +0.0        3.50        perf-profile.calltrace.cycles-pp.__send_signal_locked.do_send_sig_info.kill_pid_info_type.kill_something_info.__x64_sys_kill
      1.79            +0.0        1.84        perf-profile.calltrace.cycles-pp.enqueue_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue
      1.73            +0.1        1.78        perf-profile.calltrace.cycles-pp.enqueue_task_fair.enqueue_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue
      1.06            +0.1        1.11        perf-profile.calltrace.cycles-pp.select_task_rq.try_to_wake_up.complete_signal.__send_signal_locked.do_send_sig_info
      2.36            +0.1        2.41        perf-profile.calltrace.cycles-pp.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle
      4.26            +0.1        4.32        perf-profile.calltrace.cycles-pp.kill_pid_info_type.kill_something_info.__x64_sys_kill.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.72            +0.1        6.78        perf-profile.calltrace.cycles-pp.alarm
      0.73            +0.1        0.80        perf-profile.calltrace.cycles-pp.pick_task_fair.pick_next_task_fair.__pick_next_task.__schedule.schedule
      2.86            +0.1        2.92        perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry
      3.26            +0.1        3.33        perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary
      3.72            +0.1        3.80        perf-profile.calltrace.cycles-pp.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      0.85            +0.1        0.94        perf-profile.calltrace.cycles-pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.__x64_sys_sched_yield
      0.88            +0.1        0.97        perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64
      2.02            +0.1        2.15        perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      1.54            +0.1        1.67        perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.57            +0.1        1.71        perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      2.88            +0.2        3.04        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_yield
      2.34            +0.2        2.51        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      5.50            +0.2        5.68        perf-profile.calltrace.cycles-pp.__sched_yield
      0.52            +0.5        1.04        perf-profile.calltrace.cycles-pp.select_idle_core.select_idle_cpu.select_idle_sibling.select_task_rq_fair.select_task_rq
     34.13            -0.3       33.82        perf-profile.children.cycles-pp.cpuidle_idle_call
     32.84            -0.3       32.54        perf-profile.children.cycles-pp.cpuidle_enter
     32.79            -0.3       32.50        perf-profile.children.cycles-pp.cpuidle_enter_state
     13.10            -0.3       12.81        perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.78 ± 13%      -0.2        0.58 ± 20%  perf-profile.children.cycles-pp.intel_idle
      8.88            -0.1        8.80        perf-profile.children.cycles-pp.__x64_sys_clock_nanosleep
      8.05            -0.1        7.97        perf-profile.children.cycles-pp.do_nanosleep
      8.39            -0.1        8.31        perf-profile.children.cycles-pp.hrtimer_nanosleep
      8.46            -0.1        8.39        perf-profile.children.cycles-pp.common_nsleep
      1.22            -0.1        1.17        perf-profile.children.cycles-pp.pick_task_fair
      3.10            -0.0        3.06        perf-profile.children.cycles-pp.__pick_next_task
      2.60            -0.0        2.56        perf-profile.children.cycles-pp.pick_next_task_fair
      0.10 ±  3%      -0.0        0.08 ±  5%  perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
      0.09 ±  5%      +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.sigprocmask
      0.91            +0.0        0.94        perf-profile.children.cycles-pp.switch_mm_irqs_off
      1.85            +0.0        1.89        perf-profile.children.cycles-pp.enqueue_entity
      2.41            +0.0        2.45        perf-profile.children.cycles-pp.enqueue_task
      2.39            +0.0        2.43        perf-profile.children.cycles-pp.dequeue_task_fair
      2.48            +0.0        2.52        perf-profile.children.cycles-pp.try_to_block_task
      1.42            +0.0        1.46        perf-profile.children.cycles-pp.available_idle_cpu
      2.32            +0.0        2.37        perf-profile.children.cycles-pp.complete_signal
      2.32            +0.0        2.36        perf-profile.children.cycles-pp.enqueue_task_fair
      3.46            +0.0        3.51        perf-profile.children.cycles-pp.__send_signal_locked
      4.27            +0.1        4.32        perf-profile.children.cycles-pp.kill_pid_info_type
      4.03            +0.1        4.08        perf-profile.children.cycles-pp.do_send_sig_info
      6.84            +0.1        6.90        perf-profile.children.cycles-pp.alarm
      3.09            +0.1        3.15        perf-profile.children.cycles-pp.ttwu_do_activate
      1.95            +0.1        2.02        perf-profile.children.cycles-pp.select_idle_core
      2.23            +0.1        2.30        perf-profile.children.cycles-pp.select_idle_cpu
      3.12            +0.1        3.19        perf-profile.children.cycles-pp.sched_ttwu_pending
      3.58            +0.1        3.65        perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      2.62            +0.1        2.70        perf-profile.children.cycles-pp.select_idle_sibling
      6.14            +0.1        6.22        perf-profile.children.cycles-pp.try_to_wake_up
      3.78            +0.1        3.86        perf-profile.children.cycles-pp.flush_smp_call_function_queue
      3.05            +0.1        3.14        perf-profile.children.cycles-pp.select_task_rq_fair
      3.17            +0.1        3.26        perf-profile.children.cycles-pp.select_task_rq
      2.03            +0.1        2.17        perf-profile.children.cycles-pp.__x64_sys_sched_yield
      5.56            +0.2        5.75        perf-profile.children.cycles-pp.__sched_yield
      0.78 ± 13%      -0.2        0.58 ± 20%  perf-profile.self.cycles-pp.intel_idle
      0.22 ±  2%      +0.0        0.23        perf-profile.self.cycles-pp.exit_to_user_mode_loop
      0.80            +0.0        0.83        perf-profile.self.cycles-pp.switch_mm_irqs_off
      1.40            +0.0        1.45        perf-profile.self.cycles-pp.available_idle_cpu





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki