This ensures hot VMAs get scanned on priority irresepctive of their
access by current task.
Suggested-by: Bharata B Rao <bharata@amd.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@amd.com>
---
kernel/sched/fair.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3ae2a1a3ef5c..6529da7f370a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2971,8 +2971,22 @@ static inline bool vma_test_access_pid_history(struct vm_area_struct *vma)
return test_bit(pid_bit, &pids);
}
+static inline bool vma_accessed_recent(struct vm_area_struct *vma)
+{
+ unsigned long *pids, pid_idx;
+
+ pid_idx = vma->numab_state->access_pid_idx;
+ pids = vma->numab_state->access_pids + pid_idx;
+
+ return (bitmap_weight(pids, BITS_PER_LONG) >= 1);
+}
+
static bool vma_is_accessed(struct vm_area_struct *vma)
{
+ /* Check at least one task had accessed VMA recently. */
+ if (vma_accessed_recent(vma))
+ return true;
+
/* Check if the current task had historically accessed VMA. */
if (vma_test_access_pid_history(vma))
return true;
--
2.34.1
Hello,
kernel test robot noticed a -33.6% improvement of autonuma-benchmark.numa02.seconds on:
commit: af46f3c9ca2d16485912f8b9c896ef48bbfe1388 ("[RFC PATCH V1 5/6] sched/numa: Allow recently accessed VMAs to be scanned")
url: https://github.com/intel-lab-lkp/linux/commits/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2f88c8e802c8b128a155976631f4eb2ce4f3c805
patch link: https://lore.kernel.org/all/109ca1ea59b9dd6f2daf7b7fbc74e83ae074fbdf.1693287931.git.raghavendra.kt@amd.com/
patch subject: [RFC PATCH V1 5/6] sched/numa: Allow recently accessed VMAs to be scanned
testcase: autonuma-benchmark
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
parameters:
iterations: 4x
test: numa01_THREAD_ALLOC
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230910/202309102311.84b42068-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp6/numa01_THREAD_ALLOC/autonuma-benchmark
commit:
167773d1dd ("sched/numa: Increase tasks' access history")
af46f3c9ca ("sched/numa: Allow recently accessed VMAs to be scanned")
167773d1ddb5ffdd af46f3c9ca2d16485912f8b9c89
---------------- ---------------------------
%stddev %change %stddev
\ | \
2.534e+10 ± 10% -13.0% 2.204e+10 ± 7% cpuidle..time
26431366 ± 10% -13.2% 22948978 ± 7% cpuidle..usage
0.15 ± 4% -0.0 0.12 ± 3% mpstat.cpu.all.soft%
2.92 ± 3% +0.4 3.32 ± 4% mpstat.cpu.all.sys%
2243 ± 2% -12.7% 1957 ± 3% uptime.boot
29811 ± 8% -11.1% 26507 ± 6% uptime.idle
5.32 ± 79% -64.2% 1.91 ± 60% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
2.70 ± 18% +37.8% 3.72 ± 9% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
0.64 ±137% +26644.2% 169.91 ±220% perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
0.08 ± 20% +0.0 0.12 ± 10% perf-profile.children.cycles-pp.terminate_walk
0.10 ± 25% +0.0 0.14 ± 10% perf-profile.children.cycles-pp.wake_up_q
0.06 ± 50% +0.0 0.10 ± 10% perf-profile.children.cycles-pp.vfs_readlink
0.15 ± 36% +0.1 0.22 ± 13% perf-profile.children.cycles-pp.readlink
1.31 ± 19% +0.4 1.69 ± 12% perf-profile.children.cycles-pp.unmap_vmas
2.46 ± 19% +0.5 2.99 ± 4% perf-profile.children.cycles-pp.exit_mmap
311653 ± 10% -23.7% 237884 ± 9% turbostat.C1E
26018024 ± 10% -13.1% 22597563 ± 7% turbostat.C6
6.41 ± 9% -13.6% 5.54 ± 8% turbostat.CPU%c1
2.47 ± 11% +36.0% 3.36 ± 6% turbostat.CPU%c6
2.881e+08 ± 2% -12.8% 2.513e+08 ± 3% turbostat.IRQ
212.86 +2.8% 218.84 turbostat.RAMWatt
341.49 -4.1% 327.42 ± 2% autonuma-benchmark.numa01.seconds
186.67 ± 6% -27.1% 136.12 ± 7% autonuma-benchmark.numa01_THREAD_ALLOC.seconds
21.17 ± 7% -33.6% 14.05 autonuma-benchmark.numa02.seconds
2200 ± 2% -13.0% 1913 ± 3% autonuma-benchmark.time.elapsed_time
2200 ± 2% -13.0% 1913 ± 3% autonuma-benchmark.time.elapsed_time.max
1159380 ± 2% -12.0% 1019969 ± 3% autonuma-benchmark.time.involuntary_context_switches
3363550 -5.0% 3194802 autonuma-benchmark.time.minor_page_faults
243046 ± 2% -13.3% 210725 ± 3% autonuma-benchmark.time.user_time
7494239 -6.8% 6984234 proc-vmstat.numa_hit
118829 ± 6% +13.7% 135136 ± 6% proc-vmstat.numa_huge_pte_updates
6207618 -8.4% 5686795 ± 2% proc-vmstat.numa_local
8834573 ± 3% +20.2% 10616944 ± 4% proc-vmstat.numa_pages_migrated
61094857 ± 6% +13.6% 69409875 ± 6% proc-vmstat.numa_pte_updates
8602789 -9.0% 7827793 ± 2% proc-vmstat.pgfault
8834573 ± 3% +20.2% 10616944 ± 4% proc-vmstat.pgmigrate_success
371818 -10.1% 334391 ± 2% proc-vmstat.pgreuse
17200 ± 3% +20.3% 20686 ± 4% proc-vmstat.thp_migration_success
16401792 ± 2% -12.7% 14322816 ± 3% proc-vmstat.unevictable_pgs_scanned
1.606e+08 ± 2% -13.8% 1.385e+08 ± 3% sched_debug.cfs_rq:/.avg_vruntime.avg
1.666e+08 ± 2% -14.0% 1.433e+08 ± 3% sched_debug.cfs_rq:/.avg_vruntime.max
1.364e+08 ± 2% -11.7% 1.204e+08 ± 3% sched_debug.cfs_rq:/.avg_vruntime.min
4795327 ± 7% -17.5% 3956991 ± 7% sched_debug.cfs_rq:/.avg_vruntime.stddev
1.606e+08 ± 2% -13.8% 1.385e+08 ± 3% sched_debug.cfs_rq:/.min_vruntime.avg
1.666e+08 ± 2% -14.0% 1.433e+08 ± 3% sched_debug.cfs_rq:/.min_vruntime.max
1.364e+08 ± 2% -11.7% 1.204e+08 ± 3% sched_debug.cfs_rq:/.min_vruntime.min
4795327 ± 7% -17.5% 3956991 ± 7% sched_debug.cfs_rq:/.min_vruntime.stddev
364.96 ± 6% +16.6% 425.70 ± 5% sched_debug.cfs_rq:/.util_est_enqueued.avg
1099114 -13.0% 956021 ± 2% sched_debug.cpu.clock.avg
1099477 -13.0% 956344 ± 2% sched_debug.cpu.clock.max
1098702 -13.0% 955643 ± 2% sched_debug.cpu.clock.min
1080712 -13.0% 940415 ± 2% sched_debug.cpu.clock_task.avg
1085309 -13.1% 943557 ± 2% sched_debug.cpu.clock_task.max
1064613 -13.0% 925993 ± 2% sched_debug.cpu.clock_task.min
28890 ± 3% -11.7% 25504 ± 3% sched_debug.cpu.curr->pid.avg
35200 -11.0% 31344 sched_debug.cpu.curr->pid.max
862245 ± 3% -8.7% 786984 sched_debug.cpu.max_idle_balance_cost.max
74019 ± 9% -28.2% 53158 ± 7% sched_debug.cpu.max_idle_balance_cost.stddev
15507 -11.9% 13667 ± 2% sched_debug.cpu.nr_switches.avg
57616 ± 6% -19.0% 46642 ± 8% sched_debug.cpu.nr_switches.max
8460 ± 6% -12.9% 7368 ± 5% sched_debug.cpu.nr_switches.stddev
1098689 -13.0% 955631 ± 2% sched_debug.cpu_clk
1097964 -13.0% 954907 ± 2% sched_debug.ktime
0.00 +15.0% 0.00 ± 2% sched_debug.rt_rq:.rt_nr_migratory.avg
0.03 +15.0% 0.03 ± 2% sched_debug.rt_rq:.rt_nr_migratory.max
0.00 +15.0% 0.00 ± 2% sched_debug.rt_rq:.rt_nr_migratory.stddev
0.00 +15.0% 0.00 ± 2% sched_debug.rt_rq:.rt_nr_running.avg
0.03 +15.0% 0.03 ± 2% sched_debug.rt_rq:.rt_nr_running.max
0.00 +15.0% 0.00 ± 2% sched_debug.rt_rq:.rt_nr_running.stddev
1099511 -13.0% 956501 ± 2% sched_debug.sched_clk
1162 ± 2% +15.2% 1339 ± 3% perf-stat.i.MPKI
1.656e+08 +3.6% 1.716e+08 perf-stat.i.branch-instructions
0.95 ± 4% +0.1 1.03 perf-stat.i.branch-miss-rate%
1538367 ± 6% +11.0% 1707146 ± 2% perf-stat.i.branch-misses
6.327e+08 ± 3% +18.7% 7.513e+08 ± 4% perf-stat.i.cache-misses
8.282e+08 ± 2% +15.2% 9.542e+08 ± 3% perf-stat.i.cache-references
658.12 ± 3% -11.4% 582.98 ± 6% perf-stat.i.cycles-between-cache-misses
2.201e+08 +2.8% 2.263e+08 perf-stat.i.dTLB-loads
579771 +0.9% 584915 perf-stat.i.dTLB-store-misses
1.122e+08 +1.4% 1.138e+08 perf-stat.i.dTLB-stores
8.278e+08 +3.1% 8.538e+08 perf-stat.i.instructions
13.98 ± 2% +14.3% 15.98 ± 3% perf-stat.i.metric.M/sec
3797 +4.3% 3958 perf-stat.i.minor-faults
258749 +8.0% 279391 ± 2% perf-stat.i.node-load-misses
261169 ± 2% +7.4% 280417 ± 5% perf-stat.i.node-loads
40.91 ± 3% -3.0 37.89 ± 3% perf-stat.i.node-store-miss-rate%
3.841e+08 ± 6% +27.6% 4.902e+08 ± 7% perf-stat.i.node-stores
3797 +4.3% 3958 perf-stat.i.page-faults
998.24 ± 2% +11.8% 1116 ± 2% perf-stat.overall.MPKI
463.91 -3.2% 448.99 perf-stat.overall.cpi
604.23 ± 3% -15.9% 508.08 ± 4% perf-stat.overall.cycles-between-cache-misses
0.00 +3.3% 0.00 perf-stat.overall.ipc
39.20 ± 5% -4.5 34.70 ± 6% perf-stat.overall.node-store-miss-rate%
1.636e+08 +3.8% 1.698e+08 perf-stat.ps.branch-instructions
1499760 ± 6% +11.1% 1665855 ± 2% perf-stat.ps.branch-misses
6.296e+08 ± 3% +19.0% 7.489e+08 ± 4% perf-stat.ps.cache-misses
8.178e+08 ± 2% +15.5% 9.447e+08 ± 3% perf-stat.ps.cache-references
2.18e+08 +2.9% 2.244e+08 perf-stat.ps.dTLB-loads
578148 +0.9% 583328 perf-stat.ps.dTLB-store-misses
1.117e+08 +1.4% 1.132e+08 perf-stat.ps.dTLB-stores
8.192e+08 +3.3% 8.46e+08 perf-stat.ps.instructions
3744 +4.3% 3906 perf-stat.ps.minor-faults
255974 +8.2% 276924 ± 2% perf-stat.ps.node-load-misses
263796 ± 2% +7.7% 284110 ± 5% perf-stat.ps.node-loads
3.82e+08 ± 6% +27.7% 4.879e+08 ± 7% perf-stat.ps.node-stores
3744 +4.3% 3906 perf-stat.ps.page-faults
1.805e+12 ± 2% -10.1% 1.622e+12 ± 2% perf-stat.total.instructions
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On 9/10/2023 8:59 PM, kernel test robot wrote: > 341.49 -4.1% 327.42 ± 2% autonuma-benchmark.numa01.seconds > 186.67 ± 6% -27.1% 136.12 ± 7% autonuma-benchmark.numa01_THREAD_ALLOC.seconds > 21.17 ± 7% -33.6% 14.05 autonuma-benchmark.numa02.seconds > 2200 ± 2% -13.0% 1913 ± 3% autonuma-benchmark.time.elapsed_time Hello Oliver/Kernel test robot, Thank yo alot for testing. Results are impressive. Can I take this result as positive for whole series too? Mel/PeterZ, Whenever time permits can you please let us know your comments/concerns on the series? Thanks and Regards - Raghu
hi, Raghu,
On Mon, Sep 11, 2023 at 04:55:56PM +0530, Raghavendra K T wrote:
> On 9/10/2023 8:59 PM, kernel test robot wrote:
> > 341.49 -4.1% 327.42 ± 2% autonuma-benchmark.numa01.seconds
> > 186.67 ± 6% -27.1% 136.12 ± 7% autonuma-benchmark.numa01_THREAD_ALLOC.seconds
> > 21.17 ± 7% -33.6% 14.05 autonuma-benchmark.numa02.seconds
> > 2200 ± 2% -13.0% 1913 ± 3% autonuma-benchmark.time.elapsed_time
>
> Hello Oliver/Kernel test robot,
> Thank yo alot for testing.
>
> Results are impressive. Can I take this result as
> positive for whole series too?
FYI. we applied your patch set like below:
68cfe9439a1ba (linux-review/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007) sched/numa: Allow scanning of shared VMAs
af46f3c9ca2d1 sched/numa: Allow recently accessed VMAs to be scanned
167773d1ddb5f sched/numa: Increase tasks' access history
fc769221b2306 sched/numa: Remove unconditional scan logic using mm numa_scan_seq
1ef5cbb92bdb3 sched/numa: Add disjoint vma unconditional scan logic
2a806eab1c2e1 sched/numa: Move up the access pid reset logic
2f88c8e802c8b (tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well
in our tests, we also tested the 68cfe9439a1ba, if comparing it to af46f3c9ca2d1:
=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp6/numa01_THREAD_ALLOC/autonuma-benchmark
commit:
af46f3c9ca ("sched/numa: Allow recently accessed VMAs to be scanned")
68cfe9439a ("sched/numa: Allow scanning of shared VMA")
af46f3c9ca2d1648 68cfe9439a1baa642e05883fa64
---------------- ---------------------------
%stddev %change %stddev
\ | \
327.42 ± 2% -1.1% 323.83 ± 3% autonuma-benchmark.numa01.seconds
136.12 ± 7% -25.1% 101.90 ± 2% autonuma-benchmark.numa01_THREAD_ALLOC.seconds
14.05 +1.5% 14.26 autonuma-benchmark.numa02.seconds
1913 ± 3% -7.9% 1763 ± 2% autonuma-benchmark.time.elapsed_time
below is the full comparison FYI.
af46f3c9ca2d1648 68cfe9439a1baa642e05883fa64
---------------- ---------------------------
%stddev %change %stddev
\ | \
36437 ± 9% +20.4% 43867 ± 10% meminfo.Mapped
0.02 ± 17% +0.0 0.03 ± 8% mpstat.cpu.all.iowait%
71.00 ± 2% +6.3% 75.50 turbostat.PkgTmp
3956991 ± 7% -15.0% 3361998 ± 5% sched_debug.cfs_rq:/.avg_vruntime.stddev
3956991 ± 7% -15.0% 3361997 ± 5% sched_debug.cfs_rq:/.min_vruntime.stddev
-30.18 +27.8% -38.56 sched_debug.cpu.nr_uninterruptible.min
1913 ± 3% -7.9% 1763 ± 2% time.elapsed_time
1913 ± 3% -7.9% 1763 ± 2% time.elapsed_time.max
3194802 -2.4% 3117907 time.minor_page_faults
210725 ± 3% -8.7% 192483 ± 3% time.user_time
327.42 ± 2% -1.1% 323.83 ± 3% autonuma-benchmark.numa01.seconds
136.12 ± 7% -25.1% 101.90 ± 2% autonuma-benchmark.numa01_THREAD_ALLOC.seconds
14.05 +1.5% 14.26 autonuma-benchmark.numa02.seconds
1913 ± 3% -7.9% 1763 ± 2% autonuma-benchmark.time.elapsed_time
1913 ± 3% -7.9% 1763 ± 2% autonuma-benchmark.time.elapsed_time.max
3194802 -2.4% 3117907 autonuma-benchmark.time.minor_page_faults
210725 ± 3% -8.7% 192483 ± 3% autonuma-benchmark.time.user_time
1.33 ± 91% -88.0% 0.16 ± 14% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.09 ±194% +3204.2% 3.03 ± 66% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
3.72 ± 9% -24.8% 2.80 ± 21% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
41.00 ±147% +2060.2% 885.67 ±105% perf-sched.wait_and_delay.count.io_schedule.migration_entry_wait_on_locked.__handle_mm_fault.handle_mm_fault
18.61 ± 18% -28.5% 13.30 ± 21% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
7.84 ±100% +354.6% 35.66 ± 89% perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
9285 ± 8% +20.1% 11152 ± 10% proc-vmstat.nr_mapped
6984234 -4.0% 6706018 proc-vmstat.numa_hit
5686795 ± 2% -5.2% 5390176 proc-vmstat.numa_local
10616944 ± 4% +15.7% 12279801 ± 3% proc-vmstat.numa_pages_migrated
7827793 ± 2% -5.2% 7421440 ± 2% proc-vmstat.pgfault
10616944 ± 4% +15.7% 12279801 ± 3% proc-vmstat.pgmigrate_success
334391 ± 2% -8.6% 305628 ± 2% proc-vmstat.pgreuse
20686 ± 4% +15.7% 23939 ± 3% proc-vmstat.thp_migration_success
14322816 ± 3% -8.2% 13147392 ± 2% proc-vmstat.unevictable_pgs_scanned
1339 ± 3% +8.6% 1454 ± 2% perf-stat.i.MPKI
1.716e+08 +2.8% 1.764e+08 perf-stat.i.branch-instructions
1.03 +0.1 1.11 ± 3% perf-stat.i.branch-miss-rate%
1707146 ± 2% +9.5% 1869960 ± 4% perf-stat.i.branch-misses
7.513e+08 ± 4% +11.1% 8.351e+08 ± 3% perf-stat.i.cache-misses
9.542e+08 ± 3% +8.9% 1.04e+09 ± 3% perf-stat.i.cache-references
534.57 -1.5% 526.34 perf-stat.i.cpi
158.57 +1.6% 161.11 perf-stat.i.cpu-migrations
582.98 ± 6% -11.4% 516.40 ± 3% perf-stat.i.cycles-between-cache-misses
2.263e+08 +2.2% 2.312e+08 perf-stat.i.dTLB-loads
8.538e+08 +2.5% 8.753e+08 perf-stat.i.instructions
15.98 ± 3% +8.9% 17.40 ± 3% perf-stat.i.metric.M/sec
3958 +3.0% 4075 perf-stat.i.minor-faults
37.89 ± 3% -3.6 34.28 ± 5% perf-stat.i.node-store-miss-rate%
2.585e+08 ± 4% -7.7% 2.385e+08 ± 3% perf-stat.i.node-store-misses
4.902e+08 ± 7% +21.1% 5.937e+08 ± 7% perf-stat.i.node-stores
3958 +2.9% 4075 perf-stat.i.page-faults
1116 ± 2% +6.2% 1186 ± 2% perf-stat.overall.MPKI
0.98 +0.1 1.04 ± 3% perf-stat.overall.branch-miss-rate%
448.99 -2.8% 436.60 perf-stat.overall.cpi
508.08 ± 4% -10.1% 456.56 ± 4% perf-stat.overall.cycles-between-cache-misses
0.00 +2.8% 0.00 perf-stat.overall.ipc
34.70 ± 6% -5.7 29.02 ± 7% perf-stat.overall.node-store-miss-rate%
1.698e+08 +2.8% 1.746e+08 perf-stat.ps.branch-instructions
1665855 ± 2% +9.5% 1824511 ± 3% perf-stat.ps.branch-misses
7.489e+08 ± 4% +10.9% 8.306e+08 ± 4% perf-stat.ps.cache-misses
9.447e+08 ± 3% +8.9% 1.029e+09 ± 3% perf-stat.ps.cache-references
158.05 +1.4% 160.31 perf-stat.ps.cpu-migrations
2.244e+08 +2.1% 2.292e+08 perf-stat.ps.dTLB-loads
8.46e+08 +2.5% 8.672e+08 perf-stat.ps.instructions
3906 +2.9% 4020 perf-stat.ps.minor-faults
284110 ± 5% +12.0% 318166 ± 2% perf-stat.ps.node-loads
2.584e+08 ± 3% -7.3% 2.395e+08 ± 3% perf-stat.ps.node-store-misses
4.879e+08 ± 7% +20.6% 5.883e+08 ± 7% perf-stat.ps.node-stores
3906 +2.9% 4020 perf-stat.ps.page-faults
1.622e+12 ± 2% -5.7% 1.53e+12 ± 2% perf-stat.total.instructions
6.29 ± 13% -2.2 4.11 ± 24% perf-profile.calltrace.cycles-pp.read
6.22 ± 13% -2.2 4.05 ± 24% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
6.21 ± 13% -2.2 4.04 ± 24% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
6.04 ± 13% -2.1 3.90 ± 24% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
6.09 ± 13% -2.1 3.96 ± 24% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
3.68 ± 17% -1.4 2.25 ± 36% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.22 ± 16% -1.4 1.79 ± 27% perf-profile.calltrace.cycles-pp.open64
3.66 ± 16% -1.4 2.24 ± 36% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
3.88 ± 13% -1.4 2.49 ± 20% perf-profile.calltrace.cycles-pp.seq_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.83 ± 13% -1.4 2.48 ± 19% perf-profile.calltrace.cycles-pp.seq_read_iter.seq_read.vfs_read.ksys_read.do_syscall_64
3.03 ± 17% -1.3 1.71 ± 26% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
3.09 ± 17% -1.3 1.77 ± 27% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
3.08 ± 17% -1.3 1.76 ± 27% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
3.04 ± 17% -1.3 1.73 ± 26% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
2.61 ± 14% -1.0 1.60 ± 20% perf-profile.calltrace.cycles-pp.proc_single_show.seq_read_iter.seq_read.vfs_read.ksys_read
2.58 ± 13% -1.0 1.58 ± 21% perf-profile.calltrace.cycles-pp.do_task_stat.proc_single_show.seq_read_iter.seq_read.vfs_read
0.99 ± 17% -0.5 0.46 ± 75% perf-profile.calltrace.cycles-pp.__xstat64
0.97 ± 18% -0.5 0.46 ± 75% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__xstat64
0.96 ± 18% -0.5 0.46 ± 75% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64
0.95 ± 18% -0.5 0.45 ± 75% perf-profile.calltrace.cycles-pp.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64
0.92 ± 19% -0.5 0.45 ± 75% perf-profile.calltrace.cycles-pp.vfs_fstatat.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64
0.72 ± 12% -0.3 0.40 ± 71% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
7.12 ± 13% -2.4 4.73 ± 22% perf-profile.children.cycles-pp.ksys_read
6.91 ± 12% -2.3 4.57 ± 23% perf-profile.children.cycles-pp.vfs_read
6.30 ± 13% -2.2 4.12 ± 24% perf-profile.children.cycles-pp.read
5.34 ± 12% -1.9 3.46 ± 25% perf-profile.children.cycles-pp.seq_read_iter
4.65 ± 13% -1.7 2.98 ± 31% perf-profile.children.cycles-pp.do_sys_openat2
4.67 ± 13% -1.7 3.01 ± 30% perf-profile.children.cycles-pp.__x64_sys_openat
4.43 ± 13% -1.6 2.86 ± 29% perf-profile.children.cycles-pp.do_filp_open
4.41 ± 13% -1.6 2.85 ± 29% perf-profile.children.cycles-pp.path_openat
3.23 ± 16% -1.4 1.80 ± 27% perf-profile.children.cycles-pp.open64
3.89 ± 13% -1.4 2.49 ± 20% perf-profile.children.cycles-pp.seq_read
2.61 ± 14% -1.0 1.60 ± 20% perf-profile.children.cycles-pp.proc_single_show
2.59 ± 13% -1.0 1.58 ± 21% perf-profile.children.cycles-pp.do_task_stat
1.66 ± 12% -0.7 0.96 ± 36% perf-profile.children.cycles-pp.lookup_fast
1.43 ± 16% -0.6 0.86 ± 29% perf-profile.children.cycles-pp.walk_component
1.50 ± 14% -0.5 0.96 ± 30% perf-profile.children.cycles-pp.link_path_walk
1.24 ± 10% -0.5 0.77 ± 32% perf-profile.children.cycles-pp.do_open
1.53 ± 7% -0.4 1.08 ± 19% perf-profile.children.cycles-pp.sched_setaffinity
1.02 ± 15% -0.4 0.64 ± 33% perf-profile.children.cycles-pp.__xstat64
1.10 ± 18% -0.4 0.72 ± 31% perf-profile.children.cycles-pp.__do_sys_newstat
1.09 ± 18% -0.4 0.73 ± 30% perf-profile.children.cycles-pp.path_lookupat
1.10 ± 18% -0.4 0.74 ± 29% perf-profile.children.cycles-pp.filename_lookup
1.07 ± 19% -0.4 0.72 ± 32% perf-profile.children.cycles-pp.vfs_fstatat
0.97 ± 9% -0.4 0.62 ± 34% perf-profile.children.cycles-pp.do_dentry_open
0.82 ± 19% -0.4 0.48 ± 34% perf-profile.children.cycles-pp.__d_lookup_rcu
0.94 ± 18% -0.3 0.61 ± 35% perf-profile.children.cycles-pp.vfs_statx
0.61 ± 11% -0.3 0.33 ± 32% perf-profile.children.cycles-pp.pid_revalidate
0.78 ± 14% -0.3 0.50 ± 29% perf-profile.children.cycles-pp.tlb_finish_mmu
0.64 ± 15% -0.3 0.37 ± 29% perf-profile.children.cycles-pp.getdents64
0.62 ± 16% -0.3 0.35 ± 28% perf-profile.children.cycles-pp.proc_pid_readdir
0.64 ± 15% -0.3 0.37 ± 29% perf-profile.children.cycles-pp.__x64_sys_getdents64
0.64 ± 15% -0.3 0.37 ± 29% perf-profile.children.cycles-pp.iterate_dir
0.61 ± 15% -0.3 0.35 ± 24% perf-profile.children.cycles-pp.__percpu_counter_init
0.96 ± 8% -0.3 0.71 ± 20% perf-profile.children.cycles-pp.evlist_cpu_iterator__next
1.03 ± 12% -0.2 0.78 ± 15% perf-profile.children.cycles-pp.__libc_read
0.75 ± 8% -0.2 0.53 ± 17% perf-profile.children.cycles-pp.__x64_sys_sched_setaffinity
0.39 ± 13% -0.2 0.19 ± 24% perf-profile.children.cycles-pp.__entry_text_start
0.40 ± 18% -0.2 0.22 ± 25% perf-profile.children.cycles-pp.ptrace_may_access
0.62 ± 7% -0.2 0.45 ± 17% perf-profile.children.cycles-pp.__sched_setaffinity
0.36 ± 16% -0.2 0.20 ± 25% perf-profile.children.cycles-pp.proc_fill_cache
0.57 ± 6% -0.2 0.40 ± 20% perf-profile.children.cycles-pp.__set_cpus_allowed_ptr
0.42 ± 21% -0.2 0.27 ± 38% perf-profile.children.cycles-pp.inode_permission
0.36 ± 20% -0.1 0.22 ± 25% perf-profile.children.cycles-pp._find_next_bit
0.39 ± 14% -0.1 0.25 ± 22% perf-profile.children.cycles-pp.__kmem_cache_alloc_node
0.44 ± 12% -0.1 0.30 ± 26% perf-profile.children.cycles-pp.pick_link
0.25 ± 18% -0.1 0.12 ± 19% perf-profile.children.cycles-pp.security_ptrace_access_check
0.32 ± 15% -0.1 0.19 ± 22% perf-profile.children.cycles-pp.__x64_sys_readlink
0.22 ± 13% -0.1 0.11 ± 33% perf-profile.children.cycles-pp.readlink
0.31 ± 14% -0.1 0.19 ± 22% perf-profile.children.cycles-pp.do_readlinkat
0.32 ± 11% -0.1 0.22 ± 30% perf-profile.children.cycles-pp.vfs_fstat
0.26 ± 19% -0.1 0.15 ± 26% perf-profile.children.cycles-pp.load_elf_interp
0.22 ± 17% -0.1 0.12 ± 32% perf-profile.children.cycles-pp.d_hash_and_lookup
0.21 ± 31% -0.1 0.12 ± 31% perf-profile.children.cycles-pp.may_open
0.30 ± 14% -0.1 0.21 ± 18% perf-profile.children.cycles-pp.copy_strings
0.24 ± 18% -0.1 0.14 ± 32% perf-profile.children.cycles-pp.unlink_anon_vmas
0.19 ± 19% -0.1 0.10 ± 32% perf-profile.children.cycles-pp.__kmalloc_node
0.29 ± 8% -0.1 0.21 ± 10% perf-profile.children.cycles-pp.affine_move_task
0.24 ± 21% -0.1 0.16 ± 24% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.22 ± 10% -0.1 0.14 ± 28% perf-profile.children.cycles-pp.mas_preallocate
0.24 ± 12% -0.1 0.16 ± 30% perf-profile.children.cycles-pp.mas_alloc_nodes
0.21 ± 14% -0.1 0.14 ± 20% perf-profile.children.cycles-pp.__d_alloc
0.10 ± 19% -0.1 0.03 ±100% perf-profile.children.cycles-pp.pid_task
0.14 ± 24% -0.1 0.06 ± 50% perf-profile.children.cycles-pp.single_open
0.20 ± 11% -0.1 0.12 ± 12% perf-profile.children.cycles-pp.cpu_stop_queue_work
0.18 ± 16% -0.1 0.11 ± 25% perf-profile.children.cycles-pp.generic_fillattr
0.14 ± 19% -0.1 0.07 ± 29% perf-profile.children.cycles-pp.apparmor_ptrace_access_check
0.14 ± 23% -0.1 0.08 ± 30% perf-profile.children.cycles-pp.native_flush_tlb_one_user
0.10 ± 10% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.vfs_readlink
0.09 ± 19% -0.1 0.03 ±100% perf-profile.children.cycles-pp.aa_get_task_label
0.14 ± 25% -0.1 0.08 ± 23% perf-profile.children.cycles-pp.proc_pid_get_link
0.16 ± 21% -0.1 0.10 ± 28% perf-profile.children.cycles-pp.thread_group_cputime_adjusted
0.19 ± 15% -0.1 0.13 ± 27% perf-profile.children.cycles-pp.strnlen_user
0.18 ± 27% -0.1 0.11 ± 21% perf-profile.children.cycles-pp.wq_worker_comm
0.18 ± 13% -0.1 0.11 ± 36% perf-profile.children.cycles-pp.vfs_getattr_nosec
0.17 ± 16% -0.1 0.11 ± 24% perf-profile.children.cycles-pp.proc_pid_cmdline_read
0.12 ± 10% -0.1 0.06 ± 48% perf-profile.children.cycles-pp.terminate_walk
0.14 ± 18% -0.1 0.09 ± 27% perf-profile.children.cycles-pp.thread_group_cputime
0.13 ± 21% -0.0 0.08 ± 27% perf-profile.children.cycles-pp.get_obj_cgroup_from_current
0.14 ± 18% -0.0 0.10 ± 26% perf-profile.children.cycles-pp.get_mm_cmdline
0.14 ± 10% -0.0 0.10 ± 17% perf-profile.children.cycles-pp.wake_up_q
1.37 ± 16% -0.6 0.81 ± 23% perf-profile.self.cycles-pp.do_task_stat
0.80 ± 18% -0.3 0.46 ± 34% perf-profile.self.cycles-pp.__d_lookup_rcu
0.39 ± 15% -0.2 0.19 ± 33% perf-profile.self.cycles-pp.pid_revalidate
0.37 ± 11% -0.2 0.18 ± 22% perf-profile.self.cycles-pp.__entry_text_start
0.36 ± 14% -0.2 0.21 ± 37% perf-profile.self.cycles-pp.do_dentry_open
0.44 ± 17% -0.1 0.31 ± 24% perf-profile.self.cycles-pp.gather_pte_stats
0.23 ± 15% -0.1 0.14 ± 14% perf-profile.self.cycles-pp.__kmem_cache_alloc_node
0.10 ± 18% -0.1 0.03 ±100% perf-profile.self.cycles-pp.pid_task
0.21 ± 17% -0.1 0.14 ± 25% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
0.14 ± 23% -0.1 0.08 ± 30% perf-profile.self.cycles-pp.native_flush_tlb_one_user
0.16 ± 23% -0.1 0.09 ± 26% perf-profile.self.cycles-pp.generic_fillattr
0.09 ± 20% -0.1 0.03 ±101% perf-profile.self.cycles-pp.unlink_anon_vmas
0.10 ± 25% -0.1 0.04 ± 76% perf-profile.self.cycles-pp.proc_fill_cache
0.12 ± 20% -0.1 0.06 ± 58% perf-profile.self.cycles-pp.lookup_fast
>
> Mel/PeterZ,
>
> Whenever time permits can you please let us know your comments/concerns
> on the series?
>
> Thanks and Regards
> - Raghu
>
On 9/12/2023 7:52 AM, Oliver Sang wrote:
> hi, Raghu,
>
> On Mon, Sep 11, 2023 at 04:55:56PM +0530, Raghavendra K T wrote:
>> On 9/10/2023 8:59 PM, kernel test robot wrote:
>>> 341.49 -4.1% 327.42 ± 2% autonuma-benchmark.numa01.seconds
>>> 186.67 ± 6% -27.1% 136.12 ± 7% autonuma-benchmark.numa01_THREAD_ALLOC.seconds
>>> 21.17 ± 7% -33.6% 14.05 autonuma-benchmark.numa02.seconds
>>> 2200 ± 2% -13.0% 1913 ± 3% autonuma-benchmark.time.elapsed_time
>>
>> Hello Oliver/Kernel test robot,
>> Thank yo alot for testing.
>>
>> Results are impressive. Can I take this result as
>> positive for whole series too?
>
> FYI. we applied your patch set like below:
>
> 68cfe9439a1ba (linux-review/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007) sched/numa: Allow scanning of shared VMAs
> af46f3c9ca2d1 sched/numa: Allow recently accessed VMAs to be scanned
> 167773d1ddb5f sched/numa: Increase tasks' access history
> fc769221b2306 sched/numa: Remove unconditional scan logic using mm numa_scan_seq
> 1ef5cbb92bdb3 sched/numa: Add disjoint vma unconditional scan logic
> 2a806eab1c2e1 sched/numa: Move up the access pid reset logic
> 2f88c8e802c8b (tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well
>
> in our tests, we also tested the 68cfe9439a1ba, if comparing it to af46f3c9ca2d1:
>
> =========================================================================================
> compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
> gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp6/numa01_THREAD_ALLOC/autonuma-benchmark
>
> commit:
> af46f3c9ca ("sched/numa: Allow recently accessed VMAs to be scanned")
> 68cfe9439a ("sched/numa: Allow scanning of shared VMA")
>
> af46f3c9ca2d1648 68cfe9439a1baa642e05883fa64
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 327.42 ± 2% -1.1% 323.83 ± 3% autonuma-benchmark.numa01.seconds
> 136.12 ± 7% -25.1% 101.90 ± 2% autonuma-benchmark.numa01_THREAD_ALLOC.seconds
> 14.05 +1.5% 14.26 autonuma-benchmark.numa02.seconds
> 1913 ± 3% -7.9% 1763 ± 2% autonuma-benchmark.time.elapsed_time
>
>
> below is the full comparison FYI.
>
Thanks a lot for further run and details.
Combining this result with previous, we do have a very good
result overall for LKP.
167773d1dd ("sched/numa: Increase tasks' access history")
af46f3c9ca ("sched/numa: Allow recently accessed VMAs to be scanned")
167773d1ddb5ffdd af46f3c9ca2d16485912f8b9c89
---------------- ---------------------------
%stddev %change %stddev
341.49 -4.1% 327.42 ± 2% autonuma-benchmark.numa01.seconds
186.67 ± 6% -27.1% 136.12 ± 7%
autonuma-benchmark.numa01_THREAD_ALLOC.seconds
21.17 ± 7% -33.6% 14.05
autonuma-benchmark.numa02.seconds
2200 ± 2% -13.0% 1913 ± 3%
autonuma-benchmark.time.elapsed_time
Thanks and Regards
- Raghu
>
>
>
>>
>> Mel/PeterZ,
>>
>> Whenever time permits can you please let us know your comments/concerns
>> on the series?
>>
>> Thanks and Regards
>> - Raghu
>>
© 2016 - 2025 Red Hat, Inc.