kernel/sched/fair.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
schbench (https://github.com/masoncl/schbench.git) is showing a
regression from previous production kernels that bisected down to:
sched/fair: Remove sysctl_sched_migration_cost condition (c5b0a7eefc)
The schbench command line was:
schbench -L -m 4 -M auto -t 256 -n 0 -r 0 -s 0
This creates 4 message threads pinned to CPUs 0-3, and 256x4 worker
threads spread across the rest of the CPUs. Neither the worker threads
or the message threads do any work, they just wake each other up and go
back to sleep as soon as possible.
The end result is the first 4 CPUs are pegged waking up those 1024
workers, and the rest of the CPUs are constantly banging in and out of
idle. If I take a v6.9 Linus kernel and revert that one commit,
performance goes from 3.4M RPS to 5.4M RPS.
schedstat shows there are ~100x more new idle balance operations, and
profiling shows the worker threads are spending ~20% of their CPU time
on new idle balance. schedstats also shows that almost all of these new
idle balance attemps are failing to find busy groups.
The fix used here is to crank up the cost of the newidle balance whenever it
fails. Since we don't want sd->max_newidle_lb_cost to grow out of
control, this also changes update_newidle_cost() to use
sysctl_sched_migration_cost as the upper limit on max_newidle_lb_cost.
Signed-off-by: Chris Mason <clm@fb.com>
---
kernel/sched/fair.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a14da5396fb2..042ab0863ccc0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12174,8 +12174,14 @@ static inline bool update_newidle_cost(struct sched_domain *sd, u64 cost)
/*
* Track max cost of a domain to make sure to not delay the
* next wakeup on the CPU.
+ *
+ * sched_balance_newidle() bumps the cost whenever newidle
+ * balance fails, and we don't want things to grow out of
+ * control. Use the sysctl_sched_migration_cost as the upper
+ * limit, plus a litle extra to avoid off by ones.
*/
- sd->max_newidle_lb_cost = cost;
+ sd->max_newidle_lb_cost =
+ min(cost, sysctl_sched_migration_cost + 200);
sd->last_decay_max_lb_cost = jiffies;
} else if (time_after(jiffies, sd->last_decay_max_lb_cost + HZ)) {
/*
@@ -12867,10 +12873,17 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
t1 = sched_clock_cpu(this_cpu);
domain_cost = t1 - t0;
- update_newidle_cost(sd, domain_cost);
-
curr_cost += domain_cost;
t0 = t1;
+
+ /*
+ * Failing newidle means it is not effective;
+ * bump the cost so we end up doing less of it.
+ */
+ if (!pulled_task)
+ domain_cost = (3 * sd->max_newidle_lb_cost) / 2;
+
+ update_newidle_cost(sd, domain_cost);
}
/*
--
2.47.1
On 26/06/2025 15:39, Chris Mason wrote: > schbench (https://github.com/masoncl/schbench.git) is showing a > regression from previous production kernels that bisected down to: > > sched/fair: Remove sysctl_sched_migration_cost condition (c5b0a7eefc) > > The schbench command line was: > > schbench -L -m 4 -M auto -t 256 -n 0 -r 0 -s 0 > > This creates 4 message threads pinned to CPUs 0-3, and 256x4 worker > threads spread across the rest of the CPUs. Neither the worker threads > or the message threads do any work, they just wake each other up and go > back to sleep as soon as possible. > > The end result is the first 4 CPUs are pegged waking up those 1024 > workers, and the rest of the CPUs are constantly banging in and out of > idle. If I take a v6.9 Linus kernel and revert that one commit, > performance goes from 3.4M RPS to 5.4M RPS. > > schedstat shows there are ~100x more new idle balance operations, and > profiling shows the worker threads are spending ~20% of their CPU time > on new idle balance. schedstats also shows that almost all of these new > idle balance attemps are failing to find busy groups. > > The fix used here is to crank up the cost of the newidle balance whenever it > fails. Since we don't want sd->max_newidle_lb_cost to grow out of > control, this also changes update_newidle_cost() to use > sysctl_sched_migration_cost as the upper limit on max_newidle_lb_cost. > > Signed-off-by: Chris Mason <clm@fb.com> Hi, I'm seeing a ~25% regression in requests per second for an nginx workload in 6.17-rc4 compared with 6.16, when the number of simulated clients (threads) is high (1000). Bisection led me to this patch. The workload is running on an AmpereOne (arm64) system with 192 CPUs. FWIW, I don't see the regression on an AWS Graviton3 system. I'm also seeing a 10% regression on the same system for a MySQL workload; but I haven't yet bisected that one - I'll report back if that turns out to be due to this too. I saw that there was a regression raised against this patch by kernel test robot for unixbench.throughput back in July, but it didn't look like it got resolved. I can repro this easily so happy to try out any candidate fixes. Here is the bisect log: # good: [038d61fd642278bab63ee8ef722c50d10ab01e8f] Linux 6.16 git bisect good 038d61fd642278bab63ee8ef722c50d10ab01e8f # status: waiting for bad commit, 1 good commit known # bad: [8f5ae30d69d7543eee0d70083daf4de8fe15d585] Linux 6.17-rc1 git bisect bad 8f5ae30d69d7543eee0d70083daf4de8fe15d585 # bad: [8be4d31cb8aaeea27bde4b7ddb26e28a89062ebf] Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next git bisect bad 8be4d31cb8aaeea27bde4b7ddb26e28a89062ebf # good: [115e74a29b530d121891238e9551c4bcdf7b04b5] Merge tag 'soc-dt-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc git bisect good 115e74a29b530d121891238e9551c4bcdf7b04b5 # good: [49f02e6877d1bec848048dc6366859c30bbc0a04] Octeontx2-af: Debugfs support for firmware data git bisect good 49f02e6877d1bec848048dc6366859c30bbc0a04 # good: [14bed9bc81bae64db98349319f367bfc7dab0afd] Merge tag 'x86_sev_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good 14bed9bc81bae64db98349319f367bfc7dab0afd # good: [c6dc26df6b4883de63cb237b4070feba92b01a87] Merge tag 'nf-next-25-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next git bisect good c6dc26df6b4883de63cb237b4070feba92b01a87 # bad: [3bb38c52719baa7f9cdbf200016ed481b4498290] Merge tag 'm68k-for-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k git bisect bad 3bb38c52719baa7f9cdbf200016ed481b4498290 # bad: [bcb48dd3b344592cc33732de640b99264c073df1] Merge tag 'perf-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad bcb48dd3b344592cc33732de640b99264c073df1 # good: [d403a3689af5c3a3e3ac6e282958d0eaa69ca47f] sched/fair: Move max_cfs_quota_period decl and default_cfs_period() def from fair.c to sched.h git bisect good d403a3689af5c3a3e3ac6e282958d0eaa69ca47f # bad: [9fdb12c88e9ba75e2d831fb397dd27f03a534968] tools/sched: Add root_domains_dump.py which dumps root domains info git bisect bad 9fdb12c88e9ba75e2d831fb397dd27f03a534968 # bad: [570c8efd5eb79c3725ba439ce105ed1bedc5acd9] sched/psi: Optimize psi_group_change() cpu_clock() usage git bisect bad 570c8efd5eb79c3725ba439ce105ed1bedc5acd9 # good: [11867144ff81ab98f4b11c99716c3e8b714b8755] rust: sync: Mark PollCondVar::drop() inline git bisect good 11867144ff81ab98f4b11c99716c3e8b714b8755 # good: [7e611710acf966df1e14bcf4e067385e38e549a1] rust: task: Add Rust version of might_sleep() git bisect good 7e611710acf966df1e14bcf4e067385e38e549a1 # bad: [155213a2aed42c85361bf4f5c817f5cb68951c3b] sched/fair: Bump sd->max_newidle_lb_cost when newidle balance fails git bisect bad 155213a2aed42c85361bf4f5c817f5cb68951c3b # good: [d398a68e8bcf430e231cccfbaa27cb25a7a6f224] Merge tag 'rust-sched.2025.06.24' of git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux into sched/core git bisect good d398a68e8bcf430e231cccfbaa27cb25a7a6f224 # first bad commit: [155213a2aed42c85361bf4f5c817f5cb68951c3b] sched/fair: Bump sd->max_newidle_lb_cost when newidle balance fails Thanks, Ryan > --- > kernel/sched/fair.c | 19 ++++++++++++++++--- > 1 file changed, 16 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 7a14da5396fb2..042ab0863ccc0 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -12174,8 +12174,14 @@ static inline bool update_newidle_cost(struct sched_domain *sd, u64 cost) > /* > * Track max cost of a domain to make sure to not delay the > * next wakeup on the CPU. > + * > + * sched_balance_newidle() bumps the cost whenever newidle > + * balance fails, and we don't want things to grow out of > + * control. Use the sysctl_sched_migration_cost as the upper > + * limit, plus a litle extra to avoid off by ones. > */ > - sd->max_newidle_lb_cost = cost; > + sd->max_newidle_lb_cost = > + min(cost, sysctl_sched_migration_cost + 200); > sd->last_decay_max_lb_cost = jiffies; > } else if (time_after(jiffies, sd->last_decay_max_lb_cost + HZ)) { > /* > @@ -12867,10 +12873,17 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf) > > t1 = sched_clock_cpu(this_cpu); > domain_cost = t1 - t0; > - update_newidle_cost(sd, domain_cost); > - > curr_cost += domain_cost; > t0 = t1; > + > + /* > + * Failing newidle means it is not effective; > + * bump the cost so we end up doing less of it. > + */ > + if (!pulled_task) > + domain_cost = (3 * sd->max_newidle_lb_cost) / 2; > + > + update_newidle_cost(sd, domain_cost); > } > > /*
On 07.09.25 20:21, Ryan Roberts wrote: > On 26/06/2025 15:39, Chris Mason wrote: [...] > I'm seeing a ~25% regression in requests per second for an nginx workload in > 6.17-rc4 compared with 6.16, when the number of simulated clients (threads) is > high (1000). Bisection led me to this patch. The workload is running on an > AmpereOne (arm64) system with 192 CPUs. FWIW, I don't see the regression on an > AWS Graviton3 system. Can you look for any sched domain hierarchy differences between AmpereOne and Grav3? I assume Grav3 is bare-metal? > I'm also seeing a 10% regression on the same system for a MySQL workload; but I > haven't yet bisected that one - I'll report back if that turns out to be due to > this too. Is this the hammerdb test in which the SUT hosts the mysqld? Which params (#Virtual Users, #Warehouses, ...) were you using? > I saw that there was a regression raised against this patch by kernel test robot > for unixbench.throughput back in July, but it didn't look like it got resolved. Can you run the unixbench shell1 test on AmpereOne to see if the regression in nginx is related to this unixbench test? [...]
On 07/09/2025 19:21, Ryan Roberts wrote: > On 26/06/2025 15:39, Chris Mason wrote: >> schbench (https://github.com/masoncl/schbench.git) is showing a >> regression from previous production kernels that bisected down to: >> >> sched/fair: Remove sysctl_sched_migration_cost condition (c5b0a7eefc) >> >> The schbench command line was: >> >> schbench -L -m 4 -M auto -t 256 -n 0 -r 0 -s 0 >> >> This creates 4 message threads pinned to CPUs 0-3, and 256x4 worker >> threads spread across the rest of the CPUs. Neither the worker threads >> or the message threads do any work, they just wake each other up and go >> back to sleep as soon as possible. >> >> The end result is the first 4 CPUs are pegged waking up those 1024 >> workers, and the rest of the CPUs are constantly banging in and out of >> idle. If I take a v6.9 Linus kernel and revert that one commit, >> performance goes from 3.4M RPS to 5.4M RPS. >> >> schedstat shows there are ~100x more new idle balance operations, and >> profiling shows the worker threads are spending ~20% of their CPU time >> on new idle balance. schedstats also shows that almost all of these new >> idle balance attemps are failing to find busy groups. >> >> The fix used here is to crank up the cost of the newidle balance whenever it >> fails. Since we don't want sd->max_newidle_lb_cost to grow out of >> control, this also changes update_newidle_cost() to use >> sysctl_sched_migration_cost as the upper limit on max_newidle_lb_cost. >> >> Signed-off-by: Chris Mason <clm@fb.com> > > Hi, > > I'm seeing a ~25% regression in requests per second for an nginx workload in > 6.17-rc4 compared with 6.16, when the number of simulated clients (threads) is > high (1000). Bisection led me to this patch. The workload is running on an > AmpereOne (arm64) system with 192 CPUs. FWIW, I don't see the regression on an > AWS Graviton3 system. > > I'm also seeing a 10% regression on the same system for a MySQL workload; but I > haven't yet bisected that one - I'll report back if that turns out to be due to > this too. Just to add that this MySQL workload regression was also bisected to the same patch. Thanks, Ryan > > I saw that there was a regression raised against this patch by kernel test robot > for unixbench.throughput back in July, but it didn't look like it got resolved. > > I can repro this easily so happy to try out any candidate fixes. > > > Here is the bisect log: > > # good: [038d61fd642278bab63ee8ef722c50d10ab01e8f] Linux 6.16 > git bisect good 038d61fd642278bab63ee8ef722c50d10ab01e8f > # status: waiting for bad commit, 1 good commit known > # bad: [8f5ae30d69d7543eee0d70083daf4de8fe15d585] Linux 6.17-rc1 > git bisect bad 8f5ae30d69d7543eee0d70083daf4de8fe15d585 > # bad: [8be4d31cb8aaeea27bde4b7ddb26e28a89062ebf] Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next > git bisect bad 8be4d31cb8aaeea27bde4b7ddb26e28a89062ebf > # good: [115e74a29b530d121891238e9551c4bcdf7b04b5] Merge tag 'soc-dt-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc > git bisect good 115e74a29b530d121891238e9551c4bcdf7b04b5 > # good: [49f02e6877d1bec848048dc6366859c30bbc0a04] Octeontx2-af: Debugfs support for firmware data > git bisect good 49f02e6877d1bec848048dc6366859c30bbc0a04 > # good: [14bed9bc81bae64db98349319f367bfc7dab0afd] Merge tag 'x86_sev_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect good 14bed9bc81bae64db98349319f367bfc7dab0afd > # good: [c6dc26df6b4883de63cb237b4070feba92b01a87] Merge tag 'nf-next-25-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next > git bisect good c6dc26df6b4883de63cb237b4070feba92b01a87 > # bad: [3bb38c52719baa7f9cdbf200016ed481b4498290] Merge tag 'm68k-for-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k > git bisect bad 3bb38c52719baa7f9cdbf200016ed481b4498290 > # bad: [bcb48dd3b344592cc33732de640b99264c073df1] Merge tag 'perf-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect bad bcb48dd3b344592cc33732de640b99264c073df1 > # good: [d403a3689af5c3a3e3ac6e282958d0eaa69ca47f] sched/fair: Move max_cfs_quota_period decl and default_cfs_period() def from fair.c to sched.h > git bisect good d403a3689af5c3a3e3ac6e282958d0eaa69ca47f > # bad: [9fdb12c88e9ba75e2d831fb397dd27f03a534968] tools/sched: Add root_domains_dump.py which dumps root domains info > git bisect bad 9fdb12c88e9ba75e2d831fb397dd27f03a534968 > # bad: [570c8efd5eb79c3725ba439ce105ed1bedc5acd9] sched/psi: Optimize psi_group_change() cpu_clock() usage > git bisect bad 570c8efd5eb79c3725ba439ce105ed1bedc5acd9 > # good: [11867144ff81ab98f4b11c99716c3e8b714b8755] rust: sync: Mark PollCondVar::drop() inline > git bisect good 11867144ff81ab98f4b11c99716c3e8b714b8755 > # good: [7e611710acf966df1e14bcf4e067385e38e549a1] rust: task: Add Rust version of might_sleep() > git bisect good 7e611710acf966df1e14bcf4e067385e38e549a1 > # bad: [155213a2aed42c85361bf4f5c817f5cb68951c3b] sched/fair: Bump sd->max_newidle_lb_cost when newidle balance fails > git bisect bad 155213a2aed42c85361bf4f5c817f5cb68951c3b > # good: [d398a68e8bcf430e231cccfbaa27cb25a7a6f224] Merge tag 'rust-sched.2025.06.24' of git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux into sched/core > git bisect good d398a68e8bcf430e231cccfbaa27cb25a7a6f224 > # first bad commit: [155213a2aed42c85361bf4f5c817f5cb68951c3b] sched/fair: Bump sd->max_newidle_lb_cost when newidle balance fails > > > Thanks, > Ryan > > > >> --- >> kernel/sched/fair.c | 19 ++++++++++++++++--- >> 1 file changed, 16 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 7a14da5396fb2..042ab0863ccc0 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -12174,8 +12174,14 @@ static inline bool update_newidle_cost(struct sched_domain *sd, u64 cost) >> /* >> * Track max cost of a domain to make sure to not delay the >> * next wakeup on the CPU. >> + * >> + * sched_balance_newidle() bumps the cost whenever newidle >> + * balance fails, and we don't want things to grow out of >> + * control. Use the sysctl_sched_migration_cost as the upper >> + * limit, plus a litle extra to avoid off by ones. >> */ >> - sd->max_newidle_lb_cost = cost; >> + sd->max_newidle_lb_cost = >> + min(cost, sysctl_sched_migration_cost + 200); >> sd->last_decay_max_lb_cost = jiffies; >> } else if (time_after(jiffies, sd->last_decay_max_lb_cost + HZ)) { >> /* >> @@ -12867,10 +12873,17 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf) >> >> t1 = sched_clock_cpu(this_cpu); >> domain_cost = t1 - t0; >> - update_newidle_cost(sd, domain_cost); >> - >> curr_cost += domain_cost; >> t0 = t1; >> + >> + /* >> + * Failing newidle means it is not effective; >> + * bump the cost so we end up doing less of it. >> + */ >> + if (!pulled_task) >> + domain_cost = (3 * sd->max_newidle_lb_cost) / 2; >> + >> + update_newidle_cost(sd, domain_cost); >> } >> >> /* >
Hello, kernel test robot noticed a 22.9% regression of unixbench.throughput on: commit: ac34cb39e8aea9915ec2f4e08c979eb2ed1d7561 ("[PATCH v2] sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails") url: https://github.com/intel-lab-lkp/linux/commits/Chris-Mason/sched-fair-bump-sd-max_newidle_lb_cost-when-newidle-balance-fails/20250626-224805 base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 5bc34be478d09c4d16009e665e020ad0fcd0deea patch link: https://lore.kernel.org/all/20250626144017.1510594-2-clm@fb.com/ patch subject: [PATCH v2] sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails testcase: unixbench config: x86_64-rhel-9.4 compiler: gcc-12 test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory parameters: runtime: 300s nr_task: 100% test: shell1 cpufreq_governor: performance In addition to that, the commit also has significant impact on the following tests: +------------------+-------------------------------------------------------------------------------------------+ | testcase: change | unixbench: unixbench.throughput 20.3% regression | | test machine | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory | | test parameters | cpufreq_governor=performance | | | nr_task=100% | | | runtime=300s | | | test=shell16 | +------------------+-------------------------------------------------------------------------------------------+ | testcase: change | unixbench: unixbench.throughput 26.2% regression | | test machine | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory | | test parameters | cpufreq_governor=performance | | | nr_task=100% | | | runtime=300s | | | test=shell8 | +------------------+-------------------------------------------------------------------------------------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202507150846.538fc133-lkp@intel.com Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250715/202507150846.538fc133-lkp@intel.com ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/300s/lkp-icl-2sp9/shell1/unixbench commit: 5bc34be478 ("sched/core: Reorganize cgroup bandwidth control interface file writes") ac34cb39e8 ("sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails") 5bc34be478d09c4d ac34cb39e8aea9915ec2f4e08c9 ---------------- --------------------------- %stddev %change %stddev \ | \ 17957 +38.8% 24925 uptime.idle 1.611e+10 +42.4% 2.294e+10 cpuidle..time 2.162e+08 -23.9% 1.645e+08 cpuidle..usage 5.364e+08 -22.9% 4.134e+08 numa-numastat.node0.local_node 5.365e+08 -22.9% 4.134e+08 numa-numastat.node0.numa_hit 5.321e+08 -22.5% 4.123e+08 numa-numastat.node1.local_node 5.322e+08 -22.5% 4.124e+08 numa-numastat.node1.numa_hit 40.56 +41.4% 57.37 vmstat.cpu.id 39.90 -18.3% 32.60 ± 2% vmstat.procs.r 656646 -33.9% 434227 vmstat.system.cs 507937 -33.5% 337601 vmstat.system.in 3297469 -12.5% 2885290 meminfo.Active 3297469 -12.5% 2885290 meminfo.Active(anon) 3783682 -11.1% 3363144 meminfo.Committed_AS 126051 -11.5% 111597 meminfo.Mapped 2552513 -16.2% 2139559 meminfo.Shmem 4145 -23.5% 3170 perf-c2c.DRAM.local 27214 -28.6% 19429 perf-c2c.DRAM.remote 24159 -28.8% 17197 perf-c2c.HITM.local 17694 -26.7% 12972 perf-c2c.HITM.remote 41854 -27.9% 30169 perf-c2c.HITM.total 1433827 ± 13% -29.7% 1008553 ± 19% numa-meminfo.node0.Active 1433827 ± 13% -29.7% 1008553 ± 19% numa-meminfo.node0.Active(anon) 1021730 ± 13% -22.2% 794603 ± 18% numa-meminfo.node0.Shmem 1234989 ±124% +101.0% 2482483 ± 61% numa-meminfo.node0.Unevictable 178184 ± 52% +99.6% 355743 ± 23% numa-meminfo.node1.AnonHugePages 84396 ± 39% -42.8% 48315 ± 66% numa-meminfo.node1.Mapped 40.37 +16.9 57.24 mpstat.cpu.all.idle% 0.00 ± 9% +0.0 0.01 ± 6% mpstat.cpu.all.iowait% 1.38 -0.5 0.88 mpstat.cpu.all.irq% 0.63 -0.2 0.48 mpstat.cpu.all.soft% 48.87 -13.4 35.42 mpstat.cpu.all.sys% 8.74 -2.8 5.97 mpstat.cpu.all.usr% 64.36 -26.3% 47.43 mpstat.max_utilization_pct 35159 -22.9% 27105 unixbench.score 149076 -22.9% 114925 unixbench.throughput 11331167 -24.6% 8538458 unixbench.time.involuntary_context_switches 490074 -27.8% 354007 unixbench.time.major_page_faults 1.283e+09 -22.7% 9.909e+08 unixbench.time.minor_page_faults 3528 -23.9% 2683 unixbench.time.percent_of_cpu_this_job_got 16912 -23.8% 12883 unixbench.time.system_time 5340 -24.3% 4044 unixbench.time.user_time 1.994e+08 -34.0% 1.316e+08 unixbench.time.voluntary_context_switches 94067103 -23.0% 72441376 unixbench.workload 358475 ± 13% -29.7% 252168 ± 19% numa-vmstat.node0.nr_active_anon 255426 ± 13% -22.2% 198657 ± 18% numa-vmstat.node0.nr_shmem 308748 ±124% +101.0% 620615 ± 61% numa-vmstat.node0.nr_unevictable 358475 ± 13% -29.7% 252168 ± 19% numa-vmstat.node0.nr_zone_active_anon 308748 ±124% +101.0% 620615 ± 61% numa-vmstat.node0.nr_zone_unevictable 5.364e+08 -22.9% 4.134e+08 numa-vmstat.node0.numa_hit 5.364e+08 -22.9% 4.134e+08 numa-vmstat.node0.numa_local 86.96 ± 52% +99.8% 173.71 ± 23% numa-vmstat.node1.nr_anon_transparent_hugepages 20975 ± 39% -44.2% 11702 ± 69% numa-vmstat.node1.nr_mapped 5.321e+08 -22.5% 4.124e+08 numa-vmstat.node1.numa_hit 5.32e+08 -22.5% 4.123e+08 numa-vmstat.node1.numa_local 824471 -12.5% 721377 proc-vmstat.nr_active_anon 1544907 -6.7% 1441463 proc-vmstat.nr_file_pages 31890 -12.0% 28063 proc-vmstat.nr_mapped 15586 -6.0% 14654 proc-vmstat.nr_page_table_pages 638158 -16.2% 534927 proc-vmstat.nr_shmem 27700 -2.7% 26944 proc-vmstat.nr_slab_reclaimable 46384 -1.3% 45766 proc-vmstat.nr_slab_unreclaimable 824471 -12.5% 721377 proc-vmstat.nr_zone_active_anon 1.069e+09 -22.7% 8.258e+08 proc-vmstat.numa_hit 1.068e+09 -22.7% 8.257e+08 proc-vmstat.numa_local 1.097e+09 -22.7% 8.475e+08 proc-vmstat.pgalloc_normal 1.285e+09 -22.7% 9.935e+08 proc-vmstat.pgfault 1.096e+09 -22.7% 8.469e+08 proc-vmstat.pgfree 62443782 -22.9% 48142857 proc-vmstat.pgreuse 49314 -22.7% 38119 proc-vmstat.thp_fault_alloc 20703487 -23.0% 15951229 proc-vmstat.unevictable_pgs_culled 3.31 -13.5% 2.87 perf-stat.i.MPKI 1.573e+10 -26.9% 1.151e+10 perf-stat.i.branch-instructions 1.65 -0.1 1.51 perf-stat.i.branch-miss-rate% 2.56e+08 -33.3% 1.708e+08 perf-stat.i.branch-misses 21.13 +1.0 22.10 perf-stat.i.cache-miss-rate% 2.564e+08 -37.0% 1.616e+08 perf-stat.i.cache-misses 1.2e+09 -39.8% 7.221e+08 perf-stat.i.cache-references 659355 -33.9% 436041 perf-stat.i.context-switches 1.88 -1.1% 1.85 perf-stat.i.cpi 1.451e+11 -27.8% 1.048e+11 perf-stat.i.cpu-cycles 166359 -50.8% 81842 perf-stat.i.cpu-migrations 572.23 +14.0% 652.16 perf-stat.i.cycles-between-cache-misses 7.632e+10 -26.9% 5.58e+10 perf-stat.i.instructions 776.76 -27.7% 561.23 perf-stat.i.major-faults 75.15 -25.2% 56.20 perf-stat.i.metric.K/sec 1992430 -22.7% 1540174 perf-stat.i.minor-faults 1993207 -22.7% 1540736 perf-stat.i.page-faults 3.36 -13.8% 2.90 perf-stat.overall.MPKI 1.63 -0.1 1.48 perf-stat.overall.branch-miss-rate% 21.36 +1.0 22.38 perf-stat.overall.cache-miss-rate% 1.90 -1.2% 1.88 perf-stat.overall.cpi 565.86 +14.6% 648.44 perf-stat.overall.cycles-between-cache-misses 511627 -5.0% 486061 perf-stat.overall.path-length 1.571e+10 -26.9% 1.149e+10 perf-stat.ps.branch-instructions 2.557e+08 -33.3% 1.705e+08 perf-stat.ps.branch-misses 2.56e+08 -37.0% 1.613e+08 perf-stat.ps.cache-misses 1.198e+09 -39.8% 7.21e+08 perf-stat.ps.cache-references 658326 -33.9% 435320 perf-stat.ps.context-switches 1.448e+11 -27.8% 1.046e+11 perf-stat.ps.cpu-cycles 166110 -50.8% 81714 perf-stat.ps.cpu-migrations 7.621e+10 -26.9% 5.571e+10 perf-stat.ps.instructions 775.68 -27.8% 560.36 perf-stat.ps.major-faults 1989394 -22.7% 1537634 perf-stat.ps.minor-faults 1990170 -22.7% 1538194 perf-stat.ps.page-faults 4.813e+13 -26.8% 3.521e+13 perf-stat.total.instructions 12366538 -24.1% 9391151 sched_debug.cfs_rq:/.avg_vruntime.avg 13881097 -25.9% 10292721 ± 3% sched_debug.cfs_rq:/.avg_vruntime.max 12094177 -24.3% 9151437 sched_debug.cfs_rq:/.avg_vruntime.min 0.61 ± 4% -19.8% 0.49 ± 6% sched_debug.cfs_rq:/.h_nr_queued.avg 0.59 ± 3% +11.0% 0.66 ± 4% sched_debug.cfs_rq:/.h_nr_queued.stddev 0.60 ± 4% -20.7% 0.47 ± 5% sched_debug.cfs_rq:/.h_nr_runnable.avg 0.57 ± 3% +9.8% 0.63 ± 4% sched_debug.cfs_rq:/.h_nr_runnable.stddev 285550 ± 26% -49.0% 145513 ± 43% sched_debug.cfs_rq:/.left_deadline.avg 9795386 ± 16% -40.2% 5855585 ± 35% sched_debug.cfs_rq:/.left_deadline.max 1571799 ± 18% -43.3% 890472 ± 38% sched_debug.cfs_rq:/.left_deadline.stddev 285548 ± 26% -49.0% 145511 ± 43% sched_debug.cfs_rq:/.left_vruntime.avg 9795302 ± 16% -40.2% 5855509 ± 35% sched_debug.cfs_rq:/.left_vruntime.max 1571785 ± 18% -43.3% 890461 ± 38% sched_debug.cfs_rq:/.left_vruntime.stddev 13.95 ± 3% -26.3% 10.29 ± 3% sched_debug.cfs_rq:/.load_avg.min 12366538 -24.1% 9391151 sched_debug.cfs_rq:/.min_vruntime.avg 13881097 -25.9% 10292721 ± 3% sched_debug.cfs_rq:/.min_vruntime.max 12094177 -24.3% 9151438 sched_debug.cfs_rq:/.min_vruntime.min 0.56 ± 3% -26.4% 0.41 ± 4% sched_debug.cfs_rq:/.nr_queued.avg 33.15 ± 15% +46.7% 48.63 ± 24% sched_debug.cfs_rq:/.removed.load_avg.avg 149.93 ± 9% +24.8% 187.17 ± 14% sched_debug.cfs_rq:/.removed.load_avg.stddev 285548 ± 26% -49.0% 145511 ± 43% sched_debug.cfs_rq:/.right_vruntime.avg 9795302 ± 16% -40.2% 5855509 ± 35% sched_debug.cfs_rq:/.right_vruntime.max 1571785 ± 18% -43.3% 890461 ± 38% sched_debug.cfs_rq:/.right_vruntime.stddev 1520 ± 5% -9.4% 1378 ± 3% sched_debug.cfs_rq:/.runnable_avg.max 1524 ± 4% -9.8% 1375 ± 3% sched_debug.cfs_rq:/.util_avg.max 89.85 ± 5% -19.7% 72.13 ± 6% sched_debug.cfs_rq:/.util_est.avg 151069 +75.1% 264508 ± 3% sched_debug.cpu.avg_idle.avg 305891 ± 7% +93.0% 590328 ± 9% sched_debug.cpu.avg_idle.max 33491 ± 12% +96.0% 65650 ± 14% sched_debug.cpu.avg_idle.min 53210 ± 4% +102.6% 107782 ± 7% sched_debug.cpu.avg_idle.stddev 1051570 ± 7% -30.7% 728706 ± 8% sched_debug.cpu.curr->pid.avg 500485 +182.2% 1412208 sched_debug.cpu.max_idle_balance_cost.avg 516654 +188.9% 1492616 sched_debug.cpu.max_idle_balance_cost.max 500000 +166.3% 1331544 sched_debug.cpu.max_idle_balance_cost.min 2516 ± 29% +1714.3% 45656 ± 5% sched_debug.cpu.max_idle_balance_cost.stddev 0.60 ± 3% -19.8% 0.48 ± 5% sched_debug.cpu.nr_running.avg 0.59 ± 2% +12.1% 0.66 ± 5% sched_debug.cpu.nr_running.stddev 3086131 -33.6% 2048649 sched_debug.cpu.nr_switches.avg 3157114 -34.1% 2079849 sched_debug.cpu.nr_switches.max 2903213 -31.5% 1988842 sched_debug.cpu.nr_switches.min 42830 ± 15% -59.8% 17196 ± 16% sched_debug.cpu.nr_switches.stddev 0.74 ± 5% +15.5% 0.85 ± 3% sched_debug.cpu.nr_uninterruptible.avg 13734 ± 28% -74.1% 3553 ± 11% sched_debug.cpu.nr_uninterruptible.max -26136 -85.3% -3833 sched_debug.cpu.nr_uninterruptible.min 8461 ± 24% -79.9% 1701 ± 10% sched_debug.cpu.nr_uninterruptible.stddev 0.01 ± 12% +66.2% 0.02 ± 25% perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.wp_page_copy 0.01 ± 32% +224.1% 0.04 ± 46% perf-sched.sch_delay.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault 0.01 ± 7% +86.7% 0.03 ± 27% perf-sched.sch_delay.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput 0.02 ± 48% +184.5% 0.05 ± 39% perf-sched.sch_delay.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.part 0.01 ± 20% +48.4% 0.01 ± 9% perf-sched.sch_delay.avg.ms.__cond_resched.copy_strings.isra.0.do_execveat_common 0.01 ± 45% +244.6% 0.05 ± 43% perf-sched.sch_delay.avg.ms.__cond_resched.do_close_on_exec.begin_new_exec.load_elf_binary.exec_binprm 0.01 ± 11% +46.9% 0.02 ± 6% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat 0.02 ± 28% +90.0% 0.03 ± 25% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.walk_component.path_lookupat.filename_lookup 0.01 ± 22% +47.7% 0.02 ± 21% perf-sched.sch_delay.avg.ms.__cond_resched.down_write.unlink_anon_vmas.free_pgtables.exit_mmap 0.01 ± 22% +135.1% 0.03 ± 24% perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.setup_arg_pages.load_elf_binary.exec_binprm 0.02 ± 56% +233.7% 0.05 ± 39% perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.01 ± 20% +122.8% 0.03 ± 25% perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.elf_load.load_elf_binary 0.01 ± 16% +93.4% 0.02 ± 28% perf-sched.sch_delay.avg.ms.__cond_resched.dput.step_into.path_lookupat.filename_lookup 0.01 ± 7% +22.2% 0.01 ± 8% perf-sched.sch_delay.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open 0.01 ± 27% +127.0% 0.02 ± 37% perf-sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit 0.01 ± 30% +88.2% 0.03 ± 24% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc.d_alloc_parallel 0.01 ± 24% +79.7% 0.02 ± 27% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc._install_special_mapping.map_vdso 0.05 ±125% -89.1% 0.01 ± 28% perf-sched.sch_delay.avg.ms.__cond_resched.mmput.exit_mm.do_exit.do_group_exit 0.01 ± 21% -74.4% 0.00 ±145% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.uprobe_clear_state.__mmput.exit_mm 0.01 ± 43% +184.3% 0.04 ± 49% perf-sched.sch_delay.avg.ms.__cond_resched.remove_vma.vms_complete_munmap_vmas.__mmap_region.do_mmap 0.01 +97.9% 0.02 ± 2% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part 0.01 ± 6% +151.4% 0.03 ± 59% perf-sched.sch_delay.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0 0.02 ± 2% +18.9% 0.02 ± 4% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.01 ± 10% +42.4% 0.02 ± 14% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] 0.01 ± 9% +73.3% 0.02 ± 23% perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault 0.01 ± 4% +26.2% 0.02 ± 4% perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault 0.01 ± 13% +68.8% 0.02 ± 2% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown] 0.02 ± 5% +72.6% 0.03 ± 2% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown] 0.01 ± 12% +45.8% 0.01 ± 14% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown] 0.01 ± 6% +42.6% 0.01 ± 6% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown] 0.04 ± 21% +106.5% 0.07 ± 8% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown] 0.04 ± 15% +120.5% 0.10 ± 15% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown] 0.01 ± 3% +24.6% 0.01 ± 5% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.open_last_lookups 0.01 ± 3% +50.8% 0.02 ± 2% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.walk_component 0.01 ± 5% +31.4% 0.01 ± 3% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups 0.02 ± 7% -18.2% 0.01 ± 11% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.stop_one_cpu.sched_exec 0.01 ± 10% +40.2% 0.02 ± 6% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 0.01 ± 6% +29.9% 0.02 ± 6% perf-sched.sch_delay.avg.ms.sigsuspend.__x64_sys_rt_sigsuspend.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.01 -12.5% 0.01 perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.03 ± 3% -16.9% 0.02 ± 3% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 0.11 ± 52% +316.2% 0.47 ± 74% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pud_alloc 0.12 ± 41% +85.8% 0.22 ± 32% perf-sched.sch_delay.max.ms.__cond_resched.copy_strings.isra.0.do_execveat_common 0.10 ± 58% +298.3% 0.42 ± 65% perf-sched.sch_delay.max.ms.__cond_resched.do_close_on_exec.begin_new_exec.load_elf_binary.exec_binprm 0.07 ± 80% +633.0% 0.50 ± 90% perf-sched.sch_delay.max.ms.__cond_resched.down_write.unlink_file_vma_batch_process.unlink_file_vma_batch_add.free_pgtables 0.11 ± 44% +108.8% 0.22 ± 39% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary 0.11 ± 34% +478.5% 0.65 ± 64% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.setup_arg_pages.load_elf_binary.exec_binprm 0.09 ± 65% +286.6% 0.34 ± 69% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.12 ± 24% +313.1% 0.49 ± 54% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.elf_load.load_elf_binary 0.10 ± 32% +185.2% 0.29 ± 57% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc._install_special_mapping.map_vdso 0.09 ± 27% +142.0% 0.23 ± 25% perf-sched.sch_delay.max.ms.__cond_resched.mmput.exec_mmap.begin_new_exec.load_elf_binary 0.15 ±147% -95.7% 0.01 ± 29% perf-sched.sch_delay.max.ms.__cond_resched.mmput.exit_mm.do_exit.do_group_exit 0.01 ± 20% -75.0% 0.00 ±145% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.uprobe_clear_state.__mmput.exit_mm 0.15 ± 50% +388.7% 0.71 ± 68% perf-sched.sch_delay.max.ms.__cond_resched.remove_vma.vms_complete_munmap_vmas.__mmap_region.do_mmap 0.40 ± 27% +1421.1% 6.14 ±174% perf-sched.sch_delay.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0 0.07 ± 30% +303.5% 0.27 ± 45% perf-sched.sch_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call 0.11 ± 44% +107.0% 0.24 ± 31% perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault 0.97 ± 36% +177.9% 2.69 ± 54% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown] 0.85 ± 41% +92.2% 1.63 ± 24% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown] 0.15 ±108% -80.7% 0.03 ± 99% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown] 0.36 ± 41% +123.0% 0.80 ± 40% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown] 0.05 ± 28% -34.7% 0.03 ± 27% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.stop_one_cpu.sched_exec 0.01 ± 3% +38.5% 0.02 perf-sched.total_sch_delay.average.ms 1.01 +18.4% 1.19 perf-sched.total_wait_and_delay.average.ms 1887109 -14.5% 1613345 perf-sched.total_wait_and_delay.count.ms 1.00 +18.1% 1.18 perf-sched.total_wait_time.average.ms 13.85 ± 15% +37.4% 19.02 ± 17% perf-sched.wait_and_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write 6.31 +17.3% 7.41 perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.06 +57.6% 0.09 perf-sched.wait_and_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part 0.59 +16.4% 0.69 perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64 0.02 ± 2% +191.4% 0.05 perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call 5.95 +11.0% 6.61 perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe 19.94 ± 4% +12.3% 22.39 perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll 0.22 +28.1% 0.29 perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.open_last_lookups 0.21 +24.9% 0.27 perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.walk_component 0.21 +20.4% 0.25 perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat 0.14 +27.0% 0.17 perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups 0.54 ± 2% +28.7% 0.70 perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 0.22 ± 3% +22.0% 0.27 ± 3% perf-sched.wait_and_delay.avg.ms.sigsuspend.__x64_sys_rt_sigsuspend.do_syscall_64.entry_SYSCALL_64_after_hwframe 7.32 +13.4% 8.30 perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 126.17 ± 8% -19.8% 101.17 ± 7% perf-sched.wait_and_delay.count.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one 10.50 ± 25% -54.0% 4.83 ± 73% perf-sched.wait_and_delay.count.__cond_resched.__kmalloc_node_noprof.__vmalloc_area_node.__vmalloc_node_range_noprof.__vmalloc_node_noprof 537.33 ± 2% -22.5% 416.33 perf-sched.wait_and_delay.count.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput 43.00 ± 6% -29.8% 30.17 ± 15% perf-sched.wait_and_delay.count.__cond_resched.cgroup_css_set_fork.cgroup_can_fork.copy_process.kernel_clone 80.50 ± 7% -32.1% 54.67 ± 4% perf-sched.wait_and_delay.count.__cond_resched.copy_page_range.dup_mmap.dup_mm.constprop 126.33 ± 10% -21.4% 99.33 ± 8% perf-sched.wait_and_delay.count.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap 46.83 ± 9% -18.9% 38.00 ± 6% perf-sched.wait_and_delay.count.__cond_resched.down_write.anon_vma_clone.anon_vma_fork.dup_mmap 369.17 ± 5% -18.6% 300.50 ± 4% perf-sched.wait_and_delay.count.__cond_resched.down_write.unlink_anon_vmas.free_pgtables.exit_mmap 62.83 ± 9% -26.3% 46.33 ± 9% perf-sched.wait_and_delay.count.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit 91.17 ± 5% -26.1% 67.33 ± 3% perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write 45.50 ± 16% -31.9% 31.00 ± 21% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.alloc_pid.copy_process.kernel_clone 67.50 ± 15% -26.9% 49.33 ± 8% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.prepare_creds.copy_creds.copy_process 183.83 ± 5% -28.6% 131.33 ± 6% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.dup_mmap.dup_mm 7.67 ± 16% -52.2% 3.67 ± 51% perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.fdget_pos.ksys_write.do_syscall_64 93.50 ± 9% -30.3% 65.17 ± 45% perf-sched.wait_and_delay.count.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.__percpu_counter_init_many.mm_init 32.33 ± 13% -36.1% 20.67 ± 23% perf-sched.wait_and_delay.count.__cond_resched.pidfs_exit.release_task.wait_task_zombie.__do_wait 28.67 ± 11% -31.4% 19.67 ± 22% perf-sched.wait_and_delay.count.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin 2932 ± 2% -15.8% 2468 ± 4% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 43706 -10.1% 39295 perf-sched.wait_and_delay.count.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part 797.83 -20.7% 633.00 ± 2% perf-sched.wait_and_delay.count.__cond_resched.zap_pte_range.zap_pmd_range.isra.0 75849 -12.9% 66072 perf-sched.wait_and_delay.count.anon_pipe_read.vfs_read.ksys_read.do_syscall_64 97605 -10.0% 87867 perf-sched.wait_and_delay.count.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call 79047 -10.0% 71173 perf-sched.wait_and_delay.count.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe 2304 ± 3% -14.0% 1980 ± 2% perf-sched.wait_and_delay.count.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] 79568 -16.4% 66556 perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.open_last_lookups 1246136 -15.5% 1052691 perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.walk_component 21025 -14.7% 17931 perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat 21731 -10.0% 19563 perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 10842 -10.0% 9757 perf-sched.wait_and_delay.count.sigsuspend.__x64_sys_rt_sigsuspend.do_syscall_64.entry_SYSCALL_64_after_hwframe 83415 -11.7% 73668 perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 19.70 ± 71% +75.6% 34.61 ± 6% perf-sched.wait_and_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pgd_alloc 34.37 ± 5% +14.7% 39.42 ± 3% perf-sched.wait_and_delay.max.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap 31.04 +10.8% 34.38 ± 4% perf-sched.wait_and_delay.max.ms.__cond_resched.down_write.unlink_anon_vmas.free_pgtables.exit_mmap 28.30 ± 4% +14.1% 32.30 ± 3% perf-sched.wait_and_delay.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit 22.54 ± 49% +59.0% 35.84 ± 6% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.dup_fd.copy_process.kernel_clone 2.25 ± 30% +67.7% 3.77 ± 9% perf-sched.wait_and_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part 29.76 ± 8% +14.9% 34.19 ± 5% perf-sched.wait_and_delay.max.ms.__cond_resched.uprobe_start_dup_mmap.dup_mm.constprop.0 23.02 ± 45% +57.0% 36.15 ± 25% perf-sched.wait_and_delay.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range 30.95 ± 2% +27.9% 39.60 ± 24% perf-sched.wait_and_delay.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0 4.42 ± 19% +393.4% 21.82 ± 11% perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call 28.27 ± 4% +23.0% 34.76 ± 3% perf-sched.wait_and_delay.max.ms.sigsuspend.__x64_sys_rt_sigsuspend.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.08 ± 11% +48.0% 0.12 ± 25% perf-sched.wait_time.avg.ms.__cond_resched.__anon_vma_prepare.__vmf_anon_prepare.do_pte_missing.__handle_mm_fault 0.15 ± 5% +13.9% 0.17 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region 0.16 ± 9% +41.4% 0.23 ± 7% perf-sched.wait_time.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.part 0.13 ± 22% +50.8% 0.19 ± 19% perf-sched.wait_time.avg.ms.__cond_resched.change_pmd_range.isra.0.change_pud_range 0.18 ± 4% +24.8% 0.22 ± 5% perf-sched.wait_time.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range 0.08 ± 38% +99.8% 0.16 ± 30% perf-sched.wait_time.avg.ms.__cond_resched.do_close_on_exec.begin_new_exec.load_elf_binary.exec_binprm 0.18 ± 4% +20.1% 0.22 ± 2% perf-sched.wait_time.avg.ms.__cond_resched.down_read.open_last_lookups.path_openat.do_filp_open 0.20 ± 2% +22.5% 0.25 ± 7% perf-sched.wait_time.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.path_lookupat 0.18 +19.3% 0.22 ± 3% perf-sched.wait_time.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat 0.12 ± 9% +15.3% 0.14 ± 10% perf-sched.wait_time.avg.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap 0.20 ± 2% +25.7% 0.25 ± 8% perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_prepare.__split_vma.vma_modify 0.02 ± 42% +126.9% 0.04 ± 22% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.setup_arg_pages.load_elf_binary.exec_binprm 0.19 +22.8% 0.24 ± 3% perf-sched.wait_time.avg.ms.__cond_resched.dput.d_alloc_parallel.__lookup_slow.walk_component 0.20 ± 4% +28.6% 0.25 ± 6% perf-sched.wait_time.avg.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open 0.21 ± 2% +24.4% 0.26 ± 7% perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk.path_lookupat 0.19 +16.6% 0.22 ± 2% perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk.path_openat 0.19 ± 5% +22.7% 0.24 ± 8% perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.open_last_lookups.path_openat 0.21 ± 6% +19.4% 0.25 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_lookupat.filename_lookup 0.17 +22.6% 0.21 ± 3% perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open 0.03 ± 33% +85.9% 0.06 ± 29% perf-sched.wait_time.avg.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve 0.13 ± 15% +82.8% 0.24 ± 32% perf-sched.wait_time.avg.ms.__cond_resched.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault 13.84 ± 15% +37.4% 19.02 ± 17% perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write 0.20 ± 5% +23.4% 0.25 ± 10% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc.d_alloc_parallel 0.17 ± 5% +20.6% 0.20 ± 5% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region 0.21 ± 3% +15.2% 0.24 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas 0.01 ± 86% +437.8% 0.07 ± 86% perf-sched.wait_time.avg.ms.__cond_resched.move_page_tables.relocate_vma_down.setup_arg_pages.load_elf_binary 0.20 ± 3% +35.3% 0.27 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.remove_vma.vms_complete_munmap_vmas.__mmap_region.do_mmap 6.30 +17.3% 7.40 perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.05 +50.9% 0.07 perf-sched.wait_time.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part 0.20 ± 2% +23.0% 0.24 ± 6% perf-sched.wait_time.avg.ms.__cond_resched.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat 0.17 ± 3% +20.3% 0.20 ± 6% perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0 0.58 +16.7% 0.68 perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64 0.21 +21.0% 0.25 ± 2% perf-sched.wait_time.avg.ms.d_alloc_parallel.__lookup_slow.walk_component.link_path_walk 0.02 ± 2% +191.4% 0.05 perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call 5.93 +11.0% 6.59 perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.03 ± 24% +58.0% 0.05 ± 17% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault 0.10 +41.0% 0.14 perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault 0.20 ± 6% +21.9% 0.24 ± 7% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown] 0.34 ± 14% +23.2% 0.41 ± 10% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown] 19.93 ± 4% +12.3% 22.38 perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll 0.21 +28.2% 0.27 perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.open_last_lookups 0.20 +23.3% 0.25 perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.walk_component 0.20 +20.5% 0.24 perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat 0.49 ± 11% +90.1% 0.92 ± 22% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.dup_mmap 0.13 +26.6% 0.16 perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups 0.02 ± 7% -18.2% 0.01 ± 11% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.stop_one_cpu.sched_exec 0.53 ± 2% +28.5% 0.68 perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 0.20 ± 3% +21.7% 0.25 ± 3% perf-sched.wait_time.avg.ms.sigsuspend.__x64_sys_rt_sigsuspend.do_syscall_64.entry_SYSCALL_64_after_hwframe 7.31 +13.4% 8.29 perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 21.31 ± 55% +61.9% 34.52 ± 7% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pgd_alloc 0.49 ± 33% +124.6% 1.11 ± 42% perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.__vmf_anon_prepare.do_anonymous_page.__handle_mm_fault 0.39 ± 5% +194.0% 1.15 ± 31% perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.__vmf_anon_prepare.do_pte_missing.__handle_mm_fault 0.49 ± 14% +104.7% 1.00 ± 39% perf-sched.wait_time.max.ms.__cond_resched.__dentry_kill.dput.step_into.link_path_walk 0.65 ± 50% +126.5% 1.47 ± 31% perf-sched.wait_time.max.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.part 0.68 ± 22% +98.3% 1.36 ± 32% perf-sched.wait_time.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range 34.34 ± 5% +14.7% 39.40 ± 3% perf-sched.wait_time.max.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap 0.57 ± 12% +95.3% 1.11 ± 16% perf-sched.wait_time.max.ms.__cond_resched.copy_strings.isra.0.do_execveat_common 0.42 ± 20% +112.7% 0.89 ± 25% perf-sched.wait_time.max.ms.__cond_resched.down_read.open_last_lookups.path_openat.do_filp_open 0.74 ± 17% +136.4% 1.76 ± 27% perf-sched.wait_time.max.ms.__cond_resched.down_read.walk_component.link_path_walk.path_lookupat 30.86 +10.2% 34.02 ± 3% perf-sched.wait_time.max.ms.__cond_resched.down_write.unlink_anon_vmas.free_pgtables.exit_mmap 0.30 ± 16% +144.0% 0.72 ± 43% perf-sched.wait_time.max.ms.__cond_resched.down_write.unlink_file_vma_batch_process.free_pgtables.vms_clear_ptes 0.66 ± 39% +116.2% 1.42 ± 44% perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_prepare.__split_vma.vma_modify 0.38 ± 20% +165.9% 1.01 ± 44% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.elf_load.load_elf_interp 1.03 ± 36% +68.8% 1.75 ± 17% perf-sched.wait_time.max.ms.__cond_resched.dput.d_alloc_parallel.__lookup_slow.walk_component 0.81 ± 13% +94.0% 1.56 ± 13% perf-sched.wait_time.max.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open 0.73 ± 23% +99.7% 1.46 ± 8% perf-sched.wait_time.max.ms.__cond_resched.dput.step_into.link_path_walk.path_lookupat 0.47 ± 30% +116.2% 1.02 ± 30% perf-sched.wait_time.max.ms.__cond_resched.dput.step_into.open_last_lookups.path_openat 28.29 ± 4% +14.2% 32.30 ± 3% perf-sched.wait_time.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit 0.40 ± 10% +120.5% 0.88 ± 36% perf-sched.wait_time.max.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve 0.34 ± 33% +235.3% 1.15 ± 45% perf-sched.wait_time.max.ms.__cond_resched.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault 0.84 ± 28% +62.4% 1.36 ± 26% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open 22.53 ± 49% +59.1% 35.84 ± 6% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.dup_fd.copy_process.kernel_clone 0.41 ± 32% +143.3% 1.00 ± 31% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma 0.68 ± 19% +54.1% 1.05 ± 18% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge 1.07 ± 22% +62.2% 1.74 ± 29% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region 0.28 ± 34% +115.8% 0.61 ± 45% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.do_brk_flags.__do_sys_brk 0.30 ± 16% +47.3% 0.45 ± 19% perf-sched.wait_time.max.ms.__cond_resched.mnt_want_write.do_unlinkat.__x64_sys_unlinkat.do_syscall_64 0.39 ± 11% +224.5% 1.27 ± 47% perf-sched.wait_time.max.ms.__cond_resched.remove_vma.vms_complete_munmap_vmas.__mmap_region.do_mmap 1.62 ± 21% +48.0% 2.39 ± 11% perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part 0.77 ± 8% +108.5% 1.60 ± 17% perf-sched.wait_time.max.ms.__cond_resched.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat 1.12 ± 36% +66.4% 1.86 ± 19% perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0 29.76 ± 8% +14.5% 34.08 ± 6% perf-sched.wait_time.max.ms.__cond_resched.uprobe_start_dup_mmap.dup_mm.constprop.0 26.81 ± 9% +18.7% 31.82 ± 5% perf-sched.wait_time.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range 30.95 ± 2% +14.1% 35.32 ± 3% perf-sched.wait_time.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0 0.74 ± 23% +101.0% 1.49 ± 12% perf-sched.wait_time.max.ms.d_alloc_parallel.__lookup_slow.walk_component.link_path_walk 4.42 ± 19% +393.4% 21.82 ± 11% perf-sched.wait_time.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call 29.59 ± 3% +13.2% 33.51 ± 4% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown] 0.49 ± 43% -45.9% 0.27 ± 57% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown] 33.37 ± 4% +18.7% 39.60 ± 7% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.dup_mmap 0.05 ± 28% -34.7% 0.03 ± 27% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.stop_one_cpu.sched_exec 28.26 ± 4% +22.9% 34.73 ± 3% perf-sched.wait_time.max.ms.sigsuspend.__x64_sys_rt_sigsuspend.do_syscall_64.entry_SYSCALL_64_after_hwframe *************************************************************************************************** lkp-icl-2sp9: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/300s/lkp-icl-2sp9/shell16/unixbench commit: 5bc34be478 ("sched/core: Reorganize cgroup bandwidth control interface file writes") ac34cb39e8 ("sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails") 5bc34be478d09c4d ac34cb39e8aea9915ec2f4e08c9 ---------------- --------------------------- %stddev %change %stddev \ | \ 2730104 -10.6% 2439597 meminfo.Shmem 16614 +45.5% 24179 uptime.idle 1.463e+10 +51.2% 2.213e+10 cpuidle..time 1.678e+08 ± 3% -21.0% 1.325e+08 ± 2% cpuidle..usage 3594064 ± 35% -41.5% 2101847 ± 76% numa-meminfo.node0.FilePages 1177451 ± 12% -21.5% 924840 ± 11% numa-meminfo.node0.Shmem 5.265e+08 -20.5% 4.187e+08 numa-numastat.node0.local_node 5.265e+08 -20.5% 4.187e+08 numa-numastat.node0.numa_hit 5.234e+08 -20.4% 4.167e+08 numa-numastat.node1.local_node 5.235e+08 -20.4% 4.168e+08 numa-numastat.node1.numa_hit 36.44 +50.4% 54.81 vmstat.cpu.id 53.69 ± 3% +15.4% 61.93 ± 3% vmstat.procs.r 611600 ± 2% -36.8% 386273 vmstat.system.cs 494859 ± 2% -37.8% 307602 vmstat.system.in 9369 -15.3% 7939 perf-c2c.DRAM.local 29442 -33.9% 19455 perf-c2c.DRAM.remote 22092 -34.9% 14380 perf-c2c.HITM.local 15664 -34.4% 10278 perf-c2c.HITM.remote 37756 -34.7% 24658 perf-c2c.HITM.total 898526 ± 35% -41.5% 525463 ± 76% numa-vmstat.node0.nr_file_pages 294369 ± 12% -21.5% 231222 ± 11% numa-vmstat.node0.nr_shmem 5.265e+08 -20.5% 4.187e+08 numa-vmstat.node0.numa_hit 5.265e+08 -20.5% 4.187e+08 numa-vmstat.node0.numa_local 5.235e+08 -20.4% 4.168e+08 numa-vmstat.node1.numa_hit 5.234e+08 -20.4% 4.167e+08 numa-vmstat.node1.numa_local 36.28 +18.4 54.67 mpstat.cpu.all.idle% 0.00 ± 19% +0.0 0.01 ± 4% mpstat.cpu.all.iowait% 1.24 ± 2% -0.4 0.80 mpstat.cpu.all.irq% 0.65 -0.2 0.49 mpstat.cpu.all.soft% 52.22 -15.0 37.26 ± 2% mpstat.cpu.all.sys% 9.62 -2.8 6.77 mpstat.cpu.all.usr% 70.32 -25.8% 52.19 mpstat.max_utilization_pct 10028 -20.3% 7993 unixbench.throughput 12013870 -17.7% 9885824 ± 2% unixbench.time.involuntary_context_switches 620003 -21.1% 489302 ± 2% unixbench.time.major_page_faults 1.242e+09 -20.4% 9.891e+08 unixbench.time.minor_page_faults 3788 -25.9% 2806 unixbench.time.percent_of_cpu_this_job_got 18741 -26.6% 13750 ± 2% unixbench.time.system_time 5225 -23.2% 4014 unixbench.time.user_time 2.227e+08 -42.0% 1.292e+08 unixbench.time.voluntary_context_switches 6341093 -20.2% 5058834 unixbench.workload 902694 -8.2% 828505 proc-vmstat.nr_active_anon 1593764 -4.6% 1520712 proc-vmstat.nr_file_pages 49655 -3.6% 47878 proc-vmstat.nr_kernel_stack 31705 ± 2% -8.6% 28967 proc-vmstat.nr_mapped 36671 -5.2% 34746 proc-vmstat.nr_page_table_pages 682470 -10.6% 609854 proc-vmstat.nr_shmem 28442 -1.2% 28094 proc-vmstat.nr_slab_reclaimable 57057 -3.2% 55224 proc-vmstat.nr_slab_unreclaimable 902694 -8.2% 828505 proc-vmstat.nr_zone_active_anon 12021 ± 90% +166.4% 32030 ± 23% proc-vmstat.numa_hint_faults_local 1.05e+09 -20.4% 8.355e+08 proc-vmstat.numa_hit 1.05e+09 -20.4% 8.354e+08 proc-vmstat.numa_local 68540 -1.2% 67690 proc-vmstat.numa_other 1.09e+09 -20.8% 8.639e+08 proc-vmstat.pgalloc_normal 1.252e+09 -20.2% 9.994e+08 proc-vmstat.pgfault 1.089e+09 -20.8% 8.631e+08 proc-vmstat.pgfree 62037106 -20.4% 49362493 proc-vmstat.pgreuse 53371 -20.2% 42580 proc-vmstat.thp_fault_alloc 22516441 -20.5% 17901785 proc-vmstat.unevictable_pgs_culled 3.59 -17.9% 2.95 perf-stat.i.MPKI 1.623e+10 -24.9% 1.219e+10 perf-stat.i.branch-instructions 1.57 -0.2 1.40 perf-stat.i.branch-miss-rate% 2.498e+08 -33.0% 1.673e+08 perf-stat.i.branch-misses 2.892e+08 -38.7% 1.774e+08 perf-stat.i.cache-misses 1.186e+09 -39.0% 7.234e+08 perf-stat.i.cache-references 613708 ± 2% -36.8% 387765 perf-stat.i.context-switches 1.90 -4.6% 1.81 perf-stat.i.cpi 1.523e+11 -28.3% 1.091e+11 perf-stat.i.cpu-cycles 222994 -63.6% 81240 ± 2% perf-stat.i.cpu-migrations 535.59 +15.9% 620.57 perf-stat.i.cycles-between-cache-misses 7.918e+10 -24.8% 5.951e+10 perf-stat.i.instructions 0.54 +4.6% 0.56 perf-stat.i.ipc 985.60 -21.0% 778.65 ± 2% perf-stat.i.major-faults 73.54 -24.5% 55.54 perf-stat.i.metric.K/sec 1935828 -20.3% 1543350 perf-stat.i.minor-faults 1936814 -20.3% 1544128 perf-stat.i.page-faults 3.65 -18.4% 2.98 perf-stat.overall.MPKI 1.54 -0.2 1.37 perf-stat.overall.branch-miss-rate% 1.92 -4.7% 1.83 perf-stat.overall.cpi 526.44 +16.8% 614.96 perf-stat.overall.cycles-between-cache-misses 0.52 +4.9% 0.55 perf-stat.overall.ipc 7901309 -5.7% 7454189 perf-stat.overall.path-length 1.62e+10 -24.9% 1.217e+10 perf-stat.ps.branch-instructions 2.494e+08 -33.0% 1.67e+08 perf-stat.ps.branch-misses 2.888e+08 -38.7% 1.771e+08 perf-stat.ps.cache-misses 1.185e+09 -39.0% 7.223e+08 perf-stat.ps.cache-references 612790 ± 2% -36.8% 387169 perf-stat.ps.context-switches 1.52e+11 -28.3% 1.089e+11 perf-stat.ps.cpu-cycles 222664 -63.6% 81128 ± 2% perf-stat.ps.cpu-migrations 7.906e+10 -24.8% 5.942e+10 perf-stat.ps.instructions 984.18 -21.0% 777.37 ± 2% perf-stat.ps.major-faults 1932667 -20.3% 1540796 perf-stat.ps.minor-faults 1933651 -20.3% 1541574 perf-stat.ps.page-faults 5.01e+13 -24.7% 3.771e+13 perf-stat.total.instructions 13122731 -28.7% 9355433 sched_debug.cfs_rq:/.avg_vruntime.avg 15409189 ± 2% -25.8% 11433341 sched_debug.cfs_rq:/.avg_vruntime.max 12592550 -29.2% 8913985 sched_debug.cfs_rq:/.avg_vruntime.min 5.14 ± 16% +54.0% 7.91 ± 15% sched_debug.cfs_rq:/.h_nr_queued.max 0.94 ± 9% +56.5% 1.46 ± 11% sched_debug.cfs_rq:/.h_nr_queued.stddev 4.58 ± 17% +64.2% 7.52 ± 17% sched_debug.cfs_rq:/.h_nr_runnable.max 0.83 ± 8% +62.3% 1.35 ± 12% sched_debug.cfs_rq:/.h_nr_runnable.stddev 21218 ± 18% -35.7% 13650 ± 10% sched_debug.cfs_rq:/.load.avg 627984 ± 25% -42.7% 359859 ± 13% sched_debug.cfs_rq:/.load.max 16.98 ± 2% -18.1% 13.91 ± 3% sched_debug.cfs_rq:/.load_avg.min 13122731 -28.7% 9355433 sched_debug.cfs_rq:/.min_vruntime.avg 15409189 ± 2% -25.8% 11433341 sched_debug.cfs_rq:/.min_vruntime.max 12592550 -29.2% 8913985 sched_debug.cfs_rq:/.min_vruntime.min 0.60 ± 2% -26.9% 0.44 ± 6% sched_debug.cfs_rq:/.nr_queued.avg 879.57 +13.7% 999.66 ± 2% sched_debug.cfs_rq:/.runnable_avg.avg 1573 ± 2% +35.8% 2136 ± 8% sched_debug.cfs_rq:/.runnable_avg.max 255.02 ± 3% +18.1% 301.21 ± 5% sched_debug.cfs_rq:/.runnable_avg.stddev 243.04 ± 3% -14.0% 209.05 sched_debug.cfs_rq:/.util_avg.stddev 29.82 ± 7% +32.6% 39.52 ± 13% sched_debug.cfs_rq:/.util_est.avg 262.67 ± 8% +72.5% 452.98 ± 16% sched_debug.cfs_rq:/.util_est.max 51.96 ± 7% +61.0% 83.65 ± 11% sched_debug.cfs_rq:/.util_est.stddev 170679 ± 2% +88.6% 321939 ± 2% sched_debug.cpu.avg_idle.avg 375875 ± 5% +110.8% 792298 ± 3% sched_debug.cpu.avg_idle.max 27824 ± 14% +111.3% 58800 ± 5% sched_debug.cpu.avg_idle.min 70934 ± 5% +115.4% 152820 ± 2% sched_debug.cpu.avg_idle.stddev 1225277 ± 2% -34.5% 802812 ± 9% sched_debug.cpu.curr->pid.avg 500271 +171.7% 1359127 sched_debug.cpu.max_idle_balance_cost.avg 513437 +190.7% 1492784 sched_debug.cpu.max_idle_balance_cost.max 1752 ± 41% +7584.0% 134642 ± 47% sched_debug.cpu.max_idle_balance_cost.stddev 5.64 ± 14% +49.5% 8.42 ± 19% sched_debug.cpu.nr_running.max 0.99 ± 9% +53.5% 1.53 ± 13% sched_debug.cpu.nr_running.stddev 2889232 ± 2% -36.7% 1829756 sched_debug.cpu.nr_switches.avg 2943872 ± 2% -36.8% 1859463 sched_debug.cpu.nr_switches.max 2793423 ± 2% -36.2% 1781992 sched_debug.cpu.nr_switches.min 28235 ± 7% -51.8% 13616 ± 6% sched_debug.cpu.nr_switches.stddev 50800 ± 18% -88.4% 5891 ± 15% sched_debug.cpu.nr_uninterruptible.max -96771 -93.5% -6330 sched_debug.cpu.nr_uninterruptible.min 32796 ± 12% -92.2% 2565 ± 12% sched_debug.cpu.nr_uninterruptible.stddev 0.01 ± 14% +211.1% 0.02 ± 23% perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio 0.01 ± 9% +47.3% 0.01 ± 11% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region 0.01 ± 23% +104.8% 0.02 ± 33% perf-sched.sch_delay.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap 0.01 ± 13% +71.2% 0.02 ± 13% perf-sched.sch_delay.avg.ms.__cond_resched.do_close_on_exec.begin_new_exec.load_elf_binary.exec_binprm 0.01 ± 9% +67.4% 0.01 ± 20% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat 0.01 ± 4% +588.7% 0.06 ±166% perf-sched.sch_delay.avg.ms.__cond_resched.down_write.unlink_anon_vmas.free_pgtables.exit_mmap 0.01 ± 14% +107.7% 0.01 ± 28% perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_prepare.__split_vma.vma_modify 0.01 ± 21% +64.1% 0.02 ± 22% perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.setup_arg_pages.load_elf_binary.exec_binprm 0.01 ± 19% +91.8% 0.02 ± 16% perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.elf_load.load_elf_interp 0.01 ± 57% -61.3% 0.00 ± 22% perf-sched.sch_delay.avg.ms.__cond_resched.dput.simple_unlink.vfs_unlink.do_unlinkat 0.01 ± 6% +56.8% 0.01 ± 13% perf-sched.sch_delay.avg.ms.__cond_resched.dput.step_into.link_path_walk.path_openat 0.01 ± 10% +124.4% 0.02 ± 42% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0 0.01 ± 16% +82.2% 0.01 ± 20% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region 0.01 ± 4% +32.7% 0.01 ± 8% perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.01 ± 8% +84.0% 0.02 ± 3% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part 0.01 ± 5% +50.0% 0.01 ± 15% perf-sched.sch_delay.avg.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0 0.01 ± 26% +155.8% 0.02 ± 18% perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault 0.01 ± 8% +108.8% 0.03 ± 34% perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault 0.01 ± 7% +84.2% 0.01 ± 10% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown] 0.01 ± 22% +381.2% 0.04 ± 90% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown] 0.01 +109.5% 0.01 ± 10% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown] 0.01 ± 21% +38.2% 0.01 ± 20% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown] 0.02 ± 18% +38.5% 0.03 ± 5% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.open_last_lookups 0.00 ± 21% +78.9% 0.01 ± 26% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.__mmap_new_vma 0.01 ± 18% +65.1% 0.02 ± 17% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups 0.01 ± 4% -12.9% 0.01 perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.03 ± 3% -19.5% 0.03 ± 2% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 0.14 ± 36% +69.3% 0.24 ± 34% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pmd_alloc 0.14 ± 68% +754.0% 1.18 ± 34% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio 0.17 ± 46% +350.2% 0.76 ± 51% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.wp_page_copy 0.16 ± 31% +176.8% 0.46 ± 47% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region 0.09 ± 37% +296.4% 0.37 ± 46% perf-sched.sch_delay.max.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap 0.06 ± 32% +811.7% 0.56 ± 57% perf-sched.sch_delay.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range 0.07 ± 24% +137.0% 0.16 ± 32% perf-sched.sch_delay.max.ms.__cond_resched.do_close_on_exec.begin_new_exec.load_elf_binary.exec_binprm 0.13 ± 59% +817.1% 1.24 ± 56% perf-sched.sch_delay.max.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat 0.13 ± 13% +9995.6% 12.72 ±212% perf-sched.sch_delay.max.ms.__cond_resched.down_write.unlink_anon_vmas.free_pgtables.exit_mmap 0.07 ± 40% +669.3% 0.50 ± 49% perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_prepare.__split_vma.vma_modify 0.07 ± 67% +176.5% 0.21 ± 35% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm 0.10 ± 26% +125.2% 0.23 ± 26% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.elf_load.load_elf_interp 0.18 ± 41% +4953.1% 8.97 ±146% perf-sched.sch_delay.max.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open 0.23 ± 52% +333.0% 0.99 ± 41% perf-sched.sch_delay.max.ms.__cond_resched.dput.step_into.link_path_walk.path_openat 0.10 ± 25% +108.5% 0.22 ± 30% perf-sched.sch_delay.max.ms.__cond_resched.exit_mmap.__mmput.exec_mmap.begin_new_exec 0.06 ± 53% +914.3% 0.58 ± 47% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0 0.04 ± 30% +476.7% 0.24 ±129% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge 0.07 ± 49% +182.2% 0.18 ± 41% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_shrink 0.10 ± 37% +480.3% 0.57 ± 49% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region 0.09 ± 53% +130.5% 0.20 ± 21% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc._install_special_mapping.map_vdso 0.18 ± 38% +189.6% 0.52 ± 32% perf-sched.sch_delay.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0 0.07 ± 46% +290.1% 0.28 ± 45% perf-sched.sch_delay.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range 0.03 ±111% +1015.2% 0.37 ±101% perf-sched.sch_delay.max.ms.d_alloc_parallel.__lookup_slow.walk_component.path_lookupat 0.08 ± 41% +249.3% 0.26 ± 28% perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault 0.17 ± 42% +346.9% 0.77 ± 27% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown] 0.26 ± 30% +490.9% 1.54 ± 18% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown] 0.08 ± 48% +638.6% 0.61 ± 67% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown] 0.06 ± 74% +523.0% 0.38 ± 64% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown] 0.16 ± 55% +289.8% 0.63 ± 82% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll 0.24 ± 68% +124.3% 0.53 ± 27% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.__mmap_new_vma 1962 ± 9% +52.1% 2986 ± 26% perf-sched.total_wait_and_delay.max.ms 1962 ± 9% +52.1% 2986 ± 26% perf-sched.total_wait_time.max.ms 6.10 ± 3% +16.5% 7.11 ± 3% perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.02 ± 2% +172.7% 0.06 perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call 2.86 ± 3% +16.5% 3.33 ± 4% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat 457.67 ± 5% -13.3% 397.00 ± 4% perf-sched.wait_and_delay.count.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput 31.00 ± 12% -36.6% 19.67 ± 19% perf-sched.wait_and_delay.count.__cond_resched.pidfs_exit.release_task.wait_task_zombie.__do_wait 2114 ± 5% +18.8% 2512 ± 5% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 1443 ± 2% -11.9% 1272 ± 2% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown] 73.43 ± 13% -49.8% 36.86 ± 72% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_pseudo.alloc_file_pseudo 8.47 ± 9% +34.4% 11.39 ± 8% perf-sched.wait_and_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part 5.00 ± 13% +512.7% 30.62 ± 20% perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call 1962 ± 9% +52.1% 2986 ± 26% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 0.08 ±207% +356.6% 0.36 ± 70% perf-sched.wait_time.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault 3.33 ± 6% +10.2% 3.67 ± 7% perf-sched.wait_time.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range 3.90 ± 6% +9.7% 4.27 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.open_last_lookups.path_openat 6.09 ± 3% +16.5% 7.10 ± 3% perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 3.12 ± 3% +16.5% 3.63 ± 6% perf-sched.wait_time.avg.ms.__cond_resched.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat 0.02 ± 2% +172.7% 0.06 perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call 1.47 ± 3% +13.9% 1.67 perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault 2.94 ± 19% +53.8% 4.52 ± 9% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown] 2.84 ± 3% +16.1% 3.30 ± 3% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat 6.50 ± 5% +25.4% 8.15 ± 7% perf-sched.wait_time.max.ms.__cond_resched.down_read.walk_component.link_path_walk.path_lookupat 6.15 ± 8% +30.1% 8.00 ± 18% perf-sched.wait_time.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap 7.61 ± 6% +106.6% 15.72 ± 87% perf-sched.wait_time.max.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open 5.59 ± 13% +38.8% 7.75 ± 10% perf-sched.wait_time.max.ms.__cond_resched.dput.step_into.open_last_lookups.path_openat 73.42 ± 13% -48.7% 37.66 ± 68% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_pseudo.alloc_file_pseudo 6.55 ± 5% +14.5% 7.50 ± 7% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open 8.46 ± 9% +34.3% 11.36 ± 8% perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part 5.00 ± 13% +512.7% 30.62 ± 20% perf-sched.wait_time.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call 18.28 ±130% +313.9% 75.66 ± 16% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown] 1962 ± 9% +52.1% 2986 ± 26% perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm *************************************************************************************************** lkp-icl-2sp9: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/300s/lkp-icl-2sp9/shell8/unixbench commit: 5bc34be478 ("sched/core: Reorganize cgroup bandwidth control interface file writes") ac34cb39e8 ("sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails") 5bc34be478d09c4d ac34cb39e8aea9915ec2f4e08c9 ---------------- --------------------------- %stddev %change %stddev \ | \ 22.83 ± 60% +140.9% 55.00 ± 56% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 3708 +24.7% 4625 ± 2% uptime.idle 1.461e+09 ± 2% +56.0% 2.279e+09 ± 2% cpuidle..time 19339144 ± 3% -26.4% 14234823 ± 3% cpuidle..usage 36.38 +53.9% 56.00 vmstat.cpu.id 631990 ± 2% -40.3% 377172 ± 2% vmstat.system.cs 507992 ± 2% -40.5% 302370 vmstat.system.in 54790263 -26.6% 40229835 numa-numastat.node0.local_node 54813627 -26.6% 40257329 numa-numastat.node0.numa_hit 54280109 -25.6% 40400612 numa-numastat.node1.local_node 54337234 -25.6% 40440402 numa-numastat.node1.numa_hit 54813360 -26.6% 40257724 numa-vmstat.node0.numa_hit 54789997 -26.6% 40230231 numa-vmstat.node0.numa_local 54337173 -25.6% 40439842 numa-vmstat.node1.numa_hit 54280054 -25.6% 40400056 numa-vmstat.node1.numa_local 34.50 +20.2 54.74 mpstat.cpu.all.idle% 0.00 ± 43% +0.0 0.00 ± 20% mpstat.cpu.all.iowait% 1.30 ± 2% -0.5 0.81 mpstat.cpu.all.irq% 0.62 -0.2 0.46 mpstat.cpu.all.soft% 53.55 -16.2 37.32 ± 2% mpstat.cpu.all.sys% 10.03 -3.4 6.67 mpstat.cpu.all.usr% 72.84 -28.5% 52.04 ± 3% mpstat.max_utilization_pct 24992 -4.7% 23818 proc-vmstat.nr_page_table_pages 50622 -2.0% 49590 proc-vmstat.nr_slab_unreclaimable 1.092e+08 -26.1% 80699262 proc-vmstat.numa_hit 1.091e+08 -26.1% 80631976 proc-vmstat.numa_local 1.13e+08 -26.3% 83331586 proc-vmstat.pgalloc_normal 1.302e+08 -25.8% 96571708 proc-vmstat.pgfault 1.129e+08 -26.3% 83193858 proc-vmstat.pgfree 6371181 -26.1% 4708130 proc-vmstat.pgreuse 5521 ± 2% -25.3% 4122 proc-vmstat.thp_fault_alloc 2323235 -26.1% 1716008 proc-vmstat.unevictable_pgs_culled 34426 -26.2% 25392 unixbench.score 20655 -26.2% 15235 unixbench.throughput 1172933 -26.4% 863436 ± 4% unixbench.time.involuntary_context_switches 68518 -25.7% 50886 ± 2% unixbench.time.major_page_faults 1.295e+08 -26.0% 95855227 unixbench.time.minor_page_faults 3974 -28.2% 2852 ± 2% unixbench.time.percent_of_cpu_this_job_got 1994 -28.5% 1426 ± 3% unixbench.time.system_time 538.91 -29.3% 381.20 unixbench.time.user_time 23209907 -44.1% 12967611 unixbench.time.voluntary_context_switches 1301317 -26.2% 959835 unixbench.workload 3.20 -15.9% 2.69 perf-stat.i.MPKI 1.691e+10 -28.2% 1.214e+10 perf-stat.i.branch-instructions 1.75 -0.1 1.62 ± 2% perf-stat.i.branch-miss-rate% 2.685e+08 -33.5% 1.786e+08 perf-stat.i.branch-misses 23.70 -0.4 23.28 perf-stat.i.cache-miss-rate% 2.751e+08 -39.9% 1.654e+08 perf-stat.i.cache-misses 1.154e+09 -38.8% 7.061e+08 perf-stat.i.cache-references 662789 ± 2% -40.6% 393867 ± 2% perf-stat.i.context-switches 1.86 -2.2% 1.82 perf-stat.i.cpi 1.585e+11 -30.1% 1.108e+11 ± 2% perf-stat.i.cpu-cycles 238322 -63.0% 88291 ± 3% perf-stat.i.cpu-migrations 625.36 +15.2% 720.11 ± 2% perf-stat.i.cycles-between-cache-misses 8.25e+10 -28.2% 5.924e+10 perf-stat.i.instructions 1074 -25.5% 800.25 ± 2% perf-stat.i.major-faults 76.11 -29.7% 53.52 perf-stat.i.metric.K/sec 1989387 -25.8% 1476846 perf-stat.i.minor-faults 1990462 -25.8% 1477646 perf-stat.i.page-faults 479.50 ± 44% +39.7% 669.77 perf-stat.overall.cycles-between-cache-misses 0.43 ± 44% +23.1% 0.53 perf-stat.overall.ipc 0.47 ±154% +1.2 1.65 ± 35% perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 1.20 ± 83% +2.1 3.26 ± 41% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 1.34 ± 80% +2.4 3.72 ± 34% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 1.34 ± 80% +2.6 3.92 ± 27% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault 1.34 ± 80% +3.1 4.42 ± 33% perf-profile.calltrace.cycles-pp.asm_exc_page_fault 1.20 ± 83% +3.3 4.50 ± 75% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.05 ±223% +0.7 0.74 ± 75% perf-profile.children.cycles-pp.try_to_wake_up 0.40 ± 98% +0.7 1.14 ± 54% perf-profile.children.cycles-pp.elf_load 0.25 ±101% +0.9 1.11 ± 54% perf-profile.children.cycles-pp.next_uptodate_folio 0.00 +0.9 0.88 ± 98% perf-profile.children.cycles-pp.sched_balance_update_blocked_averages 0.04 ±223% +1.0 1.06 ±109% perf-profile.children.cycles-pp.do_anonymous_page 0.37 ± 91% +1.0 1.41 ± 41% perf-profile.children.cycles-pp.wp_page_copy 0.00 +1.1 1.06 ±110% perf-profile.children.cycles-pp.clockevents_program_event 0.28 ±126% +1.2 1.44 ± 39% perf-profile.children.cycles-pp.ktime_get 0.60 ±110% +1.3 1.91 ± 25% perf-profile.children.cycles-pp.zap_present_ptes 0.36 ±116% +1.3 1.67 ± 81% perf-profile.children.cycles-pp.lookup_fast 0.60 ±111% +1.5 2.07 ± 25% perf-profile.children.cycles-pp.walk_component 0.77 ± 88% +2.2 3.01 ± 33% perf-profile.children.cycles-pp.link_path_walk 1.53 ± 74% +4.2 5.74 ± 58% perf-profile.children.cycles-pp.__handle_mm_fault 1.92 ± 74% +4.5 6.46 ± 40% perf-profile.children.cycles-pp.do_user_addr_fault 1.58 ± 74% +4.6 6.13 ± 52% perf-profile.children.cycles-pp.handle_mm_fault 1.92 ± 74% +4.6 6.51 ± 39% perf-profile.children.cycles-pp.exc_page_fault 1.96 ± 75% +5.3 7.22 ± 39% perf-profile.children.cycles-pp.asm_exc_page_fault 0.04 ±223% +0.6 0.69 ± 48% perf-profile.self.cycles-pp.folio_remove_rmap_ptes 0.31 ±141% +0.7 1.02 ± 25% perf-profile.self.cycles-pp.zap_present_ptes 0.12 ±150% +1.2 1.29 ± 45% perf-profile.self.cycles-pp.ktime_get 0.84 ± 98% +1.3 2.10 ± 17% perf-profile.self.cycles-pp._raw_spin_lock 1458020 -31.4% 999894 ± 2% sched_debug.cfs_rq:/.avg_vruntime.avg 1335243 -31.5% 914846 ± 2% sched_debug.cfs_rq:/.avg_vruntime.min 0.61 ± 23% +49.0% 0.91 ± 34% sched_debug.cfs_rq:/.h_nr_queued.stddev 1428441 ± 6% -44.3% 795691 ± 44% sched_debug.cfs_rq:/.left_deadline.max 198879 ± 18% -38.9% 121555 ± 48% sched_debug.cfs_rq:/.left_deadline.stddev 1428352 ± 6% -44.3% 795639 ± 44% sched_debug.cfs_rq:/.left_vruntime.max 198866 ± 18% -38.9% 121547 ± 48% sched_debug.cfs_rq:/.left_vruntime.stddev 1458020 -31.4% 999894 ± 2% sched_debug.cfs_rq:/.min_vruntime.avg 1335243 -31.5% 914846 ± 2% sched_debug.cfs_rq:/.min_vruntime.min 1428352 ± 6% -44.3% 795639 ± 44% sched_debug.cfs_rq:/.right_vruntime.max 198866 ± 18% -38.9% 121547 ± 48% sched_debug.cfs_rq:/.right_vruntime.stddev 142.08 ± 23% +42.8% 202.83 ± 14% sched_debug.cfs_rq:/.runnable_avg.min 147.33 ± 21% +41.8% 208.92 ± 6% sched_debug.cfs_rq:/.util_avg.min 322.73 ± 9% -17.1% 267.44 ± 3% sched_debug.cfs_rq:/.util_avg.stddev 388303 ± 2% +51.7% 589066 ± 4% sched_debug.cpu.avg_idle.avg 661789 ± 11% +152.6% 1672006 ± 3% sched_debug.cpu.avg_idle.max 151173 ± 6% +114.9% 324892 ± 8% sched_debug.cpu.avg_idle.stddev 392290 ± 10% -44.0% 219613 ± 21% sched_debug.cpu.curr->pid.avg 631181 -26.0% 467219 sched_debug.cpu.curr->pid.max 302713 ± 2% -24.8% 227699 ± 2% sched_debug.cpu.curr->pid.stddev 501647 +103.6% 1021336 ± 5% sched_debug.cpu.max_idle_balance_cost.avg 575980 ± 10% +158.5% 1488876 sched_debug.cpu.max_idle_balance_cost.max 10437 ± 76% +3252.4% 349893 ± 5% sched_debug.cpu.max_idle_balance_cost.stddev 310053 ± 2% -40.3% 185018 sched_debug.cpu.nr_switches.avg 336326 -40.1% 201480 ± 4% sched_debug.cpu.nr_switches.max 284218 ± 2% -39.7% 171324 ± 2% sched_debug.cpu.nr_switches.min 11002 ± 8% -58.5% 4569 ± 14% sched_debug.cpu.nr_switches.stddev 28877 ± 23% -97.4% 755.50 ± 20% sched_debug.cpu.nr_uninterruptible.max -36786 -98.4% -578.42 sched_debug.cpu.nr_uninterruptible.min 17077 ± 10% -98.4% 268.08 ± 10% sched_debug.cpu.nr_uninterruptible.stddev Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
On 7/15/2025 3:08 PM, kernel test robot wrote: > > > Hello, > > kernel test robot noticed a 22.9% regression of unixbench.throughput on: > > > commit: ac34cb39e8aea9915ec2f4e08c979eb2ed1d7561 ("[PATCH v2] sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails") > url: https://github.com/intel-lab-lkp/linux/commits/Chris-Mason/sched-fair-bump-sd-max_newidle_lb_cost-when-newidle-balance-fails/20250626-224805 > base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 5bc34be478d09c4d16009e665e020ad0fcd0deea > patch link: https://lore.kernel.org/all/20250626144017.1510594-2-clm@fb.com/ > patch subject: [PATCH v2] sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails > > testcase: unixbench > config: x86_64-rhel-9.4 > compiler: gcc-12 > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory > parameters: > > runtime: 300s > nr_task: 100% > test: shell1 > cpufreq_governor: performance > > ... > > commit: > 5bc34be478 ("sched/core: Reorganize cgroup bandwidth control interface file writes") > ac34cb39e8 ("sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails") > > 5bc34be478d09c4d ac34cb39e8aea9915ec2f4e08c9 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ ... > 40.37 +16.9 57.24 mpstat.cpu.all.idle% This commit inhibits the newidle balance. It seems that some workloads do not like newlyidle balance, like schbench, which is short duration task. While other workloads want the newidle balance to pull at its best effort, like unixbench shell test case. Just wonder if we can check the sched domain's average utilization to decide how hard we should trigger the newly idle balance, or can we check the overutilized flag to decide whether we should launch the new idle balance, something I was thinking of: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9e24038fa000..6c7420ed484e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -13759,7 +13759,8 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf) sd = rcu_dereference_check_sched_domain(this_rq->sd); if (!get_rd_overloaded(this_rq->rd) || - (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { + (sd && this_rq->avg_idle < sd->max_newidle_lb_cost && + !READ_ONCE(this_rq->rd->overutilized))) { if (sd) update_next_balance(sd, &next_balance); thanks, Chenyu
On Tue, Jul 15, 2025 at 06:08:43PM +0800, Chen, Yu C wrote: > On 7/15/2025 3:08 PM, kernel test robot wrote: > > > > > > Hello, > > > > kernel test robot noticed a 22.9% regression of unixbench.throughput on: > > > > > > commit: ac34cb39e8aea9915ec2f4e08c979eb2ed1d7561 ("[PATCH v2] sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails") > > url: https://github.com/intel-lab-lkp/linux/commits/Chris-Mason/sched-fair-bump-sd-max_newidle_lb_cost-when-newidle-balance-fails/20250626-224805 > > base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 5bc34be478d09c4d16009e665e020ad0fcd0deea > > patch link: https://lore.kernel.org/all/20250626144017.1510594-2-clm@fb.com/ > > patch subject: [PATCH v2] sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails > > > > testcase: unixbench > > config: x86_64-rhel-9.4 > > compiler: gcc-12 > > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory > > parameters: > > > > runtime: 300s > > nr_task: 100% > > test: shell1 > > cpufreq_governor: performance > > > > > ... > > > > > commit: > > 5bc34be478 ("sched/core: Reorganize cgroup bandwidth control interface file writes") > > ac34cb39e8 ("sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails") > > > > 5bc34be478d09c4d ac34cb39e8aea9915ec2f4e08c9 > > ---------------- --------------------------- > > %stddev %change %stddev > > \ | \ > ... > > > 40.37 +16.9 57.24 mpstat.cpu.all.idle% > > This commit inhibits the newidle balance. When not successful. So when newidle balance is not succeeding to pull tasks, it is backing off and doing less of it. > It seems that some workloads > do not like newlyidle balance, like schbench, which is short duration > task. While other workloads want the newidle balance to pull at its best > effort, like unixbench shell test case. > Just wonder if we can check the sched domain's average utilization to > decide how hard we should trigger the newly idle balance, or can we check > the overutilized flag to decide whether we should launch the > new idle balance, something I was thinking of: Looking at the actual util signal might be interesting, but as Chris already noted, overutilized isn't the right thing to look at. Simply taking rq->cfs.avg.util_avg might be more useful. Very high util and failure to pull might indicate new-idle just isn't very important / effective. While low util and failure might mean we should try harder. Other things to look at: - if the sysctl_sched_migration_cost limit isn't artificially limiting actual scanning costs. Eg. very large domains might perhaps have costs that are genuinely larger than that somewhat random number. - if despite the apparent failure to pull, we do already have something to run (eg. wakeups). - if the 3/2 backoff is perhaps too aggressive vs the 1% per second decay.
On 7/16/2025 7:25 PM, Peter Zijlstra wrote: > On Tue, Jul 15, 2025 at 06:08:43PM +0800, Chen, Yu C wrote: >> On 7/15/2025 3:08 PM, kernel test robot wrote: >>> >>> >>> Hello, >>> >>> kernel test robot noticed a 22.9% regression of unixbench.throughput on: >>> >>> >>> commit: ac34cb39e8aea9915ec2f4e08c979eb2ed1d7561 ("[PATCH v2] sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails") >>> url: https://github.com/intel-lab-lkp/linux/commits/Chris-Mason/sched-fair-bump-sd-max_newidle_lb_cost-when-newidle-balance-fails/20250626-224805 >>> base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 5bc34be478d09c4d16009e665e020ad0fcd0deea >>> patch link: https://lore.kernel.org/all/20250626144017.1510594-2-clm@fb.com/ >>> patch subject: [PATCH v2] sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails >>> >>> testcase: unixbench >>> config: x86_64-rhel-9.4 >>> compiler: gcc-12 >>> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory >>> parameters: >>> >>> runtime: 300s >>> nr_task: 100% >>> test: shell1 >>> cpufreq_governor: performance >>> >>> >> ... >> >>> >>> commit: >>> 5bc34be478 ("sched/core: Reorganize cgroup bandwidth control interface file writes") >>> ac34cb39e8 ("sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails") >>> >>> 5bc34be478d09c4d ac34cb39e8aea9915ec2f4e08c9 >>> ---------------- --------------------------- >>> %stddev %change %stddev >>> \ | \ >> ... >> >>> 40.37 +16.9 57.24 mpstat.cpu.all.idle% >> >> This commit inhibits the newidle balance. > > When not successful. So when newidle balance is not succeeding to pull > tasks, it is backing off and doing less of it. > >> It seems that some workloads >> do not like newlyidle balance, like schbench, which is short duration >> task. While other workloads want the newidle balance to pull at its best >> effort, like unixbench shell test case. >> Just wonder if we can check the sched domain's average utilization to >> decide how hard we should trigger the newly idle balance, or can we check >> the overutilized flag to decide whether we should launch the >> new idle balance, something I was thinking of: > > Looking at the actual util signal might be interesting, but as Chris > already noted, overutilized isn't the right thing to look at. Simply > taking rq->cfs.avg.util_avg might be more useful. Very high util and > failure to pull might indicate new-idle just isn't very important / > effective. While low util and failure might mean we should try harder. > > Other things to look at: > > - if the sysctl_sched_migration_cost limit isn't artificially limiting > actual scanning costs. Eg. very large domains might perhaps have > costs that are genuinely larger than that somewhat random number. > > - if despite the apparent failure to pull, we do already have something > to run (eg. wakeups). > > - if the 3/2 backoff is perhaps too aggressive vs the 1% per second > decay. Thanks for the suggestions, let me try to reproduce this issue locally to see what is the proper way to get it addressed. thanks, Chenyu
On 7/15/25 6:08 AM, Chen, Yu C wrote: > On 7/15/2025 3:08 PM, kernel test robot wrote: >> >> >> Hello, >> >> kernel test robot noticed a 22.9% regression of unixbench.throughput on: >> >> >> commit: ac34cb39e8aea9915ec2f4e08c979eb2ed1d7561 ("[PATCH v2] sched/ >> fair: bump sd->max_newidle_lb_cost when newidle balance fails") >> url: https://github.com/intel-lab-lkp/linux/commits/Chris-Mason/sched- >> fair-bump-sd-max_newidle_lb_cost-when-newidle-balance- >> fails/20250626-224805 base: https://git.kernel.org/cgit/linux/kernel/ >> git/tip/tip.git 5bc34be478d09c4d16009e665e020ad0fcd0deea >> patch link: https://lore.kernel.org/all/20250626144017.1510594-2- >> clm@fb.com/ patch subject: [PATCH v2] sched/fair: bump sd- >> >max_newidle_lb_cost when newidle balance fails [ ... ] >> >> commit: >> 5bc34be478 ("sched/core: Reorganize cgroup bandwidth control >> interface file writes") >> ac34cb39e8 ("sched/fair: bump sd->max_newidle_lb_cost when newidle >> balance fails") >> >> 5bc34be478d09c4d ac34cb39e8aea9915ec2f4e08c9 >> ---------------- --------------------------- >> %stddev %change %stddev >> \ | \ > ... > >> 40.37 +16.9 57.24 mpstat.cpu.all.idle% > > This commit inhibits the newidle balance. It seems that some workloads > do not like newlyidle balance, like schbench, which is short duration > task. While other workloads want the newidle balance to pull at its best > effort, like unixbench shell test case. > Just wonder if we can check the sched domain's average utilization to > decide how hard we should trigger the newly idle balance, or can we check > the overutilized flag to decide whether we should launch the > new idle balance, something I was thinking of: Thanks for looking at this. > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 9e24038fa000..6c7420ed484e 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -13759,7 +13759,8 @@ static int sched_balance_newidle(struct rq > *this_rq, struct rq_flags *rf) > sd = rcu_dereference_check_sched_domain(this_rq->sd); > > if (!get_rd_overloaded(this_rq->rd) || > - (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { > + (sd && this_rq->avg_idle < sd->max_newidle_lb_cost && > + !READ_ONCE(this_rq->rd->overutilized))) { > > if (sd) > update_next_balance(sd, &next_balance); > Looking at rd->overutilized, I think we only set it when sched_energy_enabled(). I'm not sure if that's true often enough to use as a fix for hackbench? -chris
On 7/15/2025 11:38 PM, Chris Mason wrote: > On 7/15/25 6:08 AM, Chen, Yu C wrote: >> On 7/15/2025 3:08 PM, kernel test robot wrote: >>> >>> >>> Hello, >>> >>> kernel test robot noticed a 22.9% regression of unixbench.throughput on: >>> >>> >>> commit: ac34cb39e8aea9915ec2f4e08c979eb2ed1d7561 ("[PATCH v2] sched/ >>> fair: bump sd->max_newidle_lb_cost when newidle balance fails") >>> url: https://github.com/intel-lab-lkp/linux/commits/Chris-Mason/sched- >>> fair-bump-sd-max_newidle_lb_cost-when-newidle-balance- >>> fails/20250626-224805 base: https://git.kernel.org/cgit/linux/kernel/ >>> git/tip/tip.git 5bc34be478d09c4d16009e665e020ad0fcd0deea >>> patch link: https://lore.kernel.org/all/20250626144017.1510594-2- >>> clm@fb.com/ patch subject: [PATCH v2] sched/fair: bump sd- >>>> max_newidle_lb_cost when newidle balance fails > > [ ... ] > >>> >>> commit: >>> 5bc34be478 ("sched/core: Reorganize cgroup bandwidth control >>> interface file writes") >>> ac34cb39e8 ("sched/fair: bump sd->max_newidle_lb_cost when newidle >>> balance fails") >>> >>> 5bc34be478d09c4d ac34cb39e8aea9915ec2f4e08c9 >>> ---------------- --------------------------- >>> %stddev %change %stddev >>> \ | \ >> ... >> >>> 40.37 +16.9 57.24 mpstat.cpu.all.idle% >> >> This commit inhibits the newidle balance. It seems that some workloads >> do not like newlyidle balance, like schbench, which is short duration >> task. While other workloads want the newidle balance to pull at its best >> effort, like unixbench shell test case. >> Just wonder if we can check the sched domain's average utilization to >> decide how hard we should trigger the newly idle balance, or can we check >> the overutilized flag to decide whether we should launch the >> new idle balance, something I was thinking of: > > Thanks for looking at this. > >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 9e24038fa000..6c7420ed484e 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -13759,7 +13759,8 @@ static int sched_balance_newidle(struct rq >> *this_rq, struct rq_flags *rf) >> sd = rcu_dereference_check_sched_domain(this_rq->sd); >> >> if (!get_rd_overloaded(this_rq->rd) || >> - (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { >> + (sd && this_rq->avg_idle < sd->max_newidle_lb_cost && >> + !READ_ONCE(this_rq->rd->overutilized))) { >> >> if (sd) >> update_next_balance(sd, &next_balance); >> > > > Looking at rd->overutilized, I think we only set it when > sched_energy_enabled(). I'm not sure if that's true often enough to use > as a fix for hackbench? > OK, overutilized is only used for EAS. I just had a try but can not reproduce this issue on a 240 CPUs system using unixbench: ./Run shell1 -i 30 -c 240 will need to double check with lkp/0day to figure out. thanks, Chenyu > -chris >
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 155213a2aed42c85361bf4f5c817f5cb68951c3b
Gitweb: https://git.kernel.org/tip/155213a2aed42c85361bf4f5c817f5cb68951c3b
Author: Chris Mason <clm@fb.com>
AuthorDate: Thu, 26 Jun 2025 07:39:10 -07:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 09 Jul 2025 13:40:21 +02:00
sched/fair: Bump sd->max_newidle_lb_cost when newidle balance fails
schbench (https://github.com/masoncl/schbench.git) is showing a
regression from previous production kernels that bisected down to:
sched/fair: Remove sysctl_sched_migration_cost condition (c5b0a7eefc)
The schbench command line was:
schbench -L -m 4 -M auto -t 256 -n 0 -r 0 -s 0
This creates 4 message threads pinned to CPUs 0-3, and 256x4 worker
threads spread across the rest of the CPUs. Neither the worker threads
or the message threads do any work, they just wake each other up and go
back to sleep as soon as possible.
The end result is the first 4 CPUs are pegged waking up those 1024
workers, and the rest of the CPUs are constantly banging in and out of
idle. If I take a v6.9 Linus kernel and revert that one commit,
performance goes from 3.4M RPS to 5.4M RPS.
schedstat shows there are ~100x more new idle balance operations, and
profiling shows the worker threads are spending ~20% of their CPU time
on new idle balance. schedstats also shows that almost all of these new
idle balance attemps are failing to find busy groups.
The fix used here is to crank up the cost of the newidle balance whenever it
fails. Since we don't want sd->max_newidle_lb_cost to grow out of
control, this also changes update_newidle_cost() to use
sysctl_sched_migration_cost as the upper limit on max_newidle_lb_cost.
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20250626144017.1510594-2-clm@fb.com
---
kernel/sched/fair.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7e2963e..ab0822c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12064,8 +12064,14 @@ static inline bool update_newidle_cost(struct sched_domain *sd, u64 cost)
/*
* Track max cost of a domain to make sure to not delay the
* next wakeup on the CPU.
+ *
+ * sched_balance_newidle() bumps the cost whenever newidle
+ * balance fails, and we don't want things to grow out of
+ * control. Use the sysctl_sched_migration_cost as the upper
+ * limit, plus a litle extra to avoid off by ones.
*/
- sd->max_newidle_lb_cost = cost;
+ sd->max_newidle_lb_cost =
+ min(cost, sysctl_sched_migration_cost + 200);
sd->last_decay_max_lb_cost = jiffies;
} else if (time_after(jiffies, sd->last_decay_max_lb_cost + HZ)) {
/*
@@ -12757,10 +12763,17 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
t1 = sched_clock_cpu(this_cpu);
domain_cost = t1 - t0;
- update_newidle_cost(sd, domain_cost);
-
curr_cost += domain_cost;
t0 = t1;
+
+ /*
+ * Failing newidle means it is not effective;
+ * bump the cost so we end up doing less of it.
+ */
+ if (!pulled_task)
+ domain_cost = (3 * sd->max_newidle_lb_cost) / 2;
+
+ update_newidle_cost(sd, domain_cost);
}
/*
© 2016 - 2025 Red Hat, Inc.