init/Kconfig | 4 ++++ 1 file changed, 4 insertions(+)
Chris Mason reported a performance regression on big iron. Reports of
this kind were usually reported as part of a micro benchmark but Chris'
test did mimic his real workload. This makes it a real regression.
The root cause is rcuref_get() which is invoked during each futex
operation. If all threads of an application do this simultaneously then
it leads to cache line bouncing and the performance drops.
Disable FUTEX_PRIVATE_HASH entirely for this cycle. The performance
regression will be addressed in the following cycle enabling the option
again.
Reported-by: Chris Mason <clm@meta.com>
Closes: https://lore.kernel.org/all/3ad05298-351e-4d61-9972-ca45a0a50e33@meta.com/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
init/Kconfig | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/init/Kconfig b/init/Kconfig
index af4c2f0854554..666783eb50abd 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1716,9 +1716,13 @@ config FUTEX_PI
depends on FUTEX && RT_MUTEXES
default y
+#
+# marked broken for performance reasons; gives us one more cycle to sort things out.
+#
config FUTEX_PRIVATE_HASH
bool
depends on FUTEX && !BASE_SMALL && MMU
+ depends on BROKEN
default y
config FUTEX_MPOL
--
2.50.0
Hello, from the commit message, we know this 'temporary disable' is to address a performance regression. so we still send out this report FYI what's the possible performance impact. however, our team focus on micro benchmark, so, anyway, just FYI. kernel test robot noticed a 1.9% improvement of perf-bench-futex.ops/s on: commit: bc1aa469e545fe16a62d501e095630cccc3fe1c4 ("[PATCH] futex: Temporary disable FUTEX_PRIVATE_HASH") url: https://github.com/intel-lab-lkp/linux/commits/Sebastian-Andrzej-Siewior/futex-Temporary-disable-FUTEX_PRIVATE_HASH/20250630-225317 base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git d0b3b7b22dfa1f4b515fd3a295b3fd958f9e81af patch link: https://lore.kernel.org/all/20250630145034.8JnINEaS@linutronix.de/ patch subject: [PATCH] futex: Temporary disable FUTEX_PRIVATE_HASH testcase: perf-bench-futex config: x86_64-rhel-9.4 compiler: gcc-12 test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E CPU @ 2.4GHz (Sierra Forest) with 256G memory parameters: runtime: 300s nr_task: 100% test: hash shared: shared cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250716/202507160223.2a483e8b-lkp@intel.com ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/shared/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/300s/shared/lkp-srf-2sp2/hash/perf-bench-futex commit: v6.16-rc4 bc1aa469e5 ("futex: Temporary disable FUTEX_PRIVATE_HASH") v6.16-rc4 bc1aa469e545fe16a62d501e095 ---------------- --------------------------- %stddev %change %stddev \ | \ 2249734 +1.9% 2291792 perf-bench-futex.ops/s 6622 +3.7% 6868 perf-bench-futex.time.user_time 115.83 ± 12% +40.1% 162.33 ± 18% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown] 4.40 ± 19% +343.0% 19.51 ± 64% perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 4.15 ± 17% +347.3% 18.55 ± 69% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 21.00 ± 63% +121.4% 46.50 ± 11% perf-c2c.DRAM.local 4129 ± 57% +225.5% 13444 perf-c2c.DRAM.remote 67558 ± 56% +66.3% 112351 perf-c2c.HITM.local 4045 ± 57% +227.2% 13237 perf-c2c.HITM.remote 71603 ± 57% +75.4% 125588 perf-c2c.HITM.total 1.674e+08 ± 20% +19.0% 1.991e+08 perf-stat.i.cache-misses 4.096e+08 ± 20% +19.4% 4.891e+08 perf-stat.i.cache-references 0.36 +2.9% 0.37 perf-stat.overall.MPKI 3127 -1.8% 3070 perf-stat.overall.cycles-between-cache-misses 1.669e+08 ± 20% +18.9% 1.985e+08 perf-stat.ps.cache-misses 4.085e+08 ± 20% +19.3% 4.874e+08 perf-stat.ps.cache-references 1.622e+14 -1.0% 1.606e+14 perf-stat.total.instructions Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
The following commit has been merged into the locking/urgent branch of tip:
Commit-ID: 9a57c3773152a3ff2c35cc8325e088d011c9f83b
Gitweb: https://git.kernel.org/tip/9a57c3773152a3ff2c35cc8325e088d011c9f83b
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate: Mon, 30 Jun 2025 16:50:34 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Jul 2025 15:02:05 +02:00
futex: Temporary disable FUTEX_PRIVATE_HASH
Chris Mason reported a performance regression on big iron. Reports of
this kind were usually reported as part of a micro benchmark but Chris'
test did mimic his real workload. This makes it a real regression.
The root cause is rcuref_get() which is invoked during each futex
operation. If all threads of an application do this simultaneously then
it leads to cache line bouncing and the performance drops.
Disable FUTEX_PRIVATE_HASH entirely for this cycle. The performance
regression will be addressed in the following cycle enabling the option
again.
Closes: https://lore.kernel.org/all/3ad05298-351e-4d61-9972-ca45a0a50e33@meta.com/
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250630145034.8JnINEaS@linutronix.de
---
init/Kconfig | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/init/Kconfig b/init/Kconfig
index af4c2f0..666783e 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1716,9 +1716,13 @@ config FUTEX_PI
depends on FUTEX && RT_MUTEXES
default y
+#
+# marked broken for performance reasons; gives us one more cycle to sort things out.
+#
config FUTEX_PRIVATE_HASH
bool
depends on FUTEX && !BASE_SMALL && MMU
+ depends on BROKEN
default y
config FUTEX_MPOL
© 2016 - 2025 Red Hat, Inc.