[PATCH v2] sched/topology: Optimize sched_numa_find_nth_cpu() by inlining bsearch()

Kuan-Wei Chiu posted 1 patch 10 months ago
kernel/sched/topology.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
[PATCH v2] sched/topology: Optimize sched_numa_find_nth_cpu() by inlining bsearch()
Posted by Kuan-Wei Chiu 10 months ago
When CONFIG_MITIGATION_RETPOLINE is enabled, indirect function calls
become costly. Replacing bsearch() with an inline version of the binary
search reduces the overhead of indirect function calls, improving
efficiency. This change also results in a reduction of the code size by
91 bytes on x86-64 systems.

$ ./scripts/bloat-o-meter ./build_utility.o.old ./build_utility.o.new
add/remove: 0/1 grow/shrink: 1/0 up/down: 19/-110 (-91)
Function                                     old     new   delta
sched_numa_find_nth_cpu                      442     461     +19
hop_cmp                                      110       -    -110
Total: Before=46675, After=46584, chg -0.19%

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
Changes in v2:
Use bloat-o-meter to measure code size impact.

v1: https://lore.kernel.org/lkml/20241205162336.1675428-1-visitorckw@gmail.com/

 kernel/sched/topology.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index c49aea8c1025..3ba1476a97de 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2171,7 +2171,8 @@ int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
 	if (!k.masks)
 		goto unlock;
 
-	hop_masks = bsearch(&k, k.masks, sched_domains_numa_levels, sizeof(k.masks[0]), hop_cmp);
+	hop_masks = __inline_bsearch(&k, k.masks, sched_domains_numa_levels, sizeof(k.masks[0]),
+				     hop_cmp);
 	hop = hop_masks	- k.masks;
 
 	ret = hop ?
-- 
2.34.1