[PATCH v5] sched/psi: Skip CPUs with zero non-idle delta in per-CPU aggregation

Zhan Xusheng posted 1 patch 1 month ago
kernel/sched/psi.c | 3 +++
1 file changed, 3 insertions(+)
[PATCH v5] sched/psi: Skip CPUs with zero non-idle delta in per-CPU aggregation
Posted by Zhan Xusheng 1 month ago
collect_percpu_times() iterates over every possible CPU to build a
non-idle-weighted average of the PSI state times. When a CPU has no
PSI_NONIDLE delta for the current sampling interval:
  nonidle     = nsecs_to_jiffies(times[PSI_NONIDLE]) = 0
  deltas[s]  += times[s] * nonidle               /* += 0 */

so the weighted accumulation contributes nothing.

get_recent_times() already sets the PSI_NONIDLE bit in
cpu_changed_states iff the PSI_NONIDLE delta is non-zero. Use that
bit to skip such CPUs early, as suggested by Johannes, avoiding the
nsecs_to_jiffies() call and the PSI_NONIDLE * u64 mul-adds that
follow.

No functional change: on the skipped path the old code adds zero to
deltas[] and zero to nonidle_total, which is exactly the result of
not iterating.

Measured on i7-8700 (6C/12T), same mainline base and same build
flags for both kernels. Reader is a pinned userspace loop of
open()+read()+close() on /proc/pressure/cpu, 100k iterations inside
a KVM guest with -smp matching the host LCPU count (12):
                            baseline    patched     diff
  idle             p50       2438 ns    2270 ns    -6.9%
  idle             p99       2598 ns    2449 ns    -5.7%
  1 busy / 12      p50       2479 ns    2281 ns    -8.0%
  all 12 busy      p50       3738 ns    3537 ns    -5.4%

The all-busy improvement shows the skip also kicks in when the box
is hot: between two samples, many CPUs record no PSI_NONIDLE state
transition even if they've been 100% utilised.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
---
 kernel/sched/psi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index d9c9d9480a45..f220debc3fe0 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -386,6 +386,9 @@ static void collect_percpu_times(struct psi_group *group,
 				&cpu_changed_states);
 		changed_states |= cpu_changed_states;
 
+		if (!(cpu_changed_states & (1 << PSI_NONIDLE)))
+			continue;
+
 		nonidle = nsecs_to_jiffies(times[PSI_NONIDLE]);
 		nonidle_total += nonidle;
 
-- 
2.43.0