kernel/sched/psi.c | 7 +++++++ 1 file changed, 7 insertions(+)
collect_percpu_times() iterates over every possible CPU to build a
non-idle-weighted average of the PSI state times. When a CPU has
no PSI_NONIDLE delta for the current sampling interval:
nonidle = nsecs_to_jiffies(times[PSI_NONIDLE]) = 0
deltas[s] += times[s] * nonidle /* += 0 */
so the weighted accumulation contributes nothing.
get_recent_times() already sets the PSI_NONIDLE bit in
cpu_changed_states iff the PSI_NONIDLE delta is non-zero. Use that
bit to skip such CPUs early, as suggested by Johannes, avoiding the
nsecs_to_jiffies() call.
No functional change intended.
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
---
v4:
- Drop the incorrect Reviewed-by added in v2/v3; replace with
Suggested-by. Johannes' "Makes sense." on v1 was an
acknowledgement and an implementation suggestion, not a review
tag.
- Rebase commit message wording to describe "PSI_NONIDLE delta"
rather than "non-idle jiffies", matching the actual check.
v3: https://lore.kernel.org/all/20260313034847.1422-1-zhanxusheng@xiaomi.com/
- Resend of v2.
v2: https://lore.kernel.org/all/20260204022328.23938-1-zhanxusheng@xiaomi.com/
- Use cpu_changed_states & (1 << PSI_NONIDLE) per Johannes'
suggestion, saving the nsecs_to_jiffies() call.
v1: https://lore.kernel.org/all/20260203100007.22044-1-zhanxusheng@xiaomi.com/
---
kernel/sched/psi.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index d9c9d9480a45..cd1174f0b5e5 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -384,6 +384,13 @@ static void collect_percpu_times(struct psi_group *group,
get_recent_times(group, cpu, aggregator, times,
&cpu_changed_states);
+ /*
+ * If this CPU's PSI_NONIDLE delta is zero, it contributes
+ * nothing to nonidle_total or to any deltas[] entry below,
+ * so skip it early.
+ */
+ if (!(cpu_changed_states & (1 << PSI_NONIDLE)))
+ continue;
changed_states |= cpu_changed_states;
nonidle = nsecs_to_jiffies(times[PSI_NONIDLE]);
--
2.43.0
On Wed, Apr 29, 2026 at 06:05:55PM +0800, Zhan Xusheng wrote: > collect_percpu_times() iterates over every possible CPU to build a > non-idle-weighted average of the PSI state times. When a CPU has > no PSI_NONIDLE delta for the current sampling interval: > nonidle = nsecs_to_jiffies(times[PSI_NONIDLE]) = 0 > deltas[s] += times[s] * nonidle /* += 0 */ > > so the weighted accumulation contributes nothing. > > get_recent_times() already sets the PSI_NONIDLE bit in > cpu_changed_states iff the PSI_NONIDLE delta is non-zero. Use that > bit to skip such CPUs early, as suggested by Johannes, avoiding the > nsecs_to_jiffies() call. > > No functional change intended. So presumably this is an optimization. Where is the data that justifies this?
collect_percpu_times() iterates over every possible CPU to build a
non-idle-weighted average of the PSI state times. When a CPU has no
PSI_NONIDLE delta for the current sampling interval:
nonidle = nsecs_to_jiffies(times[PSI_NONIDLE]) = 0
deltas[s] += times[s] * nonidle /* += 0 */
so the weighted accumulation contributes nothing.
get_recent_times() already sets the PSI_NONIDLE bit in
cpu_changed_states iff the PSI_NONIDLE delta is non-zero. Use that
bit to skip such CPUs early, as suggested by Johannes, avoiding the
nsecs_to_jiffies() call and the PSI_NONIDLE * u64 mul-adds that
follow.
No functional change: on the skipped path the old code adds zero to
deltas[] and zero to nonidle_total, which is exactly the result of
not iterating.
Measured on i7-8700 (6C/12T), same mainline base and same build
flags for both kernels. Reader is a pinned userspace loop of
open()+read()+close() on /proc/pressure/cpu, 100k iterations inside
a KVM guest with -smp matching the host LCPU count (12):
baseline patched diff
idle p50 2438 ns 2270 ns -6.9%
idle p99 2598 ns 2449 ns -5.7%
1 busy / 12 p50 2479 ns 2281 ns -8.0%
all 12 busy p50 3738 ns 3537 ns -5.4%
The all-busy improvement shows the skip also kicks in when the box
is hot: between two samples, many CPUs record no PSI_NONIDLE state
transition even if they've been 100% utilised.
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
---
kernel/sched/psi.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index d9c9d9480a45..f220debc3fe0 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -386,6 +386,9 @@ static void collect_percpu_times(struct psi_group *group,
&cpu_changed_states);
changed_states |= cpu_changed_states;
+ if (!(cpu_changed_states & (1 << PSI_NONIDLE)))
+ continue;
+
nonidle = nsecs_to_jiffies(times[PSI_NONIDLE]);
nonidle_total += nonidle;
--
2.43.0
© 2016 - 2026 Red Hat, Inc.