[RESEND PATCH] tick/nohz: Fix wrong NOHZ idle CPU state

Shubhang Kaushik posted 1 patch 4 days, 9 hours ago
kernel/time/tick-sched.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
[RESEND PATCH] tick/nohz: Fix wrong NOHZ idle CPU state
Posted by Shubhang Kaushik 4 days, 9 hours ago
Under CONFIG_NO_HZ_FULL, the scheduler tick can get stopped earlier via
tick_nohz_full_stop_tick() before the CPU subsequently enters the idle
path. In this case, tick_nohz_idle_stop_tick() observes TS_FLAG_STOPPED
already set and skips nohz_balance_enter_idle() because the !was_stopped
condition assumes tick-stop and idle-entry are coupled.
This leaves a tickless idle CPU absent from nohz.idle_cpus_mask, making
it invisible to NOHZ idle load balancing while periodic balancing is
also suppressed.

The patch fixes this by decoupling tick-stop transition accounting from
scheduler bookkeeping. idle_jiffies remains updated only on the
tick-stop transition, while nohz_balance_enter_idle() is invoked
whenever a CPU enters idle with the tick already stopped, relying on its
existing idempotent gaurd to avoid duplicate registration.

Tested on Ampere Altra on 6.19.0-rc8 with CONFIG_NO_HZ_FULL enabled:
- This change improves load distribution by ensuring that tickless idle
  CPUs are visible to NOHZ idle load balancing. In llama-batched-bench,
  throughput improves by up to ~14% across multiple thread counts.
- Hackbench single-process results improve by 5% and multi-process
  results improve by up to ~26%, consistent with reduced scheduler
  jitter and earlier utilization of fully idle cores.
  No regressions observed.

Signed-off-by: Shubhang Kaushik <shubhang@os.amperecomputing.com>
Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
Reviewed-by: Christoph Lameter (Ampere) <cl@gentwo.org>
Reviewed-by: Shubhang Kaushik <shubhang@os.amperecomputing.com>
---
This is a resend of the original patch to ensure visibility.
Previous resend: https://lkml.org/lkml/2025/8/21/170
Original thread: https://lkml.org/lkml/2025/8/21/171

The patch addresses a performance regression in NOHZ idle load balancing 
observed under CONFIG_NO_HZ_FULL, where idle CPUs were becoming 
invisible to the balancer.
---
 kernel/time/tick-sched.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 2f8a7923fa279409ffe950f770ff2eac868f6ece..eee6fcebe78c2f8d93464a55fe332e12fe9c164e 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1250,8 +1250,9 @@ void tick_nohz_idle_stop_tick(void)
 		ts->idle_sleeps++;
 		ts->idle_expires = expires;
 
-		if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
-			ts->idle_jiffies = ts->last_jiffies;
+		if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
+			if (!was_stopped)
+				ts->idle_jiffies = ts->last_jiffies;
 			nohz_balance_enter_idle(cpu);
 		}
 	} else {

---
base-commit: 18f7fcd5e69a04df57b563360b88be72471d6b62
change-id: 20260203-fix-nohz-idle-b2838276cb91

Best regards,
-- 
Shubhang Kaushik <shubhang@os.amperecomputing.com>