From nobody Fri Apr 17 02:59:32 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97E8D36605D for ; Tue, 24 Feb 2026 08:32:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771921974; cv=none; b=PBQ+BavDkPZRUH7Z9GoxeCiaMN64S5pFDyoKoUF73yMSVeyFdPdR/IYVALxI1yLAdyY1X+QVHOOFDaCawYpGXVNgN7gOPwge6iBvAiON0MpK6H1GOK3V/pV/EV/Fv8v0FqAkGgLmVCGNapYxvFgJd4QGjrDcL/g6dK48NkbA2Rw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771921974; c=relaxed/simple; bh=nCZP3O1jlpF+ZnxgoXds4SGgmSUGSU70yB6njTsd0Yo=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=UxapzEcrJQD2F6eDjUxZlojhnaQwAhdmcAPHrOasW2fBo7AlijvF4LcZjFMmYp+JDZ263SqvrIJltxU9HKoZjGrrFXiOS58BqxCP1v/jufYM77gQo+Wh1xYxLcFMhKP+aaYDiMCD43/MGZ1jpXa45lMoJa6EUK6tgAA6rHD/ZMA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=eXh44PyN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="eXh44PyN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5344AC116D0; Tue, 24 Feb 2026 08:32:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771921974; bh=nCZP3O1jlpF+ZnxgoXds4SGgmSUGSU70yB6njTsd0Yo=; h=From:To:Cc:Subject:Date:From; b=eXh44PyNk/IaeWu3JL1l0mJO2lCQIHCHTPJ1SlU24qGD6sKkYzWCkwfCOBRzAlSYh QUyGCa8YJ3h78AXYshcuqeqUdsCp1cwCNf+CBVamYXKEfDhqtLvtTF047VJDiV/0lO rW/f725NmtyQ/1srLfVnyUPUfQhecwaxw5OSeuKFalBxQHQUCWITdaqUiWiZ3yU27w /ld2t8b72S/h/eX1sZPnbsVDw4wIt+/FQHHQg31wW+2cAsa4figV0JLbB7aDeZfqxq ISvdVSETciTnhCgBZm2ulYa0BvPoc5PNQAbipa3GQIIz+Tm1vKTy5Q51V2S8c3yO+R c+44g8OkRUSvA== From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , "Rafael J. Wysocki" , Frederic Weisbecker Subject: [PATCH RFC] tick/sched: Prevent pointless NOHZ transitions Date: Tue, 24 Feb 2026 09:32:50 +0100 Message-ID: <875x7mv8wd.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" During a hackbench run with a fully loaded machine CPUs go briefly idle when they run out of tasks, which is expected. What's not expected are pointless NOHZ transitions like this: hackbench-1915 [001] d..2. 84.086755: sched_switch: prev_comm= =3Dhackbench prev_pid=3D1915 prev_prio=3D120 prev_state=3DS =3D=3D> next_co= mm=3Dswapper/1 next_pid=3D0 next_prio=3D120 1) -0 [001] dn.2. 84.086757: hrtimer_start: hrtimer= =3D00000000db1ede74 function=3Dtick_nohz_handler expires=3D305340000000 sof= texpires=3D305340000000 mode=3DABS|PINNED|HARD was_armed=3D1 -0 [001] dn.2. 84.086757: hrtimer_rearm: next_even= t=3D83885523974 deferred=3D0 2) -0 [001] dN.2. 84.086761: hrtimer_start: hrtimer= =3D00000000db1ede74 function=3Dtick_nohz_handler expires=3D82950000000 soft= expires=3D82950000000 mode=3DABS|PINNED|HARD was_armed=3D1 -0 [001] dN.2. 84.086761: hrtimer_rearm: next_even= t=3D82950000000 deferred=3D0 -0 [001] d..2. 84.086767: sched_switch: prev_comm= =3Dswapper/1 prev_pid=3D0 prev_prio=3D120 prev_state=3DR =3D=3D> next_comm= =3Dhackbench next_pid=3D2138 next_prio=3D120 hackbench-2138 [001] d..2. 84.086779: sched_switch: prev_comm= =3Dhackbench prev_pid=3D2138 prev_prio=3D120 prev_state=3DS =3D=3D> next_co= mm=3Dswapper/1 next_pid=3D0 next_prio=3D120 #1 switches to NOHZ mode targeting the next expiring timer and #2 switches back to tick mode a whopping 4us later. This happens with both TEO and MENU governors in a VM guest. That's not only pointless it's also a performance issue as each rearm of the timer implies a VM exit. Keep track of the idle time with a moving average and check it for being larger than TICK_NSEC in can_stop_idle_tick(). That cures this behaviour while still allowing the system to go into long idle sleeps once the work load stopped. Signed-off-by: Thomas Gleixner --- kernel/time/tick-sched.c | 20 +++++++++++++++++--- kernel/time/tick-sched.h | 9 +++++++++ 2 files changed, 26 insertions(+), 3 deletions(-) --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -751,6 +751,16 @@ static void tick_nohz_update_jiffies(kti touch_softlockup_watchdog_sched(); } =20 +static void tick_nohz_update_idle_duration(struct tick_sched *ts, ktime_t = now) +{ + ktime_t delta =3D now - ts->idle_dur_entry; + unsigned int idx =3D ts->idle_dur_idx; + + ts->idle_dur_sum +=3D delta - ts->idle_dur[idx]; + ts->idle_dur[idx] =3D delta; + ts->idle_dur_idx =3D (idx + 1) & IDLE_DUR_MASK; +} + static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now) { ktime_t delta; @@ -760,6 +770,8 @@ static void tick_nohz_stop_idle(struct t =20 delta =3D ktime_sub(now, ts->idle_entrytime); =20 + tick_nohz_update_idle_duration(ts, now); + write_seqcount_begin(&ts->idle_sleeptime_seq); if (nr_iowait_cpu(smp_processor_id()) > 0) ts->iowait_sleeptime =3D ktime_add(ts->iowait_sleeptime, delta); @@ -1224,7 +1236,7 @@ static bool can_stop_idle_tick(int cpu, return false; } =20 - return true; + return ts->idle_dur_sum > TICK_NSEC * IDLE_DUR_ENTRIES; } =20 /** @@ -1292,6 +1304,7 @@ void tick_nohz_idle_enter(void) =20 tick_sched_flag_set(ts, TS_FLAG_INIDLE); tick_nohz_start_idle(ts); + ts->idle_dur_entry =3D ts->idle_entrytime; =20 local_irq_enable(); } @@ -1490,11 +1503,12 @@ void tick_nohz_idle_exit(void) idle_active =3D tick_sched_flag_test(ts, TS_FLAG_IDLE_ACTIVE); tick_stopped =3D tick_sched_flag_test(ts, TS_FLAG_STOPPED); =20 - if (idle_active || tick_stopped) - now =3D ktime_get(); + now =3D ktime_get(); =20 if (idle_active) tick_nohz_stop_idle(ts, now); + else + tick_nohz_update_idle_duration(ts, now); =20 if (tick_stopped) tick_nohz_idle_update_tick(ts, now); --- a/kernel/time/tick-sched.h +++ b/kernel/time/tick-sched.h @@ -30,6 +30,9 @@ struct tick_device { /* High resolution tick mode */ #define TS_FLAG_HIGHRES BIT(5) =20 +#define IDLE_DUR_ENTRIES 8 +#define IDLE_DUR_MASK (IDLE_DUR_ENTRIES - 1) + /** * struct tick_sched - sched tick emulation and no idle tick control/stats * @@ -95,6 +98,12 @@ struct tick_sched { ktime_t idle_sleeptime; ktime_t iowait_sleeptime; =20 + /* Idle duration */ + ktime_t idle_dur[IDLE_DUR_ENTRIES]; + ktime_t idle_dur_entry; + ktime_t idle_dur_sum; + unsigned int idle_dur_idx; + /* Full dynticks handling */ atomic_t tick_dep_mask;