From nobody Mon Jun 8 05:26:39 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DC952F8EA2; Tue, 2 Jun 2026 19:30:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780428621; cv=none; b=S9q1JgYVGgFcKiCese7C+yWIyhmIJJyjCeMwX9/W1DIp2/Ft2ICCcAB92bHLofkkb0Ud1COoRhKFxpYFnup6QmMsmsRA0eGhM1qYqUeTV+QE4MW22C24lPIyzq/rDJ7Sth+z+W/iiwSiMLScJWlw9Rs3Iju2mErTgry9aRDA/vE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780428621; c=relaxed/simple; bh=ROiQHdiuuBqu4Uu6OVHB9D8iZfsnuTRMLM38xRd3i0U=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=BscTcqfN5i0YYHgYTyB3z+gqvhWYieCCZdVko1gnocfVptrwTQBPREO6sSL7cWikory/mQtgwYiqyxU6qQZKdNnDvvnQZXRI3ze4ZY803finWum67UND+J8xIZS7Fnbb2Rq5b7nvQv8Bd7ZsEmITgTmaaB+CCznuUIdq/q3HDnY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=0Va+PJwo; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=9XOQdlKS; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="0Va+PJwo"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="9XOQdlKS" Date: Tue, 02 Jun 2026 19:30:14 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1780428616; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=W7NB3HWwIz5Ewq7LYfQ+hKwsGIeQhTUHgkh4ZuX/IyI=; b=0Va+PJwoQBUZlgzdVgbcZ2uydz7nrZ8pzUd2hdCi+3JTe+RoZCHLGxZZnC6Yyxp/UDHxBB 4jyCE8Uy0VkgAXH9+yJkVB1iwElaI60rBCq0CEnG43u8gWCIjOJRdEb3ekTnAwbR8Pa/iP LjfZ/RYAQpHCioV+gp2WtklrrtvS7mMT3QQY/q0VIRuFyThgBk1bu+ZWzaaundcmUZ1Ogh AllQirBLDBjTpnpq3JrXsslWA0Z6zajef1xsyFs53wD/btf6dfEXa1eRrjF1EXMpu2Yovq coalnGmLQ3SC3/eZF6EDUWtC1Re2i6tzkLfU3TbVlIFfmBCiqkLlc1BsVXRjUQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1780428616; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=W7NB3HWwIz5Ewq7LYfQ+hKwsGIeQhTUHgkh4ZuX/IyI=; b=9XOQdlKSk9dJbmo3a53BJmKh9vX8ywBugGpp49yelXdN5BkIFBpws4ddGgPvEET1Qu2wYI hGp/vVWIMw8cPfCQ== From: "tip-bot2 for Frederic Weisbecker" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: timers/nohz] tick/sched: Unify idle cputime accounting Cc: Xin Zhao , Frederic Weisbecker , Thomas Gleixner , Shrikanth Hegde , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20260508131647.43868-8-frederic@kernel.org> References: <20260508131647.43868-8-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <178042861493.710.13681566128851078879.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the timers/nohz branch of tip: Commit-ID: cf6444c3e1bb7dd5974441bbd74840e9821d36f9 Gitweb: https://git.kernel.org/tip/cf6444c3e1bb7dd5974441bbd74840e98= 21d36f9 Author: Frederic Weisbecker AuthorDate: Fri, 08 May 2026 15:16:39 +02:00 Committer: Thomas Gleixner CommitterDate: Tue, 02 Jun 2026 21:27:25 +02:00 tick/sched: Unify idle cputime accounting The non-vtime dynticks-idle cputime accounting is a big mess that accumulates within two concurrent statistics, each having their own shortcomings: * The accounting for online CPUs which is based on the delta between tick_nohz_start_idle() and tick_nohz_stop_idle(). Pros: - Works when the tick is off - Has nsecs granularity Cons: - Account idle steal time but doesn't substract it from idle cputime. - Assumes CONFIG_IRQ_TIME_ACCOUNTING by not accounting IRQs but the IRQ time is simply ignored when CONFIG_IRQ_TIME_ACCOUNTING=3Dn - The windows between 1) idle task scheduling and the first call to tick_nohz_start_idle() and 2) idle task between the last tick_nohz_stop_idle() and the rest of the idle time are blindspots wrt. cputime accounting (though mostly insignificant amount) - Relies on private fields outside of kernel stats, with specific accessors. * The accounting for offline CPUs which is based on ticks and the jiffies delta during which the tick was stopped. Pros: - Handles steal time correctly - Handle CONFIG_IRQ_TIME_ACCOUNTING=3Dy and CONFIG_IRQ_TIME_ACCOUNTING=3Dn correctly. - Handles the whole idle task - Accounts directly to kernel stats, without midlayer accumulator. Cons: - Doesn't elapse when the tick is off, which doesn't make it suitable for online CPUs. - Has TICK_NSEC granularity (jiffies) - Needs to track the dyntick-idle ticks that were accounted and substract them from the total jiffies time spent while the tick was stopped. This is an ugly workaround. Having two different accounting for a single context is not the only problem: since those accountings are of different natures, it is possible to observe the global idle time going backward after a CPU goes offline. Clean up the situation with introducing a hybrid approach that stays coherent and works for both online and offline CPUs: * Tick based or native vtime accounting operate before the idle loop is entered and resume once the idle loop prepares to exit. * When the idle loop starts, switch to dynticks-idle accounting as is done currently, except that the statistics accumulate directly to the relevant kernel stat fields. * Private dyntick cputime accounting fields are removed. * Works on both online and offline case. Further improvement will include: * Only switch to dynticks-idle cputime accounting when the tick actually goes in dynticks mode. * Handle CONFIG_IRQ_TIME_ACCOUNTING=3Dn correctly such that the dynticks-idle accounting still elapses while on IRQs. * Correctly substract idle steal cputime from idle time Reported-by: Xin Zhao Signed-off-by: Frederic Weisbecker Signed-off-by: Thomas Gleixner Tested-by: Shrikanth Hegde Link: https://patch.msgid.link/20260508131647.43868-8-frederic@kernel.org --- include/linux/kernel_stat.h | 24 ++++++++++--- include/linux/vtime.h | 7 +++- kernel/sched/cputime.c | 62 +++++++++++++++----------------- kernel/time/tick-sched.c | 69 +++++++++--------------------------- 4 files changed, 75 insertions(+), 87 deletions(-) diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index dd020ec..ba65aad 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -34,6 +34,9 @@ enum cpu_usage_stat { }; =20 struct kernel_cpustat { +#ifdef CONFIG_NO_HZ_COMMON + int idle_dyntick; +#endif u64 cpustat[NR_STATS]; }; =20 @@ -99,6 +102,20 @@ static inline unsigned long kstat_cpu_irqs_sum(unsigned= int cpu) return kstat_cpu(cpu).irqs_sum; } =20 +#ifdef CONFIG_NO_HZ_COMMON +extern void kcpustat_dyntick_start(void); +extern void kcpustat_dyntick_stop(void); +static inline bool kcpustat_idle_dyntick(void) +{ + return __this_cpu_read(kernel_cpustat.idle_dyntick); +} +#else +static inline bool kcpustat_idle_dyntick(void) +{ + return false; +} +#endif /* CONFIG_NO_HZ_COMMON */ + #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN extern u64 kcpustat_field(enum cpu_usage_stat usage, int cpu); extern void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu); @@ -113,7 +130,7 @@ static inline void kcpustat_cpu_fetch(struct kernel_cpu= stat *dst, int cpu) *dst =3D kcpustat_cpu(cpu); } =20 -#endif +#endif /* !CONFIG_VIRT_CPU_ACCOUNTING_GEN */ =20 extern void account_user_time(struct task_struct *, u64); extern void account_guest_time(struct task_struct *, u64); @@ -127,14 +144,13 @@ extern u64 get_idle_time(struct kernel_cpustat *kcs, = int cpu); #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE static inline void account_process_tick(struct task_struct *tsk, int user) { - vtime_flush(tsk); + if (!kcpustat_idle_dyntick()) + vtime_flush(tsk); } #else extern void account_process_tick(struct task_struct *, int user); #endif =20 -extern void account_idle_ticks(unsigned long ticks); - #ifdef CONFIG_SCHED_CORE extern void __account_forceidle_time(struct task_struct *tsk, u64 delta); #endif diff --git a/include/linux/vtime.h b/include/linux/vtime.h index 5516ad5..9dc25b0 100644 --- a/include/linux/vtime.h +++ b/include/linux/vtime.h @@ -31,6 +31,11 @@ static inline bool vtime_generic_enabled_cpu(int cpu) return context_tracking_enabled_cpu(cpu); } =20 +static inline bool vtime_generic_enabled_this_cpu(void) +{ + return context_tracking_enabled_this_cpu(); +} + #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE extern void vtime_account_idle(struct task_struct *tsk); extern void vtime_account_irq(struct task_struct *tsk, unsigned int offset= ); @@ -90,7 +95,7 @@ static inline bool vtime_accounting_enabled_cpu(int cpu) =20 static inline bool vtime_accounting_enabled_this_cpu(void) { - return context_tracking_enabled_this_cpu(); + return vtime_generic_enabled_this_cpu(); } =20 extern void vtime_task_switch_generic(struct task_struct *prev); diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 815d0f7..a573378 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -414,16 +414,30 @@ static void irqtime_account_process_tick(struct task_= struct *p, int user_tick, } } =20 -static void irqtime_account_idle_ticks(int ticks) -{ - irqtime_account_process_tick(current, 0, ticks); -} #else /* !CONFIG_IRQ_TIME_ACCOUNTING: */ -static inline void irqtime_account_idle_ticks(int ticks) { } static inline void irqtime_account_process_tick(struct task_struct *p, int= user_tick, int nr_ticks) { } #endif /* !CONFIG_IRQ_TIME_ACCOUNTING */ =20 +#ifdef CONFIG_NO_HZ_COMMON +void kcpustat_dyntick_start(void) +{ + if (!vtime_generic_enabled_this_cpu()) { + vtime_dyntick_start(); + __this_cpu_write(kernel_cpustat.idle_dyntick, 1); + } +} + +void kcpustat_dyntick_stop(void) +{ + if (!vtime_generic_enabled_this_cpu()) { + __this_cpu_write(kernel_cpustat.idle_dyntick, 0); + vtime_dyntick_stop(); + steal_account_process_time(ULONG_MAX); + } +} +#endif /* CONFIG_NO_HZ_COMMON */ + /* * Use precise platform statistics if available: */ @@ -437,11 +451,15 @@ void vtime_account_irq(struct task_struct *tsk, unsig= ned int offset) vtime_account_hardirq(tsk); } else if (pc & SOFTIRQ_OFFSET) { vtime_account_softirq(tsk); - } else if (!IS_ENABLED(CONFIG_HAVE_VIRT_CPU_ACCOUNTING_IDLE) && - is_idle_task(tsk)) { - vtime_account_idle(tsk); + } else if (!kcpustat_idle_dyntick()) { + if (!IS_ENABLED(CONFIG_HAVE_VIRT_CPU_ACCOUNTING_IDLE) && + is_idle_task(tsk)) { + vtime_account_idle(tsk); + } else { + vtime_account_kernel(tsk); + } } else { - vtime_account_kernel(tsk); + vtime_reset(); } } =20 @@ -483,6 +501,9 @@ void account_process_tick(struct task_struct *p, int us= er_tick) if (vtime_accounting_enabled_this_cpu()) return; =20 + if (kcpustat_idle_dyntick()) + return; + if (irqtime_enabled()) { irqtime_account_process_tick(p, user_tick, 1); return; @@ -505,29 +526,6 @@ void account_process_tick(struct task_struct *p, int u= ser_tick) } =20 /* - * Account multiple ticks of idle time. - * @ticks: number of stolen ticks - */ -void account_idle_ticks(unsigned long ticks) -{ - u64 cputime, steal; - - if (irqtime_enabled()) { - irqtime_account_idle_ticks(ticks); - return; - } - - cputime =3D ticks * TICK_NSEC; - steal =3D steal_account_process_time(ULONG_MAX); - - if (steal >=3D cputime) - return; - - cputime -=3D steal; - account_idle_time(cputime); -} - -/* * Adjust tick based cputime random precision against scheduler runtime * accounting. * diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 597c3a0..c3efd35 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -285,8 +285,6 @@ static void tick_sched_handle(struct tick_sched *ts, st= ruct pt_regs *regs) if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) { touch_softlockup_watchdog_sched(); - if (is_idle_task(current)) - ts->idle_jiffies++; /* * In case the current tick fired too early past its expected * expiration, make sure we don't bypass the next clock reprogramming @@ -753,8 +751,12 @@ static void tick_nohz_update_jiffies(ktime_t now) =20 static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now) { + u64 *cpustat =3D kcpustat_this_cpu->cpustat; ktime_t delta; =20 + if (vtime_generic_enabled_this_cpu()) + return; + if (WARN_ON_ONCE(!tick_sched_flag_test(ts, TS_FLAG_IDLE_ACTIVE))) return; =20 @@ -762,9 +764,9 @@ static void tick_nohz_stop_idle(struct tick_sched *ts, = ktime_t now) =20 write_seqcount_begin(&ts->idle_sleeptime_seq); if (nr_iowait_cpu(smp_processor_id()) > 0) - ts->iowait_sleeptime =3D ktime_add(ts->iowait_sleeptime, delta); + cpustat[CPUTIME_IOWAIT] =3D ktime_add(cpustat[CPUTIME_IOWAIT], delta); else - ts->idle_sleeptime =3D ktime_add(ts->idle_sleeptime, delta); + cpustat[CPUTIME_IDLE] =3D ktime_add(cpustat[CPUTIME_IDLE], delta); =20 ts->idle_entrytime =3D now; tick_sched_flag_clear(ts, TS_FLAG_IDLE_ACTIVE); @@ -775,18 +777,21 @@ static void tick_nohz_stop_idle(struct tick_sched *ts= , ktime_t now) =20 static void tick_nohz_start_idle(struct tick_sched *ts) { + if (vtime_generic_enabled_this_cpu()) + return; + write_seqcount_begin(&ts->idle_sleeptime_seq); ts->idle_entrytime =3D ktime_get(); tick_sched_flag_set(ts, TS_FLAG_IDLE_ACTIVE); write_seqcount_end(&ts->idle_sleeptime_seq); - sched_clock_idle_sleep_event(); } =20 -static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usage_stat idx, ktime_t= *sleeptime, +static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usage_stat idx, bool compute_delta, u64 *last_update_time) { struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); + u64 *cpustat =3D kcpustat_cpu(cpu).cpustat; ktime_t now, idle; unsigned int seq; =20 @@ -812,7 +817,7 @@ static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usag= e_stat idx, ktime_t *slee delta =3D ktime_sub(now, ts->idle_entrytime); } =20 - idle =3D ktime_add(*sleeptime, delta); + idle =3D ktime_add(cpustat[idx], delta); } while (read_seqcount_retry(&ts->idle_sleeptime_seq, seq)); =20 return ktime_to_us(idle); @@ -838,9 +843,7 @@ static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usag= e_stat idx, ktime_t *slee */ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) { - struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); - - return get_cpu_sleep_time_us(cpu, CPUTIME_IDLE, &ts->idle_sleeptime, + return get_cpu_sleep_time_us(cpu, CPUTIME_IDLE, !nr_iowait_cpu(cpu), last_update_time); } EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); @@ -864,9 +867,7 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); */ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) { - struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); - - return get_cpu_sleep_time_us(cpu, CPUTIME_IOWAIT, &ts->iowait_sleeptime, + return get_cpu_sleep_time_us(cpu, CPUTIME_IOWAIT, nr_iowait_cpu(cpu), last_update_time); } EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us); @@ -1279,10 +1280,8 @@ void tick_nohz_idle_stop_tick(void) ts->idle_sleeps++; ts->idle_expires =3D expires; =20 - if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) { - ts->idle_jiffies =3D ts->last_jiffies; + if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) nohz_balance_enter_idle(cpu); - } } else { tick_nohz_retain_tick(ts); } @@ -1311,6 +1310,7 @@ void tick_nohz_idle_enter(void) WARN_ON_ONCE(ts->timer_expires_base); =20 tick_sched_flag_set(ts, TS_FLAG_INIDLE); + kcpustat_dyntick_start(); tick_nohz_start_idle(ts); =20 local_irq_enable(); @@ -1436,37 +1436,12 @@ unsigned long tick_nohz_get_idle_calls_cpu(int cpu) return ts->idle_calls; } =20 -static void tick_nohz_account_idle_time(struct tick_sched *ts, - ktime_t now) -{ - unsigned long ticks; - - ts->idle_exittime =3D now; - - if (vtime_accounting_enabled_this_cpu()) - return; - /* - * We stopped the tick in idle. update_process_times() would miss the - * time we slept, as it does only a 1 tick accounting. - * Enforce that this is accounted to idle ! - */ - ticks =3D jiffies - ts->idle_jiffies; - /* - * We might be one off. Do not randomly account a huge number of ticks! - */ - if (ticks && ticks < LONG_MAX) - account_idle_ticks(ticks); -} - void tick_nohz_idle_restart_tick(void) { struct tick_sched *ts =3D this_cpu_ptr(&tick_cpu_sched); =20 - if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) { - ktime_t now =3D ktime_get(); - tick_nohz_restart_sched_tick(ts, now); - tick_nohz_account_idle_time(ts, now); - } + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) + tick_nohz_restart_sched_tick(ts, ktime_get()); } =20 static void tick_nohz_idle_update_tick(struct tick_sched *ts, ktime_t now) @@ -1475,8 +1450,6 @@ static void tick_nohz_idle_update_tick(struct tick_sc= hed *ts, ktime_t now) __tick_nohz_full_update_tick(ts, now); else tick_nohz_restart_sched_tick(ts, now); - - tick_nohz_account_idle_time(ts, now); } =20 /** @@ -1518,6 +1491,7 @@ void tick_nohz_idle_exit(void) =20 if (tick_stopped) tick_nohz_idle_update_tick(ts, now); + kcpustat_dyntick_stop(); =20 local_irq_enable(); } @@ -1655,20 +1629,15 @@ void tick_setup_sched_timer(bool hrtimer) void tick_sched_timer_dying(int cpu) { struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); - ktime_t idle_sleeptime, iowait_sleeptime; unsigned long idle_calls, idle_sleeps; =20 /* This must happen before hrtimers are migrated! */ if (tick_sched_flag_test(ts, TS_FLAG_HIGHRES)) hrtimer_cancel(&ts->sched_timer); =20 - idle_sleeptime =3D ts->idle_sleeptime; - iowait_sleeptime =3D ts->iowait_sleeptime; idle_calls =3D ts->idle_calls; idle_sleeps =3D ts->idle_sleeps; memset(ts, 0, sizeof(*ts)); - ts->idle_sleeptime =3D idle_sleeptime; - ts->iowait_sleeptime =3D iowait_sleeptime; ts->idle_calls =3D idle_calls; ts->idle_sleeps =3D idle_sleeps; }