From nobody Sat Feb 7 21:14:59 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B8D3C212542; Fri, 6 Feb 2026 14:23:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770387825; cv=none; b=l/CN2F051bj3W2+EO5/xzaLN+Z2EvhsakjAMf3bJrWfG4X9GsKaJgQSS76OAi1srhO9AJMVouzuUBN8yIpcStGYcWdjXh5zvXm8BnbOjNDX3ohojgOme8RBfZAX9MktU9YyjFMza1/zEHdCpo+nSYHabIA971IDVMlzOtqfiecs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770387825; c=relaxed/simple; bh=IbgsIe5v0OELV1evugyCjy9XF5xOTfRaayo/oBM79xA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=srS2tHEw2Y1Q9D5v6iQogBNCV7FimXSghHS9rVbweEJxgk2TE31zwQJd9/olM2sgFYWz2nJnOBTc3NbjsBjT3zVE+fVXTawRY8VM6UDxYOqUxnJkVY2JOjMecVpA0TNaOocBoDgrMq+ym/7i9kQ28y1ramWs+TAAIVoNugkl7Mo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QuJHwKim; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QuJHwKim" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0D1A3C19421; Fri, 6 Feb 2026 14:23:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1770387825; bh=IbgsIe5v0OELV1evugyCjy9XF5xOTfRaayo/oBM79xA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QuJHwKimvlBEaQIGoSfFSWw9NDTB7xFmowSy62HChbpxS8iy3G+mEUj/wfuxFrlaE 5hlfMtG5hGCbjdTWRYDmBMTIihoVaqilYpYvD9QU9kQ6AachZN8IyGaj2td+xbxux/ ziugIIhogKJsFFPVS0cGD14tRHv0HmYpbgHrfxT2azEbcVAR/exVFgxbXwtKy0bFfy 3jx44jWh2QRHMP8YVaVpd14C/ZoX7UPxx8SCWU2xJKxGCnq98MSRk000APsyz7ADnJ CprSaCtnWDCODM1IuvJgTrafHPo/iNsOQioAH7uPH4P3yz/+Mc6pMcntIGR8IqS19d p5OM+KulhxrQg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Christophe Leroy (CS GROUP)" , "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Shrikanth Hegde Subject: [PATCH 06/15] tick/sched: Unify idle cputime accounting Date: Fri, 6 Feb 2026 15:22:36 +0100 Message-ID: <20260206142245.58987-7-frederic@kernel.org> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20260206142245.58987-1-frederic@kernel.org> References: <20260206142245.58987-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The non-vtime dynticks-idle cputime accounting is a big mess that accumulates within two concurrent statistics, each having their own shortcomings: * The accounting for online CPUs which is based on the delta between tick_nohz_start_idle() and tick_nohz_stop_idle(). Pros: - Works when the tick is off - Has nsecs granularity Cons: - Account idle steal time but doesn't substract it from idle cputime. - Assumes CONFIG_IRQ_TIME_ACCOUNTING by not accounting IRQs but the IRQ time is simply ignored when CONFIG_IRQ_TIME_ACCOUNTING=3Dn - The windows between 1) idle task scheduling and the first call to tick_nohz_start_idle() and 2) idle task between the last tick_nohz_stop_idle() and the rest of the idle time are blindspots wrt. cputime accounting (though mostly insignificant amount) - Relies on private fields outside of kernel stats, with specific accessors. * The accounting for offline CPUs which is based on ticks and the jiffies delta during which the tick was stopped. Pros: - Handles steal time correctly - Handle CONFIG_IRQ_TIME_ACCOUNTING=3Dy and CONFIG_IRQ_TIME_ACCOUNTING=3Dn correctly. - Handles the whole idle task - Accounts directly to kernel stats, without midlayer accumulator. Cons: - Doesn't elapse when the tick is off, which doesn't make it suitable for online CPUs. - Has TICK_NSEC granularity (jiffies) - Needs to track the dyntick-idle ticks that were accounted and substract them from the total jiffies time spent while the tick was stopped. This is an ugly workaround. Having two different accounting for a single context is not the only problem: since those accountings are of different natures, it is possible to observe the global idle time going backward after a CPU goes offline. Clean up the situation with introducing a hybrid approach that stays coherent and works for both online and offline CPUs: * Tick based or native vtime accounting operate before the idle loop is entered and resume once the idle loop prepares to exit. * When the idle loop starts, switch to dynticks-idle accounting as is done currently, except that the statistics accumulate directly to the relevant kernel stat fields. * Private dyntick cputime accounting fields are removed. * Works on both online and offline case. Further improvement will include: * Only switch to dynticks-idle cputime accounting when the tick actually goes in dynticks mode. * Handle CONFIG_IRQ_TIME_ACCOUNTING=3Dn correctly such that the dynticks-idle accounting still elapses while on IRQs. * Correctly substract idle steal cputime from idle time Reported-by: Xin Zhao Signed-off-by: Frederic Weisbecker --- include/linux/kernel_stat.h | 24 ++++++++++--- include/linux/vtime.h | 7 +++- kernel/sched/cputime.c | 62 ++++++++++++++++---------------- kernel/time/tick-sched.c | 71 +++++++++++-------------------------- 4 files changed, 76 insertions(+), 88 deletions(-) diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index dd020ecaf67b..ba65aad308a1 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -34,6 +34,9 @@ enum cpu_usage_stat { }; =20 struct kernel_cpustat { +#ifdef CONFIG_NO_HZ_COMMON + int idle_dyntick; +#endif u64 cpustat[NR_STATS]; }; =20 @@ -99,6 +102,20 @@ static inline unsigned long kstat_cpu_irqs_sum(unsigned= int cpu) return kstat_cpu(cpu).irqs_sum; } =20 +#ifdef CONFIG_NO_HZ_COMMON +extern void kcpustat_dyntick_start(void); +extern void kcpustat_dyntick_stop(void); +static inline bool kcpustat_idle_dyntick(void) +{ + return __this_cpu_read(kernel_cpustat.idle_dyntick); +} +#else +static inline bool kcpustat_idle_dyntick(void) +{ + return false; +} +#endif /* CONFIG_NO_HZ_COMMON */ + #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN extern u64 kcpustat_field(enum cpu_usage_stat usage, int cpu); extern void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu); @@ -113,7 +130,7 @@ static inline void kcpustat_cpu_fetch(struct kernel_cpu= stat *dst, int cpu) *dst =3D kcpustat_cpu(cpu); } =20 -#endif +#endif /* !CONFIG_VIRT_CPU_ACCOUNTING_GEN */ =20 extern void account_user_time(struct task_struct *, u64); extern void account_guest_time(struct task_struct *, u64); @@ -127,14 +144,13 @@ extern u64 get_idle_time(struct kernel_cpustat *kcs, = int cpu); #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE static inline void account_process_tick(struct task_struct *tsk, int user) { - vtime_flush(tsk); + if (!kcpustat_idle_dyntick()) + vtime_flush(tsk); } #else extern void account_process_tick(struct task_struct *, int user); #endif =20 -extern void account_idle_ticks(unsigned long ticks); - #ifdef CONFIG_SCHED_CORE extern void __account_forceidle_time(struct task_struct *tsk, u64 delta); #endif diff --git a/include/linux/vtime.h b/include/linux/vtime.h index 61b94c12d7dd..a4506336002d 100644 --- a/include/linux/vtime.h +++ b/include/linux/vtime.h @@ -31,6 +31,11 @@ static inline bool vtime_generic_enabled_cpu(int cpu) return context_tracking_enabled_cpu(cpu); } =20 +static inline bool vtime_generic_enabled_this_cpu(void) +{ + return context_tracking_enabled_this_cpu(); +} + #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE extern void vtime_account_idle(struct task_struct *tsk); extern void vtime_account_irq(struct task_struct *tsk, unsigned int offset= ); @@ -90,7 +95,7 @@ static inline bool vtime_accounting_enabled_cpu(int cpu) =20 static inline bool vtime_accounting_enabled_this_cpu(void) { - return context_tracking_enabled_this_cpu(); + return vtime_generic_enabled_this_cpu(); } =20 extern void vtime_task_switch_generic(struct task_struct *prev); diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 5613838d0307..d67f93e845a7 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -400,16 +400,30 @@ static void irqtime_account_process_tick(struct task_= struct *p, int user_tick, } } =20 -static void irqtime_account_idle_ticks(int ticks) -{ - irqtime_account_process_tick(current, 0, ticks); -} #else /* !CONFIG_IRQ_TIME_ACCOUNTING: */ -static inline void irqtime_account_idle_ticks(int ticks) { } static inline void irqtime_account_process_tick(struct task_struct *p, int= user_tick, int nr_ticks) { } #endif /* !CONFIG_IRQ_TIME_ACCOUNTING */ =20 +#ifdef CONFIG_NO_HZ_COMMON +void kcpustat_dyntick_start(void) +{ + if (!vtime_generic_enabled_this_cpu()) { + vtime_dyntick_start(); + __this_cpu_write(kernel_cpustat.idle_dyntick, 1); + } +} + +void kcpustat_dyntick_stop(void) +{ + if (!vtime_generic_enabled_this_cpu()) { + __this_cpu_write(kernel_cpustat.idle_dyntick, 0); + vtime_dyntick_stop(); + steal_account_process_time(ULONG_MAX); + } +} +#endif /* CONFIG_NO_HZ_COMMON */ + /* * Use precise platform statistics if available: */ @@ -423,11 +437,15 @@ void vtime_account_irq(struct task_struct *tsk, unsig= ned int offset) vtime_account_hardirq(tsk); } else if (pc & SOFTIRQ_OFFSET) { vtime_account_softirq(tsk); - } else if (!IS_ENABLED(CONFIG_HAVE_VIRT_CPU_ACCOUNTING_IDLE) && - is_idle_task(tsk)) { - vtime_account_idle(tsk); + } else if (!kcpustat_idle_dyntick()) { + if (!IS_ENABLED(CONFIG_HAVE_VIRT_CPU_ACCOUNTING_IDLE) && + is_idle_task(tsk)) { + vtime_account_idle(tsk); + } else { + vtime_account_kernel(tsk); + } } else { - vtime_account_kernel(tsk); + vtime_reset(); } } =20 @@ -469,6 +487,9 @@ void account_process_tick(struct task_struct *p, int us= er_tick) if (vtime_accounting_enabled_this_cpu()) return; =20 + if (kcpustat_idle_dyntick()) + return; + if (irqtime_enabled()) { irqtime_account_process_tick(p, user_tick, 1); return; @@ -490,29 +511,6 @@ void account_process_tick(struct task_struct *p, int u= ser_tick) account_idle_time(cputime); } =20 -/* - * Account multiple ticks of idle time. - * @ticks: number of stolen ticks - */ -void account_idle_ticks(unsigned long ticks) -{ - u64 cputime, steal; - - if (irqtime_enabled()) { - irqtime_account_idle_ticks(ticks); - return; - } - - cputime =3D ticks * TICK_NSEC; - steal =3D steal_account_process_time(ULONG_MAX); - - if (steal >=3D cputime) - return; - - cputime -=3D steal; - account_idle_time(cputime); -} - /* * Adjust tick based cputime random precision against scheduler runtime * accounting. diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 9632066aea4d..21ac561a8545 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -285,8 +285,6 @@ static void tick_sched_handle(struct tick_sched *ts, st= ruct pt_regs *regs) if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) { touch_softlockup_watchdog_sched(); - if (is_idle_task(current)) - ts->idle_jiffies++; /* * In case the current tick fired too early past its expected * expiration, make sure we don't bypass the next clock reprogramming @@ -744,8 +742,12 @@ static void tick_nohz_update_jiffies(ktime_t now) =20 static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now) { + u64 *cpustat =3D kcpustat_this_cpu->cpustat; ktime_t delta; =20 + if (vtime_generic_enabled_this_cpu()) + return; + if (WARN_ON_ONCE(!tick_sched_flag_test(ts, TS_FLAG_IDLE_ACTIVE))) return; =20 @@ -753,9 +755,9 @@ static void tick_nohz_stop_idle(struct tick_sched *ts, = ktime_t now) =20 write_seqcount_begin(&ts->idle_sleeptime_seq); if (nr_iowait_cpu(smp_processor_id()) > 0) - ts->iowait_sleeptime =3D ktime_add(ts->iowait_sleeptime, delta); + cpustat[CPUTIME_IOWAIT] =3D ktime_add(cpustat[CPUTIME_IOWAIT], delta); else - ts->idle_sleeptime =3D ktime_add(ts->idle_sleeptime, delta); + cpustat[CPUTIME_IDLE] =3D ktime_add(cpustat[CPUTIME_IDLE], delta); =20 ts->idle_entrytime =3D now; tick_sched_flag_clear(ts, TS_FLAG_IDLE_ACTIVE); @@ -766,18 +768,21 @@ static void tick_nohz_stop_idle(struct tick_sched *ts= , ktime_t now) =20 static void tick_nohz_start_idle(struct tick_sched *ts) { + if (vtime_generic_enabled_this_cpu()) + return; + write_seqcount_begin(&ts->idle_sleeptime_seq); ts->idle_entrytime =3D ktime_get(); tick_sched_flag_set(ts, TS_FLAG_IDLE_ACTIVE); write_seqcount_end(&ts->idle_sleeptime_seq); - sched_clock_idle_sleep_event(); } =20 -static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usage_stat idx, ktime_t= *sleeptime, +static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usage_stat idx, bool compute_delta, u64 *last_update_time) { struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); + u64 *cpustat =3D kcpustat_cpu(cpu).cpustat; ktime_t now, idle; unsigned int seq; =20 @@ -799,9 +804,9 @@ static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usag= e_stat idx, ktime_t *slee if (tick_sched_flag_test(ts, TS_FLAG_IDLE_ACTIVE) && compute_delta) { ktime_t delta =3D ktime_sub(now, ts->idle_entrytime); =20 - idle =3D ktime_add(*sleeptime, delta); + idle =3D ktime_add(cpustat[idx], delta); } else { - idle =3D *sleeptime; + idle =3D cpustat[idx]; } } while (read_seqcount_retry(&ts->idle_sleeptime_seq, seq)); =20 @@ -828,9 +833,7 @@ static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usag= e_stat idx, ktime_t *slee */ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) { - struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); - - return get_cpu_sleep_time_us(cpu, CPUTIME_IDLE, &ts->idle_sleeptime, + return get_cpu_sleep_time_us(cpu, CPUTIME_IDLE, !nr_iowait_cpu(cpu), last_update_time); } EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); @@ -854,9 +857,7 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); */ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) { - struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); - - return get_cpu_sleep_time_us(cpu, CPUTIME_IOWAIT, &ts->iowait_sleeptime, + return get_cpu_sleep_time_us(cpu, CPUTIME_IOWAIT, nr_iowait_cpu(cpu), last_update_time); } EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us); @@ -1256,10 +1257,8 @@ void tick_nohz_idle_stop_tick(void) ts->idle_sleeps++; ts->idle_expires =3D expires; =20 - if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) { - ts->idle_jiffies =3D ts->last_jiffies; + if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) nohz_balance_enter_idle(cpu); - } } else { tick_nohz_retain_tick(ts); } @@ -1288,6 +1287,7 @@ void tick_nohz_idle_enter(void) WARN_ON_ONCE(ts->timer_expires_base); =20 tick_sched_flag_set(ts, TS_FLAG_INIDLE); + kcpustat_dyntick_start(); tick_nohz_start_idle(ts); =20 local_irq_enable(); @@ -1413,37 +1413,12 @@ unsigned long tick_nohz_get_idle_calls_cpu(int cpu) return ts->idle_calls; } =20 -static void tick_nohz_account_idle_time(struct tick_sched *ts, - ktime_t now) -{ - unsigned long ticks; - - ts->idle_exittime =3D now; - - if (vtime_accounting_enabled_this_cpu()) - return; - /* - * We stopped the tick in idle. update_process_times() would miss the - * time we slept, as it does only a 1 tick accounting. - * Enforce that this is accounted to idle ! - */ - ticks =3D jiffies - ts->idle_jiffies; - /* - * We might be one off. Do not randomly account a huge number of ticks! - */ - if (ticks && ticks < LONG_MAX) - account_idle_ticks(ticks); -} - void tick_nohz_idle_restart_tick(void) { struct tick_sched *ts =3D this_cpu_ptr(&tick_cpu_sched); =20 - if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) { - ktime_t now =3D ktime_get(); - tick_nohz_restart_sched_tick(ts, now); - tick_nohz_account_idle_time(ts, now); - } + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) + tick_nohz_restart_sched_tick(ts, ktime_get()); } =20 static void tick_nohz_idle_update_tick(struct tick_sched *ts, ktime_t now) @@ -1452,8 +1427,6 @@ static void tick_nohz_idle_update_tick(struct tick_sc= hed *ts, ktime_t now) __tick_nohz_full_update_tick(ts, now); else tick_nohz_restart_sched_tick(ts, now); - - tick_nohz_account_idle_time(ts, now); } =20 /** @@ -1495,6 +1468,7 @@ void tick_nohz_idle_exit(void) =20 if (tick_stopped) tick_nohz_idle_update_tick(ts, now); + kcpustat_dyntick_stop(); =20 local_irq_enable(); } @@ -1631,20 +1605,15 @@ void tick_setup_sched_timer(bool hrtimer) void tick_sched_timer_dying(int cpu) { struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); - ktime_t idle_sleeptime, iowait_sleeptime; unsigned long idle_calls, idle_sleeps; =20 /* This must happen before hrtimers are migrated! */ if (tick_sched_flag_test(ts, TS_FLAG_HIGHRES)) hrtimer_cancel(&ts->sched_timer); =20 - idle_sleeptime =3D ts->idle_sleeptime; - iowait_sleeptime =3D ts->iowait_sleeptime; idle_calls =3D ts->idle_calls; idle_sleeps =3D ts->idle_sleeps; memset(ts, 0, sizeof(*ts)); - ts->idle_sleeptime =3D idle_sleeptime; - ts->iowait_sleeptime =3D iowait_sleeptime; ts->idle_calls =3D idle_calls; ts->idle_sleeps =3D idle_sleeps; } --=20 2.51.1