From nobody Wed Apr 1 09:43:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B8B61311954; Tue, 31 Mar 2026 13:16:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963000; cv=none; b=uuM2+MsA4TAdtT5hkvz5DgHB+WZhDiL4nG0z/9eiTZ1UXUSSmQD6GQCplOMWgEHY9jL3bOBI/Rzma/kPSFjAVZ3VB4vTPNkJdBm4KuNQjgu3T41q9A4qOba0DPPzC5s/tGNTAzrgWtaDQa+GgRH4AyZngzMd0xVSKQOVlz16nsw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963000; c=relaxed/simple; bh=HS19yv/P2kO5uwZtZn6WI3D14aCIjKAvAWdMPBQbuaw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GYOlzwtc7tuS0TfHk2/gYUSEZ0lTZ/Q5zvNo8PtOG/xWfCiPPoRzQUdvG8XVxF8dYF+lE+OsauMwQHOocchy2741tCPjC/+7HrjftC9X3m5mnFBMXIRwG3jw68gDgZ08ZXRJPx/2mwf1tcoh11z829vX8IWmEz8b+ZqGTARe3Gw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QOd19h22; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QOd19h22" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 83B22C2BCB1; Tue, 31 Mar 2026 13:16:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774963000; bh=HS19yv/P2kO5uwZtZn6WI3D14aCIjKAvAWdMPBQbuaw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QOd19h22X/RosrIg36sqR4C547PJmOMUOXt8k01W677Q/+RAOIFeVhFiiIG3+TE7i cEvV7ifj6S5hMQ9Qt0gKuz7aAICwb+07P0jD4HkZsSG4WgYxmcLrxbZ7CNEpwQYFaC 155JAD301rzIjSi9iS9GCaKWPMy24AxgjmpEMnBLjTeqBhSqPcUJgapbOoYDiBGbAW gFIQXoFMXIj7EFweb4d76Xur0LXUrumcFb8U69WbG2lvZNZna2ck9OCggdrGD8j3Mr b4xrQfFkgwpL0VrKHQ2sHNRU6njozzz+aDPA/o+UH+4//C+LDkvpopk8p7AFypemWl 1gbQ+yOSVjQsg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Christophe Leroy (CS GROUP)" , "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Shrikanth Hegde , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 01/14] sched/idle: Handle offlining first in idle loop Date: Tue, 31 Mar 2026 15:16:09 +0200 Message-ID: <20260331131622.30505-2-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260331131622.30505-1-frederic@kernel.org> References: <20260331131622.30505-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Offline handling happens from within the inner idle loop, after the beginning of dyntick cputime accounting, nohz idle load balancing and TIF_NEED_RESCHED polling. This is not necessary and even buggy because: * There is no dyntick handling to do. And calling tick_nohz_idle_enter() messes up with the struct tick_sched reset that was performed on tick_sched_timer_dying(). * There is no nohz idle balancing to do. * Polling on TIF_RESCHED is irrelevant at this stage, there are no more tasks allowed to run. * No need to check if need_resched() before offline handling since stop_machine is done and all per-cpu kthread should be done with their job. Therefore move the offline handling at the beginning of the idle loop. This will also ease the idle cputime unification later by not elapsing idle time while offline through the call to: tick_nohz_idle_enter() -> tick_nohz_start_idle() Signed-off-by: Frederic Weisbecker Reviewed-by: Shrikanth Hegde Tested-by: Shrikanth Hegde Reviewed-by: Rafael J. Wysocki (Intel) --- kernel/sched/idle.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 3681b6ad9276..521a076d8845 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -260,6 +260,14 @@ static void do_idle(void) { int cpu =3D smp_processor_id(); =20 + if (cpu_is_offline(cpu)) { + local_irq_disable(); + /* All per-CPU kernel threads should be done by now. */ + WARN_ON_ONCE(need_resched()); + cpuhp_report_idle_dead(); + arch_cpu_idle_dead(); + } + /* * Check if we need to update blocked load */ @@ -311,11 +319,6 @@ static void do_idle(void) */ local_irq_disable(); =20 - if (cpu_is_offline(cpu)) { - cpuhp_report_idle_dead(); - arch_cpu_idle_dead(); - } - arch_cpu_idle_enter(); rcu_nocb_flush_deferred_wakeup(); =20 --=20 2.53.0 From nobody Wed Apr 1 09:43:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FD093DFC86; Tue, 31 Mar 2026 13:16:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963008; cv=none; b=s6IuxwWt6EeC9znFCjenXK1BGQ0WLZ79h87q5k6bwVvY6fuFkkteXYlmxfMpZTyr8+IqTsKJ3BTs2ig0Wuuhf/mTlioUbnZGXj/pFbxJxVlddSQuC3rvhgT1kjGAFxwSpWRh3y6mR+BnhZbPtyKgaAMiWy7eTa5U2YZsRG6fUPQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963008; c=relaxed/simple; bh=ToJm1amU84B7C3xkdXohgmvzVkmSBy0/3/PURrIxwTw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OyAEAPCaMx2p8yplmTezObc0wvSSSZsMIrALiZ5H08X1p+NX2DhGTMxLqByp+/byw0l3kMEsT96fF5HzaIw+vdYXOw756eJRoD1rWXCTuxBCUBFuPB8WnAgh8+psutsUfwGe7qs/p3dFrfvZ3QUh0PejxtRNaN40L92E0o00sDs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=utJ0Romu; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="utJ0Romu" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D5A4EC2BCB2; Tue, 31 Mar 2026 13:16:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774963008; bh=ToJm1amU84B7C3xkdXohgmvzVkmSBy0/3/PURrIxwTw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=utJ0Romu/Lw2Vfmju3QNiFc/EA0xnYIUlSWdQssecRsJ7FgzJFMgHPfVH2LvAMqja TLf2iG+Sd85g3nB/SE+Jb6dRTNC3m4WMt03izWbNdoYezhFxXjb8wqYBbJ1bUEVQp0 WnXJoAtGAnZHjLjg/rFM1mKp9xtk0vnD+9+2zi/SElWUWNzQn/c+oGTEudxRE+dIje 0INTYB2sNBultg56aZn3AInnqKCzbh6XpCMtAQ5LTN5t7e659wtaW28LUMwyF032f+ w94H++y/FvtDVUGczlKUyuMLCSzrD5tuW13D4sOabe9FDFBbMKMHCmIHkZpN6mBff4 jfFtn4MlU++Lg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Christophe Leroy (CS GROUP)" , "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Shrikanth Hegde , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 02/14] sched/cputime: Remove superfluous and error prone kcpustat_field() parameter Date: Tue, 31 Mar 2026 15:16:10 +0200 Message-ID: <20260331131622.30505-3-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260331131622.30505-1-frederic@kernel.org> References: <20260331131622.30505-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The first parameter to kcpustat_field() is a pointer to the cpu kcpustat to be fetched from. This parameter is error prone because a copy to a kcpustat could be passed by accident instead of the original one. Also the kcpustat structure can already be retrieved with the help of the mandatory CPU argument. Remove the needless parameter. Signed-off-by: Frederic Weisbecker Reviewed-by: Shrikanth Hegde Tested-by: Shrikanth Hegde --- drivers/cpufreq/cpufreq_governor.c | 6 +++--- drivers/macintosh/rack-meter.c | 2 +- include/linux/kernel_stat.h | 8 +++----- kernel/rcu/tree.c | 9 +++------ kernel/rcu/tree_stall.h | 7 +++---- kernel/sched/cputime.c | 5 ++--- 6 files changed, 15 insertions(+), 22 deletions(-) diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_g= overnor.c index 36eb7aee4bcd..ed36d22195f6 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -105,7 +105,7 @@ void gov_update_cpu_data(struct dbs_data *dbs_data) j_cdbs->prev_cpu_idle =3D get_cpu_idle_time(j, &j_cdbs->prev_update_tim= e, dbs_data->io_is_busy); if (dbs_data->ignore_nice_load) - j_cdbs->prev_cpu_nice =3D kcpustat_field(&kcpustat_cpu(j), CPUTIME_NIC= E, j); + j_cdbs->prev_cpu_nice =3D kcpustat_field(CPUTIME_NICE, j); } } } @@ -165,7 +165,7 @@ unsigned int dbs_update(struct cpufreq_policy *policy) j_cdbs->prev_cpu_idle =3D cur_idle_time; =20 if (ignore_nice) { - u64 cur_nice =3D kcpustat_field(&kcpustat_cpu(j), CPUTIME_NICE, j); + u64 cur_nice =3D kcpustat_field(CPUTIME_NICE, j); =20 idle_time +=3D div_u64(cur_nice - j_cdbs->prev_cpu_nice, NSEC_PER_USEC); j_cdbs->prev_cpu_nice =3D cur_nice; @@ -539,7 +539,7 @@ int cpufreq_dbs_governor_start(struct cpufreq_policy *p= olicy) j_cdbs->prev_load =3D 0; =20 if (ignore_nice) - j_cdbs->prev_cpu_nice =3D kcpustat_field(&kcpustat_cpu(j), CPUTIME_NICE= , j); + j_cdbs->prev_cpu_nice =3D kcpustat_field(CPUTIME_NICE, j); } =20 gov->start(policy); diff --git a/drivers/macintosh/rack-meter.c b/drivers/macintosh/rack-meter.c index 8a1e2c08b096..26cb93191ede 100644 --- a/drivers/macintosh/rack-meter.c +++ b/drivers/macintosh/rack-meter.c @@ -87,7 +87,7 @@ static inline u64 get_cpu_idle_time(unsigned int cpu) kcpustat->cpustat[CPUTIME_IOWAIT]; =20 if (rackmeter_ignore_nice) - retval +=3D kcpustat_field(kcpustat, CPUTIME_NICE, cpu); + retval +=3D kcpustat_field(CPUTIME_NICE, cpu); =20 return retval; } diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index b97ce2df376f..dd020ecaf67b 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -100,14 +100,12 @@ static inline unsigned long kstat_cpu_irqs_sum(unsign= ed int cpu) } =20 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN -extern u64 kcpustat_field(struct kernel_cpustat *kcpustat, - enum cpu_usage_stat usage, int cpu); +extern u64 kcpustat_field(enum cpu_usage_stat usage, int cpu); extern void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu); #else -static inline u64 kcpustat_field(struct kernel_cpustat *kcpustat, - enum cpu_usage_stat usage, int cpu) +static inline u64 kcpustat_field(enum cpu_usage_stat usage, int cpu) { - return kcpustat->cpustat[usage]; + return kcpustat_cpu(cpu).cpustat[usage]; } =20 static inline void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 55df6d37145e..3cbf79bee976 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -969,14 +969,11 @@ static int rcu_watching_snap_recheck(struct rcu_data = *rdp) if (rcu_cpu_stall_cputime && rdp->snap_record.gp_seq !=3D rdp->gp_seq) { int cpu =3D rdp->cpu; struct rcu_snap_record *rsrp; - struct kernel_cpustat *kcsp; - - kcsp =3D &kcpustat_cpu(cpu); =20 rsrp =3D &rdp->snap_record; - rsrp->cputime_irq =3D kcpustat_field(kcsp, CPUTIME_IRQ, cpu); - rsrp->cputime_softirq =3D kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu); - rsrp->cputime_system =3D kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu); + rsrp->cputime_irq =3D kcpustat_field(CPUTIME_IRQ, cpu); + rsrp->cputime_softirq =3D kcpustat_field(CPUTIME_SOFTIRQ, cpu); + rsrp->cputime_system =3D kcpustat_field(CPUTIME_SYSTEM, cpu); rsrp->nr_hardirqs =3D kstat_cpu_irqs_sum(cpu) + arch_irq_stat_cpu(cpu); rsrp->nr_softirqs =3D kstat_cpu_softirqs_sum(cpu); rsrp->nr_csw =3D nr_context_switches_cpu(cpu); diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index b67532cb8770..cf7ae51cba40 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -479,7 +479,6 @@ static void print_cpu_stat_info(int cpu) { struct rcu_snap_record rsr, *rsrp; struct rcu_data *rdp =3D per_cpu_ptr(&rcu_data, cpu); - struct kernel_cpustat *kcsp =3D &kcpustat_cpu(cpu); =20 if (!rcu_cpu_stall_cputime) return; @@ -488,9 +487,9 @@ static void print_cpu_stat_info(int cpu) if (rsrp->gp_seq !=3D rdp->gp_seq) return; =20 - rsr.cputime_irq =3D kcpustat_field(kcsp, CPUTIME_IRQ, cpu); - rsr.cputime_softirq =3D kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu); - rsr.cputime_system =3D kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu); + rsr.cputime_irq =3D kcpustat_field(CPUTIME_IRQ, cpu); + rsr.cputime_softirq =3D kcpustat_field(CPUTIME_SOFTIRQ, cpu); + rsr.cputime_system =3D kcpustat_field(CPUTIME_SYSTEM, cpu); =20 pr_err("\t hardirqs softirqs csw/system\n"); pr_err("\t number: %8lld %10d %12lld\n", diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index fbf31db0d2f3..caaaf0a04ced 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -975,10 +975,9 @@ static int kcpustat_field_vtime(u64 *cpustat, return 0; } =20 -u64 kcpustat_field(struct kernel_cpustat *kcpustat, - enum cpu_usage_stat usage, int cpu) +u64 kcpustat_field(enum cpu_usage_stat usage, int cpu) { - u64 *cpustat =3D kcpustat->cpustat; + u64 *cpustat =3D kcpustat_cpu(cpu).cpustat; u64 val =3D cpustat[usage]; struct rq *rq; int err; --=20 2.53.0 From nobody Wed Apr 1 09:43:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D664F3E5ECE; Tue, 31 Mar 2026 13:16:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963015; cv=none; b=YuSHrb3EjWeTucQtYkZaJOEfNLGJJjTPi4rjUWR+t4uNBHCw9uQ+kSq7hNrf9NNlsNcJBcsUCtqPMYUARp95z/mD4ZNwtsPnuimbyWLDRfflZu42ONQJT6ongBf1gCz2nTUcQZJpoNHIJR0vSS9tYkEpf/LGRJKW20mVVCbLFHI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963015; c=relaxed/simple; bh=FIR8yqoziWF4eq4PotWtg16qL3+UqHBknbFqZfHWHpw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rKYzmI88BFC/6FEXXe1E2cdYKMHQJq7KXNZxK1DRWdbsTErTfBWdzjdtLU+lvheTzGQNoZWjLyzb//xm4rsKKUF7eW+nw3K9RHD7+yLhNicewraVX+jRdVAiTh23gxuPCmBVFAnMmcQ6uF8wjwxTws2e7jA+Quw2UN9l/fEPYkE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GjHPWfWe; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GjHPWfWe" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7D595C2BCB1; Tue, 31 Mar 2026 13:16:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774963015; bh=FIR8yqoziWF4eq4PotWtg16qL3+UqHBknbFqZfHWHpw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GjHPWfWeJs98FD35R2fPBM/zzmkEZNP43NEG8bt/MegfdEYdnKCh5WmYWfZWCazWx 1eYNFoge1C83N5pjm7tgqvVKBJJIJRujQVxC2NU9bTTPEF+1d64YqtUL269s5YfVa0 8ElSA+AtQL/V7u1rkD5dWiQKPOxdOP7t/0Z6e50NBTmXRGV0f4MirErofvjArrGC2M FGP9F59UajMqE/n/m2ctoV1sZdHnu9OVUoPdt7cyOYJOPrSyYMjII4xNScaX/VzAp9 3E8VdNHxps0W5rAPdb5wb3gJKH8LQcOpxjsQJ20r6sJjnwOOOF1L99TDlzgW6zjXqH 8Zl0g0x06IITQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Christophe Leroy (CS GROUP)" , "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Shrikanth Hegde , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 03/14] sched/cputime: Correctly support generic vtime idle time Date: Tue, 31 Mar 2026 15:16:11 +0200 Message-ID: <20260331131622.30505-4-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260331131622.30505-1-frederic@kernel.org> References: <20260331131622.30505-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently whether generic vtime is running or not, the idle cputime is fetched from the nohz accounting. However generic vtime already does its own idle cputime accounting. Only the kernel stat accessors are not plugged to support it. Read the idle generic vtime cputime when it's running, this will allow to later more clearly split nohz and vtime cputime accounting. Signed-off-by: Frederic Weisbecker Tested-by: Shrikanth Hegde --- include/linux/vtime.h | 9 +++++++-- kernel/sched/cputime.c | 38 +++++++++++++++++++++++++++++--------- kernel/time/tick-sched.c | 12 +++++++++--- 3 files changed, 45 insertions(+), 14 deletions(-) diff --git a/include/linux/vtime.h b/include/linux/vtime.h index 29dd5b91dd7d..336875bea767 100644 --- a/include/linux/vtime.h +++ b/include/linux/vtime.h @@ -10,7 +10,6 @@ */ #ifdef CONFIG_VIRT_CPU_ACCOUNTING extern void vtime_account_kernel(struct task_struct *tsk); -extern void vtime_account_idle(struct task_struct *tsk); #endif /* !CONFIG_VIRT_CPU_ACCOUNTING */ =20 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN @@ -27,7 +26,13 @@ static inline void vtime_guest_exit(struct task_struct *= tsk) { } static inline void vtime_init_idle(struct task_struct *tsk, int cpu) { } #endif =20 +static inline bool vtime_generic_enabled_cpu(int cpu) +{ + return context_tracking_enabled_cpu(cpu); +} + #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE +extern void vtime_account_idle(struct task_struct *tsk); extern void vtime_account_irq(struct task_struct *tsk, unsigned int offset= ); extern void vtime_account_softirq(struct task_struct *tsk); extern void vtime_account_hardirq(struct task_struct *tsk); @@ -74,7 +79,7 @@ static inline bool vtime_accounting_enabled(void) =20 static inline bool vtime_accounting_enabled_cpu(int cpu) { - return context_tracking_enabled_cpu(cpu); + return vtime_generic_enabled_cpu(cpu); } =20 static inline bool vtime_accounting_enabled_this_cpu(void) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index caaaf0a04ced..d91b495457ec 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -773,9 +773,9 @@ void vtime_guest_exit(struct task_struct *tsk) } EXPORT_SYMBOL_GPL(vtime_guest_exit); =20 -void vtime_account_idle(struct task_struct *tsk) +static void __vtime_account_idle(struct vtime *vtime) { - account_idle_time(get_vtime_delta(&tsk->vtime)); + account_idle_time(get_vtime_delta(vtime)); } =20 void vtime_task_switch_generic(struct task_struct *prev) @@ -784,7 +784,7 @@ void vtime_task_switch_generic(struct task_struct *prev) =20 write_seqcount_begin(&vtime->seqcount); if (vtime->state =3D=3D VTIME_IDLE) - vtime_account_idle(prev); + __vtime_account_idle(vtime); else __vtime_account_kernel(prev, vtime); vtime->state =3D VTIME_INACTIVE; @@ -926,6 +926,7 @@ static int kcpustat_field_vtime(u64 *cpustat, int cpu, u64 *val) { struct vtime *vtime =3D &tsk->vtime; + struct rq *rq =3D cpu_rq(cpu); unsigned int seq; =20 do { @@ -967,6 +968,14 @@ static int kcpustat_field_vtime(u64 *cpustat, if (state =3D=3D VTIME_GUEST && task_nice(tsk) > 0) *val +=3D vtime->gtime + vtime_delta(vtime); break; + case CPUTIME_IDLE: + if (state =3D=3D VTIME_IDLE && !atomic_read(&rq->nr_iowait)) + *val +=3D vtime_delta(vtime); + break; + case CPUTIME_IOWAIT: + if (state =3D=3D VTIME_IDLE && atomic_read(&rq->nr_iowait) > 0) + *val +=3D vtime_delta(vtime); + break; default: break; } @@ -1029,8 +1038,8 @@ static int kcpustat_cpu_fetch_vtime(struct kernel_cpu= stat *dst, *dst =3D *src; cpustat =3D dst->cpustat; =20 - /* Task is sleeping, dead or idle, nothing to add */ - if (state < VTIME_SYS) + /* Task is sleeping or dead, nothing to add */ + if (state < VTIME_IDLE) continue; =20 delta =3D vtime_delta(vtime); @@ -1039,15 +1048,17 @@ static int kcpustat_cpu_fetch_vtime(struct kernel_c= pustat *dst, * Task runs either in user (including guest) or kernel space, * add pending nohz time to the right place. */ - if (state =3D=3D VTIME_SYS) { + switch (vtime->state) { + case VTIME_SYS: cpustat[CPUTIME_SYSTEM] +=3D vtime->stime + delta; - } else if (state =3D=3D VTIME_USER) { + break; + case VTIME_USER: if (task_nice(tsk) > 0) cpustat[CPUTIME_NICE] +=3D vtime->utime + delta; else cpustat[CPUTIME_USER] +=3D vtime->utime + delta; - } else { - WARN_ON_ONCE(state !=3D VTIME_GUEST); + break; + case VTIME_GUEST: if (task_nice(tsk) > 0) { cpustat[CPUTIME_GUEST_NICE] +=3D vtime->gtime + delta; cpustat[CPUTIME_NICE] +=3D vtime->gtime + delta; @@ -1055,6 +1066,15 @@ static int kcpustat_cpu_fetch_vtime(struct kernel_cp= ustat *dst, cpustat[CPUTIME_GUEST] +=3D vtime->gtime + delta; cpustat[CPUTIME_USER] +=3D vtime->gtime + delta; } + break; + case VTIME_IDLE: + if (atomic_read(&cpu_rq(cpu)->nr_iowait) > 0) + cpustat[CPUTIME_IOWAIT] +=3D delta; + else + cpustat[CPUTIME_IDLE] +=3D delta; + break; + default: + WARN_ON_ONCE(1); } } while (read_seqcount_retry(&vtime->seqcount, seq)); =20 diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index f7907fadd63f..7224a50d9c44 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -783,9 +783,10 @@ static void tick_nohz_start_idle(struct tick_sched *ts) sched_clock_idle_sleep_event(); } =20 -static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime, +static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usage_stat idx, ktime_t= *sleeptime, bool compute_delta, u64 *last_update_time) { + struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); ktime_t now, idle; unsigned int seq; =20 @@ -796,6 +797,11 @@ static u64 get_cpu_sleep_time_us(struct tick_sched *ts= , ktime_t *sleeptime, if (last_update_time) *last_update_time =3D ktime_to_us(now); =20 + if (vtime_generic_enabled_cpu(cpu)) { + idle =3D kcpustat_field(idx, cpu); + return ktime_to_us(idle); + } + do { seq =3D read_seqcount_begin(&ts->idle_sleeptime_seq); =20 @@ -833,7 +839,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) { struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); =20 - return get_cpu_sleep_time_us(ts, &ts->idle_sleeptime, + return get_cpu_sleep_time_us(cpu, CPUTIME_IDLE, &ts->idle_sleeptime, !nr_iowait_cpu(cpu), last_update_time); } EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); @@ -859,7 +865,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_ti= me) { struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); =20 - return get_cpu_sleep_time_us(ts, &ts->iowait_sleeptime, + return get_cpu_sleep_time_us(cpu, CPUTIME_IOWAIT, &ts->iowait_sleeptime, nr_iowait_cpu(cpu), last_update_time); } EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us); --=20 2.53.0 From nobody Wed Apr 1 09:43:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C08B3B47D7; Tue, 31 Mar 2026 13:17:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963023; cv=none; b=fWWfpRIDXWECrpbTZa1Hrna7aKfkXVpjlGxO8z5ksKvxFJ+SiNna6INOG6C9hqZ1mKVVzp8vk5fS+PJMEDIslRyfmplDR0up/btRnJr4krzVME7ERcrzmIEC9Aj4lTjJhJKjjqlwaLZwGxToZqFxRFMnTj6ocDits5ERDmvFlFo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963023; c=relaxed/simple; bh=3JGza6TEAhXjrijNJ2v2eGWf9KE7T/4fipw25vHTRnU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KjundfHDGv82fn48il8n1Rv5cfNQbIU1wI2B1LJuff21t5seboguCqJuTzbCeW9Qb5usIh6hG2EGq0vNgV8APHUqVvrWZZvcZA8IueZN45fzAfRTppmsiPYoVyK8ec08B/C7MgZDFr98/F8dNiJlzGu2WBfPDiaU/tizGpxJDjs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=doY1r4lb; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="doY1r4lb" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0E89DC2BCB1; Tue, 31 Mar 2026 13:16:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774963023; bh=3JGza6TEAhXjrijNJ2v2eGWf9KE7T/4fipw25vHTRnU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=doY1r4lbmnoEkhdUq/s6RcT8z+9x9eEAV7C5WI9HDy3AfoCMn0B1xHRV3ENfZJmC8 O4ksxko+GCSlaH3rSSUzugQdojRmm5BLBr0eb2AjWzHIEaM8ltZr8Eilg/JruBmn1G u35tngVYQjeQZMggBo03qZP5vzr1pnw5ESHSyvnzbmgx052jV35aQ8QB0WyiDsCtdl JUCfA5rFQ/jtluXSjaMgjeLWwaWDLEV87RJCA6eRNc4G0JgriBeMg7Th/otuFupIfu cC7ANEwSHfxt2grOivxUGdaiUDbwLy6DJx1h/mnArkKs6kfxB+bH6gV5/J+vg3HnK7 4SRXMXsaAGqUw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Christophe Leroy (CS GROUP)" , "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Shrikanth Hegde , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 04/14] powerpc/time: Prepare to stop elapsing in dynticks-idle Date: Tue, 31 Mar 2026 15:16:12 +0200 Message-ID: <20260331131622.30505-5-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260331131622.30505-1-frederic@kernel.org> References: <20260331131622.30505-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently the tick subsystem stores the idle cputime accounting in private fields, allowing cohabitation with architecture idle vtime accounting. The former is fetched on online CPUs, the latter on offline CPUs. For consolidation purpose, architecture vtime accounting will continue to account the cputime but will make a break when the idle tick is stopped. The dyntick cputime accounting will then be relayed by the tick subsystem so that the idle cputime is still seen advancing coherently even when the tick isn't there to flush the idle vtime. Prepare for that and introduce three new APIs which will be used in subsequent patches: _ vtime_dynticks_start() is deemed to be called when idle enters in dyntick mode. The idle cputime that elapsed so far is accumulated. - vtime_dynticks_stop() is deemed to be called when idle exits from dyntick mode. The vtime entry clocks are fast-forward to current time so that idle accounting restarts elapsing from now. - vtime_reset() is deemed to be called from dynticks idle IRQ entry to fast-forward the clock to current time so that the IRQ time is still accounted by vtime while nohz cputime is paused. Also accumulated vtime won't be flushed from dyntick-idle ticks to avoid accounting twice the idle cputime, along with nohz accounting. Signed-off-by: Frederic Weisbecker Reviewed-by: Shrikanth Hegde Tested-by: Shrikanth Hegde --- arch/powerpc/kernel/time.c | 41 ++++++++++++++++++++++++++++++++++++++ include/linux/vtime.h | 6 ++++++ 2 files changed, 47 insertions(+) diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index 4bbeb8644d3d..18506740f4a4 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -376,6 +376,47 @@ void vtime_task_switch(struct task_struct *prev) acct->starttime =3D acct0->starttime; } } + +#ifdef CONFIG_NO_HZ_COMMON +/** + * vtime_reset - Fast forward vtime entry clocks + * + * Called from dynticks idle IRQ entry to fast-forward the clocks to curre= nt time + * so that the IRQ time is still accounted by vtime while nohz cputime is = paused. + */ +void vtime_reset(void) +{ + struct cpu_accounting_data *acct =3D get_accounting(current); + + acct->starttime =3D mftb(); +#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME + acct->startspurr =3D read_spurr(acct->starttime); +#endif +} + +/** + * vtime_dyntick_start - Inform vtime about entry to idle-dynticks + * + * Called when idle enters in dyntick mode. The idle cputime that elapsed = so far + * is accumulated and the tick subsystem takes over the idle cputime accou= nting. + */ +void vtime_dyntick_start(void) +{ + vtime_account_idle(current); +} + +/** + * vtime_dyntick_stop - Inform vtime about exit from idle-dynticks + * + * Called when idle exits from dyntick mode. The vtime entry clocks are + * fast-forward to current time so that idle accounting restarts elapsing = from + * now. + */ +void vtime_dyntick_stop(void) +{ + vtime_reset(); +} +#endif /* CONFIG_NO_HZ_COMMON */ #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */ =20 void __no_kcsan __delay(unsigned long loops) diff --git a/include/linux/vtime.h b/include/linux/vtime.h index 336875bea767..61b94c12d7dd 100644 --- a/include/linux/vtime.h +++ b/include/linux/vtime.h @@ -37,11 +37,17 @@ extern void vtime_account_irq(struct task_struct *tsk, = unsigned int offset); extern void vtime_account_softirq(struct task_struct *tsk); extern void vtime_account_hardirq(struct task_struct *tsk); extern void vtime_flush(struct task_struct *tsk); +extern void vtime_reset(void); +extern void vtime_dyntick_start(void); +extern void vtime_dyntick_stop(void); #else /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */ static inline void vtime_account_irq(struct task_struct *tsk, unsigned int= offset) { } static inline void vtime_account_softirq(struct task_struct *tsk) { } static inline void vtime_account_hardirq(struct task_struct *tsk) { } static inline void vtime_flush(struct task_struct *tsk) { } +static inline void vtime_reset(void) { } +static inline void vtime_dyntick_start(void) { } +extern inline void vtime_dyntick_stop(void) { } #endif =20 /* --=20 2.53.0 From nobody Wed Apr 1 09:43:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0F3E3DFC86; Tue, 31 Mar 2026 13:17:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963030; cv=none; b=qYIE35yYiA8MKdD5MTEb13qEsL9iQ1snFwCnR1bdwIC2cJnzxHNKeL6Cwz4xIpXZb3cEcCaRK5d+Ptw64aMyPIZ2hE51ij/HDk2Ume9U6P/M+AaRdO4+9A42Q1l1ys9gyt4+j90CtWTPCfrWK1KLbGh2RTlNt2t/ezfD08J7LyY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963030; c=relaxed/simple; bh=bK9FrHwYdtzlfjtR+gYpB517UvY6FDy8GPtD0U1/TCo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A7Q4vYTpqWB7j+TR8IDbeHATItqfUNpINI8/AE0rXoIq97Ew+E8QFOYtzEOcGZdwRaxLo9x4TnEnzTa9NztWnBq4loeM9WQ8y1XxrMNoQerOD5HnALMcAjuCUmMfA46wVmpRBx77g5T+8Fh71GEjpRT4Iio1PZGVQ6FWjjG93CI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QOr8z5aO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QOr8z5aO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7ECAEC19423; Tue, 31 Mar 2026 13:17:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774963030; bh=bK9FrHwYdtzlfjtR+gYpB517UvY6FDy8GPtD0U1/TCo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QOr8z5aObamx4kProSnztxVwdQlHn9xLwz3QWRSh54AQnpHHqbKZLhGNZ8tOlSwht aJGFYaGh7Q8hKi6bouEvSeeWyEytZ8tkdEXdbWJeS17SWm9fmh2LJgL2pGW/wVorf+ YdlB4Zfd9HeHHgTa5TyngS/chK8jPejZkVxL69tKzNc7I0fQRRml/6UTYbWisjBoS1 xNp9oJkag50xrbV02FIhJ324cQz44XqfTCYVw9H+j3NYGcZWEGWNYp/YjAwqdhjjMp 1LF9/OTtA4QLU8bI0+lmlPgD8uU4C7vLjIiylBJx9BlT/+3U2dCvQd0TySn++3HPIo fCmn8xbE6Rdkg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Christophe Leroy (CS GROUP)" , "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Shrikanth Hegde , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 05/14] s390/time: Prepare to stop elapsing in dynticks-idle Date: Tue, 31 Mar 2026 15:16:13 +0200 Message-ID: <20260331131622.30505-6-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260331131622.30505-1-frederic@kernel.org> References: <20260331131622.30505-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently the tick subsystem stores the idle cputime accounting in private fields, allowing cohabitation with architecture idle vtime accounting. The former is fetched on online CPUs, the latter on offline CPUs. For consolidation purpose, architecture vtime accounting will continue to account the cputime but will make a break when the idle tick is stopped. The dyntick cputime accounting will then be relayed by the tick subsystem so that the idle cputime is still seen advancing coherently even when the tick isn't there to flush the idle vtime. Prepare for that and introduce three new APIs which will be used in subsequent patches: _ vtime_dynticks_start() is deemed to be called when idle enters in dyntick mode. The idle cputime that elapsed so far is accumulated and accounted. Also idle time accounting is ignored. - vtime_dynticks_stop() is deemed to be called when idle exits from dyntick mode. The vtime entry clocks are fast-forward to current time so that idle accounting restarts elapsing from now. Also idle time accounting is resumed. - vtime_reset() is deemed to be called from dynticks idle IRQ entry to fast-forward the clock to current time so that the IRQ time is still accounted by vtime while nohz cputime is paused. Also accumulated vtime won't be flushed from dyntick-idle ticks to avoid accounting twice the idle cputime, along with nohz accounting. Signed-off-by: Frederic Weisbecker Co-developed-by: Heiko Carstens Signed-off-by: Heiko Carstens Tested-by: Shrikanth Hegde --- arch/s390/include/asm/idle.h | 2 ++ arch/s390/kernel/idle.c | 5 +++- arch/s390/kernel/vtime.c | 57 ++++++++++++++++++++++++++++++++---- 3 files changed, 57 insertions(+), 7 deletions(-) diff --git a/arch/s390/include/asm/idle.h b/arch/s390/include/asm/idle.h index 32536ee34aa0..e4ad09a22400 100644 --- a/arch/s390/include/asm/idle.h +++ b/arch/s390/include/asm/idle.h @@ -8,10 +8,12 @@ #ifndef _S390_IDLE_H #define _S390_IDLE_H =20 +#include #include #include =20 struct s390_idle_data { + bool idle_dyntick; unsigned long idle_count; unsigned long idle_time; unsigned long clock_idle_enter; diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c index 1f1b06b6b4ef..4685d7c5bc51 100644 --- a/arch/s390/kernel/idle.c +++ b/arch/s390/kernel/idle.c @@ -31,7 +31,10 @@ void account_idle_time_irq(void) /* Account time spent with enabled wait psw loaded as idle time. */ __atomic64_add(idle_time, &idle->idle_time); __atomic64_add_const(1, &idle->idle_count); - account_idle_time(cputime_to_nsecs(idle_time)); + + /* Dyntick idle time accounted by nohz/scheduler */ + if (!idle->idle_dyntick) + account_idle_time(cputime_to_nsecs(idle_time)); } =20 void noinstr arch_cpu_idle(void) diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c index bf48744d0912..b1c7700d082c 100644 --- a/arch/s390/kernel/vtime.c +++ b/arch/s390/kernel/vtime.c @@ -17,6 +17,7 @@ #include #include #include +#include #include =20 #include "entry.h" @@ -110,6 +111,16 @@ static void account_system_index_scaled(struct task_st= ruct *p, u64 cputime, account_system_index_time(p, cputime_to_nsecs(cputime), index); } =20 +static inline void vtime_reset_last_update(struct lowcore *lc) +{ + asm volatile( + " stpt %0\n" /* Store current cpu timer value */ + " stckf %1" /* Store current tod clock value */ + : "=3DQ" (lc->last_update_timer), + "=3DQ" (lc->last_update_clock) + : : "cc"); +} + /* * Update process times based on virtual cpu times stored by entry.S * to the lowcore fields user_timer, system_timer & steal_clock. @@ -121,12 +132,9 @@ static int do_account_vtime(struct task_struct *tsk) =20 timer =3D lc->last_update_timer; clock =3D lc->last_update_clock; - asm volatile( - " stpt %0\n" /* Store current cpu timer value */ - " stckf %1" /* Store current tod clock value */ - : "=3DQ" (lc->last_update_timer), - "=3DQ" (lc->last_update_clock) - : : "cc"); + + vtime_reset_last_update(lc); + clock =3D lc->last_update_clock - clock; timer -=3D lc->last_update_timer; =20 @@ -239,6 +247,43 @@ void vtime_account_hardirq(struct task_struct *tsk) get_lowcore()->hardirq_timer +=3D vtime_delta(); } =20 +#ifdef CONFIG_NO_HZ_COMMON +/** + * vtime_reset - Fast forward vtime entry clocks + * + * Called from dynticks idle IRQ entry to fast-forward the clocks to curre= nt time + * so that the IRQ time is still accounted by vtime while nohz cputime is = paused. + */ +void vtime_reset(void) +{ + vtime_reset_last_update(get_lowcore()); +} + +/** + * vtime_dyntick_start - Inform vtime about entry to idle-dynticks + * + * Called when idle enters in dyntick mode. The idle cputime that elapsed = so far + * is flushed and the tick subsystem takes over the idle cputime accountin= g. + */ +void vtime_dyntick_start(void) +{ + __this_cpu_write(s390_idle.idle_dyntick, true); + vtime_flush(current); +} + +/** + * vtime_dyntick_stop - Inform vtime about exit from idle-dynticks + * + * Called when idle exits from dyntick mode. The vtime entry clocks are + * fast-forward to current time and idle accounting resumes. + */ +void vtime_dyntick_stop(void) +{ + vtime_reset_last_update(get_lowcore()); + __this_cpu_write(s390_idle.idle_dyntick, false); +} +#endif /* CONFIG_NO_HZ_COMMON */ + /* * Sorted add to a list. List is linear searched until first bigger * element is found. --=20 2.53.0 From nobody Wed Apr 1 09:43:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 212C73DD50E; Tue, 31 Mar 2026 13:17:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963038; cv=none; b=j/ByVqn6n4iQE5r4dqy/KS/gL1AFe3/xyIaK5U5G2QbtzDaXOuMlf8S6efJepLZxH5p7djtcquKEOjQvLMynVtd7zA6Qq1ghJd+m4nhsZKoGPl8YczGv7RtO4o6JfmHXQW0K27Kkvo+YbJtmaiQxiavDkxA+E2eyyZhFpqG97hE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963038; c=relaxed/simple; bh=V9WWlCuXZNggckfE8ISKLn5MtYlSjxloP98rAv/GZCY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eSosWoQZTK3lG3rEXnuNeM4B++z/EiIllf5iGw/A0W+OADUERN8ue79M7S6fCX03RFRvJ5gKMmac+1wHZuRpXsMxe4Uf/xgtDgHQlXnsxejRCRCLsqjw4yXuT+0nbARR4JSM8cv8sK2vwFNcwhJwYMEoYcetxus7ExLCwVqI54M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NjGfNqTY; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NjGfNqTY" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BADD9C2BCB1; Tue, 31 Mar 2026 13:17:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774963037; bh=V9WWlCuXZNggckfE8ISKLn5MtYlSjxloP98rAv/GZCY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NjGfNqTYHCzBAhZNeQV9Q40UUniXY9uuLVdzLWvSMLMhbRugEjhJu3ils32yJw5RF N5PuQTzAZCRY+T199jKi2tftd+yEar6O1Zioy65RHqDYV9dSBaIbnQXMl5bqyREhsz L1LScojWu7ea7BokCXuN/GiLarxuj60oYAta4orXHXQnoOdpE0F4FLG+QRUSm2SAek Bye+xyVm7XAngDFlfXQ/XtcsmKrUgOJMh9prUycIjBkE9zoFoSf7yQieM2kVXLHp1R UUPXnxVn3Mco7OT5dXtCysFDsvgQnXH+QRKIUv9Q4df2c5bHrhrVXYP2uQE/pEvk3c 7df93M7DZPsIg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Christophe Leroy (CS GROUP)" , "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Shrikanth Hegde , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 06/14] tick/sched: Unify idle cputime accounting Date: Tue, 31 Mar 2026 15:16:14 +0200 Message-ID: <20260331131622.30505-7-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260331131622.30505-1-frederic@kernel.org> References: <20260331131622.30505-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The non-vtime dynticks-idle cputime accounting is a big mess that accumulates within two concurrent statistics, each having their own shortcomings: * The accounting for online CPUs which is based on the delta between tick_nohz_start_idle() and tick_nohz_stop_idle(). Pros: - Works when the tick is off - Has nsecs granularity Cons: - Account idle steal time but doesn't substract it from idle cputime. - Assumes CONFIG_IRQ_TIME_ACCOUNTING by not accounting IRQs but the IRQ time is simply ignored when CONFIG_IRQ_TIME_ACCOUNTING=3Dn - The windows between 1) idle task scheduling and the first call to tick_nohz_start_idle() and 2) idle task between the last tick_nohz_stop_idle() and the rest of the idle time are blindspots wrt. cputime accounting (though mostly insignificant amount) - Relies on private fields outside of kernel stats, with specific accessors. * The accounting for offline CPUs which is based on ticks and the jiffies delta during which the tick was stopped. Pros: - Handles steal time correctly - Handle CONFIG_IRQ_TIME_ACCOUNTING=3Dy and CONFIG_IRQ_TIME_ACCOUNTING=3Dn correctly. - Handles the whole idle task - Accounts directly to kernel stats, without midlayer accumulator. Cons: - Doesn't elapse when the tick is off, which doesn't make it suitable for online CPUs. - Has TICK_NSEC granularity (jiffies) - Needs to track the dyntick-idle ticks that were accounted and substract them from the total jiffies time spent while the tick was stopped. This is an ugly workaround. Having two different accounting for a single context is not the only problem: since those accountings are of different natures, it is possible to observe the global idle time going backward after a CPU goes offline. Clean up the situation with introducing a hybrid approach that stays coherent and works for both online and offline CPUs: * Tick based or native vtime accounting operate before the idle loop is entered and resume once the idle loop prepares to exit. * When the idle loop starts, switch to dynticks-idle accounting as is done currently, except that the statistics accumulate directly to the relevant kernel stat fields. * Private dyntick cputime accounting fields are removed. * Works on both online and offline case. Further improvement will include: * Only switch to dynticks-idle cputime accounting when the tick actually goes in dynticks mode. * Handle CONFIG_IRQ_TIME_ACCOUNTING=3Dn correctly such that the dynticks-idle accounting still elapses while on IRQs. * Correctly substract idle steal cputime from idle time Reported-by: Xin Zhao Signed-off-by: Frederic Weisbecker Tested-by: Shrikanth Hegde --- include/linux/kernel_stat.h | 24 ++++++++++--- include/linux/vtime.h | 7 +++- kernel/sched/cputime.c | 62 ++++++++++++++++---------------- kernel/time/tick-sched.c | 71 +++++++++++-------------------------- 4 files changed, 76 insertions(+), 88 deletions(-) diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index dd020ecaf67b..ba65aad308a1 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -34,6 +34,9 @@ enum cpu_usage_stat { }; =20 struct kernel_cpustat { +#ifdef CONFIG_NO_HZ_COMMON + int idle_dyntick; +#endif u64 cpustat[NR_STATS]; }; =20 @@ -99,6 +102,20 @@ static inline unsigned long kstat_cpu_irqs_sum(unsigned= int cpu) return kstat_cpu(cpu).irqs_sum; } =20 +#ifdef CONFIG_NO_HZ_COMMON +extern void kcpustat_dyntick_start(void); +extern void kcpustat_dyntick_stop(void); +static inline bool kcpustat_idle_dyntick(void) +{ + return __this_cpu_read(kernel_cpustat.idle_dyntick); +} +#else +static inline bool kcpustat_idle_dyntick(void) +{ + return false; +} +#endif /* CONFIG_NO_HZ_COMMON */ + #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN extern u64 kcpustat_field(enum cpu_usage_stat usage, int cpu); extern void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu); @@ -113,7 +130,7 @@ static inline void kcpustat_cpu_fetch(struct kernel_cpu= stat *dst, int cpu) *dst =3D kcpustat_cpu(cpu); } =20 -#endif +#endif /* !CONFIG_VIRT_CPU_ACCOUNTING_GEN */ =20 extern void account_user_time(struct task_struct *, u64); extern void account_guest_time(struct task_struct *, u64); @@ -127,14 +144,13 @@ extern u64 get_idle_time(struct kernel_cpustat *kcs, = int cpu); #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE static inline void account_process_tick(struct task_struct *tsk, int user) { - vtime_flush(tsk); + if (!kcpustat_idle_dyntick()) + vtime_flush(tsk); } #else extern void account_process_tick(struct task_struct *, int user); #endif =20 -extern void account_idle_ticks(unsigned long ticks); - #ifdef CONFIG_SCHED_CORE extern void __account_forceidle_time(struct task_struct *tsk, u64 delta); #endif diff --git a/include/linux/vtime.h b/include/linux/vtime.h index 61b94c12d7dd..a4506336002d 100644 --- a/include/linux/vtime.h +++ b/include/linux/vtime.h @@ -31,6 +31,11 @@ static inline bool vtime_generic_enabled_cpu(int cpu) return context_tracking_enabled_cpu(cpu); } =20 +static inline bool vtime_generic_enabled_this_cpu(void) +{ + return context_tracking_enabled_this_cpu(); +} + #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE extern void vtime_account_idle(struct task_struct *tsk); extern void vtime_account_irq(struct task_struct *tsk, unsigned int offset= ); @@ -90,7 +95,7 @@ static inline bool vtime_accounting_enabled_cpu(int cpu) =20 static inline bool vtime_accounting_enabled_this_cpu(void) { - return context_tracking_enabled_this_cpu(); + return vtime_generic_enabled_this_cpu(); } =20 extern void vtime_task_switch_generic(struct task_struct *prev); diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index d91b495457ec..4934c537f5e3 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -414,16 +414,30 @@ static void irqtime_account_process_tick(struct task_= struct *p, int user_tick, } } =20 -static void irqtime_account_idle_ticks(int ticks) -{ - irqtime_account_process_tick(current, 0, ticks); -} #else /* !CONFIG_IRQ_TIME_ACCOUNTING: */ -static inline void irqtime_account_idle_ticks(int ticks) { } static inline void irqtime_account_process_tick(struct task_struct *p, int= user_tick, int nr_ticks) { } #endif /* !CONFIG_IRQ_TIME_ACCOUNTING */ =20 +#ifdef CONFIG_NO_HZ_COMMON +void kcpustat_dyntick_start(void) +{ + if (!vtime_generic_enabled_this_cpu()) { + vtime_dyntick_start(); + __this_cpu_write(kernel_cpustat.idle_dyntick, 1); + } +} + +void kcpustat_dyntick_stop(void) +{ + if (!vtime_generic_enabled_this_cpu()) { + __this_cpu_write(kernel_cpustat.idle_dyntick, 0); + vtime_dyntick_stop(); + steal_account_process_time(ULONG_MAX); + } +} +#endif /* CONFIG_NO_HZ_COMMON */ + /* * Use precise platform statistics if available: */ @@ -437,11 +451,15 @@ void vtime_account_irq(struct task_struct *tsk, unsig= ned int offset) vtime_account_hardirq(tsk); } else if (pc & SOFTIRQ_OFFSET) { vtime_account_softirq(tsk); - } else if (!IS_ENABLED(CONFIG_HAVE_VIRT_CPU_ACCOUNTING_IDLE) && - is_idle_task(tsk)) { - vtime_account_idle(tsk); + } else if (!kcpustat_idle_dyntick()) { + if (!IS_ENABLED(CONFIG_HAVE_VIRT_CPU_ACCOUNTING_IDLE) && + is_idle_task(tsk)) { + vtime_account_idle(tsk); + } else { + vtime_account_kernel(tsk); + } } else { - vtime_account_kernel(tsk); + vtime_reset(); } } =20 @@ -483,6 +501,9 @@ void account_process_tick(struct task_struct *p, int us= er_tick) if (vtime_accounting_enabled_this_cpu()) return; =20 + if (kcpustat_idle_dyntick()) + return; + if (irqtime_enabled()) { irqtime_account_process_tick(p, user_tick, 1); return; @@ -504,29 +525,6 @@ void account_process_tick(struct task_struct *p, int u= ser_tick) account_idle_time(cputime); } =20 -/* - * Account multiple ticks of idle time. - * @ticks: number of stolen ticks - */ -void account_idle_ticks(unsigned long ticks) -{ - u64 cputime, steal; - - if (irqtime_enabled()) { - irqtime_account_idle_ticks(ticks); - return; - } - - cputime =3D ticks * TICK_NSEC; - steal =3D steal_account_process_time(ULONG_MAX); - - if (steal >=3D cputime) - return; - - cputime -=3D steal; - account_idle_time(cputime); -} - /* * Adjust tick based cputime random precision against scheduler runtime * accounting. diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 7224a50d9c44..2c0f0b81f452 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -285,8 +285,6 @@ static void tick_sched_handle(struct tick_sched *ts, st= ruct pt_regs *regs) if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) { touch_softlockup_watchdog_sched(); - if (is_idle_task(current)) - ts->idle_jiffies++; /* * In case the current tick fired too early past its expected * expiration, make sure we don't bypass the next clock reprogramming @@ -753,8 +751,12 @@ static void tick_nohz_update_jiffies(ktime_t now) =20 static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now) { + u64 *cpustat =3D kcpustat_this_cpu->cpustat; ktime_t delta; =20 + if (vtime_generic_enabled_this_cpu()) + return; + if (WARN_ON_ONCE(!tick_sched_flag_test(ts, TS_FLAG_IDLE_ACTIVE))) return; =20 @@ -762,9 +764,9 @@ static void tick_nohz_stop_idle(struct tick_sched *ts, = ktime_t now) =20 write_seqcount_begin(&ts->idle_sleeptime_seq); if (nr_iowait_cpu(smp_processor_id()) > 0) - ts->iowait_sleeptime =3D ktime_add(ts->iowait_sleeptime, delta); + cpustat[CPUTIME_IOWAIT] =3D ktime_add(cpustat[CPUTIME_IOWAIT], delta); else - ts->idle_sleeptime =3D ktime_add(ts->idle_sleeptime, delta); + cpustat[CPUTIME_IDLE] =3D ktime_add(cpustat[CPUTIME_IDLE], delta); =20 ts->idle_entrytime =3D now; tick_sched_flag_clear(ts, TS_FLAG_IDLE_ACTIVE); @@ -775,18 +777,21 @@ static void tick_nohz_stop_idle(struct tick_sched *ts= , ktime_t now) =20 static void tick_nohz_start_idle(struct tick_sched *ts) { + if (vtime_generic_enabled_this_cpu()) + return; + write_seqcount_begin(&ts->idle_sleeptime_seq); ts->idle_entrytime =3D ktime_get(); tick_sched_flag_set(ts, TS_FLAG_IDLE_ACTIVE); write_seqcount_end(&ts->idle_sleeptime_seq); - sched_clock_idle_sleep_event(); } =20 -static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usage_stat idx, ktime_t= *sleeptime, +static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usage_stat idx, bool compute_delta, u64 *last_update_time) { struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); + u64 *cpustat =3D kcpustat_cpu(cpu).cpustat; ktime_t now, idle; unsigned int seq; =20 @@ -808,9 +813,9 @@ static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usag= e_stat idx, ktime_t *slee if (tick_sched_flag_test(ts, TS_FLAG_IDLE_ACTIVE) && compute_delta) { ktime_t delta =3D ktime_sub(now, ts->idle_entrytime); =20 - idle =3D ktime_add(*sleeptime, delta); + idle =3D ktime_add(cpustat[idx], delta); } else { - idle =3D *sleeptime; + idle =3D cpustat[idx]; } } while (read_seqcount_retry(&ts->idle_sleeptime_seq, seq)); =20 @@ -837,9 +842,7 @@ static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usag= e_stat idx, ktime_t *slee */ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) { - struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); - - return get_cpu_sleep_time_us(cpu, CPUTIME_IDLE, &ts->idle_sleeptime, + return get_cpu_sleep_time_us(cpu, CPUTIME_IDLE, !nr_iowait_cpu(cpu), last_update_time); } EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); @@ -863,9 +866,7 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); */ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) { - struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); - - return get_cpu_sleep_time_us(cpu, CPUTIME_IOWAIT, &ts->iowait_sleeptime, + return get_cpu_sleep_time_us(cpu, CPUTIME_IOWAIT, nr_iowait_cpu(cpu), last_update_time); } EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us); @@ -1265,10 +1266,8 @@ void tick_nohz_idle_stop_tick(void) ts->idle_sleeps++; ts->idle_expires =3D expires; =20 - if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) { - ts->idle_jiffies =3D ts->last_jiffies; + if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) nohz_balance_enter_idle(cpu); - } } else { tick_nohz_retain_tick(ts); } @@ -1297,6 +1296,7 @@ void tick_nohz_idle_enter(void) WARN_ON_ONCE(ts->timer_expires_base); =20 tick_sched_flag_set(ts, TS_FLAG_INIDLE); + kcpustat_dyntick_start(); tick_nohz_start_idle(ts); =20 local_irq_enable(); @@ -1422,37 +1422,12 @@ unsigned long tick_nohz_get_idle_calls_cpu(int cpu) return ts->idle_calls; } =20 -static void tick_nohz_account_idle_time(struct tick_sched *ts, - ktime_t now) -{ - unsigned long ticks; - - ts->idle_exittime =3D now; - - if (vtime_accounting_enabled_this_cpu()) - return; - /* - * We stopped the tick in idle. update_process_times() would miss the - * time we slept, as it does only a 1 tick accounting. - * Enforce that this is accounted to idle ! - */ - ticks =3D jiffies - ts->idle_jiffies; - /* - * We might be one off. Do not randomly account a huge number of ticks! - */ - if (ticks && ticks < LONG_MAX) - account_idle_ticks(ticks); -} - void tick_nohz_idle_restart_tick(void) { struct tick_sched *ts =3D this_cpu_ptr(&tick_cpu_sched); =20 - if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) { - ktime_t now =3D ktime_get(); - tick_nohz_restart_sched_tick(ts, now); - tick_nohz_account_idle_time(ts, now); - } + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) + tick_nohz_restart_sched_tick(ts, ktime_get()); } =20 static void tick_nohz_idle_update_tick(struct tick_sched *ts, ktime_t now) @@ -1461,8 +1436,6 @@ static void tick_nohz_idle_update_tick(struct tick_sc= hed *ts, ktime_t now) __tick_nohz_full_update_tick(ts, now); else tick_nohz_restart_sched_tick(ts, now); - - tick_nohz_account_idle_time(ts, now); } =20 /** @@ -1504,6 +1477,7 @@ void tick_nohz_idle_exit(void) =20 if (tick_stopped) tick_nohz_idle_update_tick(ts, now); + kcpustat_dyntick_stop(); =20 local_irq_enable(); } @@ -1640,20 +1614,15 @@ void tick_setup_sched_timer(bool hrtimer) void tick_sched_timer_dying(int cpu) { struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); - ktime_t idle_sleeptime, iowait_sleeptime; unsigned long idle_calls, idle_sleeps; =20 /* This must happen before hrtimers are migrated! */ if (tick_sched_flag_test(ts, TS_FLAG_HIGHRES)) hrtimer_cancel(&ts->sched_timer); =20 - idle_sleeptime =3D ts->idle_sleeptime; - iowait_sleeptime =3D ts->iowait_sleeptime; idle_calls =3D ts->idle_calls; idle_sleeps =3D ts->idle_sleeps; memset(ts, 0, sizeof(*ts)); - ts->idle_sleeptime =3D idle_sleeptime; - ts->iowait_sleeptime =3D iowait_sleeptime; ts->idle_calls =3D idle_calls; ts->idle_sleeps =3D idle_sleeps; } --=20 2.53.0 From nobody Wed Apr 1 09:43:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A2473DFC9D; Tue, 31 Mar 2026 13:17:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963045; cv=none; b=sO+Q9/j5BpaDmcSnBNJGV8Fzd0Y6WlfM8vO6R79VRrphaTl8z+lxMO+Iqj54bo29DrZh+FD7N1s59BpRivyKkCJ5BBDTOafeBlcFqWjUzJCaPS7peDNAeJVr4lL0c2OBsKVQeWBbxAvod5Z7qeDMFOi6pbEYzd2zm4FCJCpxVsw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963045; c=relaxed/simple; bh=Xsd0PjsmWwW/3jYc4ig871UilSeg7MyzpojJFWzDitk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=M8hBTzGfgJA/mdLZI03G0iMn362Eaysg+QXLz2QOVj5d5mvzF3uhRyyjn65YF0DTJ61PzinkXAuf+BeR+RhrSOSkMt6sG4uAUsplC8vna1Ni2O6Lv6vxoQbcu0XMCtroNo0xC5o1wzgtC/Jc3bDwwbI/YHHc6/jtnGdVZurAjns= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lIah65/w; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lIah65/w" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 23640C2BC9E; Tue, 31 Mar 2026 13:17:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774963045; bh=Xsd0PjsmWwW/3jYc4ig871UilSeg7MyzpojJFWzDitk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lIah65/wzsxzwGZ99T9GeGNiYWhsGOJU59bgHBoNcb7G3lAfWXIXJ4FrxMM3oqT5q 1iNy13Cf/k4ifDJ7LTHwnR9FAyw95Z+HjmSnij8ZkZ0Sv/GemvfLSFI0g9C0+a87is x4lsfqYOO/YMoNPsRMW34kGGFbpygDjM//8wsaN0d2AXrdR3rgMLzeCD51vrOFg86p 95NERasKnzLq/53XXi1taHs6DkO38r5snxxz6UAHODARv1pGayZ5xBxuE6HixFR9HI DIlqDoayQzKuUjJ5ycv4aBChjmLHRPHgBaRgyixnybB/geaMr4+Fvj58dgdg8LKLU9 ioskn+maxujGg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Christophe Leroy (CS GROUP)" , "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Shrikanth Hegde , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 07/14] tick/sched: Remove nohz disabled special case in cputime fetch Date: Tue, 31 Mar 2026 15:16:15 +0200 Message-ID: <20260331131622.30505-8-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260331131622.30505-1-frederic@kernel.org> References: <20260331131622.30505-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Even when nohz is not runtime enabled, the dynticks idle cputime accounting can run and the common idle cputime accessors are still relevant. Remove the nohz disabled special case accordingly. Signed-off-by: Frederic Weisbecker Tested-by: Shrikanth Hegde --- kernel/time/tick-sched.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 2c0f0b81f452..6cd7777755dd 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -795,9 +795,6 @@ static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usag= e_stat idx, ktime_t now, idle; unsigned int seq; =20 - if (!tick_nohz_active) - return -1; - now =3D ktime_get(); if (last_update_time) *last_update_time =3D ktime_to_us(now); @@ -838,7 +835,7 @@ static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usag= e_stat idx, * This time is measured via accounting rather than sampling, * and is as accurate as ktime_get() is. * - * Return: -1 if NOHZ is not enabled, else total idle time of the @cpu + * Return: -1 if generic vtime is enabled, else total idle time of the @cpu */ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) { @@ -862,7 +859,7 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); * This time is measured via accounting rather than sampling, * and is as accurate as ktime_get() is. * - * Return: -1 if NOHZ is not enabled, else total iowait time of @cpu + * Return: -1 if generic vtime is enabled, else total iowait time of @cpu */ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) { --=20 2.53.0 From nobody Wed Apr 1 09:43:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A140D3E3155; Tue, 31 Mar 2026 13:17:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963052; cv=none; b=BInYWpKw2zRUxZSY/tpS5Q9ivS5LzTFXYU6cn0RCsNf5O6I5LMM4fT6mZp/6NU3kJL1Gydaac6+wdBufJscYV7/fbxQtqyLSErzjTYJmBJZePwBwwJXVM9z2AK0KMXa1jRuhVB7l/BhlSk0iUW18aavRledMdJUTc76RiXYA5JY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963052; c=relaxed/simple; bh=fNJ/wOIp6ZL2N5olFop6YLLXoKkblq8P3RraqV3W0Bk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=up7ao+9/mxaQNFo0ZrfuTxmV8oPzcDZqT8dS6FpilOLuHuC+2Q9Ee5BCKjeExcUN6tN7lNtzJlYoCTWtwDIjU3Ly5u+jE83qeON4QEY4cpbiSZdhSUr2DN+L0OGIGegehuU+CXndZZQ4FFYtJvE0RM1hMog+ZiUvoiZTODB//XM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=IyNg2bGv; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="IyNg2bGv" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 778F3C2BCB1; Tue, 31 Mar 2026 13:17:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774963052; bh=fNJ/wOIp6ZL2N5olFop6YLLXoKkblq8P3RraqV3W0Bk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IyNg2bGvIjMzMN1RqgNoFjJLrp2MyjlWYmLE/RJmrZONO+C3e6m9GHSltE8x1RleT I9GERowGleebRMcIPfR347ccW1ZqaI6NV0xFl6SNvrBoI1rMQUBCBysHIN8c4CenM4 sNmaIDsmg4ESgnFDDkqfOdV1k1AHIrF2j0Bi1L/dwKv7T89wq69iGCPwBvkVSnQZCc 0wDCfjiZQ42OSLDoKXcoTd79C+F7UlGBbXKYAViMmXLNv9fhX2DKHjp6KqtfkMl1fq p0n9xtCxQIEjR3OYOwDRyDQ5UfkYWvJppIHfkpQeiRKj1PrVOpQ0ttoFUMrWKv77s9 D4tVDpWZW95zw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Christophe Leroy (CS GROUP)" , "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Shrikanth Hegde , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 08/14] tick/sched: Move dyntick-idle cputime accounting to cputime code Date: Tue, 31 Mar 2026 15:16:16 +0200 Message-ID: <20260331131622.30505-9-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260331131622.30505-1-frederic@kernel.org> References: <20260331131622.30505-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Although the dynticks-idle cputime accounting is necessarily tied to the tick subsystem, the actual related accounting code has no business residing there and should be part of the scheduler cputime code. Move away the relevant pieces and state machine to where they belong. Signed-off-by: Frederic Weisbecker Tested-by: Shrikanth Hegde --- include/linux/kernel_stat.h | 14 +++- kernel/sched/cputime.c | 149 +++++++++++++++++++++++++++++++-- kernel/time/tick-sched.c | 162 +++++++----------------------------- 3 files changed, 184 insertions(+), 141 deletions(-) diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index ba65aad308a1..9343353ac7a3 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -35,9 +35,12 @@ enum cpu_usage_stat { =20 struct kernel_cpustat { #ifdef CONFIG_NO_HZ_COMMON - int idle_dyntick; + bool idle_dyntick; + bool idle_elapse; + seqcount_t idle_sleeptime_seq; + u64 idle_entrytime; #endif - u64 cpustat[NR_STATS]; + u64 cpustat[NR_STATS]; }; =20 struct kernel_stat { @@ -103,8 +106,11 @@ static inline unsigned long kstat_cpu_irqs_sum(unsigne= d int cpu) } =20 #ifdef CONFIG_NO_HZ_COMMON -extern void kcpustat_dyntick_start(void); -extern void kcpustat_dyntick_stop(void); +extern void kcpustat_dyntick_start(u64 now); +extern void kcpustat_dyntick_stop(u64 now); +extern void kcpustat_irq_enter(u64 now); +extern void kcpustat_irq_exit(u64 now); + static inline bool kcpustat_idle_dyntick(void) { return __this_cpu_read(kernel_cpustat.idle_dyntick); diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 4934c537f5e3..4a259f2700a1 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -2,6 +2,7 @@ /* * Simple CPU accounting cgroup controller */ +#include #include #include #include "sched.h" @@ -420,22 +421,156 @@ static inline void irqtime_account_process_tick(stru= ct task_struct *p, int user_ #endif /* !CONFIG_IRQ_TIME_ACCOUNTING */ =20 #ifdef CONFIG_NO_HZ_COMMON -void kcpustat_dyntick_start(void) +static void kcpustat_idle_stop(struct kernel_cpustat *kc, u64 now) { - if (!vtime_generic_enabled_this_cpu()) { - vtime_dyntick_start(); - __this_cpu_write(kernel_cpustat.idle_dyntick, 1); - } + u64 *cpustat =3D kc->cpustat; + u64 delta; + + if (!kc->idle_elapse) + return; + + delta =3D now - kc->idle_entrytime; + + write_seqcount_begin(&kc->idle_sleeptime_seq); + if (nr_iowait_cpu(smp_processor_id()) > 0) + cpustat[CPUTIME_IOWAIT] +=3D delta; + else + cpustat[CPUTIME_IDLE] +=3D delta; + + kc->idle_entrytime =3D now; + kc->idle_elapse =3D false; + write_seqcount_end(&kc->idle_sleeptime_seq); } =20 -void kcpustat_dyntick_stop(void) +static void kcpustat_idle_start(struct kernel_cpustat *kc, u64 now) { + write_seqcount_begin(&kc->idle_sleeptime_seq); + kc->idle_entrytime =3D now; + kc->idle_elapse =3D true; + write_seqcount_end(&kc->idle_sleeptime_seq); +} + +void kcpustat_dyntick_stop(u64 now) +{ + struct kernel_cpustat *kc =3D kcpustat_this_cpu; + if (!vtime_generic_enabled_this_cpu()) { - __this_cpu_write(kernel_cpustat.idle_dyntick, 0); + WARN_ON_ONCE(!kc->idle_dyntick); + kcpustat_idle_stop(kc, now); + kc->idle_dyntick =3D false; vtime_dyntick_stop(); steal_account_process_time(ULONG_MAX); } } + +void kcpustat_dyntick_start(u64 now) +{ + struct kernel_cpustat *kc =3D kcpustat_this_cpu; + + if (!vtime_generic_enabled_this_cpu()) { + vtime_dyntick_start(); + kc->idle_dyntick =3D true; + kcpustat_idle_start(kc, now); + } +} + +void kcpustat_irq_enter(u64 now) +{ + struct kernel_cpustat *kc =3D kcpustat_this_cpu; + + if (!vtime_generic_enabled_this_cpu()) + kcpustat_idle_stop(kc, now); +} + +void kcpustat_irq_exit(u64 now) +{ + struct kernel_cpustat *kc =3D kcpustat_this_cpu; + + if (!vtime_generic_enabled_this_cpu()) + kcpustat_idle_start(kc, now); +} + +static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usage_stat idx, + bool compute_delta, u64 *last_update_time) +{ + struct kernel_cpustat *kc =3D &kcpustat_cpu(cpu); + u64 *cpustat =3D kc->cpustat; + unsigned int seq; + ktime_t now; + u64 idle; + + now =3D ktime_get(); + if (last_update_time) + *last_update_time =3D ktime_to_us(now); + + if (vtime_generic_enabled_cpu(cpu)) { + idle =3D kcpustat_field(idx, cpu); + goto to_us; + } + + do { + seq =3D read_seqcount_begin(&kc->idle_sleeptime_seq); + + if (kc->idle_elapse && compute_delta) + idle =3D cpustat[idx] + (now - kc->idle_entrytime); + else + idle =3D cpustat[idx]; + } while (read_seqcount_retry(&kc->idle_sleeptime_seq, seq)); + +to_us: + do_div(idle, NSEC_PER_USEC); + + return idle; +} + +/** + * get_cpu_idle_time_us - get the total idle time of a CPU + * @cpu: CPU number to query + * @last_update_time: variable to store update time in. Do not update + * counters if NULL. + * + * Return the cumulative idle time (since boot) for a given + * CPU, in microseconds. Note that this is partially broken due to + * the counter of iowait tasks that can be remotely updated without + * any synchronization. Therefore it is possible to observe backward + * values within two consecutive reads. + * + * This time is measured via accounting rather than sampling, + * and is as accurate as ktime_get() is. + * + * Return: -1 if generic vtime is enabled, else total idle time of the @cpu + */ +u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) +{ + return get_cpu_sleep_time_us(cpu, CPUTIME_IDLE, + !nr_iowait_cpu(cpu), last_update_time); +} +EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); + +/** + * get_cpu_iowait_time_us - get the total iowait time of a CPU + * @cpu: CPU number to query + * @last_update_time: variable to store update time in. Do not update + * counters if NULL. + * + * Return the cumulative iowait time (since boot) for a given + * CPU, in microseconds. Note this is partially broken due to + * the counter of iowait tasks that can be remotely updated without + * any synchronization. Therefore it is possible to observe backward + * values within two consecutive reads. + * + * This time is measured via accounting rather than sampling, + * and is as accurate as ktime_get() is. + * + * Return: -1 if generic vtime is enabled, else total iowait time of @cpu + */ +u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) +{ + return get_cpu_sleep_time_us(cpu, CPUTIME_IOWAIT, + nr_iowait_cpu(cpu), last_update_time); +} +EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us); + #endif /* CONFIG_NO_HZ_COMMON */ =20 /* diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 6cd7777755dd..0da83d9b324a 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -749,125 +749,6 @@ static void tick_nohz_update_jiffies(ktime_t now) touch_softlockup_watchdog_sched(); } =20 -static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now) -{ - u64 *cpustat =3D kcpustat_this_cpu->cpustat; - ktime_t delta; - - if (vtime_generic_enabled_this_cpu()) - return; - - if (WARN_ON_ONCE(!tick_sched_flag_test(ts, TS_FLAG_IDLE_ACTIVE))) - return; - - delta =3D ktime_sub(now, ts->idle_entrytime); - - write_seqcount_begin(&ts->idle_sleeptime_seq); - if (nr_iowait_cpu(smp_processor_id()) > 0) - cpustat[CPUTIME_IOWAIT] =3D ktime_add(cpustat[CPUTIME_IOWAIT], delta); - else - cpustat[CPUTIME_IDLE] =3D ktime_add(cpustat[CPUTIME_IDLE], delta); - - ts->idle_entrytime =3D now; - tick_sched_flag_clear(ts, TS_FLAG_IDLE_ACTIVE); - write_seqcount_end(&ts->idle_sleeptime_seq); - - sched_clock_idle_wakeup_event(); -} - -static void tick_nohz_start_idle(struct tick_sched *ts) -{ - if (vtime_generic_enabled_this_cpu()) - return; - - write_seqcount_begin(&ts->idle_sleeptime_seq); - ts->idle_entrytime =3D ktime_get(); - tick_sched_flag_set(ts, TS_FLAG_IDLE_ACTIVE); - write_seqcount_end(&ts->idle_sleeptime_seq); - sched_clock_idle_sleep_event(); -} - -static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usage_stat idx, - bool compute_delta, u64 *last_update_time) -{ - struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); - u64 *cpustat =3D kcpustat_cpu(cpu).cpustat; - ktime_t now, idle; - unsigned int seq; - - now =3D ktime_get(); - if (last_update_time) - *last_update_time =3D ktime_to_us(now); - - if (vtime_generic_enabled_cpu(cpu)) { - idle =3D kcpustat_field(idx, cpu); - return ktime_to_us(idle); - } - - do { - seq =3D read_seqcount_begin(&ts->idle_sleeptime_seq); - - if (tick_sched_flag_test(ts, TS_FLAG_IDLE_ACTIVE) && compute_delta) { - ktime_t delta =3D ktime_sub(now, ts->idle_entrytime); - - idle =3D ktime_add(cpustat[idx], delta); - } else { - idle =3D cpustat[idx]; - } - } while (read_seqcount_retry(&ts->idle_sleeptime_seq, seq)); - - return ktime_to_us(idle); - -} - -/** - * get_cpu_idle_time_us - get the total idle time of a CPU - * @cpu: CPU number to query - * @last_update_time: variable to store update time in. Do not update - * counters if NULL. - * - * Return the cumulative idle time (since boot) for a given - * CPU, in microseconds. Note that this is partially broken due to - * the counter of iowait tasks that can be remotely updated without - * any synchronization. Therefore it is possible to observe backward - * values within two consecutive reads. - * - * This time is measured via accounting rather than sampling, - * and is as accurate as ktime_get() is. - * - * Return: -1 if generic vtime is enabled, else total idle time of the @cpu - */ -u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) -{ - return get_cpu_sleep_time_us(cpu, CPUTIME_IDLE, - !nr_iowait_cpu(cpu), last_update_time); -} -EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); - -/** - * get_cpu_iowait_time_us - get the total iowait time of a CPU - * @cpu: CPU number to query - * @last_update_time: variable to store update time in. Do not update - * counters if NULL. - * - * Return the cumulative iowait time (since boot) for a given - * CPU, in microseconds. Note this is partially broken due to - * the counter of iowait tasks that can be remotely updated without - * any synchronization. Therefore it is possible to observe backward - * values within two consecutive reads. - * - * This time is measured via accounting rather than sampling, - * and is as accurate as ktime_get() is. - * - * Return: -1 if generic vtime is enabled, else total iowait time of @cpu - */ -u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) -{ - return get_cpu_sleep_time_us(cpu, CPUTIME_IOWAIT, - nr_iowait_cpu(cpu), last_update_time); -} -EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us); - static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) { hrtimer_cancel(&ts->sched_timer); @@ -1275,6 +1156,20 @@ void tick_nohz_idle_retain_tick(void) tick_nohz_retain_tick(this_cpu_ptr(&tick_cpu_sched)); } =20 +static void tick_nohz_clock_sleep(struct tick_sched *ts) +{ + tick_sched_flag_set(ts, TS_FLAG_IDLE_ACTIVE); + sched_clock_idle_sleep_event(); +} + +static void tick_nohz_clock_wakeup(struct tick_sched *ts) +{ + if (tick_sched_flag_test(ts, TS_FLAG_IDLE_ACTIVE)) { + tick_sched_flag_clear(ts, TS_FLAG_IDLE_ACTIVE); + sched_clock_idle_wakeup_event(); + } +} + /** * tick_nohz_idle_enter - prepare for entering idle on the current CPU * @@ -1289,12 +1184,11 @@ void tick_nohz_idle_enter(void) local_irq_disable(); =20 ts =3D this_cpu_ptr(&tick_cpu_sched); - WARN_ON_ONCE(ts->timer_expires_base); - tick_sched_flag_set(ts, TS_FLAG_INIDLE); - kcpustat_dyntick_start(); - tick_nohz_start_idle(ts); + ts->idle_entrytime =3D ktime_get(); + kcpustat_dyntick_start(ts->idle_entrytime); + tick_nohz_clock_sleep(ts); =20 local_irq_enable(); } @@ -1322,10 +1216,13 @@ void tick_nohz_irq_exit(void) { struct tick_sched *ts =3D this_cpu_ptr(&tick_cpu_sched); =20 - if (tick_sched_flag_test(ts, TS_FLAG_INIDLE)) - tick_nohz_start_idle(ts); - else + if (tick_sched_flag_test(ts, TS_FLAG_INIDLE)) { + ts->idle_entrytime =3D ktime_get(); + kcpustat_irq_exit(ts->idle_entrytime); + tick_nohz_clock_sleep(ts); + } else { tick_nohz_full_update_tick(ts); + } } =20 /** @@ -1470,11 +1367,11 @@ void tick_nohz_idle_exit(void) now =3D ktime_get(); =20 if (idle_active) - tick_nohz_stop_idle(ts, now); + tick_nohz_clock_wakeup(ts); =20 if (tick_stopped) tick_nohz_idle_update_tick(ts, now); - kcpustat_dyntick_stop(); + kcpustat_dyntick_stop(now); =20 local_irq_enable(); } @@ -1530,9 +1427,14 @@ static inline void tick_nohz_irq_enter(void) =20 if (!tick_sched_flag_test(ts, TS_FLAG_STOPPED | TS_FLAG_IDLE_ACTIVE)) return; + now =3D ktime_get(); - if (tick_sched_flag_test(ts, TS_FLAG_IDLE_ACTIVE)) - tick_nohz_stop_idle(ts, now); + + if (tick_sched_flag_test(ts, TS_FLAG_IDLE_ACTIVE)) { + tick_nohz_clock_wakeup(ts); + kcpustat_irq_enter(now); + } + /* * If all CPUs are idle we may need to update a stale jiffies value. * Note nohz_full is a special case: a timekeeper is guaranteed to stay --=20 2.53.0 From nobody Wed Apr 1 09:43:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E1CDC3E4C93; Tue, 31 Mar 2026 13:17:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963060; cv=none; b=fYb0WNmV96zF0H8pMtocv5/aF5TnQzmkQn8ld0L3MHoNLqZ/JwGwZs2KpPyDOlsfFQefGOlDNYjO/cnyTFUUuS93MuJIZtTcL8YpvKTB5sPOxY3fi64QOA93k5jM07W78MbBjQtICejT8tdqZbCW3nfXeIxjcHuYDSvkbgEHv0E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963060; c=relaxed/simple; bh=TjYFIgCHyh+/o6L1KtAM5WwypjR946R5y+AKSoRMF7c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gdu6H/UlTbPYYsKs8JIYMi6wdGsmfWkBPZBmZNpE2Qm1ARCTmtrSe4deTndZSE6BsuEC93Y0kZ0ES92PJxfq50bHvm7++zN4j6q9O2aG8p65CuO8o7Uh7oFH3TQSVJGsATyr3uzi5oP24uMhhGgOtKuymg8S6Onz4ja9PnFk2qU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YlxKKPwK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YlxKKPwK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B4CBCC2BCB2; Tue, 31 Mar 2026 13:17:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774963059; bh=TjYFIgCHyh+/o6L1KtAM5WwypjR946R5y+AKSoRMF7c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YlxKKPwKpD/abGgPr/0nLZeecm+jfIIDBUsfwzJRzObTAqgHgyT3EvQVr1sAWC3JT nbYWjM5uSHyojZLhY7d6NkcpgI+MSiDZSYmzSE/sI84vOAr6z9G633U7pFIi2ktoQl z/6z9f4sLl6lS3MsRyYjPI5df9QNRItFO1Pr5T1t1ojEBmP4xekFvaeZj6NZh4GKqh x1ELoJCBQFMez2x7QWmLg2wGaDFZYGnJCHK8Y2KNef6xJCjIEeKAnxg89h3qS6U2gA HyrlrStC8pINuvV+EDY7wTUElJYf1K0/1zqhJZxnbMJ5RTkei7eL4eFSYU88wO+8qf NHWKSq/l6aZzg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Christophe Leroy (CS GROUP)" , "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Shrikanth Hegde , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 09/14] tick/sched: Remove unused fields Date: Tue, 31 Mar 2026 15:16:17 +0200 Message-ID: <20260331131622.30505-10-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260331131622.30505-1-frederic@kernel.org> References: <20260331131622.30505-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Remove fields after the dyntick-idle cputime migration to scheduler code. Signed-off-by: Frederic Weisbecker Tested-by: Shrikanth Hegde --- kernel/time/tick-sched.h | 12 ------------ kernel/time/timer_list.c | 6 +----- scripts/gdb/linux/timerlist.py | 4 ---- 3 files changed, 1 insertion(+), 21 deletions(-) diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h index b4a7822f495d..79b9252047b1 100644 --- a/kernel/time/tick-sched.h +++ b/kernel/time/tick-sched.h @@ -44,9 +44,7 @@ struct tick_device { * to resume the tick timer operation in the timeline * when the CPU returns from nohz sleep. * @next_tick: Next tick to be fired when in dynticks mode. - * @idle_jiffies: jiffies at the entry to idle for idle time accounting * @idle_waketime: Time when the idle was interrupted - * @idle_sleeptime_seq: sequence counter for data consistency * @idle_entrytime: Time when the idle call was entered * @last_jiffies: Base jiffies snapshot when next event was last computed * @timer_expires_base: Base time clock monotonic for @timer_expires @@ -55,9 +53,6 @@ struct tick_device { * @idle_expires: Next tick in idle, for debugging purpose only * @idle_calls: Total number of idle calls * @idle_sleeps: Number of idle calls, where the sched tick was stopped - * @idle_exittime: Time when the idle state was left - * @idle_sleeptime: Sum of the time slept in idle with sched tick stopped - * @iowait_sleeptime: Sum of the time slept in idle with sched tick stoppe= d, with IO outstanding * @tick_dep_mask: Tick dependency mask - is set, if someone needs the tick * @check_clocks: Notification mechanism about clocksource changes */ @@ -73,12 +68,10 @@ struct tick_sched { struct hrtimer sched_timer; ktime_t last_tick; ktime_t next_tick; - unsigned long idle_jiffies; ktime_t idle_waketime; unsigned int got_idle_tick; =20 /* Idle entry */ - seqcount_t idle_sleeptime_seq; ktime_t idle_entrytime; =20 /* Tick stop */ @@ -90,11 +83,6 @@ struct tick_sched { unsigned long idle_calls; unsigned long idle_sleeps; =20 - /* Idle exit */ - ktime_t idle_exittime; - ktime_t idle_sleeptime; - ktime_t iowait_sleeptime; - /* Full dynticks handling */ atomic_t tick_dep_mask; =20 diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c index 488e47e96e93..e77b512e8597 100644 --- a/kernel/time/timer_list.c +++ b/kernel/time/timer_list.c @@ -154,14 +154,10 @@ static void print_cpu(struct seq_file *m, int cpu, u6= 4 now) P_flag(highres, TS_FLAG_HIGHRES); P_ns(last_tick); P_flag(tick_stopped, TS_FLAG_STOPPED); - P(idle_jiffies); P(idle_calls); P(idle_sleeps); P_ns(idle_entrytime); P_ns(idle_waketime); - P_ns(idle_exittime); - P_ns(idle_sleeptime); - P_ns(iowait_sleeptime); P(last_jiffies); P(next_timer); P_ns(idle_expires); @@ -258,7 +254,7 @@ static void timer_list_show_tickdevices_header(struct s= eq_file *m) =20 static inline void timer_list_header(struct seq_file *m, u64 now) { - SEQ_printf(m, "Timer List Version: v0.10\n"); + SEQ_printf(m, "Timer List Version: v0.11\n"); SEQ_printf(m, "HRTIMER_MAX_CLOCK_BASES: %d\n", HRTIMER_MAX_CLOCK_BASES); SEQ_printf(m, "now at %Ld nsecs\n", (unsigned long long)now); SEQ_printf(m, "\n"); diff --git a/scripts/gdb/linux/timerlist.py b/scripts/gdb/linux/timerlist.py index ccc24d30de80..c14ce55674c9 100644 --- a/scripts/gdb/linux/timerlist.py +++ b/scripts/gdb/linux/timerlist.py @@ -90,14 +90,10 @@ def print_cpu(hrtimer_bases, cpu, max_clock_bases): text +=3D f" .{'nohz':15s}: {int(bool(ts['flags'] & TS_FLAG_N= OHZ))}\n" text +=3D f" .{'last_tick':15s}: {ts['last_tick']}\n" text +=3D f" .{'tick_stopped':15s}: {int(bool(ts['flags'] & T= S_FLAG_STOPPED))}\n" - text +=3D f" .{'idle_jiffies':15s}: {ts['idle_jiffies']}\n" text +=3D f" .{'idle_calls':15s}: {ts['idle_calls']}\n" text +=3D f" .{'idle_sleeps':15s}: {ts['idle_sleeps']}\n" text +=3D f" .{'idle_entrytime':15s}: {ts['idle_entrytime']} = nsecs\n" text +=3D f" .{'idle_waketime':15s}: {ts['idle_waketime']} ns= ecs\n" - text +=3D f" .{'idle_exittime':15s}: {ts['idle_exittime']} ns= ecs\n" - text +=3D f" .{'idle_sleeptime':15s}: {ts['idle_sleeptime']} = nsecs\n" - text +=3D f" .{'iowait_sleeptime':15s}: {ts['iowait_sleeptime= ']} nsecs\n" text +=3D f" .{'last_jiffies':15s}: {ts['last_jiffies']}\n" text +=3D f" .{'next_timer':15s}: {ts['next_timer']}\n" text +=3D f" .{'idle_expires':15s}: {ts['idle_expires']} nsec= s\n" --=20 2.53.0 From nobody Wed Apr 1 09:43:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 493BE1C3F0C; Tue, 31 Mar 2026 13:17:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963067; cv=none; b=DtarcMzh4SghU5W9ha/4EICzAHgjOsT+3ZNh/VtrRi9NWBg8T2yHKxF2s0cSRT7BFdvTF2YhazKIo9G4UTcvu1L2XS08LdbFG/SDDzk+8t7ENyl0Vub5r3Hz03X7fk/O6owUSYXY634jk9bZKKbjuKzj6OvAJdSMzqWNLLMyTqk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963067; c=relaxed/simple; bh=b0ahYWXnH1OlnVL9k+U+tBmKrl/hr/iIMc+t5ZPZsTU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=B1lWkEvLw76ozNmNMBEXfsilHEDxPpL7SZ3seIWeHAuPpsM7BctHgU5uz0KaVPxfqSaRg50+tiCE+JRKFRG/HvDKdRsgRAMVZBHzAWwGbJWd+W+ZjZrf+E4cuZ3bDyJZeIN/bU69zjhCzy3UnZaU2/wCs9hOhgd533ub6PCBSbE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=g247PlHX; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="g247PlHX" Received: by smtp.kernel.org (Postfix) with ESMTPSA id ED080C2BC9E; Tue, 31 Mar 2026 13:17:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774963066; bh=b0ahYWXnH1OlnVL9k+U+tBmKrl/hr/iIMc+t5ZPZsTU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=g247PlHXKumCwTjBUEPH/OaEm97Ov8XRAiN+XxOLgKeikZwSsUuHEOpgz+lL1Fcll u4tm+fBpVaGBDmc+Hd0er8OJZozmuf1jab1eyiLVQOlghTUrrL9x5mwxzXyM+UX6Qo /r39+MUozV4AJUk4QEu9Y+cGMWBDwbm1O+8f4RsI70Az/uDlGYf6mBmuIFFiQmCxuT 8yPpijKwrB5RY8+Mkph0ycgLsCcaLpueAYAd24Jxu3ZIBVf2PGZU4/zHTjxMxe9eYx OThNyt16b8efCE5VXYwaKeNuCagh3SOO6uMCgpmfLBIEnVFdbN7icAFmqJn/EFfVgw nR8+ZCc/81b1w== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Christophe Leroy (CS GROUP)" , "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Shrikanth Hegde , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 10/14] tick/sched: Account tickless idle cputime only when tick is stopped Date: Tue, 31 Mar 2026 15:16:18 +0200 Message-ID: <20260331131622.30505-11-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260331131622.30505-1-frederic@kernel.org> References: <20260331131622.30505-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There is no real point in switching to dyntick-idle cputime accounting mode if the tick is not actually stopped. This just adds overhead, notably fetching the GTOD, on each idle exit and each idle IRQ entry for no reason during short idle trips. Signed-off-by: Frederic Weisbecker Tested-by: Shrikanth Hegde --- kernel/time/tick-sched.c | 44 ++++++++++++++++++---------------------- 1 file changed, 20 insertions(+), 24 deletions(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 0da83d9b324a..923cae35827a 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1144,8 +1144,10 @@ void tick_nohz_idle_stop_tick(void) ts->idle_sleeps++; ts->idle_expires =3D expires; =20 - if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) + if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) { + kcpustat_dyntick_start(ts->idle_entrytime); nohz_balance_enter_idle(cpu); + } } else { tick_nohz_retain_tick(ts); } @@ -1187,7 +1189,6 @@ void tick_nohz_idle_enter(void) WARN_ON_ONCE(ts->timer_expires_base); tick_sched_flag_set(ts, TS_FLAG_INIDLE); ts->idle_entrytime =3D ktime_get(); - kcpustat_dyntick_start(ts->idle_entrytime); tick_nohz_clock_sleep(ts); =20 local_irq_enable(); @@ -1217,9 +1218,10 @@ void tick_nohz_irq_exit(void) struct tick_sched *ts =3D this_cpu_ptr(&tick_cpu_sched); =20 if (tick_sched_flag_test(ts, TS_FLAG_INIDLE)) { - ts->idle_entrytime =3D ktime_get(); - kcpustat_irq_exit(ts->idle_entrytime); tick_nohz_clock_sleep(ts); + ts->idle_entrytime =3D ktime_get(); + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) + kcpustat_irq_exit(ts->idle_entrytime); } else { tick_nohz_full_update_tick(ts); } @@ -1320,8 +1322,11 @@ void tick_nohz_idle_restart_tick(void) { struct tick_sched *ts =3D this_cpu_ptr(&tick_cpu_sched); =20 - if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) - tick_nohz_restart_sched_tick(ts, ktime_get()); + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) { + ktime_t now =3D ktime_get(); + kcpustat_dyntick_stop(now); + tick_nohz_restart_sched_tick(ts, now); + } } =20 static void tick_nohz_idle_update_tick(struct tick_sched *ts, ktime_t now) @@ -1351,7 +1356,6 @@ static void tick_nohz_idle_update_tick(struct tick_sc= hed *ts, ktime_t now) void tick_nohz_idle_exit(void) { struct tick_sched *ts =3D this_cpu_ptr(&tick_cpu_sched); - bool idle_active, tick_stopped; ktime_t now; =20 local_irq_disable(); @@ -1360,18 +1364,13 @@ void tick_nohz_idle_exit(void) WARN_ON_ONCE(ts->timer_expires_base); =20 tick_sched_flag_clear(ts, TS_FLAG_INIDLE); - idle_active =3D tick_sched_flag_test(ts, TS_FLAG_IDLE_ACTIVE); - tick_stopped =3D tick_sched_flag_test(ts, TS_FLAG_STOPPED); + tick_nohz_clock_wakeup(ts); =20 - if (idle_active || tick_stopped) + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) { now =3D ktime_get(); - - if (idle_active) - tick_nohz_clock_wakeup(ts); - - if (tick_stopped) + kcpustat_dyntick_stop(now); tick_nohz_idle_update_tick(ts, now); - kcpustat_dyntick_stop(now); + } =20 local_irq_enable(); } @@ -1425,15 +1424,13 @@ static inline void tick_nohz_irq_enter(void) struct tick_sched *ts =3D this_cpu_ptr(&tick_cpu_sched); ktime_t now; =20 - if (!tick_sched_flag_test(ts, TS_FLAG_STOPPED | TS_FLAG_IDLE_ACTIVE)) + tick_nohz_clock_wakeup(ts); + + if (!tick_sched_flag_test(ts, TS_FLAG_STOPPED)) return; =20 now =3D ktime_get(); - - if (tick_sched_flag_test(ts, TS_FLAG_IDLE_ACTIVE)) { - tick_nohz_clock_wakeup(ts); - kcpustat_irq_enter(now); - } + kcpustat_irq_enter(now); =20 /* * If all CPUs are idle we may need to update a stale jiffies value. @@ -1442,8 +1439,7 @@ static inline void tick_nohz_irq_enter(void) * rare case (typically stop machine). So we must make sure we have a * last resort. */ - if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) - tick_nohz_update_jiffies(now); + tick_nohz_update_jiffies(now); } =20 #else --=20 2.53.0 From nobody Wed Apr 1 09:43:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4423421257B; Tue, 31 Mar 2026 13:17:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963074; cv=none; b=U/KgY6jEG8CPEaGdeBLAJmw6WKD0U0gImgsSGfhGIVxTyzcrrAX7qW35BT2avAoFs57BQSX8hQYG2OSoIHnoCzZzcXMqUnt7jqXlZU5uM2cbVvVe0NjgzODmITpSDgW7ZndgUgYgEqNKIu4+9Nc7jVqlngT6gNwAR/KZFQxMDq0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963074; c=relaxed/simple; bh=l4UXuPtarigCS8/9y+D4zvM6fzJUWSEh7U0vHQaQuhs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=o2t7Xr4Khy5tu2yN085BOhI6CVpbQdqHKp4VhMHy0+qoucbgt7JRzftK+/AzacsOgnhqFqsBjFunH4CjUHlU1rtiQNECzFtUfDFJuyEr6HJ6b65QXouk9NRndkOZJcLFiYUqzA5Nh0leAnkF7GBlJtJUABqIsErNo0Mmjybuc6I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=K7AeiwWS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="K7AeiwWS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4F24FC2BCB1; Tue, 31 Mar 2026 13:17:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774963074; bh=l4UXuPtarigCS8/9y+D4zvM6fzJUWSEh7U0vHQaQuhs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=K7AeiwWShE7xc62Afgw7NwDYnh+ZgmePezlcZrhhBYuGYr1uAQThMppwIj2iv+eVC xgli87oaP2q84PRbUKRCTQB05BQiwbFcEj9RqjrBg5MK6wl+Q96JtNH0Z/7EIubhCf WWtkB9eKRqutDAYWQcAkhhWkeZyuLyrR5y9vBgCX4PFrW2ZiHUrdrr8eO45HFlo5ZV rDqMTg/pLaR4+9/ZQozUPVniXsadl7REgmynbEEJ9uNUBd8yY5w34XA++9ztKX1b4q /VkMTn9yDfG/vizNwxNkEX1uNe34WECIL2NHez5QPsSI7Pe9uTCJMvvn68Or6V7u9g AmIr/RlPBw7TQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Christophe Leroy (CS GROUP)" , "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Shrikanth Hegde , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 11/14] tick/sched: Consolidate idle time fetching APIs Date: Tue, 31 Mar 2026 15:16:19 +0200 Message-ID: <20260331131622.30505-12-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260331131622.30505-1-frederic@kernel.org> References: <20260331131622.30505-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Fetching the idle cputime is available through a variety of accessors all over the place depending on the different accounting flavours and needs: - idle vtime generic accounting can be accessed by kcpustat_field(), kcpustat_cpu_fetch(), get_idle/iowait_time() and get_cpu_idle/iowait_time_us() - dynticks-idle accounting can only be accessed by get_idle/iowait_time() or get_cpu_idle/iowait_time_us() - CONFIG_NO_HZ_COMMON=3Dn idle accounting can be accessed by kcpustat_field= () kcpustat_cpu_fetch(), or get_idle/iowait_time() but not by get_cpu_idle/iowait_time_us() Moreover get_idle/iowait_time() relies on get_cpu_idle/iowait_time_us() with a non-sensical conversion to microseconds and back to nanoseconds on the way. Start consolidating the APIs with removing get_idle/iowait_time() and make kcpustat_field() and kcpustat_cpu_fetch() work for all cases. Signed-off-by: Frederic Weisbecker Tested-by: Shrikanth Hegde --- fs/proc/stat.c | 40 +++--------------------- fs/proc/uptime.c | 8 ++--- include/linux/kernel_stat.h | 34 ++++++++++++++++++--- kernel/sched/cputime.c | 61 ++++++++++++++++++++++++------------- 4 files changed, 76 insertions(+), 67 deletions(-) diff --git a/fs/proc/stat.c b/fs/proc/stat.c index 8b444e862319..c00468a83f64 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -22,38 +22,6 @@ #define arch_irq_stat() 0 #endif =20 -u64 get_idle_time(struct kernel_cpustat *kcs, int cpu) -{ - u64 idle, idle_usecs =3D -1ULL; - - if (cpu_online(cpu)) - idle_usecs =3D get_cpu_idle_time_us(cpu, NULL); - - if (idle_usecs =3D=3D -1ULL) - /* !NO_HZ or cpu offline so we can rely on cpustat.idle */ - idle =3D kcs->cpustat[CPUTIME_IDLE]; - else - idle =3D idle_usecs * NSEC_PER_USEC; - - return idle; -} - -static u64 get_iowait_time(struct kernel_cpustat *kcs, int cpu) -{ - u64 iowait, iowait_usecs =3D -1ULL; - - if (cpu_online(cpu)) - iowait_usecs =3D get_cpu_iowait_time_us(cpu, NULL); - - if (iowait_usecs =3D=3D -1ULL) - /* !NO_HZ or cpu offline so we can rely on cpustat.iowait */ - iowait =3D kcs->cpustat[CPUTIME_IOWAIT]; - else - iowait =3D iowait_usecs * NSEC_PER_USEC; - - return iowait; -} - static void show_irq_gap(struct seq_file *p, unsigned int gap) { static const char zeros[] =3D " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"; @@ -105,8 +73,8 @@ static int show_stat(struct seq_file *p, void *v) user +=3D cpustat[CPUTIME_USER]; nice +=3D cpustat[CPUTIME_NICE]; system +=3D cpustat[CPUTIME_SYSTEM]; - idle +=3D get_idle_time(&kcpustat, i); - iowait +=3D get_iowait_time(&kcpustat, i); + idle +=3D cpustat[CPUTIME_IDLE]; + iowait +=3D cpustat[CPUTIME_IOWAIT]; irq +=3D cpustat[CPUTIME_IRQ]; softirq +=3D cpustat[CPUTIME_SOFTIRQ]; steal +=3D cpustat[CPUTIME_STEAL]; @@ -146,8 +114,8 @@ static int show_stat(struct seq_file *p, void *v) user =3D cpustat[CPUTIME_USER]; nice =3D cpustat[CPUTIME_NICE]; system =3D cpustat[CPUTIME_SYSTEM]; - idle =3D get_idle_time(&kcpustat, i); - iowait =3D get_iowait_time(&kcpustat, i); + idle =3D cpustat[CPUTIME_IDLE]; + iowait =3D cpustat[CPUTIME_IOWAIT]; irq =3D cpustat[CPUTIME_IRQ]; softirq =3D cpustat[CPUTIME_SOFTIRQ]; steal =3D cpustat[CPUTIME_STEAL]; diff --git a/fs/proc/uptime.c b/fs/proc/uptime.c index b5343d209381..433aa947cd57 100644 --- a/fs/proc/uptime.c +++ b/fs/proc/uptime.c @@ -18,12 +18,8 @@ static int uptime_proc_show(struct seq_file *m, void *v) int i; =20 idle_nsec =3D 0; - for_each_possible_cpu(i) { - struct kernel_cpustat kcs; - - kcpustat_cpu_fetch(&kcs, i); - idle_nsec +=3D get_idle_time(&kcs, i); - } + for_each_possible_cpu(i) + idle_nsec +=3D kcpustat_field(CPUTIME_IDLE, i); =20 ktime_get_boottime_ts64(&uptime); timens_add_boottime(&uptime); diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index 9343353ac7a3..3680519d7b2c 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -110,32 +110,59 @@ extern void kcpustat_dyntick_start(u64 now); extern void kcpustat_dyntick_stop(u64 now); extern void kcpustat_irq_enter(u64 now); extern void kcpustat_irq_exit(u64 now); +extern u64 kcpustat_field_idle(int cpu); +extern u64 kcpustat_field_iowait(int cpu); =20 static inline bool kcpustat_idle_dyntick(void) { return __this_cpu_read(kernel_cpustat.idle_dyntick); } #else +static inline u64 kcpustat_field_idle(int cpu) +{ + return kcpustat_cpu(cpu).cpustat[CPUTIME_IDLE]; +} +static inline u64 kcpustat_field_iowait(int cpu) +{ + return kcpustat_cpu(cpu).cpustat[CPUTIME_IOWAIT]; +} + static inline bool kcpustat_idle_dyntick(void) { return false; } #endif /* CONFIG_NO_HZ_COMMON */ =20 +/* Fetch cputime values when vtime is disabled on a CPU */ +static inline u64 kcpustat_field_default(enum cpu_usage_stat usage, int cp= u) +{ + if (usage =3D=3D CPUTIME_IDLE) + return kcpustat_field_idle(cpu); + if (usage =3D=3D CPUTIME_IOWAIT) + return kcpustat_field_iowait(cpu); + return kcpustat_cpu(cpu).cpustat[usage]; +} + +static inline void kcpustat_cpu_fetch_default(struct kernel_cpustat *dst, = int cpu) +{ + *dst =3D kcpustat_cpu(cpu); + dst->cpustat[CPUTIME_IDLE] =3D kcpustat_field_idle(cpu); + dst->cpustat[CPUTIME_IOWAIT] =3D kcpustat_field_iowait(cpu); +} + #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN extern u64 kcpustat_field(enum cpu_usage_stat usage, int cpu); extern void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu); #else static inline u64 kcpustat_field(enum cpu_usage_stat usage, int cpu) { - return kcpustat_cpu(cpu).cpustat[usage]; + return kcpustat_field_default(usage, cpu); } =20 static inline void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu) { - *dst =3D kcpustat_cpu(cpu); + kcpustat_cpu_fetch_default(dst, cpu); } - #endif /* !CONFIG_VIRT_CPU_ACCOUNTING_GEN */ =20 extern void account_user_time(struct task_struct *, u64); @@ -145,7 +172,6 @@ extern void account_system_index_time(struct task_struc= t *, u64, enum cpu_usage_stat); extern void account_steal_time(u64); extern void account_idle_time(u64); -extern u64 get_idle_time(struct kernel_cpustat *kcs, int cpu); =20 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE static inline void account_process_tick(struct task_struct *tsk, int user) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 4a259f2700a1..ebb9eee049e0 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -490,24 +490,14 @@ void kcpustat_irq_exit(u64 now) kcpustat_idle_start(kc, now); } =20 -static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usage_stat idx, - bool compute_delta, u64 *last_update_time) +static u64 kcpustat_field_dyntick(int cpu, enum cpu_usage_stat idx, + bool compute_delta, u64 now) { struct kernel_cpustat *kc =3D &kcpustat_cpu(cpu); u64 *cpustat =3D kc->cpustat; unsigned int seq; - ktime_t now; u64 idle; =20 - now =3D ktime_get(); - if (last_update_time) - *last_update_time =3D ktime_to_us(now); - - if (vtime_generic_enabled_cpu(cpu)) { - idle =3D kcpustat_field(idx, cpu); - goto to_us; - } - do { seq =3D read_seqcount_begin(&kc->idle_sleeptime_seq); =20 @@ -517,12 +507,42 @@ static u64 get_cpu_sleep_time_us(int cpu, enum cpu_us= age_stat idx, idle =3D cpustat[idx]; } while (read_seqcount_retry(&kc->idle_sleeptime_seq, seq)); =20 -to_us: - do_div(idle, NSEC_PER_USEC); - return idle; } =20 +u64 kcpustat_field_idle(int cpu) +{ + return kcpustat_field_dyntick(cpu, CPUTIME_IDLE, + !nr_iowait_cpu(cpu), ktime_get()); +} +EXPORT_SYMBOL_GPL(kcpustat_field_idle); + +u64 kcpustat_field_iowait(int cpu) +{ + return kcpustat_field_dyntick(cpu, CPUTIME_IOWAIT, + nr_iowait_cpu(cpu), ktime_get()); +} +EXPORT_SYMBOL_GPL(kcpustat_field_iowait); + +static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usage_stat idx, + bool compute_delta, u64 *last_update_time) +{ + ktime_t now =3D ktime_get(); + u64 res; + + if (vtime_generic_enabled_cpu(cpu)) + res =3D kcpustat_field(idx, cpu); + else + res =3D kcpustat_field_dyntick(cpu, idx, compute_delta, now); + + do_div(res, NSEC_PER_USEC); + + if (last_update_time) + *last_update_time =3D ktime_to_us(now); + + return res; +} + /** * get_cpu_idle_time_us - get the total idle time of a CPU * @cpu: CPU number to query @@ -570,7 +590,6 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_ti= me) nr_iowait_cpu(cpu), last_update_time); } EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us); - #endif /* CONFIG_NO_HZ_COMMON */ =20 /* @@ -1124,8 +1143,8 @@ u64 kcpustat_field(enum cpu_usage_stat usage, int cpu) struct rq *rq; int err; =20 - if (!vtime_accounting_enabled_cpu(cpu)) - return val; + if (!vtime_generic_enabled_cpu(cpu)) + return kcpustat_field_default(usage, cpu); =20 rq =3D cpu_rq(cpu); =20 @@ -1220,8 +1239,8 @@ void kcpustat_cpu_fetch(struct kernel_cpustat *dst, i= nt cpu) struct rq *rq; int err; =20 - if (!vtime_accounting_enabled_cpu(cpu)) { - *dst =3D *src; + if (!vtime_generic_enabled_cpu(cpu)) { + kcpustat_cpu_fetch_default(dst, cpu); return; } =20 @@ -1234,7 +1253,7 @@ void kcpustat_cpu_fetch(struct kernel_cpustat *dst, i= nt cpu) curr =3D rcu_dereference(rq->curr); if (WARN_ON_ONCE(!curr)) { rcu_read_unlock(); - *dst =3D *src; + kcpustat_cpu_fetch_default(dst, cpu); return; } =20 --=20 2.53.0 From nobody Wed Apr 1 09:43:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 210DA3E51C6; Tue, 31 Mar 2026 13:18:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963082; cv=none; b=hQwvcWbm3JyOO+bnTeIBKD7Ej9qyRa23ksSk77QkCxH5+ivtdxJl59lVyXz6Mb3ydzm7TXN3u2H0fjujYq5jCdyr46VbPO9RyBRFffDa/SJIpNzg5M1Pm6fSsI0ew3P8bGUEAIm4ry/gYSrMRUD6y2jSZnQ4f8rINuK+e1zTp1o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963082; c=relaxed/simple; bh=6XEEW8WwCjaro5TSsa+MlYr1uj+SUPgh5dpCfyCU+rQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hxmD7jpvvvIrNh6NIpXkCnnNh3D0rFyIQM+gWpoQN2gzRcjl5LPtt9KjV9nKdmFZdb6X+nFAa2QLZFtPeCMn63uHXnleuysnnqXh/dAiQCyLGpIgD0EiRLbXL78aQL/A6mZ8RiNAFnEAbyDnom+uvRq6z0t4ERHWO5Us6U/vbKA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NQ8bprRL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NQ8bprRL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9007DC2BCB4; Tue, 31 Mar 2026 13:17:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774963081; bh=6XEEW8WwCjaro5TSsa+MlYr1uj+SUPgh5dpCfyCU+rQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NQ8bprRLqMwo/foMLh5eabFQjJhQgi5Hbd4PdmZnuFEA2wIQZmCJD/AtQNN1Lpx0b 06SMhRLJFER9dP93aD+nZpi4B2xPNqNlQolI2KMKJFDEy/hCaV+k/KoXxQe3Ai9BQY ET1EFxl7Zbwn963tlqLhhTNxRcoPJqxThT/TfFRtggrfwzc/XcoExYe6+B/N3ehKj0 IBw43IHDUKF6ztSihzA/xAbsMcHNLQKOISpZQdEu/5JinMj2mazGGmTWNg55gphkHe xiZ6eQ8FSxB7opJFrah3Pg77/09PffUFYTPy7dGbNlNZks0vJ26xp2KLkweWj9c1iq l3lYhjQyl4nhg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Christophe Leroy (CS GROUP)" , "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Shrikanth Hegde , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 12/14] sched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-case Date: Tue, 31 Mar 2026 15:16:20 +0200 Message-ID: <20260331131622.30505-13-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260331131622.30505-1-frederic@kernel.org> References: <20260331131622.30505-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The last reason why get_cpu_idle/iowait_time_us() may return -1 now is if the config doesn't support nohz. The ad-hoc replacement solution by cpufreq is to compute jiffies minus the whole busy cputime. Although the intention should provide a coherent low resolution estimation of the idle and iowait time, the implementation is buggy because jiffies don't start at 0. Just provide instead a real get_cpu_[idle|iowait]_time_us() offcase. Signed-off-by: Frederic Weisbecker Tested-by: Shrikanth Hegde --- drivers/cpufreq/cpufreq.c | 29 +---------------------------- include/linux/kernel_stat.h | 3 +++ include/linux/tick.h | 4 ---- kernel/sched/cputime.c | 12 +++++++++--- 4 files changed, 13 insertions(+), 35 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 277884d91913..1adffb8ccefd 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -130,38 +130,11 @@ struct kobject *get_governor_parent_kobj(struct cpufr= eq_policy *policy) } EXPORT_SYMBOL_GPL(get_governor_parent_kobj); =20 -static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, u64 *wall) -{ - struct kernel_cpustat kcpustat; - u64 cur_wall_time; - u64 idle_time; - u64 busy_time; - - cur_wall_time =3D jiffies64_to_nsecs(get_jiffies_64()); - - kcpustat_cpu_fetch(&kcpustat, cpu); - - busy_time =3D kcpustat.cpustat[CPUTIME_USER]; - busy_time +=3D kcpustat.cpustat[CPUTIME_SYSTEM]; - busy_time +=3D kcpustat.cpustat[CPUTIME_IRQ]; - busy_time +=3D kcpustat.cpustat[CPUTIME_SOFTIRQ]; - busy_time +=3D kcpustat.cpustat[CPUTIME_STEAL]; - busy_time +=3D kcpustat.cpustat[CPUTIME_NICE]; - - idle_time =3D cur_wall_time - busy_time; - if (wall) - *wall =3D div_u64(cur_wall_time, NSEC_PER_USEC); - - return div_u64(idle_time, NSEC_PER_USEC); -} - u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy) { u64 idle_time =3D get_cpu_idle_time_us(cpu, io_busy ? wall : NULL); =20 - if (idle_time =3D=3D -1ULL) - return get_cpu_idle_time_jiffy(cpu, wall); - else if (!io_busy) + if (!io_busy) idle_time +=3D get_cpu_iowait_time_us(cpu, wall); =20 return idle_time; diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index 3680519d7b2c..512104b0ff49 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -133,6 +133,9 @@ static inline bool kcpustat_idle_dyntick(void) } #endif /* CONFIG_NO_HZ_COMMON */ =20 +extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time); +extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time); + /* Fetch cputime values when vtime is disabled on a CPU */ static inline u64 kcpustat_field_default(enum cpu_usage_stat usage, int cp= u) { diff --git a/include/linux/tick.h b/include/linux/tick.h index 738007d6f577..1cf4651f09ad 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -139,8 +139,6 @@ extern bool tick_nohz_idle_got_tick(void); extern ktime_t tick_nohz_get_next_hrtimer(void); extern ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next); extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu); -extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time); -extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time); #else /* !CONFIG_NO_HZ_COMMON */ #define tick_nohz_enabled (0) static inline bool tick_nohz_is_active(void) { return false; } @@ -162,8 +160,6 @@ static inline ktime_t tick_nohz_get_sleep_length(ktime_= t *delta_next) *delta_next =3D TICK_NSEC; return *delta_next; } -static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; } -static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1= ; } #endif /* !CONFIG_NO_HZ_COMMON */ =20 /* diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index ebb9eee049e0..137e6b099fa9 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -523,6 +523,13 @@ u64 kcpustat_field_iowait(int cpu) nr_iowait_cpu(cpu), ktime_get()); } EXPORT_SYMBOL_GPL(kcpustat_field_iowait); +#else +static u64 kcpustat_field_dyntick(int cpu, enum cpu_usage_stat idx, + bool compute_delta, ktime_t now) +{ + return kcpustat_cpu(cpu).cpustat[idx]; +} +#endif /* CONFIG_NO_HZ_COMMON */ =20 static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usage_stat idx, bool compute_delta, u64 *last_update_time) @@ -558,7 +565,7 @@ static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usag= e_stat idx, * This time is measured via accounting rather than sampling, * and is as accurate as ktime_get() is. * - * Return: -1 if generic vtime is enabled, else total idle time of the @cpu + * Return: total idle time of the @cpu */ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) { @@ -582,7 +589,7 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); * This time is measured via accounting rather than sampling, * and is as accurate as ktime_get() is. * - * Return: -1 if generic vtime is enabled, else total iowait time of @cpu + * Return: total iowait time of @cpu */ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) { @@ -590,7 +597,6 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_ti= me) nr_iowait_cpu(cpu), last_update_time); } EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us); -#endif /* CONFIG_NO_HZ_COMMON */ =20 /* * Use precise platform statistics if available: --=20 2.53.0 From nobody Wed Apr 1 09:43:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54F4D3DBD62; Tue, 31 Mar 2026 13:18:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963089; cv=none; b=Pa+T9QcDX3gU7YvMuqCBWqBXyGL4Wzo2UD1e4Joy4Dch9/0yI/cStQ8LGAIB/IUBXqLOE1jhTyLJF9ega2WYLjKpVFf9sGkGagXZAqZ4Vqu4kSYcqJqMqJ37kwKnAkkF7dwvKL2MDUiRyijAX1vYQSPZ0RzT+9VXRYxgmGtlsys= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963089; c=relaxed/simple; bh=92dsTpBynD1g68/x/29CHbwEUuiIPa/Iwbdac5kZ2Xg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=i4dnYOsKkQxOzQxhA64lsgoNhYOjH15JwPNNivLWJ5EWfNWx0WKA4DzNsFuSJYe3N6mKs4Fy2pOxDFVHwiFe4/SZf4dytX17vytHeI0m7538TJ//sZMAGhku48VdjXk0lJEPrnIv37218gybc1ySMz6MUlJR3Dr3+fIBUIXJlCo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WeFf0V7Q; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WeFf0V7Q" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 09445C19423; Tue, 31 Mar 2026 13:18:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774963089; bh=92dsTpBynD1g68/x/29CHbwEUuiIPa/Iwbdac5kZ2Xg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WeFf0V7Q76pBqWbBm8IGxACxytO3klh4ei3hm4d7O+torfL3SL0J88znMRVk3JZmw omf84HqK58CsG9RlCrTiYc7+bBSeYHg1vRC3OoshW/Au5njEL+lE6PY15qBPm6wcLt 2xWjsIqddQJwLG2HYfTTMM7ykffZVMd5qlLTrON6cT1cGi+j3nmvED7/989mH3RHh3 759W7vogKMpmVJ6QNpAFAaV8mdaMZl2RHtcwjuajIUZT4vsozdYWLQG6ajGRF59nS0 /ceyoU60qKQX7AEOXV+KNEF2biVlKfOj3qUymOffxL4QvLd6ryyQAUT6bAiirxUy+T aqzSnvT3I0SBA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Christophe Leroy (CS GROUP)" , "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Shrikanth Hegde , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 13/14] sched/cputime: Handle idle irqtime gracefully Date: Tue, 31 Mar 2026 15:16:21 +0200 Message-ID: <20260331131622.30505-14-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260331131622.30505-1-frederic@kernel.org> References: <20260331131622.30505-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The dyntick-idle cputime accounting always assumes that IRQ time accounting is enabled and consequently stops elapsing the idle time during dyntick-idle IRQs. This doesn't mix up well with disabled IRQ time accounting because then idle IRQs become a cputime blind-spot. Also this feature is disabled on most configurations and the overhead of pausing dyntick-idle accounting while in idle IRQs could then be avoided. Fix the situation with conditionally pausing dyntick-idle accounting during idle IRQs only if neither native vtime (which does IRQ time accounting) nor generic IRQ time accounting are enabled. Also make sure that the accumulated IRQ time is not accidentally substracted from later accounting. Signed-off-by: Frederic Weisbecker Tested-by: Shrikanth Hegde --- kernel/sched/cputime.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 137e6b099fa9..a77b6f1dbdca 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -47,7 +47,8 @@ static void irqtime_account_delta(struct irqtime *irqtime= , u64 delta, u64_stats_update_begin(&irqtime->sync); cpustat[idx] +=3D delta; irqtime->total +=3D delta; - irqtime->tick_delta +=3D delta; + if (!kcpustat_idle_dyntick()) + irqtime->tick_delta +=3D delta; u64_stats_update_end(&irqtime->sync); } =20 @@ -478,7 +479,8 @@ void kcpustat_irq_enter(u64 now) { struct kernel_cpustat *kc =3D kcpustat_this_cpu; =20 - if (!vtime_generic_enabled_this_cpu()) + if (!vtime_generic_enabled_this_cpu() && + (irqtime_enabled() || vtime_accounting_enabled_this_cpu())) kcpustat_idle_stop(kc, now); } =20 @@ -486,7 +488,8 @@ void kcpustat_irq_exit(u64 now) { struct kernel_cpustat *kc =3D kcpustat_this_cpu; =20 - if (!vtime_generic_enabled_this_cpu()) + if (!vtime_generic_enabled_this_cpu() && + (irqtime_enabled() || vtime_accounting_enabled_this_cpu())) kcpustat_idle_start(kc, now); } =20 --=20 2.53.0 From nobody Wed Apr 1 09:43:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D79D1C3F0C; Tue, 31 Mar 2026 13:18:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963096; cv=none; b=srIgHwUa+2ZOFA1hG6PDoepfYSv7p20QNdXl0o5mksqv3mTKkhVFGArAbqjP0mJi8eTLphh+lyUXq/OD1GpCZr5NcywhB7mLa3D2MnLWBRf+4yvi7dRPWpgY5chfNziOyypQuZOj/Kbud+yqxeu4yhQqFP4z4Iy+M0Kq5DZZrH4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774963096; c=relaxed/simple; bh=PFIpn22Lqg8nUvYixerHksrQguG7VQqiCjtgoj0+dII=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jRRnYZ2PoPHATITCRmj86LV6Sy7YlAhAhxipPnlok5a/kaXRMm26v4bvKcDpcazrbxd1IAjydC9YUVdviYYlMPE0YNvaBrtHy/zfZFIcs4VS5JJhmnr4KeTvyOrTxkZl0MWhFliHeEnB8KuVxYnoSKuOwOzGdnfS10JHh4dL2eA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QcjY+/i5; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QcjY+/i5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6B76BC2BC9E; Tue, 31 Mar 2026 13:18:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774963096; bh=PFIpn22Lqg8nUvYixerHksrQguG7VQqiCjtgoj0+dII=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QcjY+/i5y/kMyQ5zzBs/8HbtBsPy4rODD3TLaBkViHed+LXJHBdR3Hl4Ou6uEQL1/ dh5y2JRw47HtffMZP5vvW6JDLevDFG3rthVqd2dVCUquw82FQw9f/WnKemDfCb6ydo 7zkk1apR5DmFqaq6YaEvuktsM7cxFtBOxTKP33IISuWEowhBJYUNt8qVHw/MRVQyif FtmCHCuJIBkqo/w31e+CcNNUhEY21vWe5DD+2vEPKRTmCD9TZ2Io/CfwC2NK7r5Mqe FYokXoKTBVBio+IyccRpfZkPOshlHMzMTAwv6aXrSbzrq6tzoumbKFgRXvqEa3e2Wn Z0LqDOdVIb2Og== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Christophe Leroy (CS GROUP)" , "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Madhavan Srinivasan , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Shrikanth Hegde , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH 14/14] sched/cputime: Handle dyntick-idle steal time correctly Date: Tue, 31 Mar 2026 15:16:22 +0200 Message-ID: <20260331131622.30505-15-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260331131622.30505-1-frederic@kernel.org> References: <20260331131622.30505-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The dyntick-idle steal time is currently accounted when the tick restarts but the stolen idle time is not subtracted from the idle time that was already accounted. This is to avoid observing the idle time going backward as the dyntick-idle cputime accessors can't reliably know in advance the stolen idle time. In order to maintain a forward progressing idle cputime while subtracting idle steal time from it, keep track of the previously accounted idle stolen time and substract it from _later_ idle cputime accounting. Signed-off-by: Frederic Weisbecker Tested-by: Shrikanth Hegde --- include/linux/kernel_stat.h | 1 + kernel/sched/cputime.c | 21 +++++++++++++++------ 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index 512104b0ff49..24a54a6151ba 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -39,6 +39,7 @@ struct kernel_cpustat { bool idle_elapse; seqcount_t idle_sleeptime_seq; u64 idle_entrytime; + u64 idle_stealtime; #endif u64 cpustat[NR_STATS]; }; diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index a77b6f1dbdca..4ae779b161b5 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -425,19 +425,25 @@ static inline void irqtime_account_process_tick(struc= t task_struct *p, int user_ static void kcpustat_idle_stop(struct kernel_cpustat *kc, u64 now) { u64 *cpustat =3D kc->cpustat; - u64 delta; + u64 delta, steal, steal_delta; =20 if (!kc->idle_elapse) return; =20 delta =3D now - kc->idle_entrytime; + steal =3D steal_account_process_time(delta); =20 write_seqcount_begin(&kc->idle_sleeptime_seq); + steal_delta =3D min_t(u64, kc->idle_stealtime, delta); + delta -=3D steal_delta; + kc->idle_stealtime -=3D steal_delta; + if (nr_iowait_cpu(smp_processor_id()) > 0) cpustat[CPUTIME_IOWAIT] +=3D delta; else cpustat[CPUTIME_IDLE] +=3D delta; =20 + kc->idle_stealtime +=3D steal; kc->idle_entrytime =3D now; kc->idle_elapse =3D false; write_seqcount_end(&kc->idle_sleeptime_seq); @@ -460,7 +466,6 @@ void kcpustat_dyntick_stop(u64 now) kcpustat_idle_stop(kc, now); kc->idle_dyntick =3D false; vtime_dyntick_stop(); - steal_account_process_time(ULONG_MAX); } } =20 @@ -504,10 +509,14 @@ static u64 kcpustat_field_dyntick(int cpu, enum cpu_u= sage_stat idx, do { seq =3D read_seqcount_begin(&kc->idle_sleeptime_seq); =20 - if (kc->idle_elapse && compute_delta) - idle =3D cpustat[idx] + (now - kc->idle_entrytime); - else - idle =3D cpustat[idx]; + idle =3D cpustat[idx]; + + if (kc->idle_elapse && compute_delta) { + u64 delta =3D now - kc->idle_entrytime; + + delta -=3D min_t(u64, kc->idle_stealtime, delta); + idle +=3D delta; + } } while (read_seqcount_retry(&kc->idle_sleeptime_seq, seq)); =20 return idle; --=20 2.53.0