From nobody Wed Jul 1 14:37:25 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3F24C433FE for ; Mon, 20 Dec 2021 11:43:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232013AbhLTLnm (ORCPT ); Mon, 20 Dec 2021 06:43:42 -0500 Received: from foss.arm.com ([217.140.110.172]:52892 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229861AbhLTLnl (ORCPT ); Mon, 20 Dec 2021 06:43:41 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 32A6511FB; Mon, 20 Dec 2021 03:43:41 -0800 (PST) Received: from localhost.localdomain (unknown [10.57.37.247]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id D69E33F718; Mon, 20 Dec 2021 03:43:39 -0800 (PST) From: Vincent Donnefort To: peterz@infradead.org, mingo@redhat.com, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, dietmar.eggemann@arm.com, valentin.schneider@arm.com, morten.rasmussen@arm.com, qperret@google.com, Vincent Donnefort Subject: [PATCH 1/3] sched/fair: Make cpu_overutilized() EAS dependent Date: Mon, 20 Dec 2021 12:43:21 +0100 Message-Id: <20211220114323.22811-2-vincent.donnefort@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211220114323.22811-1-vincent.donnefort@arm.com> References: <20211220114323.22811-1-vincent.donnefort@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" On a system with Energy Aware Scheduling (EAS), tasks are placed according to their energy consumption estimation and load balancing is disabled to not break that energy biased placement. If the system becomes overutilized, i.e. one of the CPU has too much utilization, energy placement would then be disabled, in favor of Capacity-Aware Scheduling (CAS), including load balancing. This is the sole usage for rd->overutilized. Hence, there is no need to raise it for !EAS systems. Fixes: 2802bf3cd936 ("sched/fair: Add over-utilization/tipping point indica= tor") Signed-off-by: Vincent Donnefort diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 095b0aa378df..e2f6fa14e5e7 100644 Reviewed-by: Valentin Schneider --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5511,7 +5511,8 @@ static inline void hrtick_update(struct rq *rq) #ifdef CONFIG_SMP static inline bool cpu_overutilized(int cpu) { - return !fits_capacity(cpu_util_cfs(cpu), capacity_of(cpu)); + return sched_energy_enabled() && + !fits_capacity(cpu_util_cfs(cpu), capacity_of(cpu)); } =20 static inline void update_overutilized_status(struct rq *rq) --=20 2.25.1 From nobody Wed Jul 1 14:37:25 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0118EC433F5 for ; Mon, 20 Dec 2021 11:43:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232012AbhLTLnp (ORCPT ); Mon, 20 Dec 2021 06:43:45 -0500 Received: from foss.arm.com ([217.140.110.172]:52906 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232023AbhLTLnn (ORCPT ); Mon, 20 Dec 2021 06:43:43 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E45701396; Mon, 20 Dec 2021 03:43:42 -0800 (PST) Received: from localhost.localdomain (unknown [10.57.37.247]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 81FFF3F718; Mon, 20 Dec 2021 03:43:41 -0800 (PST) From: Vincent Donnefort To: peterz@infradead.org, mingo@redhat.com, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, dietmar.eggemann@arm.com, valentin.schneider@arm.com, morten.rasmussen@arm.com, qperret@google.com, Vincent Donnefort Subject: [PATCH 2/3] sched/fair: Fix newidle_balance() for overutilized systems Date: Mon, 20 Dec 2021 12:43:22 +0100 Message-Id: <20211220114323.22811-3-vincent.donnefort@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211220114323.22811-1-vincent.donnefort@arm.com> References: <20211220114323.22811-1-vincent.donnefort@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" On Energy-Aware Scheduling systems, load balancing is disabled in favor of energy based placement, until one of the CPU is identified as being overutilized. Once the overutilization is resolved, two paths can lead to marking the system as non overutilized again: * load_balance() triggered from newidle_balance(). * load_balance() triggered from the scheduler tick. However, small caveat for each of those paths. newidle_balance() needs rd->overload set to run load_balance(), while the load_balance() triggered by the scheduler tick needs to run from the first idle CPU of the root domain (see should_we_balance()). Overutilized can be triggered without setting overload (this can happen for a CPU which had a misfit task but didn't had its util_avg updated yet). Then, only the scheduler tick could help to reset overutilized... but if most of the CPUs are idle, it is very unlikely load_balance() would run on the only CPU which can reset the flag. This means the root domain can spuriously maintain overutilized for a long period of time. We then need newidle_balance() to proceed with balancing if the system is overutilized. Fixes: 2802bf3cd936 ("sched/fair: Add over-utilization/tipping point indica= tor") Signed-off-by: Vincent Donnefort diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e2f6fa14e5e7..51f6f55abb37 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10849,7 +10849,8 @@ static int newidle_balance(struct rq *this_rq, stru= ct rq_flags *rf) rcu_read_lock(); sd =3D rcu_dereference_check_sched_domain(this_rq->sd); =20 - if (!READ_ONCE(this_rq->rd->overload) || + if ((!READ_ONCE(this_rq->rd->overload) && + !READ_ONCE(this_rq->rd->overutilized)) || (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { =20 if (sd) --=20 2.25.1 From nobody Wed Jul 1 14:37:25 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95406C433FE for ; Mon, 20 Dec 2021 11:43:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232041AbhLTLnq (ORCPT ); Mon, 20 Dec 2021 06:43:46 -0500 Received: from foss.arm.com ([217.140.110.172]:52922 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232004AbhLTLnp (ORCPT ); Mon, 20 Dec 2021 06:43:45 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F330C139F; Mon, 20 Dec 2021 03:43:44 -0800 (PST) Received: from localhost.localdomain (unknown [10.57.37.247]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 390203F718; Mon, 20 Dec 2021 03:43:43 -0800 (PST) From: Vincent Donnefort To: peterz@infradead.org, mingo@redhat.com, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, dietmar.eggemann@arm.com, valentin.schneider@arm.com, morten.rasmussen@arm.com, qperret@google.com, Vincent Donnefort Subject: [PATCH 3/3] sched/fair: Do not raise overutilized for idle CPUs Date: Mon, 20 Dec 2021 12:43:23 +0100 Message-Id: <20211220114323.22811-4-vincent.donnefort@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211220114323.22811-1-vincent.donnefort@arm.com> References: <20211220114323.22811-1-vincent.donnefort@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" During a migration, the lock for the previous runqueue is not taken and hence, the task contribution isn't directly removed from that runqueue utilization but instead temporarily saved, until the next PELT signals update where it would be accounted. There is then a window in which a CPU can ben idle be nonetheless overutilized. The load balancer wouldn't be able to do anything to help a sleeping CPU, it brings then no gain to raise overutilized there, only the risk of spuriously doing it. Signed-off-by: Vincent Donnefort diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 51f6f55abb37..37f737c5f0b8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8641,26 +8641,28 @@ static inline void update_sg_lb_stats(struct lb_env= *env, =20 nr_running =3D rq->nr_running; sgs->sum_nr_running +=3D nr_running; - - if (nr_running > 1) - *sg_status |=3D SG_OVERLOAD; - - if (cpu_overutilized(i)) - *sg_status |=3D SG_OVERUTILIZED; - #ifdef CONFIG_NUMA_BALANCING sgs->nr_numa_running +=3D rq->nr_numa_running; sgs->nr_preferred_running +=3D rq->nr_preferred_running; #endif + if (nr_running > 1) + *sg_status |=3D SG_OVERLOAD; + /* * No need to call idle_cpu() if nr_running is not 0 */ if (!nr_running && idle_cpu(i)) { sgs->idle_cpus++; - /* Idle cpu can't have misfit task */ + /* + * Idle cpu can neither be overutilized nor have a + * misfit task. + */ continue; } =20 + if (cpu_overutilized(i)) + *sg_status |=3D SG_OVERUTILIZED; + if (local_group) continue; =20 --=20 2.25.1