From nobody Sat Apr 11 21:30:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BAF3C19F2A for ; Thu, 4 Aug 2022 14:36:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239845AbiHDOgk (ORCPT ); Thu, 4 Aug 2022 10:36:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239805AbiHDOgh (ORCPT ); Thu, 4 Aug 2022 10:36:37 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 13C224A815 for ; Thu, 4 Aug 2022 07:36:36 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 863211480; Thu, 4 Aug 2022 07:36:36 -0700 (PDT) Received: from localhost.localdomain (unknown [10.57.10.177]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4D5303F73B; Thu, 4 Aug 2022 07:36:34 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , "Peter Zijlstra (Intel)" , Vincent Guittot , Dietmar Eggemann Cc: linux-kernel@vger.kernel.org, Xuewen Yan , Lukasz Luba , Wei Wang , Jonathan JMChen , Hank , Qais Yousef Subject: [PATCH v2 1/9] sched/uclamp: Fix relationship between uclamp and migration margin Date: Thu, 4 Aug 2022 15:36:01 +0100 Message-Id: <20220804143609.515789-2-qais.yousef@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220804143609.515789-1-qais.yousef@arm.com> References: <20220804143609.515789-1-qais.yousef@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" fits_capacity() verifies that a util is within 20% margin of the capacity of a CPU, which is an attempt to speed up upmigration. But when uclamp is used, this 20% margin is problematic because for example if a task is boosted to 1024, then it will not fit on any CPU according to fits_capacity() logic. Or if a task is boosted to capacity_orig_of(medium_cpu). The task will end up on big instead on the desired medium CPU. Similar corner cases exist for uclamp and usage of capacity_of(). Slightest irq pressure on biggest CPU for example will make a 1024 boosted task look like it can't fit. What we really want is for uclamp comparisons to ignore the migration margin and capacity pressure, yet retain them for when checking the _actual_ util signal. For example, task p: p->util_avg =3D 300 p->uclamp[UCLAMP_MIN] =3D 1024 Will fit a big CPU. But p->util_avg =3D 900 p->uclamp[UCLAMP_MIN] =3D 1024 will not, this should trigger overutilized state because the big CPU is now *actually* being saturated. Similar reasoning applies to capping tasks with UCLAMP_MAX. For example: p->util_avg =3D 1024 p->uclamp[UCLAMP_MAX] =3D capacity_orig_of(medium_cpu) Should fit the task on medium cpus without triggering overutilized state. Inlined comments expand more on desired behavior in more scenarios. Introduce new util_fits_cpu() function which encapsulates the new logic. The new function is not used anywhere yet, but will be used to update various users of fits_capacity() in later patches. Fixes: af24bde8df202 ("sched/uclamp: Add uclamp support to energy_compute()= ") Signed-off-by: Qais Yousef --- kernel/sched/fair.c | 123 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 123 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 52dc9d6f811e..00c2de800685 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4254,6 +4254,129 @@ static inline void util_est_update(struct cfs_rq *c= fs_rq, trace_sched_util_est_se_tp(&p->se); } =20 +static inline int util_fits_cpu(unsigned long util, + unsigned long uclamp_min, + unsigned long uclamp_max, + int cpu) +{ + unsigned long capacity_orig, capacity_orig_thermal; + unsigned long capacity =3D capacity_of(cpu); + bool fits, uclamp_max_fits; + + /* + * Check if the real util fits without any uclamp boost/cap applied. + */ + fits =3D fits_capacity(util, capacity); + + if (!uclamp_is_used()) + return fits; + + /* + * We must use capacity_orig_of() for comparing against uclamp_min and + * uclamp_max. We only care about capacity pressure (by using + * capacity_of()) for comparing against the real util. + * + * If a task is boosted to 1024 for example, we don't want a tiny + * pressure to skew the check whether it fits a CPU or not. + * + * Similarly if a task is capped to capacity_orig_of(little_cpu), it + * should fit a little cpu even if there's some pressure. + * + * Only exception is for thermal pressure since it has a direct impact + * on available OPP of the system. + * + * We honour it for uclamp_min only as a drop in performance level + * could result in not getting the requested minimum performance level. + * + * For uclamp_max, we can tolerate a drop in performance level as the + * goal is to cap the task. So it's okay if it's getting less. + * + * In case of capacity inversion, which is not handled yet, we should + * honour the inverted capacity for both uclamp_min and uclamp_max all + * the time. + */ + capacity_orig =3D capacity_orig_of(cpu); + capacity_orig_thermal =3D capacity_orig - arch_scale_thermal_pressure(cpu= ); + + /* + * We want to force a task to fit a cpu as implied by uclamp_max. + * But we do have some corner cases to cater for.. + * + * + * C=3Dz + * | ___ + * | C=3Dy | | + * |_ _ _ _ _ _ _ _ _ ___ _ _ _ | _ | _ _ _ _ _ uclamp_max + * | C=3Dx | | | | + * | ___ | | | | + * | | | | | | | (util somewhere in this region) + * | | | | | | | + * | | | | | | | + * +---------------------------------------- + * cpu0 cpu1 cpu2 + * + * In the above example if a task is capped to a specific performance + * point, y, then when: + * + * * util =3D 80% of x then it does not fit on cpu0 and should migrate + * to cpu1 + * * util =3D 80% of y then it is forced to fit on cpu1 to honour + * uclamp_max request. + * + * which is what we're enforcing here. A task always fits if + * uclamp_max <=3D capacity_orig. But when uclamp_max > capacity_orig, + * the normal upmigration rules should withhold still. + * + * Only exception is when we are on max capacity, then we need to be + * careful not to block overutilized state. This is so because: + * + * 1. There's no concept of capping at max_capacity! We can't go + * beyond this performance level anyway. + * 2. The system is being saturated when we're operating near + * max capacity, it doesn't make sense to block overutilized. + */ + uclamp_max_fits =3D (capacity_orig =3D=3D SCHED_CAPACITY_SCALE) && (uclam= p_max =3D=3D SCHED_CAPACITY_SCALE); + uclamp_max_fits =3D !uclamp_max_fits && (uclamp_max <=3D capacity_orig); + fits =3D fits || uclamp_max_fits; + + /* + * + * C=3Dz + * | ___ (region a, capped, util >=3D= uclamp_max) + * | C=3Dy | | + * |_ _ _ _ _ _ _ _ _ ___ _ _ _ | _ | _ _ _ _ _ uclamp_max + * | C=3Dx | | | | + * | ___ | | | | (region b, uclamp_min <=3D u= til <=3D uclamp_max) + * |_ _ _|_ _|_ _ _ _| _ | _ _ _| _ | _ _ _ _ _ uclamp_min + * | | | | | | | + * | | | | | | | (region c, boosted, util < u= clamp_min) + * +---------------------------------------- + * cpu0 cpu1 cpu2 + * + * a) If util > uclamp_max, then we're capped, we don't care about + * actual fitness value here. We only care if uclamp_max fits + * capacity without taking margin/pressure into account. + * See comment above. + * + * b) If uclamp_min <=3D util <=3D uclamp_max, then the normal + * fits_capacity() rules apply. Except we need to ensure that we + * enforce we remain within uclamp_max, see comment above. + * + * c) If util < uclamp_min, then we are boosted. Same as (b) but we + * need to take into account the boosted value fits the CPU without + * taking margin/pressure into account. + * + * Cases (a) and (b) are handled in the 'fits' variable already. We + * just need to consider an extra check for case (c) after ensuring we + * handle the case uclamp_min > uclamp_max. + */ + uclamp_min =3D min(uclamp_min, uclamp_max); + if (util < uclamp_min && capacity_orig !=3D SCHED_CAPACITY_SCALE) + fits =3D fits && (uclamp_min <=3D capacity_orig_thermal); + + return fits; +} + static inline int task_fits_capacity(struct task_struct *p, unsigned long capacity) { --=20 2.25.1 From nobody Sat Apr 11 21:30:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E150C19F2A for ; Thu, 4 Aug 2022 14:36:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239910AbiHDOgw (ORCPT ); Thu, 4 Aug 2022 10:36:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239839AbiHDOgk (ORCPT ); Thu, 4 Aug 2022 10:36:40 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1D9684B48D for ; Thu, 4 Aug 2022 07:36:39 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6785B14BF; Thu, 4 Aug 2022 07:36:39 -0700 (PDT) Received: from localhost.localdomain (unknown [10.57.10.177]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 38DC93F73B; Thu, 4 Aug 2022 07:36:37 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , "Peter Zijlstra (Intel)" , Vincent Guittot , Dietmar Eggemann Cc: linux-kernel@vger.kernel.org, Xuewen Yan , Lukasz Luba , Wei Wang , Jonathan JMChen , Hank , Qais Yousef Subject: [PATCH v2 2/9] sched/uclamp: Make task_fits_capacity() use util_fits_cpu() Date: Thu, 4 Aug 2022 15:36:02 +0100 Message-Id: <20220804143609.515789-3-qais.yousef@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220804143609.515789-1-qais.yousef@arm.com> References: <20220804143609.515789-1-qais.yousef@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" So that the new uclamp rules in regard to migration margin and capacity pressure are taken into account correctly. Fixes: a7008c07a568 ("sched/fair: Make task_fits_capacity() consider uclamp= restrictions") Co-developed-by: Vincent Guittot Signed-off-by: Qais Yousef --- kernel/sched/fair.c | 26 ++++++++++++++++---------- kernel/sched/sched.h | 9 +++++++++ 2 files changed, 25 insertions(+), 10 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 00c2de800685..78feb9ca1e41 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4377,10 +4377,12 @@ static inline int util_fits_cpu(unsigned long util, return fits; } =20 -static inline int task_fits_capacity(struct task_struct *p, - unsigned long capacity) +static inline int task_fits_cpu(struct task_struct *p, int cpu) { - return fits_capacity(uclamp_task_util(p), capacity); + unsigned long uclamp_min =3D uclamp_eff_value(p, UCLAMP_MIN); + unsigned long uclamp_max =3D uclamp_eff_value(p, UCLAMP_MAX); + unsigned long util =3D task_util_est(p); + return util_fits_cpu(util, uclamp_min, uclamp_max, cpu); } =20 static inline void update_misfit_status(struct task_struct *p, struct rq *= rq) @@ -4393,7 +4395,7 @@ static inline void update_misfit_status(struct task_s= truct *p, struct rq *rq) return; } =20 - if (task_fits_capacity(p, capacity_of(cpu_of(rq)))) { + if (task_fits_cpu(p, cpu_of(rq))) { rq->misfit_task_load =3D 0; return; } @@ -8234,7 +8236,7 @@ static int detach_tasks(struct lb_env *env) =20 case migrate_misfit: /* This is not a misfit task */ - if (task_fits_capacity(p, capacity_of(env->src_cpu))) + if (task_fits_cpu(p, env->src_cpu)) goto next; =20 env->imbalance =3D 0; @@ -9239,6 +9241,10 @@ static inline void update_sg_wakeup_stats(struct sch= ed_domain *sd, =20 memset(sgs, 0, sizeof(*sgs)); =20 + /* Assume that task can't fit any CPU of the group */ + if (sd->flags & SD_ASYM_CPUCAPACITY) + sgs->group_misfit_task_load =3D 1; + for_each_cpu(i, sched_group_span(group)) { struct rq *rq =3D cpu_rq(i); unsigned int local; @@ -9258,12 +9264,12 @@ static inline void update_sg_wakeup_stats(struct sc= hed_domain *sd, if (!nr_running && idle_cpu_without(i, p)) sgs->idle_cpus++; =20 - } + /* Check if task fits in the CPU */ + if (sd->flags & SD_ASYM_CPUCAPACITY && + sgs->group_misfit_task_load && + task_fits_cpu(p, i)) + sgs->group_misfit_task_load =3D 0; =20 - /* Check if task fits in the group */ - if (sd->flags & SD_ASYM_CPUCAPACITY && - !task_fits_capacity(p, group->sgc->max_capacity)) { - sgs->group_misfit_task_load =3D 1; } =20 sgs->group_capacity =3D group->sgc->capacity; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 3ccd35c22f0f..eec1cac3eef4 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3047,6 +3047,15 @@ static inline bool uclamp_is_used(void) return static_branch_likely(&sched_uclamp_used); } #else /* CONFIG_UCLAMP_TASK */ +static inline unsigned long uclamp_eff_value(struct task_struct *p, + enum uclamp_id clamp_id) +{ + if (clamp_id =3D=3D UCLAMP_MIN) + return 0; + + return SCHED_CAPACITY_SCALE; +} + static inline unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util, struct task_struct *p) --=20 2.25.1 From nobody Sat Apr 11 21:30:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07A2DC19F2A for ; Thu, 4 Aug 2022 14:36:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239886AbiHDOgz (ORCPT ); Thu, 4 Aug 2022 10:36:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49728 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239860AbiHDOgt (ORCPT ); Thu, 4 Aug 2022 10:36:49 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C71144D4CC for ; Thu, 4 Aug 2022 07:36:41 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4668B150C; Thu, 4 Aug 2022 07:36:42 -0700 (PDT) Received: from localhost.localdomain (unknown [10.57.10.177]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id ED2FB3F73B; Thu, 4 Aug 2022 07:36:39 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , "Peter Zijlstra (Intel)" , Vincent Guittot , Dietmar Eggemann Cc: linux-kernel@vger.kernel.org, Xuewen Yan , Lukasz Luba , Wei Wang , Jonathan JMChen , Hank , Qais Yousef , Yun Hsiang Subject: [PATCH v2 3/9] sched/uclamp: Fix fits_capacity() check in feec() Date: Thu, 4 Aug 2022 15:36:03 +0100 Message-Id: <20220804143609.515789-4-qais.yousef@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220804143609.515789-1-qais.yousef@arm.com> References: <20220804143609.515789-1-qais.yousef@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" As reported by Yun Hsiang [1], if a task has its uclamp_min >=3D 0.8 * 1024, it'll always pick the previous CPU because fits_capacity() will always return false in this case. The new util_fits_cpu() logic should handle this correctly for us beside more corner cases where similar failures could occur, like when using UCLAMP_MAX. We open code uclamp_rq_util_with() except for the clamp() part, util_fits_cpu() needs the 'raw' values to be passed to it. Also introduce uclamp_rq_{set, get}() shorthand accessors to get uclamp value for the rq. Makes the code more readable and ensures the right rules (use READ_ONCE/WRITE_ONCE) are respected transparently. [1] https://lists.linaro.org/pipermail/eas-dev/2020-July/001488.html Fixes: 1d42509e475c ("sched/fair: Make EAS wakeup placement consider uclamp= restrictions") Reported-by: Yun Hsiang Signed-off-by: Qais Yousef --- kernel/sched/core.c | 10 +++++----- kernel/sched/fair.c | 26 ++++++++++++++++++++++++-- kernel/sched/sched.h | 42 +++++++++++++++++++++++++++++++++++++++--- 3 files changed, 68 insertions(+), 10 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 64c08993221b..ea66c525d3ef 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1398,7 +1398,7 @@ static inline void uclamp_idle_reset(struct rq *rq, e= num uclamp_id clamp_id, if (!(rq->uclamp_flags & UCLAMP_FLAG_IDLE)) return; =20 - WRITE_ONCE(rq->uclamp[clamp_id].value, clamp_value); + uclamp_rq_set(rq, clamp_id, clamp_value); } =20 static inline @@ -1549,8 +1549,8 @@ static inline void uclamp_rq_inc_id(struct rq *rq, st= ruct task_struct *p, if (bucket->tasks =3D=3D 1 || uc_se->value > bucket->value) bucket->value =3D uc_se->value; =20 - if (uc_se->value > READ_ONCE(uc_rq->value)) - WRITE_ONCE(uc_rq->value, uc_se->value); + if (uc_se->value > uclamp_rq_get(rq, clamp_id)) + uclamp_rq_set(rq, clamp_id, uc_se->value); } =20 /* @@ -1616,7 +1616,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, st= ruct task_struct *p, if (likely(bucket->tasks)) return; =20 - rq_clamp =3D READ_ONCE(uc_rq->value); + rq_clamp =3D uclamp_rq_get(rq, clamp_id); /* * Defensive programming: this should never happen. If it happens, * e.g. due to future modification, warn and fixup the expected value. @@ -1624,7 +1624,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, st= ruct task_struct *p, SCHED_WARN_ON(bucket->value > rq_clamp); if (bucket->value >=3D rq_clamp) { bkt_clamp =3D uclamp_rq_max_value(rq, clamp_id, uc_se->value); - WRITE_ONCE(uc_rq->value, bkt_clamp); + uclamp_rq_set(rq, clamp_id, bkt_clamp); } } =20 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 78feb9ca1e41..ea02c64cd933 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6993,6 +6993,8 @@ static int find_energy_efficient_cpu(struct task_stru= ct *p, int prev_cpu) { struct cpumask *cpus =3D this_cpu_cpumask_var_ptr(select_rq_mask); unsigned long prev_delta =3D ULONG_MAX, best_delta =3D ULONG_MAX; + unsigned long p_util_min =3D uclamp_is_used() ? uclamp_eff_value(p, UCLAM= P_MIN) : 0; + unsigned long p_util_max =3D uclamp_is_used() ? uclamp_eff_value(p, UCLAM= P_MAX) : 1024; struct root_domain *rd =3D this_rq()->rd; int cpu, best_energy_cpu, target =3D -1; struct sched_domain *sd; @@ -7025,6 +7027,8 @@ static int find_energy_efficient_cpu(struct task_stru= ct *p, int prev_cpu) for (; pd; pd =3D pd->next) { unsigned long cpu_cap, cpu_thermal_cap, util; unsigned long cur_delta, max_spare_cap =3D 0; + unsigned long rq_util_min, rq_util_max; + unsigned long util_min, util_max; bool compute_prev_delta =3D false; int max_spare_cap_cpu =3D -1; unsigned long base_energy; @@ -7061,8 +7065,26 @@ static int find_energy_efficient_cpu(struct task_str= uct *p, int prev_cpu) * much capacity we can get out of the CPU; this is * aligned with sched_cpu_util(). */ - util =3D uclamp_rq_util_with(cpu_rq(cpu), util, p); - if (!fits_capacity(util, cpu_cap)) + if (uclamp_is_used()) { + if (uclamp_rq_is_idle(cpu_rq(cpu))) { + util_min =3D p_util_min; + util_max =3D p_util_max; + } else { + /* + * Open code uclamp_rq_util_with() except for + * the clamp() part. Ie: apply max aggregation + * only. util_fits_cpu() logic requires to + * operate on non clamped util but must use the + * max-aggregated uclamp_{min, max}. + */ + rq_util_min =3D uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN); + rq_util_max =3D uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX); + + util_min =3D max(rq_util_min, p_util_min); + util_max =3D max(rq_util_max, p_util_max); + } + } + if (!util_fits_cpu(util, util_min, util_max, cpu)) continue; =20 lsub_positive(&cpu_cap, util); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index eec1cac3eef4..caf017f7def6 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2966,6 +2966,23 @@ static inline unsigned long cpu_util_rt(struct rq *r= q) #ifdef CONFIG_UCLAMP_TASK unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp= _id); =20 +static inline unsigned long uclamp_rq_get(struct rq *rq, + enum uclamp_id clamp_id) +{ + return READ_ONCE(rq->uclamp[clamp_id].value); +} + +static inline void uclamp_rq_set(struct rq *rq, enum uclamp_id clamp_id, + unsigned int value) +{ + WRITE_ONCE(rq->uclamp[clamp_id].value, value); +} + +static inline bool uclamp_rq_is_idle(struct rq *rq) +{ + return rq->uclamp_flags & UCLAMP_FLAG_IDLE; +} + /** * uclamp_rq_util_with - clamp @util with @rq and @p effective uclamp valu= es. * @rq: The rq to clamp against. Must not be NULL. @@ -3001,12 +3018,12 @@ unsigned long uclamp_rq_util_with(struct rq *rq, un= signed long util, * Ignore last runnable task's max clamp, as this task will * reset it. Similarly, no need to read the rq's min clamp. */ - if (rq->uclamp_flags & UCLAMP_FLAG_IDLE) + if (uclamp_rq_is_idle(rq)) goto out; } =20 - min_util =3D max_t(unsigned long, min_util, READ_ONCE(rq->uclamp[UCLAMP_M= IN].value)); - max_util =3D max_t(unsigned long, max_util, READ_ONCE(rq->uclamp[UCLAMP_M= AX].value)); + min_util =3D max_t(unsigned long, min_util, uclamp_rq_get(rq, UCLAMP_MIN)= ); + max_util =3D max_t(unsigned long, max_util, uclamp_rq_get(rq, UCLAMP_MAX)= ); out: /* * Since CPU's {min,max}_util clamps are MAX aggregated considering @@ -3069,6 +3086,25 @@ static inline bool uclamp_is_used(void) { return false; } + +static inline unsigned long uclamp_rq_get(struct rq *rq, + enum uclamp_id clamp_id) +{ + if (clamp_id =3D=3D UCLAMP_MIN) + return 0; + + return SCHED_CAPACITY_SCALE; +} + +static inline void uclamp_rq_set(struct rq *rq, enum uclamp_id clamp_id, + unsigned int value) +{ +} + +static inline bool uclamp_rq_is_idle(struct rq *rq) +{ + return false; +} #endif /* CONFIG_UCLAMP_TASK */ =20 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ --=20 2.25.1 From nobody Sat Apr 11 21:30:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F818C19F2A for ; Thu, 4 Aug 2022 14:37:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239965AbiHDOhA (ORCPT ); Thu, 4 Aug 2022 10:37:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239823AbiHDOgt (ORCPT ); Thu, 4 Aug 2022 10:36:49 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 472A74C627 for ; Thu, 4 Aug 2022 07:36:44 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BDB3911FB; Thu, 4 Aug 2022 07:36:44 -0700 (PDT) Received: from localhost.localdomain (unknown [10.57.10.177]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 913993F73B; Thu, 4 Aug 2022 07:36:42 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , "Peter Zijlstra (Intel)" , Vincent Guittot , Dietmar Eggemann Cc: linux-kernel@vger.kernel.org, Xuewen Yan , Lukasz Luba , Wei Wang , Jonathan JMChen , Hank , Qais Yousef Subject: [PATCH v2 4/9] sched/uclamp: Make select_idle_capacity() use util_fits_cpu() Date: Thu, 4 Aug 2022 15:36:04 +0100 Message-Id: <20220804143609.515789-5-qais.yousef@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220804143609.515789-1-qais.yousef@arm.com> References: <20220804143609.515789-1-qais.yousef@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Use the new util_fits_cpu() to ensure migration margin and capacity pressure are taken into account correctly when uclamp is being used otherwise we will fail to consider CPUs as fitting in scenarios where they should. Fixes: b4c9c9f15649 ("sched/fair: Prefer prev cpu in asymmetric wakeup path= ") Signed-off-by: Qais Yousef --- kernel/sched/fair.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ea02c64cd933..3079ca867f2c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6603,21 +6603,23 @@ static int select_idle_cpu(struct task_struct *p, s= truct sched_domain *sd, bool static int select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int t= arget) { - unsigned long task_util, best_cap =3D 0; + unsigned long task_util, util_min, util_max, best_cap =3D 0; int cpu, best_cpu =3D -1; struct cpumask *cpus; =20 cpus =3D this_cpu_cpumask_var_ptr(select_rq_mask); cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); =20 - task_util =3D uclamp_task_util(p); + task_util =3D task_util_est(p); + util_min =3D uclamp_eff_value(p, UCLAMP_MIN); + util_max =3D uclamp_eff_value(p, UCLAMP_MAX); =20 for_each_cpu_wrap(cpu, cpus, target) { unsigned long cpu_cap =3D capacity_of(cpu); =20 if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu)) continue; - if (fits_capacity(task_util, cpu_cap)) + if (util_fits_cpu(task_util, util_min, util_max, cpu)) return cpu; =20 if (cpu_cap > best_cap) { --=20 2.25.1 From nobody Sat Apr 11 21:30:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFA20C19F2A for ; Thu, 4 Aug 2022 14:37:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239805AbiHDOhD (ORCPT ); Thu, 4 Aug 2022 10:37:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239888AbiHDOgv (ORCPT ); Thu, 4 Aug 2022 10:36:51 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C819932461 for ; Thu, 4 Aug 2022 07:36:46 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 406FA1480; Thu, 4 Aug 2022 07:36:47 -0700 (PDT) Received: from localhost.localdomain (unknown [10.57.10.177]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 14A213F73B; Thu, 4 Aug 2022 07:36:44 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , "Peter Zijlstra (Intel)" , Vincent Guittot , Dietmar Eggemann Cc: linux-kernel@vger.kernel.org, Xuewen Yan , Lukasz Luba , Wei Wang , Jonathan JMChen , Hank , Qais Yousef Subject: [PATCH v2 5/9] sched/uclamp: Make asym_fits_capacity() use util_fits_cpu() Date: Thu, 4 Aug 2022 15:36:05 +0100 Message-Id: <20220804143609.515789-6-qais.yousef@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220804143609.515789-1-qais.yousef@arm.com> References: <20220804143609.515789-1-qais.yousef@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Use the new util_fits_cpu() to ensure migration margin and capacity pressure are taken into account correctly when uclamp is being used otherwise we will fail to consider CPUs as fitting in scenarios where they should. s/asym_fits_capacity/asym_fits_cpu/ to better reflect what it does now. Fixes: b4c9c9f15649 ("sched/fair: Prefer prev cpu in asymmetric wakeup path= ") Signed-off-by: Qais Yousef --- kernel/sched/fair.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3079ca867f2c..38cf56ade66d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6631,10 +6631,13 @@ select_idle_capacity(struct task_struct *p, struct = sched_domain *sd, int target) return best_cpu; } =20 -static inline bool asym_fits_capacity(unsigned long task_util, int cpu) +static inline bool asym_fits_cpu(unsigned long util, + unsigned long util_min, + unsigned long util_max, + int cpu) { if (sched_asym_cpucap_active()) - return fits_capacity(task_util, capacity_of(cpu)); + return util_fits_cpu(util, util_min, util_max, cpu); =20 return true; } @@ -6646,7 +6649,7 @@ static int select_idle_sibling(struct task_struct *p,= int prev, int target) { bool has_idle_core =3D false; struct sched_domain *sd; - unsigned long task_util; + unsigned long task_util, util_min, util_max; int i, recent_used_cpu; =20 /* @@ -6655,7 +6658,9 @@ static int select_idle_sibling(struct task_struct *p,= int prev, int target) */ if (sched_asym_cpucap_active()) { sync_entity_load_avg(&p->se); - task_util =3D uclamp_task_util(p); + task_util =3D task_util_est(p); + util_min =3D uclamp_eff_value(p, UCLAMP_MIN); + util_max =3D uclamp_eff_value(p, UCLAMP_MAX); } =20 /* @@ -6664,7 +6669,7 @@ static int select_idle_sibling(struct task_struct *p,= int prev, int target) lockdep_assert_irqs_disabled(); =20 if ((available_idle_cpu(target) || sched_idle_cpu(target)) && - asym_fits_capacity(task_util, target)) + asym_fits_cpu(task_util, util_min, util_max, target)) return target; =20 /* @@ -6672,7 +6677,7 @@ static int select_idle_sibling(struct task_struct *p,= int prev, int target) */ if (prev !=3D target && cpus_share_cache(prev, target) && (available_idle_cpu(prev) || sched_idle_cpu(prev)) && - asym_fits_capacity(task_util, prev)) + asym_fits_cpu(task_util, util_min, util_max, prev)) return prev; =20 /* @@ -6687,7 +6692,7 @@ static int select_idle_sibling(struct task_struct *p,= int prev, int target) in_task() && prev =3D=3D smp_processor_id() && this_rq()->nr_running <=3D 1 && - asym_fits_capacity(task_util, prev)) { + asym_fits_cpu(task_util, util_min, util_max, prev)) { return prev; } =20 @@ -6699,7 +6704,7 @@ static int select_idle_sibling(struct task_struct *p,= int prev, int target) cpus_share_cache(recent_used_cpu, target) && (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cp= u)) && cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr) && - asym_fits_capacity(task_util, recent_used_cpu)) { + asym_fits_cpu(task_util, util_min, util_max, recent_used_cpu)) { return recent_used_cpu; } =20 --=20 2.25.1 From nobody Sat Apr 11 21:30:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D660BC19F2A for ; Thu, 4 Aug 2022 14:37:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240024AbiHDOhL (ORCPT ); Thu, 4 Aug 2022 10:37:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239852AbiHDOgv (ORCPT ); Thu, 4 Aug 2022 10:36:51 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 764DD49B50 for ; Thu, 4 Aug 2022 07:36:49 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E2E4111FB; Thu, 4 Aug 2022 07:36:49 -0700 (PDT) Received: from localhost.localdomain (unknown [10.57.10.177]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B68793F73B; Thu, 4 Aug 2022 07:36:47 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , "Peter Zijlstra (Intel)" , Vincent Guittot , Dietmar Eggemann Cc: linux-kernel@vger.kernel.org, Xuewen Yan , Lukasz Luba , Wei Wang , Jonathan JMChen , Hank , Qais Yousef Subject: [PATCH v2 6/9] sched/uclamp: Make cpu_overutilized() use util_fits_cpu() Date: Thu, 4 Aug 2022 15:36:06 +0100 Message-Id: <20220804143609.515789-7-qais.yousef@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220804143609.515789-1-qais.yousef@arm.com> References: <20220804143609.515789-1-qais.yousef@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" So that it is now uclamp aware. This fixes a major problem of busy tasks capped with UCLAMP_MAX keeping the system in overutilized state which disables EAS and leads to wasting energy in the long run. Without this patch running a busy background activity like JIT compilation on Pixel 6 causes the system to be in overutilized state 74.5% of the time. With this patch this goes down to 9.79%. It also fixes another problem when long running tasks that have their UCLAMP_MIN changed while running such that they need to upmigrate to honour the new UCLAMP_MIN value. The upmigration doesn't get triggered because overutilized state never gets set in this state, hence misfit migration never happens at tick in this case until the task wakes up again. Fixes: af24bde8df202 ("sched/uclamp: Add uclamp support to energy_compute()= ") Signed-off-by: Qais Yousef --- kernel/sched/fair.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 38cf56ade66d..4c3a5240d7e3 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5807,7 +5807,10 @@ static inline void hrtick_update(struct rq *rq) #ifdef CONFIG_SMP static inline bool cpu_overutilized(int cpu) { - return !fits_capacity(cpu_util_cfs(cpu), capacity_of(cpu)); + unsigned long rq_util_min =3D uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN); + unsigned long rq_util_max =3D uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX); + + return !util_fits_cpu(cpu_util_cfs(cpu), rq_util_min, rq_util_max, cpu); } =20 static inline void update_overutilized_status(struct rq *rq) --=20 2.25.1 From nobody Sat Apr 11 21:30:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6955CC19F2D for ; Thu, 4 Aug 2022 14:37:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239942AbiHDOhX (ORCPT ); Thu, 4 Aug 2022 10:37:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239839AbiHDOgx (ORCPT ); Thu, 4 Aug 2022 10:36:53 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6874E4B490 for ; Thu, 4 Aug 2022 07:36:52 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8FA581480; Thu, 4 Aug 2022 07:36:52 -0700 (PDT) Received: from localhost.localdomain (unknown [10.57.10.177]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 63C8A3F73B; Thu, 4 Aug 2022 07:36:50 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , "Peter Zijlstra (Intel)" , Vincent Guittot , Dietmar Eggemann Cc: linux-kernel@vger.kernel.org, Xuewen Yan , Lukasz Luba , Wei Wang , Jonathan JMChen , Hank , Qais Yousef Subject: [PATCH v2 7/9] sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition Date: Thu, 4 Aug 2022 15:36:07 +0100 Message-Id: <20220804143609.515789-8-qais.yousef@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220804143609.515789-1-qais.yousef@arm.com> References: <20220804143609.515789-1-qais.yousef@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" If the utilization of the woken up task is 0, we skip the energy calculation because it has no impact. But if the task is boosted (uclamp_min !=3D 0) will have an impact on task placement and frequency selection. Only skip if the util is truly 0 after applying uclamp values. Change uclamp_task_cpu() signature to avoid unnecessary additional calls to uclamp_eff_get(). feec() is the only user now. Fixes: 732cd75b8c920 ("sched/fair: Select an energy-efficient CPU on task w= ake-up") Signed-off-by: Qais Yousef --- kernel/sched/fair.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4c3a5240d7e3..59ba7106ddc6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4108,14 +4108,16 @@ static inline unsigned long task_util_est(struct ta= sk_struct *p) } =20 #ifdef CONFIG_UCLAMP_TASK -static inline unsigned long uclamp_task_util(struct task_struct *p) +static inline unsigned long uclamp_task_util(struct task_struct *p, + unsigned long uclamp_min, + unsigned long uclamp_max) { - return clamp(task_util_est(p), - uclamp_eff_value(p, UCLAMP_MIN), - uclamp_eff_value(p, UCLAMP_MAX)); + return clamp(task_util_est(p), uclamp_min, uclamp_max); } #else -static inline unsigned long uclamp_task_util(struct task_struct *p) +static inline unsigned long uclamp_task_util(struct task_struct *p, + unsigned long uclamp_min, + unsigned long uclamp_max) { return task_util_est(p); } @@ -7029,7 +7031,7 @@ static int find_energy_efficient_cpu(struct task_stru= ct *p, int prev_cpu) target =3D prev_cpu; =20 sync_entity_load_avg(&p->se); - if (!task_util_est(p)) + if (!uclamp_task_util(p, p_util_min, p_util_max)) goto unlock; =20 eenv_task_busy_time(&eenv, p, prev_cpu); --=20 2.25.1 From nobody Sat Apr 11 21:30:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52687C19F2A for ; Thu, 4 Aug 2022 14:37:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239851AbiHDOh2 (ORCPT ); Thu, 4 Aug 2022 10:37:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239948AbiHDOg5 (ORCPT ); Thu, 4 Aug 2022 10:36:57 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 172174D820 for ; Thu, 4 Aug 2022 07:36:55 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 822D311FB; Thu, 4 Aug 2022 07:36:55 -0700 (PDT) Received: from localhost.localdomain (unknown [10.57.10.177]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 56C3A3F73B; Thu, 4 Aug 2022 07:36:53 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , "Peter Zijlstra (Intel)" , Vincent Guittot , Dietmar Eggemann Cc: linux-kernel@vger.kernel.org, Xuewen Yan , Lukasz Luba , Wei Wang , Jonathan JMChen , Hank , Qais Yousef Subject: [PATCH v2 8/9] sched/fair: Detect capacity inversion Date: Thu, 4 Aug 2022 15:36:08 +0100 Message-Id: <20220804143609.515789-9-qais.yousef@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220804143609.515789-1-qais.yousef@arm.com> References: <20220804143609.515789-1-qais.yousef@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Check each performance domain to see if thermal pressure is causing its capacity to be lower than another performance domain. We assume that each performance domain has CPUs with the same capacities, which is similar to an assumption made in energy_model.c We also assume that thermal pressure impacts all CPUs in a performance domain equally. If there're multiple performance domains with the same capacity_orig, we will trigger a capacity inversion if the domain is under thermal pressure. The new cpu_in_capacity_inversion() should help users to know when information about capacity_orig are not reliable and can opt in to use the inverted capacity as the 'actual' capacity_orig. Signed-off-by: Qais Yousef --- kernel/sched/fair.c | 63 +++++++++++++++++++++++++++++++++++++++++--- kernel/sched/sched.h | 19 +++++++++++++ 2 files changed, 79 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 59ba7106ddc6..cb32dc9a057f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8659,16 +8659,73 @@ static unsigned long scale_rt_capacity(int cpu) =20 static void update_cpu_capacity(struct sched_domain *sd, int cpu) { + unsigned long capacity_orig =3D arch_scale_cpu_capacity(cpu); unsigned long capacity =3D scale_rt_capacity(cpu); struct sched_group *sdg =3D sd->groups; + struct rq *rq =3D cpu_rq(cpu); =20 - cpu_rq(cpu)->cpu_capacity_orig =3D arch_scale_cpu_capacity(cpu); + rq->cpu_capacity_orig =3D capacity_orig; =20 if (!capacity) capacity =3D 1; =20 - cpu_rq(cpu)->cpu_capacity =3D capacity; - trace_sched_cpu_capacity_tp(cpu_rq(cpu)); + rq->cpu_capacity =3D capacity; + + /* + * Detect if the performance domain is in capacity inversion state. + * + * Capacity inversion happens when another perf domain with equal or + * lower capacity_orig_of() ends up having higher capacity than this + * domain after subtracting thermal pressure. + * + * We only take into account thermal pressure in this detection as it's + * the only metric that actually results in *real* reduction of + * capacity due to performance points (OPPs) being dropped/become + * unreachable due to thermal throttling. + * + * We assume: + * * That all cpus in a perf domain have the same capacity_orig + * (same uArch). + * * Thermal pressure will impact all cpus in this perf domain + * equally. + */ + if (static_branch_unlikely(&sched_asym_cpucapacity)) { + unsigned long inv_cap =3D capacity_orig - thermal_load_avg(rq); + struct perf_domain *pd =3D rcu_dereference(rq->rd->pd); + + rq->cpu_capacity_inverted =3D 0; + + for (; pd; pd =3D pd->next) { + struct cpumask *pd_span =3D perf_domain_span(pd); + unsigned long pd_cap_orig, pd_cap; + + cpu =3D cpumask_any(pd_span); + pd_cap_orig =3D arch_scale_cpu_capacity(cpu); + + if (capacity_orig < pd_cap_orig) + continue; + + /* + * handle the case of multiple perf domains have the + * same capacity_orig but one of them is under higher + * thermal pressure. We record it as capacity + * inversion. + */ + if (capacity_orig =3D=3D pd_cap_orig) { + pd_cap =3D pd_cap_orig - thermal_load_avg(cpu_rq(cpu)); + + if (pd_cap > inv_cap) { + rq->cpu_capacity_inverted =3D inv_cap; + break; + } + } else if (pd_cap_orig > inv_cap) { + rq->cpu_capacity_inverted =3D inv_cap; + break; + } + } + } + + trace_sched_cpu_capacity_tp(rq); =20 sdg->sgc->capacity =3D capacity; sdg->sgc->min_capacity =3D capacity; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index caf017f7def6..541a70fa55b3 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1033,6 +1033,7 @@ struct rq { =20 unsigned long cpu_capacity; unsigned long cpu_capacity_orig; + unsigned long cpu_capacity_inverted; =20 struct callback_head *balance_callback; =20 @@ -2865,6 +2866,24 @@ static inline unsigned long capacity_orig_of(int cpu) return cpu_rq(cpu)->cpu_capacity_orig; } =20 +/* + * Returns inverted capacity if the CPU is in capacity inversion state. + * 0 otherwise. + * + * Capacity inversion detection only considers thermal impact where actual + * performance points (OPPs) gets dropped. + * + * Capacity inversion state happens when another performance domain that h= as + * equal or lower capacity_orig_of() becomes effectively larger than the p= erf + * domain this CPU belongs to due to thermal pressure throttling it hard. + * + * See comment in update_cpu_capacity(). + */ +static inline unsigned long cpu_in_capacity_inversion(int cpu) +{ + return cpu_rq(cpu)->cpu_capacity_inverted; +} + /** * enum cpu_util_type - CPU utilization type * @FREQUENCY_UTIL: Utilization used to select frequency --=20 2.25.1 From nobody Sat Apr 11 21:30:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5112DC19F2A for ; Thu, 4 Aug 2022 14:37:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239467AbiHDOhb (ORCPT ); Thu, 4 Aug 2022 10:37:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239988AbiHDOhC (ORCPT ); Thu, 4 Aug 2022 10:37:02 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C7CCD52475 for ; Thu, 4 Aug 2022 07:36:57 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3F16C1480; Thu, 4 Aug 2022 07:36:58 -0700 (PDT) Received: from localhost.localdomain (unknown [10.57.10.177]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 142293F73B; Thu, 4 Aug 2022 07:36:55 -0700 (PDT) From: Qais Yousef To: Ingo Molnar , "Peter Zijlstra (Intel)" , Vincent Guittot , Dietmar Eggemann Cc: linux-kernel@vger.kernel.org, Xuewen Yan , Lukasz Luba , Wei Wang , Jonathan JMChen , Hank , Qais Yousef Subject: [PATCH v2 9/9] sched/fair: Consider capacity inversion in util_fits_cpu() Date: Thu, 4 Aug 2022 15:36:09 +0100 Message-Id: <20220804143609.515789-10-qais.yousef@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220804143609.515789-1-qais.yousef@arm.com> References: <20220804143609.515789-1-qais.yousef@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" We do consider thermal pressure in util_fits_cpu() for uclamp_min only. With the exception of the biggest cores which by definition are the max performance point of the system and all tasks by definition should fit. Even under thermal pressure, the capacity of the biggest CPU is the highest in the system and should still fit every task. Except when it reaches capacity inversion point, then this is no longer true. We can handle this by using the inverted capacity as capacity_orig in util_fits_cpu(). Which not only addresses the problem above, but also ensure uclamp_max now considers the inverted capacity. Force fitting a task when a CPU is in this adverse state will contribute to making the thermal throttling last longer. Signed-off-by: Qais Yousef --- kernel/sched/fair.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index cb32dc9a057f..77ae343e32a3 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4293,12 +4293,16 @@ static inline int util_fits_cpu(unsigned long util, * For uclamp_max, we can tolerate a drop in performance level as the * goal is to cap the task. So it's okay if it's getting less. * - * In case of capacity inversion, which is not handled yet, we should - * honour the inverted capacity for both uclamp_min and uclamp_max all - * the time. + * In case of capacity inversion we should honour the inverted capacity + * for both uclamp_min and uclamp_max all the time. */ - capacity_orig =3D capacity_orig_of(cpu); - capacity_orig_thermal =3D capacity_orig - arch_scale_thermal_pressure(cpu= ); + capacity_orig =3D cpu_in_capacity_inversion(cpu); + if (capacity_orig) { + capacity_orig_thermal =3D capacity_orig; + } else { + capacity_orig =3D capacity_orig_of(cpu); + capacity_orig_thermal =3D capacity_orig - arch_scale_thermal_pressure(cp= u); + } =20 /* * We want to force a task to fit a cpu as implied by uclamp_max. --=20 2.25.1