From nobody Sun Feb 8 19:47:15 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 735F25CDFD for ; Thu, 1 Feb 2024 13:12:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793151; cv=none; b=lWp06rbFPC6TcqgiDvTLhTPPnY/mfIz3fFJw/oY6H50MbHkuW4pZAlstHsgokpF0QkDOTR6lL0DnrDt6U6iF091e4wCJtGWjYc0mBjOZrZm61PmycG3itbeHpZH4H9TzJRDxePOr7YLNqkLw2NgOCtprXqaA4TiwkbYVum43HAQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793151; c=relaxed/simple; bh=t7Hg4wTa+RM0YHgJQOc3/BebdVYjqWwE6KBAvsLFOVs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=urVen31Z1aQcVwfX29KQvmlnjyFBPG+ScEPuRYsXOn0uLysVrT/sfayYMRXgXkkj786i2oIpjNt2shYY1/yRaS1OcQW0dp1nG3QI0Rtnp9ydrFteuqxg2BdZEDZGz4yZKQOOoaIQ7HeB5OZI7Rjyeget8cRZ+P0p3aTdhkywI3Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 88A6F1764; Thu, 1 Feb 2024 05:13:10 -0800 (PST) Received: from e130256.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A19333F762; Thu, 1 Feb 2024 05:12:25 -0800 (PST) From: Hongyan Xia To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Juri Lelli , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider Cc: Qais Yousef , Morten Rasmussen , Lukasz Luba , Christian Loehle , linux-kernel@vger.kernel.org, David Dai , Saravana Kannan Subject: [RFC PATCH v2 5/7] sched/fair: Massively simplify util_fits_cpu() Date: Thu, 1 Feb 2024 13:12:01 +0000 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently, there's no way to distinguish the difference between 1) a CPU that is actually maxed out at its highest frequency, or 2) one that is throttled because of UCLAMP_MAX, since both present util_avg values of 1024. This is problematic because when we try to pick a CPU for a task to run, we would like to give 2) a chance, or at least prefer 2) to 1). Current upstream now gives all 0 spare capacity CPUs a chance to consider queuing more tasks because there's a chance that 0 spare capacity is due to UCLAMP_MAX. However, this creates further problems because energy calculations are now bogus when spare capacity is already 0, and tasks tend to pile up on one CPU. Fix by using util_avg_uclamp for util_fits_cpu(). This way, case 1) will still keep its utilization at 1024 whereas 2) shows spare capacities if the sum of util_avg_uclamp values is still under the CPU capacity. Under sum aggregation, checking whether a task fits a CPU becomes much simpler. We simply do fits_capacity() and there does not need to be all kinds of code checking all corner cases for uclamp. This means util_fits_cpu() returns to true and false instead of tri-state, simplifying a huge amount of code. [1]: https://lore.kernel.org/all/20230205224318.2035646-2-qyousef@layalina.= io/ Signed-off-by: Hongyan Xia --- kernel/sched/fair.c | 253 ++++---------------------------------------- 1 file changed, 23 insertions(+), 230 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index b92739e1c52f..49997f1f58fb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4974,135 +4974,19 @@ static inline void util_est_update(struct cfs_rq *= cfs_rq, trace_sched_util_est_se_tp(&p->se); } =20 -static inline int util_fits_cpu(unsigned long util, - unsigned long uclamp_min, - unsigned long uclamp_max, - int cpu) +/* util must be the uclamp'ed value (i.e. from util_avg_uclamp). */ +static inline int util_fits_cpu(unsigned long util, int cpu) { - unsigned long capacity_orig, capacity_orig_thermal; unsigned long capacity =3D capacity_of(cpu); - bool fits, uclamp_max_fits; =20 - /* - * Check if the real util fits without any uclamp boost/cap applied. - */ - fits =3D fits_capacity(util, capacity); - - if (!uclamp_is_used()) - return fits; - - /* - * We must use arch_scale_cpu_capacity() for comparing against uclamp_min= and - * uclamp_max. We only care about capacity pressure (by using - * capacity_of()) for comparing against the real util. - * - * If a task is boosted to 1024 for example, we don't want a tiny - * pressure to skew the check whether it fits a CPU or not. - * - * Similarly if a task is capped to arch_scale_cpu_capacity(little_cpu), = it - * should fit a little cpu even if there's some pressure. - * - * Only exception is for thermal pressure since it has a direct impact - * on available OPP of the system. - * - * We honour it for uclamp_min only as a drop in performance level - * could result in not getting the requested minimum performance level. - * - * For uclamp_max, we can tolerate a drop in performance level as the - * goal is to cap the task. So it's okay if it's getting less. - */ - capacity_orig =3D arch_scale_cpu_capacity(cpu); - capacity_orig_thermal =3D capacity_orig - arch_scale_thermal_pressure(cpu= ); - - /* - * We want to force a task to fit a cpu as implied by uclamp_max. - * But we do have some corner cases to cater for.. - * - * - * C=3Dz - * | ___ - * | C=3Dy | | - * |_ _ _ _ _ _ _ _ _ ___ _ _ _ | _ | _ _ _ _ _ uclamp_max - * | C=3Dx | | | | - * | ___ | | | | - * | | | | | | | (util somewhere in this region) - * | | | | | | | - * | | | | | | | - * +---------------------------------------- - * cpu0 cpu1 cpu2 - * - * In the above example if a task is capped to a specific performance - * point, y, then when: - * - * * util =3D 80% of x then it does not fit on cpu0 and should migrate - * to cpu1 - * * util =3D 80% of y then it is forced to fit on cpu1 to honour - * uclamp_max request. - * - * which is what we're enforcing here. A task always fits if - * uclamp_max <=3D capacity_orig. But when uclamp_max > capacity_orig, - * the normal upmigration rules should withhold still. - * - * Only exception is when we are on max capacity, then we need to be - * careful not to block overutilized state. This is so because: - * - * 1. There's no concept of capping at max_capacity! We can't go - * beyond this performance level anyway. - * 2. The system is being saturated when we're operating near - * max capacity, it doesn't make sense to block overutilized. - */ - uclamp_max_fits =3D (capacity_orig =3D=3D SCHED_CAPACITY_SCALE) && (uclam= p_max =3D=3D SCHED_CAPACITY_SCALE); - uclamp_max_fits =3D !uclamp_max_fits && (uclamp_max <=3D capacity_orig); - fits =3D fits || uclamp_max_fits; - - /* - * - * C=3Dz - * | ___ (region a, capped, util >=3D= uclamp_max) - * | C=3Dy | | - * |_ _ _ _ _ _ _ _ _ ___ _ _ _ | _ | _ _ _ _ _ uclamp_max - * | C=3Dx | | | | - * | ___ | | | | (region b, uclamp_min <=3D u= til <=3D uclamp_max) - * |_ _ _|_ _|_ _ _ _| _ | _ _ _| _ | _ _ _ _ _ uclamp_min - * | | | | | | | - * | | | | | | | (region c, boosted, util < u= clamp_min) - * +---------------------------------------- - * cpu0 cpu1 cpu2 - * - * a) If util > uclamp_max, then we're capped, we don't care about - * actual fitness value here. We only care if uclamp_max fits - * capacity without taking margin/pressure into account. - * See comment above. - * - * b) If uclamp_min <=3D util <=3D uclamp_max, then the normal - * fits_capacity() rules apply. Except we need to ensure that we - * enforce we remain within uclamp_max, see comment above. - * - * c) If util < uclamp_min, then we are boosted. Same as (b) but we - * need to take into account the boosted value fits the CPU without - * taking margin/pressure into account. - * - * Cases (a) and (b) are handled in the 'fits' variable already. We - * just need to consider an extra check for case (c) after ensuring we - * handle the case uclamp_min > uclamp_max. - */ - uclamp_min =3D min(uclamp_min, uclamp_max); - if (fits && (util < uclamp_min) && (uclamp_min > capacity_orig_thermal)) - return -1; - - return fits; + return fits_capacity(util, capacity); } =20 static inline int task_fits_cpu(struct task_struct *p, int cpu) { - unsigned long uclamp_min =3D uclamp_eff_value(p, UCLAMP_MIN); - unsigned long uclamp_max =3D uclamp_eff_value(p, UCLAMP_MAX); unsigned long util =3D task_util_est(p); - /* - * Return true only if the cpu fully fits the task requirements, which - * include the utilization but also the performance hints. - */ - return (util_fits_cpu(util, uclamp_min, uclamp_max, cpu) > 0); + + return util_fits_cpu(util, cpu); } =20 static inline void update_misfit_status(struct task_struct *p, struct rq *= rq) @@ -6678,11 +6562,8 @@ static inline void hrtick_update(struct rq *rq) #ifdef CONFIG_SMP static inline bool cpu_overutilized(int cpu) { - unsigned long rq_util_min =3D uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN); - unsigned long rq_util_max =3D uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX); - /* Return true only if the utilization doesn't fit CPU's capacity */ - return !util_fits_cpu(cpu_util_cfs(cpu), rq_util_min, rq_util_max, cpu); + return !util_fits_cpu(cpu_util_cfs(cpu), cpu); } =20 static inline void update_overutilized_status(struct rq *rq) @@ -7463,8 +7344,7 @@ static int select_idle_cpu(struct task_struct *p, str= uct sched_domain *sd, bool static int select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int t= arget) { - unsigned long task_util, util_min, util_max, best_cap =3D 0; - int fits, best_fits =3D 0; + unsigned long task_util, best_cap =3D 0; int cpu, best_cpu =3D -1; struct cpumask *cpus; =20 @@ -7472,8 +7352,6 @@ select_idle_capacity(struct task_struct *p, struct sc= hed_domain *sd, int target) cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); =20 task_util =3D task_util_est(p); - util_min =3D uclamp_eff_value(p, UCLAMP_MIN); - util_max =3D uclamp_eff_value(p, UCLAMP_MAX); =20 for_each_cpu_wrap(cpu, cpus, target) { unsigned long cpu_cap =3D capacity_of(cpu); @@ -7481,44 +7359,22 @@ select_idle_capacity(struct task_struct *p, struct = sched_domain *sd, int target) if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu)) continue; =20 - fits =3D util_fits_cpu(task_util, util_min, util_max, cpu); - - /* This CPU fits with all requirements */ - if (fits > 0) + if (util_fits_cpu(task_util, cpu)) return cpu; - /* - * Only the min performance hint (i.e. uclamp_min) doesn't fit. - * Look for the CPU with best capacity. - */ - else if (fits < 0) - cpu_cap =3D arch_scale_cpu_capacity(cpu) - thermal_load_avg(cpu_rq(cpu)= ); =20 - /* - * First, select CPU which fits better (-1 being better than 0). - * Then, select the one with best capacity at same level. - */ - if ((fits < best_fits) || - ((fits =3D=3D best_fits) && (cpu_cap > best_cap))) { + if (cpu_cap > best_cap) { best_cap =3D cpu_cap; best_cpu =3D cpu; - best_fits =3D fits; } } =20 return best_cpu; } =20 -static inline bool asym_fits_cpu(unsigned long util, - unsigned long util_min, - unsigned long util_max, - int cpu) +static inline bool asym_fits_cpu(unsigned long util, int cpu) { if (sched_asym_cpucap_active()) - /* - * Return true only if the cpu fully fits the task requirements - * which include the utilization and the performance hints. - */ - return (util_fits_cpu(util, util_min, util_max, cpu) > 0); + return util_fits_cpu(util, cpu); =20 return true; } @@ -7530,7 +7386,7 @@ static int select_idle_sibling(struct task_struct *p,= int prev, int target) { bool has_idle_core =3D false; struct sched_domain *sd; - unsigned long task_util, util_min, util_max; + unsigned long task_util; int i, recent_used_cpu, prev_aff =3D -1; =20 /* @@ -7540,8 +7396,6 @@ static int select_idle_sibling(struct task_struct *p,= int prev, int target) if (sched_asym_cpucap_active()) { sync_entity_load_avg(&p->se); task_util =3D task_util_est(p); - util_min =3D uclamp_eff_value(p, UCLAMP_MIN); - util_max =3D uclamp_eff_value(p, UCLAMP_MAX); } =20 /* @@ -7550,7 +7404,7 @@ static int select_idle_sibling(struct task_struct *p,= int prev, int target) lockdep_assert_irqs_disabled(); =20 if ((available_idle_cpu(target) || sched_idle_cpu(target)) && - asym_fits_cpu(task_util, util_min, util_max, target)) + asym_fits_cpu(task_util, target)) return target; =20 /* @@ -7558,7 +7412,7 @@ static int select_idle_sibling(struct task_struct *p,= int prev, int target) */ if (prev !=3D target && cpus_share_cache(prev, target) && (available_idle_cpu(prev) || sched_idle_cpu(prev)) && - asym_fits_cpu(task_util, util_min, util_max, prev)) { + asym_fits_cpu(task_util, prev)) { =20 if (!static_branch_unlikely(&sched_cluster_active) || cpus_share_resources(prev, target)) @@ -7579,7 +7433,7 @@ static int select_idle_sibling(struct task_struct *p,= int prev, int target) in_task() && prev =3D=3D smp_processor_id() && this_rq()->nr_running <=3D 1 && - asym_fits_cpu(task_util, util_min, util_max, prev)) { + asym_fits_cpu(task_util, prev)) { return prev; } =20 @@ -7591,7 +7445,7 @@ static int select_idle_sibling(struct task_struct *p,= int prev, int target) cpus_share_cache(recent_used_cpu, target) && (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cp= u)) && cpumask_test_cpu(recent_used_cpu, p->cpus_ptr) && - asym_fits_cpu(task_util, util_min, util_max, recent_used_cpu)) { + asym_fits_cpu(task_util, recent_used_cpu)) { =20 if (!static_branch_unlikely(&sched_cluster_active) || cpus_share_resources(recent_used_cpu, target)) @@ -7966,13 +7820,8 @@ static int find_energy_efficient_cpu(struct task_str= uct *p, int prev_cpu) { struct cpumask *cpus =3D this_cpu_cpumask_var_ptr(select_rq_mask); unsigned long prev_delta =3D ULONG_MAX, best_delta =3D ULONG_MAX; - unsigned long p_util_min =3D uclamp_is_used() ? uclamp_eff_value(p, UCLAM= P_MIN) : 0; - unsigned long p_util_max =3D uclamp_is_used() ? uclamp_eff_value(p, UCLAM= P_MAX) : 1024; struct root_domain *rd =3D this_rq()->rd; int cpu, best_energy_cpu, target =3D -1; - int prev_fits =3D -1, best_fits =3D -1; - unsigned long best_thermal_cap =3D 0; - unsigned long prev_thermal_cap =3D 0; struct sched_domain *sd; struct perf_domain *pd; struct energy_env eenv; @@ -8001,14 +7850,11 @@ static int find_energy_efficient_cpu(struct task_st= ruct *p, int prev_cpu) eenv_task_busy_time(&eenv, p, prev_cpu); =20 for (; pd; pd =3D pd->next) { - unsigned long util_min =3D p_util_min, util_max =3D p_util_max; unsigned long cpu_cap, cpu_thermal_cap, util; unsigned long cur_delta, max_spare_cap =3D 0; - unsigned long rq_util_min, rq_util_max; unsigned long prev_spare_cap =3D 0; int max_spare_cap_cpu =3D -1; unsigned long base_energy; - int fits, max_fits =3D -1; =20 cpumask_and(cpus, perf_domain_span(pd), cpu_online_mask); =20 @@ -8024,8 +7870,6 @@ static int find_energy_efficient_cpu(struct task_stru= ct *p, int prev_cpu) eenv.pd_cap =3D 0; =20 for_each_cpu(cpu, cpus) { - struct rq *rq =3D cpu_rq(cpu); - eenv.pd_cap +=3D cpu_thermal_cap; =20 if (!cpumask_test_cpu(cpu, sched_domain_span(sd))) @@ -8036,31 +7880,7 @@ static int find_energy_efficient_cpu(struct task_str= uct *p, int prev_cpu) =20 util =3D cpu_util(cpu, p, cpu, 0); cpu_cap =3D capacity_of(cpu); - - /* - * Skip CPUs that cannot satisfy the capacity request. - * IOW, placing the task there would make the CPU - * overutilized. Take uclamp into account to see how - * much capacity we can get out of the CPU; this is - * aligned with sched_cpu_util(). - */ - if (uclamp_is_used() && !uclamp_rq_is_idle(rq)) { - /* - * Open code uclamp_rq_util_with() except for - * the clamp() part. Ie: apply max aggregation - * only. util_fits_cpu() logic requires to - * operate on non clamped util but must use the - * max-aggregated uclamp_{min, max}. - */ - rq_util_min =3D uclamp_rq_get(rq, UCLAMP_MIN); - rq_util_max =3D uclamp_rq_get(rq, UCLAMP_MAX); - - util_min =3D max(rq_util_min, p_util_min); - util_max =3D max(rq_util_max, p_util_max); - } - - fits =3D util_fits_cpu(util, util_min, util_max, cpu); - if (!fits) + if (!util_fits_cpu(util, cpu)) continue; =20 lsub_positive(&cpu_cap, util); @@ -8068,9 +7888,7 @@ static int find_energy_efficient_cpu(struct task_stru= ct *p, int prev_cpu) if (cpu =3D=3D prev_cpu) { /* Always use prev_cpu as a candidate. */ prev_spare_cap =3D cpu_cap; - prev_fits =3D fits; - } else if ((fits > max_fits) || - ((fits =3D=3D max_fits) && (cpu_cap > max_spare_cap))) { + } else if (cpu_cap > max_spare_cap) { /* * Find the CPU with the maximum spare capacity * among the remaining CPUs in the performance @@ -8078,7 +7896,6 @@ static int find_energy_efficient_cpu(struct task_stru= ct *p, int prev_cpu) */ max_spare_cap =3D cpu_cap; max_spare_cap_cpu =3D cpu; - max_fits =3D fits; } } =20 @@ -8097,50 +7914,26 @@ static int find_energy_efficient_cpu(struct task_st= ruct *p, int prev_cpu) if (prev_delta < base_energy) goto unlock; prev_delta -=3D base_energy; - prev_thermal_cap =3D cpu_thermal_cap; best_delta =3D min(best_delta, prev_delta); } =20 /* Evaluate the energy impact of using max_spare_cap_cpu. */ if (max_spare_cap_cpu >=3D 0 && max_spare_cap > prev_spare_cap) { - /* Current best energy cpu fits better */ - if (max_fits < best_fits) - continue; - - /* - * Both don't fit performance hint (i.e. uclamp_min) - * but best energy cpu has better capacity. - */ - if ((max_fits < 0) && - (cpu_thermal_cap <=3D best_thermal_cap)) - continue; - cur_delta =3D compute_energy(&eenv, pd, cpus, p, max_spare_cap_cpu); /* CPU utilization has changed */ if (cur_delta < base_energy) goto unlock; cur_delta -=3D base_energy; - - /* - * Both fit for the task but best energy cpu has lower - * energy impact. - */ - if ((max_fits > 0) && (best_fits > 0) && - (cur_delta >=3D best_delta)) - continue; - - best_delta =3D cur_delta; - best_energy_cpu =3D max_spare_cap_cpu; - best_fits =3D max_fits; - best_thermal_cap =3D cpu_thermal_cap; + if (cur_delta < best_delta) { + best_delta =3D cur_delta; + best_energy_cpu =3D max_spare_cap_cpu; + } } } rcu_read_unlock(); =20 - if ((best_fits > prev_fits) || - ((best_fits > 0) && (best_delta < prev_delta)) || - ((best_fits < 0) && (best_thermal_cap > prev_thermal_cap))) + if (best_delta < prev_delta) target =3D best_energy_cpu; =20 return target; --=20 2.34.1