[v2] Fix relationship between uclamp and fits_capacity()

[PATCH v2 1/9] sched/uclamp: Fix relationship between uclamp and migration margin

Posted by Qais Yousef 3 years, 8 months ago

fits_capacity() verifies that a util is within 20% margin of the
capacity of a CPU, which is an attempt to speed up upmigration.

But when uclamp is used, this 20% margin is problematic because for
example if a task is boosted to 1024, then it will not fit on any CPU
according to fits_capacity() logic.

Or if a task is boosted to capacity_orig_of(medium_cpu). The task will
end up on big instead on the desired medium CPU.

Similar corner cases exist for uclamp and usage of capacity_of().
Slightest irq pressure on biggest CPU for example will make a 1024
boosted task look like it can't fit.

What we really want is for uclamp comparisons to ignore the migration
margin and capacity pressure, yet retain them for when checking the
_actual_ util signal.

For example, task p:

	p->util_avg = 300
	p->uclamp[UCLAMP_MIN] = 1024

Will fit a big CPU. But

	p->util_avg = 900
	p->uclamp[UCLAMP_MIN] = 1024

will not, this should trigger overutilized state because the big CPU is
now *actually* being saturated.

Similar reasoning applies to capping tasks with UCLAMP_MAX. For example:

	p->util_avg = 1024
	p->uclamp[UCLAMP_MAX] = capacity_orig_of(medium_cpu)

Should fit the task on medium cpus without triggering overutilized
state.

Inlined comments expand more on desired behavior in more scenarios.

Introduce new util_fits_cpu() function which encapsulates the new logic.
The new function is not used anywhere yet, but will be used to update
various users of fits_capacity() in later patches.

Fixes: af24bde8df202 ("sched/uclamp: Add uclamp support to energy_compute()")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
---
 kernel/sched/fair.c | 123 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 123 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 52dc9d6f811e..00c2de800685 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4254,6 +4254,129 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 	trace_sched_util_est_se_tp(&p->se);
 }
 
+static inline int util_fits_cpu(unsigned long util,
+				unsigned long uclamp_min,
+				unsigned long uclamp_max,
+				int cpu)
+{
+	unsigned long capacity_orig, capacity_orig_thermal;
+	unsigned long capacity = capacity_of(cpu);
+	bool fits, uclamp_max_fits;
+
+	/*
+	 * Check if the real util fits without any uclamp boost/cap applied.
+	 */
+	fits = fits_capacity(util, capacity);
+
+	if (!uclamp_is_used())
+		return fits;
+
+	/*
+	 * We must use capacity_orig_of() for comparing against uclamp_min and
+	 * uclamp_max. We only care about capacity pressure (by using
+	 * capacity_of()) for comparing against the real util.
+	 *
+	 * If a task is boosted to 1024 for example, we don't want a tiny
+	 * pressure to skew the check whether it fits a CPU or not.
+	 *
+	 * Similarly if a task is capped to capacity_orig_of(little_cpu), it
+	 * should fit a little cpu even if there's some pressure.
+	 *
+	 * Only exception is for thermal pressure since it has a direct impact
+	 * on available OPP of the system.
+	 *
+	 * We honour it for uclamp_min only as a drop in performance level
+	 * could result in not getting the requested minimum performance level.
+	 *
+	 * For uclamp_max, we can tolerate a drop in performance level as the
+	 * goal is to cap the task. So it's okay if it's getting less.
+	 *
+	 * In case of capacity inversion, which is not handled yet, we should
+	 * honour the inverted capacity for both uclamp_min and uclamp_max all
+	 * the time.
+	 */
+	capacity_orig = capacity_orig_of(cpu);
+	capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
+
+	/*
+	 * We want to force a task to fit a cpu as implied by uclamp_max.
+	 * But we do have some corner cases to cater for..
+	 *
+	 *
+	 *                                 C=z
+	 *   |                             ___
+	 *   |                  C=y       |   |
+	 *   |_ _ _ _ _ _ _ _ _ ___ _ _ _ | _ | _ _ _ _ _  uclamp_max
+	 *   |      C=x        |   |      |   |
+	 *   |      ___        |   |      |   |
+	 *   |     |   |       |   |      |   |    (util somewhere in this region)
+	 *   |     |   |       |   |      |   |
+	 *   |     |   |       |   |      |   |
+	 *   +----------------------------------------
+	 *         cpu0        cpu1       cpu2
+	 *
+	 *   In the above example if a task is capped to a specific performance
+	 *   point, y, then when:
+	 *
+	 *   * util = 80% of x then it does not fit on cpu0 and should migrate
+	 *     to cpu1
+	 *   * util = 80% of y then it is forced to fit on cpu1 to honour
+	 *     uclamp_max request.
+	 *
+	 *   which is what we're enforcing here. A task always fits if
+	 *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
+	 *   the normal upmigration rules should withhold still.
+	 *
+	 *   Only exception is when we are on max capacity, then we need to be
+	 *   careful not to block overutilized state. This is so because:
+	 *
+	 *     1. There's no concept of capping at max_capacity! We can't go
+	 *        beyond this performance level anyway.
+	 *     2. The system is being saturated when we're operating near
+	 *        max capacity, it doesn't make sense to block overutilized.
+	 */
+	uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
+	uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
+	fits = fits || uclamp_max_fits;
+
+	/*
+	 *
+	 *                                 C=z
+	 *   |                             ___       (region a, capped, util >= uclamp_max)
+	 *   |                  C=y       |   |
+	 *   |_ _ _ _ _ _ _ _ _ ___ _ _ _ | _ | _ _ _ _ _ uclamp_max
+	 *   |      C=x        |   |      |   |
+	 *   |      ___        |   |      |   |      (region b, uclamp_min <= util <= uclamp_max)
+	 *   |_ _ _|_ _|_ _ _ _| _ | _ _ _| _ | _ _ _ _ _ uclamp_min
+	 *   |     |   |       |   |      |   |
+	 *   |     |   |       |   |      |   |      (region c, boosted, util < uclamp_min)
+	 *   +----------------------------------------
+	 *         cpu0        cpu1       cpu2
+	 *
+	 * a) If util > uclamp_max, then we're capped, we don't care about
+	 *    actual fitness value here. We only care if uclamp_max fits
+	 *    capacity without taking margin/pressure into account.
+	 *    See comment above.
+	 *
+	 * b) If uclamp_min <= util <= uclamp_max, then the normal
+	 *    fits_capacity() rules apply. Except we need to ensure that we
+	 *    enforce we remain within uclamp_max, see comment above.
+	 *
+	 * c) If util < uclamp_min, then we are boosted. Same as (b) but we
+	 *    need to take into account the boosted value fits the CPU without
+	 *    taking margin/pressure into account.
+	 *
+	 * Cases (a) and (b) are handled in the 'fits' variable already. We
+	 * just need to consider an extra check for case (c) after ensuring we
+	 * handle the case uclamp_min > uclamp_max.
+	 */
+	uclamp_min = min(uclamp_min, uclamp_max);
+	if (util < uclamp_min && capacity_orig != SCHED_CAPACITY_SCALE)
+		fits = fits && (uclamp_min <= capacity_orig_thermal);
+
+	return fits;
+}
+
 static inline int task_fits_capacity(struct task_struct *p,
 				     unsigned long capacity)
 {
-- 
2.25.1

Re: [PATCH v2 1/9] sched/uclamp: Fix relationship between uclamp and migration margin

Posted by Dietmar Eggemann 3 years, 5 months ago

On 04/08/2022 16:36, Qais Yousef wrote:

[...]

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 52dc9d6f811e..00c2de800685 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4254,6 +4254,129 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
>  	trace_sched_util_est_se_tp(&p->se);
>  }
>  
> +static inline int util_fits_cpu(unsigned long util,
> +				unsigned long uclamp_min,
> +				unsigned long uclamp_max,
> +				int cpu)
> +{
> +	unsigned long capacity_orig, capacity_orig_thermal;
> +	unsigned long capacity = capacity_of(cpu);
> +	bool fits, uclamp_max_fits;
> +
> +	/*
> +	 * Check if the real util fits without any uclamp boost/cap applied.
> +	 */
> +	fits = fits_capacity(util, capacity);
> +
> +	if (!uclamp_is_used())
> +		return fits;
> +
> +	/*
> +	 * We must use capacity_orig_of() for comparing against uclamp_min and
> +	 * uclamp_max. We only care about capacity pressure (by using
> +	 * capacity_of()) for comparing against the real util.
> +	 *
> +	 * If a task is boosted to 1024 for example, we don't want a tiny
> +	 * pressure to skew the check whether it fits a CPU or not.
> +	 *
> +	 * Similarly if a task is capped to capacity_orig_of(little_cpu), it
> +	 * should fit a little cpu even if there's some pressure.
> +	 *
> +	 * Only exception is for thermal pressure since it has a direct impact
> +	 * on available OPP of the system.
> +	 *
> +	 * We honour it for uclamp_min only as a drop in performance level
> +	 * could result in not getting the requested minimum performance level.
> +	 *
> +	 * For uclamp_max, we can tolerate a drop in performance level as the
> +	 * goal is to cap the task. So it's okay if it's getting less.
> +	 *
> +	 * In case of capacity inversion, which is not handled yet, we should
> +	 * honour the inverted capacity for both uclamp_min and uclamp_max all
> +	 * the time.
> +	 */
> +	capacity_orig = capacity_orig_of(cpu);
> +	capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);

Did you and Vincent agreed at the end to use `cap_orig - _instantaneous_
thermal pressure` (1) here?

Last email in v1 from Vincent on this one was
https://lkml.kernel.org/r/20220722151300.GA30193@vingu-book :

"Using capacity_orig_of(cpu) - thermal_load_avg(rq_of(cpu)) seems like
a simple solution to cover thermal mitigation".

And there is no Acked-By/Reviewed-By so far.

We use (1) in feec() to cater for the thermal throttling (thermal
restricting policy->max) schedutil takes into account immediately when
asking for frequency (performance). EAS and schedutil should see the
same thing.

Do you want to use the same in util_fits_cpu() since it's used in feec()?

[...]

Re: [PATCH v2 1/9] sched/uclamp: Fix relationship between uclamp and migration margin

Posted by Valentin Schneider 3 years, 5 months ago

I'm only seeing this now that it has hit tip/sched/core and I've had a
stroll through fair.c, apologies for this being late.

On 04/08/22 15:36, Qais Yousef wrote:
> +static inline int util_fits_cpu(unsigned long util,
> +				unsigned long uclamp_min,
> +				unsigned long uclamp_max,
> +				int cpu)
> +{
> +	unsigned long capacity_orig, capacity_orig_thermal;
> +	unsigned long capacity = capacity_of(cpu);
> +	bool fits, uclamp_max_fits;
> +
> +	/*
> +	 * Check if the real util fits without any uclamp boost/cap applied.
> +	 */
> +	fits = fits_capacity(util, capacity);
> +
> +	if (!uclamp_is_used())
> +		return fits;
> +
> +	/*
> +	 * We must use capacity_orig_of() for comparing against uclamp_min and
> +	 * uclamp_max. We only care about capacity pressure (by using
> +	 * capacity_of()) for comparing against the real util.
> +	 *
> +	 * If a task is boosted to 1024 for example, we don't want a tiny
> +	 * pressure to skew the check whether it fits a CPU or not.
> +	 *
> +	 * Similarly if a task is capped to capacity_orig_of(little_cpu), it
> +	 * should fit a little cpu even if there's some pressure.
> +	 *
> +	 * Only exception is for thermal pressure since it has a direct impact
> +	 * on available OPP of the system.
> +	 *
> +	 * We honour it for uclamp_min only as a drop in performance level
> +	 * could result in not getting the requested minimum performance level.
> +	 *

Why specifically care about OPPs here? Per our CPU capacity model, a task
alone on a CPUx throttled to f=fmax/2 and a task coscheduled on a CPUy with
RT/DL tasks and/or IRQs such that cpu_capacity(CPUy) = 50% are both getting
(roughly) the same performance level.

Re: [PATCH v2 1/9] sched/uclamp: Fix relationship between uclamp and migration margin

Posted by Qais Yousef 3 years, 5 months ago

On 11/04/22 17:35, Valentin Schneider wrote:

> > +	/*
> > +	 * We must use capacity_orig_of() for comparing against uclamp_min and
> > +	 * uclamp_max. We only care about capacity pressure (by using
> > +	 * capacity_of()) for comparing against the real util.
> > +	 *
> > +	 * If a task is boosted to 1024 for example, we don't want a tiny
> > +	 * pressure to skew the check whether it fits a CPU or not.
> > +	 *
> > +	 * Similarly if a task is capped to capacity_orig_of(little_cpu), it
> > +	 * should fit a little cpu even if there's some pressure.
> > +	 *
> > +	 * Only exception is for thermal pressure since it has a direct impact
> > +	 * on available OPP of the system.
> > +	 *
> > +	 * We honour it for uclamp_min only as a drop in performance level
> > +	 * could result in not getting the requested minimum performance level.
> > +	 *
> 
> Why specifically care about OPPs here? Per our CPU capacity model, a task
> alone on a CPUx throttled to f=fmax/2 and a task coscheduled on a CPUy with
> RT/DL tasks and/or IRQs such that cpu_capacity(CPUy) = 50% are both getting
> (roughly) the same performance level.

Depends how you define performance level. What you call performance level,
I think is better called bandwidth. Uclamp is a performance and not a bandwidth
hint.

If a 10% task:

	p->util_avg = 10% * 1024

is requesting max performance level

	p->uclamp_min = 1024

This will translate to running at highest frequency and in case of big.LITTLE
system, the biggest CPU too.

RT/DL pressure has no impact in the task being able to achieve this; that is
running at max frequency and biggest cpu.

If the cpu has no bandwidth to fit this task, then our usual comparison of
util_avg with capacity_of() should fail as usual.

In the example above, the RT/DL pressure has to be pretty high for the 10% task
not to fit from bandwidth point of view. Which has nothing to do with
uclamp_min.  Only thermal pressure which drops OPPs can actually affect the
uclamp_min hint/request.

That is, when the task runs it will run at maximum frequency regardless of the
RT/DL pressure. The fact that the bandwidth of the CPU can be stolen has
nothing to do with uclamp_min hint.

Thanks!

--
Qais Yousef

Re: [PATCH v2 1/9] sched/uclamp: Fix relationship between uclamp and migration margin

Posted by Valentin Schneider 3 years, 5 months ago

On 05/11/22 19:24, Qais Yousef wrote:
> On 11/04/22 17:35, Valentin Schneider wrote:
>
>> > +	/*
>> > +	 * We must use capacity_orig_of() for comparing against uclamp_min and
>> > +	 * uclamp_max. We only care about capacity pressure (by using
>> > +	 * capacity_of()) for comparing against the real util.
>> > +	 *
>> > +	 * If a task is boosted to 1024 for example, we don't want a tiny
>> > +	 * pressure to skew the check whether it fits a CPU or not.
>> > +	 *
>> > +	 * Similarly if a task is capped to capacity_orig_of(little_cpu), it
>> > +	 * should fit a little cpu even if there's some pressure.
>> > +	 *
>> > +	 * Only exception is for thermal pressure since it has a direct impact
>> > +	 * on available OPP of the system.
>> > +	 *
>> > +	 * We honour it for uclamp_min only as a drop in performance level
>> > +	 * could result in not getting the requested minimum performance level.
>> > +	 *
>>
>> Why specifically care about OPPs here? Per our CPU capacity model, a task
>> alone on a CPUx throttled to f=fmax/2 and a task coscheduled on a CPUy with
>> RT/DL tasks and/or IRQs such that cpu_capacity(CPUy) = 50% are both getting
>> (roughly) the same performance level.
>
> Depends how you define performance level. What you call performance level,
> I think is better called bandwidth. Uclamp is a performance and not a bandwidth
> hint.
>
> If a 10% task:
>
>       p->util_avg = 10% * 1024
>
> is requesting max performance level
>
>       p->uclamp_min = 1024
>
> This will translate to running at highest frequency and in case of big.LITTLE
> system, the biggest CPU too.
>
> RT/DL pressure has no impact in the task being able to achieve this; that is
> running at max frequency and biggest cpu.
>
> If the cpu has no bandwidth to fit this task, then our usual comparison of
> util_avg with capacity_of() should fail as usual.
>

Ok so we *do* have this with how the fitting criteria are combined (I
didn't get that when I first scanned through the code); thanks for
elaborating on that.

> In the example above, the RT/DL pressure has to be pretty high for the 10% task
> not to fit from bandwidth point of view. Which has nothing to do with
> uclamp_min.  Only thermal pressure which drops OPPs can actually affect the
> uclamp_min hint/request.
>
> That is, when the task runs it will run at maximum frequency regardless of the
> RT/DL pressure. The fact that the bandwidth of the CPU can be stolen has
> nothing to do with uclamp_min hint.
>
>
> Thanks!
>
> --
> Qais Yousef

Re: [PATCH v2 1/9] sched/uclamp: Fix relationship between uclamp and migration margin

Posted by Qais Yousef 3 years, 5 months ago

On 11/07/22 18:58, Valentin Schneider wrote:
> On 05/11/22 19:24, Qais Yousef wrote:
> > On 11/04/22 17:35, Valentin Schneider wrote:
> >
> >> > +	/*
> >> > +	 * We must use capacity_orig_of() for comparing against uclamp_min and
> >> > +	 * uclamp_max. We only care about capacity pressure (by using
> >> > +	 * capacity_of()) for comparing against the real util.
> >> > +	 *
> >> > +	 * If a task is boosted to 1024 for example, we don't want a tiny
> >> > +	 * pressure to skew the check whether it fits a CPU or not.
> >> > +	 *
> >> > +	 * Similarly if a task is capped to capacity_orig_of(little_cpu), it
> >> > +	 * should fit a little cpu even if there's some pressure.
> >> > +	 *
> >> > +	 * Only exception is for thermal pressure since it has a direct impact
> >> > +	 * on available OPP of the system.
> >> > +	 *
> >> > +	 * We honour it for uclamp_min only as a drop in performance level
> >> > +	 * could result in not getting the requested minimum performance level.
> >> > +	 *
> >>
> >> Why specifically care about OPPs here? Per our CPU capacity model, a task
> >> alone on a CPUx throttled to f=fmax/2 and a task coscheduled on a CPUy with
> >> RT/DL tasks and/or IRQs such that cpu_capacity(CPUy) = 50% are both getting
> >> (roughly) the same performance level.
> >
> > Depends how you define performance level. What you call performance level,
> > I think is better called bandwidth. Uclamp is a performance and not a bandwidth
> > hint.
> >
> > If a 10% task:
> >
> >       p->util_avg = 10% * 1024
> >
> > is requesting max performance level
> >
> >       p->uclamp_min = 1024
> >
> > This will translate to running at highest frequency and in case of big.LITTLE
> > system, the biggest CPU too.
> >
> > RT/DL pressure has no impact in the task being able to achieve this; that is
> > running at max frequency and biggest cpu.
> >
> > If the cpu has no bandwidth to fit this task, then our usual comparison of
> > util_avg with capacity_of() should fail as usual.
> >
> 
> Ok so we *do* have this with how the fitting criteria are combined (I
> didn't get that when I first scanned through the code); thanks for
> elaborating on that.

Oh yeah, this hasn't changed.

> 
> > In the example above, the RT/DL pressure has to be pretty high for the 10% task
> > not to fit from bandwidth point of view. Which has nothing to do with
> > uclamp_min.  Only thermal pressure which drops OPPs can actually affect the
> > uclamp_min hint/request.
> >
> > That is, when the task runs it will run at maximum frequency regardless of the
> > RT/DL pressure. The fact that the bandwidth of the CPU can be stolen has
> > nothing to do with uclamp_min hint.
> >
> >
> > Thanks!
> >
> > --
> > Qais Yousef
>

[tip: sched/core] sched/uclamp: Fix relationship between uclamp and migration margin

Posted by tip-bot2 for Qais Yousef 3 years, 5 months ago

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     48d5e9daa8b767e75ed9421665b037a49ce4bc04
Gitweb:        https://git.kernel.org/tip/48d5e9daa8b767e75ed9421665b037a49ce4bc04
Author:        Qais Yousef <qais.yousef@arm.com>
AuthorDate:    Thu, 04 Aug 2022 15:36:01 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 27 Oct 2022 11:01:17 +02:00

sched/uclamp: Fix relationship between uclamp and migration margin

fits_capacity() verifies that a util is within 20% margin of the
capacity of a CPU, which is an attempt to speed up upmigration.

But when uclamp is used, this 20% margin is problematic because for
example if a task is boosted to 1024, then it will not fit on any CPU
according to fits_capacity() logic.

Or if a task is boosted to capacity_orig_of(medium_cpu). The task will
end up on big instead on the desired medium CPU.

Similar corner cases exist for uclamp and usage of capacity_of().
Slightest irq pressure on biggest CPU for example will make a 1024
boosted task look like it can't fit.

What we really want is for uclamp comparisons to ignore the migration
margin and capacity pressure, yet retain them for when checking the
_actual_ util signal.

For example, task p:

	p->util_avg = 300
	p->uclamp[UCLAMP_MIN] = 1024

Will fit a big CPU. But

	p->util_avg = 900
	p->uclamp[UCLAMP_MIN] = 1024

will not, this should trigger overutilized state because the big CPU is
now *actually* being saturated.

Similar reasoning applies to capping tasks with UCLAMP_MAX. For example:

	p->util_avg = 1024
	p->uclamp[UCLAMP_MAX] = capacity_orig_of(medium_cpu)

Should fit the task on medium cpus without triggering overutilized
state.

Inlined comments expand more on desired behavior in more scenarios.

Introduce new util_fits_cpu() function which encapsulates the new logic.
The new function is not used anywhere yet, but will be used to update
various users of fits_capacity() in later patches.

Fixes: af24bde8df202 ("sched/uclamp: Add uclamp support to energy_compute()")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-2-qais.yousef@arm.com
---
 kernel/sched/fair.c | 123 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 123 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e4a0b8b..0d193ef 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4426,6 +4426,129 @@ done:
 	trace_sched_util_est_se_tp(&p->se);
 }
 
+static inline int util_fits_cpu(unsigned long util,
+				unsigned long uclamp_min,
+				unsigned long uclamp_max,
+				int cpu)
+{
+	unsigned long capacity_orig, capacity_orig_thermal;
+	unsigned long capacity = capacity_of(cpu);
+	bool fits, uclamp_max_fits;
+
+	/*
+	 * Check if the real util fits without any uclamp boost/cap applied.
+	 */
+	fits = fits_capacity(util, capacity);
+
+	if (!uclamp_is_used())
+		return fits;
+
+	/*
+	 * We must use capacity_orig_of() for comparing against uclamp_min and
+	 * uclamp_max. We only care about capacity pressure (by using
+	 * capacity_of()) for comparing against the real util.
+	 *
+	 * If a task is boosted to 1024 for example, we don't want a tiny
+	 * pressure to skew the check whether it fits a CPU or not.
+	 *
+	 * Similarly if a task is capped to capacity_orig_of(little_cpu), it
+	 * should fit a little cpu even if there's some pressure.
+	 *
+	 * Only exception is for thermal pressure since it has a direct impact
+	 * on available OPP of the system.
+	 *
+	 * We honour it for uclamp_min only as a drop in performance level
+	 * could result in not getting the requested minimum performance level.
+	 *
+	 * For uclamp_max, we can tolerate a drop in performance level as the
+	 * goal is to cap the task. So it's okay if it's getting less.
+	 *
+	 * In case of capacity inversion, which is not handled yet, we should
+	 * honour the inverted capacity for both uclamp_min and uclamp_max all
+	 * the time.
+	 */
+	capacity_orig = capacity_orig_of(cpu);
+	capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
+
+	/*
+	 * We want to force a task to fit a cpu as implied by uclamp_max.
+	 * But we do have some corner cases to cater for..
+	 *
+	 *
+	 *                                 C=z
+	 *   |                             ___
+	 *   |                  C=y       |   |
+	 *   |_ _ _ _ _ _ _ _ _ ___ _ _ _ | _ | _ _ _ _ _  uclamp_max
+	 *   |      C=x        |   |      |   |
+	 *   |      ___        |   |      |   |
+	 *   |     |   |       |   |      |   |    (util somewhere in this region)
+	 *   |     |   |       |   |      |   |
+	 *   |     |   |       |   |      |   |
+	 *   +----------------------------------------
+	 *         cpu0        cpu1       cpu2
+	 *
+	 *   In the above example if a task is capped to a specific performance
+	 *   point, y, then when:
+	 *
+	 *   * util = 80% of x then it does not fit on cpu0 and should migrate
+	 *     to cpu1
+	 *   * util = 80% of y then it is forced to fit on cpu1 to honour
+	 *     uclamp_max request.
+	 *
+	 *   which is what we're enforcing here. A task always fits if
+	 *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
+	 *   the normal upmigration rules should withhold still.
+	 *
+	 *   Only exception is when we are on max capacity, then we need to be
+	 *   careful not to block overutilized state. This is so because:
+	 *
+	 *     1. There's no concept of capping at max_capacity! We can't go
+	 *        beyond this performance level anyway.
+	 *     2. The system is being saturated when we're operating near
+	 *        max capacity, it doesn't make sense to block overutilized.
+	 */
+	uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
+	uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
+	fits = fits || uclamp_max_fits;
+
+	/*
+	 *
+	 *                                 C=z
+	 *   |                             ___       (region a, capped, util >= uclamp_max)
+	 *   |                  C=y       |   |
+	 *   |_ _ _ _ _ _ _ _ _ ___ _ _ _ | _ | _ _ _ _ _ uclamp_max
+	 *   |      C=x        |   |      |   |
+	 *   |      ___        |   |      |   |      (region b, uclamp_min <= util <= uclamp_max)
+	 *   |_ _ _|_ _|_ _ _ _| _ | _ _ _| _ | _ _ _ _ _ uclamp_min
+	 *   |     |   |       |   |      |   |
+	 *   |     |   |       |   |      |   |      (region c, boosted, util < uclamp_min)
+	 *   +----------------------------------------
+	 *         cpu0        cpu1       cpu2
+	 *
+	 * a) If util > uclamp_max, then we're capped, we don't care about
+	 *    actual fitness value here. We only care if uclamp_max fits
+	 *    capacity without taking margin/pressure into account.
+	 *    See comment above.
+	 *
+	 * b) If uclamp_min <= util <= uclamp_max, then the normal
+	 *    fits_capacity() rules apply. Except we need to ensure that we
+	 *    enforce we remain within uclamp_max, see comment above.
+	 *
+	 * c) If util < uclamp_min, then we are boosted. Same as (b) but we
+	 *    need to take into account the boosted value fits the CPU without
+	 *    taking margin/pressure into account.
+	 *
+	 * Cases (a) and (b) are handled in the 'fits' variable already. We
+	 * just need to consider an extra check for case (c) after ensuring we
+	 * handle the case uclamp_min > uclamp_max.
+	 */
+	uclamp_min = min(uclamp_min, uclamp_max);
+	if (util < uclamp_min && capacity_orig != SCHED_CAPACITY_SCALE)
+		fits = fits && (uclamp_min <= capacity_orig_thermal);
+
+	return fits;
+}
+
 static inline int task_fits_capacity(struct task_struct *p,
 				     unsigned long capacity)
 {