[v2] Fix relationship between uclamp and fits_capacity()

[PATCH v2 9/9] sched/fair: Consider capacity inversion in util_fits_cpu()

Posted by Qais Yousef 3 years, 8 months ago

We do consider thermal pressure in util_fits_cpu() for uclamp_min only.
With the exception of the biggest cores which by definition are the max
performance point of the system and all tasks by definition should fit.

Even under thermal pressure, the capacity of the biggest CPU is the
highest in the system and should still fit every task. Except when it
reaches capacity inversion point, then this is no longer true.

We can handle this by using the inverted capacity as capacity_orig in
util_fits_cpu(). Which not only addresses the problem above, but also
ensure uclamp_max now considers the inverted capacity. Force fitting
a task when a CPU is in this adverse state will contribute to making the
thermal throttling last longer.

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
---
 kernel/sched/fair.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cb32dc9a057f..77ae343e32a3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4293,12 +4293,16 @@ static inline int util_fits_cpu(unsigned long util,
 	 * For uclamp_max, we can tolerate a drop in performance level as the
 	 * goal is to cap the task. So it's okay if it's getting less.
 	 *
-	 * In case of capacity inversion, which is not handled yet, we should
-	 * honour the inverted capacity for both uclamp_min and uclamp_max all
-	 * the time.
+	 * In case of capacity inversion we should honour the inverted capacity
+	 * for both uclamp_min and uclamp_max all the time.
 	 */
-	capacity_orig = capacity_orig_of(cpu);
-	capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
+	capacity_orig = cpu_in_capacity_inversion(cpu);
+	if (capacity_orig) {
+		capacity_orig_thermal = capacity_orig;
+	} else {
+		capacity_orig = capacity_orig_of(cpu);
+		capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
+	}
 
 	/*
 	 * We want to force a task to fit a cpu as implied by uclamp_max.
-- 
2.25.1

Re: [PATCH v2 9/9] sched/fair: Consider capacity inversion in util_fits_cpu()

Posted by Valentin Schneider 3 years, 5 months ago

On 04/08/22 15:36, Qais Yousef wrote:
> We do consider thermal pressure in util_fits_cpu() for uclamp_min only.
> With the exception of the biggest cores which by definition are the max
> performance point of the system and all tasks by definition should fit.
>
> Even under thermal pressure, the capacity of the biggest CPU is the
> highest in the system and should still fit every task. Except when it
> reaches capacity inversion point, then this is no longer true.
>
> We can handle this by using the inverted capacity as capacity_orig in
> util_fits_cpu(). Which not only addresses the problem above, but also
> ensure uclamp_max now considers the inverted capacity. Force fitting
> a task when a CPU is in this adverse state will contribute to making the
> thermal throttling last longer.
>
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> ---
>  kernel/sched/fair.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index cb32dc9a057f..77ae343e32a3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4293,12 +4293,16 @@ static inline int util_fits_cpu(unsigned long util,
>        * For uclamp_max, we can tolerate a drop in performance level as the
>        * goal is to cap the task. So it's okay if it's getting less.
>        *
> -	 * In case of capacity inversion, which is not handled yet, we should
> -	 * honour the inverted capacity for both uclamp_min and uclamp_max all
> -	 * the time.
> +	 * In case of capacity inversion we should honour the inverted capacity
> +	 * for both uclamp_min and uclamp_max all the time.
>        */
> -	capacity_orig = capacity_orig_of(cpu);
> -	capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
> +	capacity_orig = cpu_in_capacity_inversion(cpu);
> +	if (capacity_orig) {
> +		capacity_orig_thermal = capacity_orig;
> +	} else {
> +		capacity_orig = capacity_orig_of(cpu);
> +		capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
> +	}
>

IIUC the rq->cpu_capacity_inverted computation in update_cpu_capacity() can be
summarised as:

- If there is a PD with equal cap_orig, but higher effective (orig - thermal)
  capacity
  OR
  there is a PD with pd_cap_orig > cpu_effective_cap:
  rq->cpu_capacity_inverted = capacity_orig - thermal_load_avg(rq)

- Else:
  rq->cpu_capacity_inverted = 0

Then, the code above uses either rq->cpu_capacity_inverted if it is
non-zero, otherwise:

  capacity_orig - arch_scale_thermal_pressure(cpu);

Why use average thermal pressure in one case, and use instantaneous
thermal pressure in the other?

Can't we get rid of rq->cpu_capacity_inverted and replace this whole thing
with an unconditional

  capacity_orig_thermal = capacity_orig_of(cpu) - thermal_load_avg(cpu_rq(cpu));

?

>       /*
>        * We want to force a task to fit a cpu as implied by uclamp_max.
> --
> 2.25.1

Re: [PATCH v2 9/9] sched/fair: Consider capacity inversion in util_fits_cpu()

Posted by Qais Yousef 3 years, 5 months ago

On 11/04/22 17:35, Valentin Schneider wrote:
> On 04/08/22 15:36, Qais Yousef wrote:
> > We do consider thermal pressure in util_fits_cpu() for uclamp_min only.
> > With the exception of the biggest cores which by definition are the max
> > performance point of the system and all tasks by definition should fit.
> >
> > Even under thermal pressure, the capacity of the biggest CPU is the
> > highest in the system and should still fit every task. Except when it
> > reaches capacity inversion point, then this is no longer true.
> >
> > We can handle this by using the inverted capacity as capacity_orig in
> > util_fits_cpu(). Which not only addresses the problem above, but also
> > ensure uclamp_max now considers the inverted capacity. Force fitting
> > a task when a CPU is in this adverse state will contribute to making the
> > thermal throttling last longer.
> >
> > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> > ---
> >  kernel/sched/fair.c | 14 +++++++++-----
> >  1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index cb32dc9a057f..77ae343e32a3 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4293,12 +4293,16 @@ static inline int util_fits_cpu(unsigned long util,
> >        * For uclamp_max, we can tolerate a drop in performance level as the
> >        * goal is to cap the task. So it's okay if it's getting less.
> >        *
> > -	 * In case of capacity inversion, which is not handled yet, we should
> > -	 * honour the inverted capacity for both uclamp_min and uclamp_max all
> > -	 * the time.
> > +	 * In case of capacity inversion we should honour the inverted capacity
> > +	 * for both uclamp_min and uclamp_max all the time.
> >        */
> > -	capacity_orig = capacity_orig_of(cpu);
> > -	capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
> > +	capacity_orig = cpu_in_capacity_inversion(cpu);
> > +	if (capacity_orig) {
> > +		capacity_orig_thermal = capacity_orig;
> > +	} else {
> > +		capacity_orig = capacity_orig_of(cpu);
> > +		capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
> > +	}
> >
> 
> IIUC the rq->cpu_capacity_inverted computation in update_cpu_capacity() can be
> summarised as:
> 
> - If there is a PD with equal cap_orig, but higher effective (orig - thermal)
>   capacity
>   OR
>   there is a PD with pd_cap_orig > cpu_effective_cap:
>   rq->cpu_capacity_inverted = capacity_orig - thermal_load_avg(rq)
> 
> - Else:
>   rq->cpu_capacity_inverted = 0
> 
> Then, the code above uses either rq->cpu_capacity_inverted if it is
> non-zero, otherwise:
> 
>   capacity_orig - arch_scale_thermal_pressure(cpu);
> 
> Why use average thermal pressure in one case, and use instantaneous
> thermal pressure in the other?

There was a big debate on [1] about using avg vs instantaneous.

I used avg for detecting inversion to be consistent with using average in in
scale_rt_capacity(). I didn't want the inversion state to be flipping too
quickly too.

I used the instantaneous in the other check based on that discussion. It seemed
using the average is hurtful when for example the medium drops an OPP and by
not reacting quickly at wake up we lose the chance to place it on a big; which
if my memory didn't fail me is what Xuewen was seeing.

[1] https://lore.kernel.org/lkml/24631a27-42d9-229f-d9b0-040ac993b749@arm.com/

> 
> Can't we get rid of rq->cpu_capacity_inverted and replace this whole thing
> with an unconditional
> 
>   capacity_orig_thermal = capacity_orig_of(cpu) - thermal_load_avg(cpu_rq(cpu));
> 
> ?

I can't see how we end up with equivalent behavior then. Or address the
concerns raised by Xuewen and Lukasz on the RT thread in regards to avg vs
instantaneous.

Specifically, if we don't use the new rq->cpu_capacity_inverted we can't handle
the case where the task is requesting to run at maximum performance but a small
drop in thermal pressure means it won't fit anywhere. That PD is the best fit
until it hits an inversion.

Originally I wanted to defer handling thermal pressure into a different series.
But Vincent thought it's better to handle it now. We want more data points from
more systems tbh. But I think what we have now is still a good improvement over
what we had before.

Lukasz had a patch [2] which could allow making thermal_load_avg() more
acceptable for systems that care about faster response times.

[2] https://lore.kernel.org/lkml/20220429091245.12423-1-lukasz.luba@arm.com/


Thanks

--
Qais Yousef

Re: [PATCH v2 9/9] sched/fair: Consider capacity inversion in util_fits_cpu()

Posted by Dietmar Eggemann 3 years, 5 months ago

- Qais Yousef <qais.yousef@arm.com>

On 05/11/2022 21:41, Qais Yousef wrote:
> On 11/04/22 17:35, Valentin Schneider wrote:
>> On 04/08/22 15:36, Qais Yousef wrote:

[...]

>> IIUC the rq->cpu_capacity_inverted computation in update_cpu_capacity() can be
>> summarised as:
>>
>> - If there is a PD with equal cap_orig, but higher effective (orig - thermal)
>>   capacity
>>   OR
>>   there is a PD with pd_cap_orig > cpu_effective_cap:
>>   rq->cpu_capacity_inverted = capacity_orig - thermal_load_avg(rq)
>>
>> - Else:
>>   rq->cpu_capacity_inverted = 0
>>
>> Then, the code above uses either rq->cpu_capacity_inverted if it is
>> non-zero, otherwise:
>>
>>   capacity_orig - arch_scale_thermal_pressure(cpu);
>>
>> Why use average thermal pressure in one case, and use instantaneous
>> thermal pressure in the other?
> 
> There was a big debate on [1] about using avg vs instantaneous.
> 
> I used avg for detecting inversion to be consistent with using average in in
> scale_rt_capacity(). I didn't want the inversion state to be flipping too
> quickly too.
> 
> I used the instantaneous in the other check based on that discussion. It seemed
> using the average is hurtful when for example the medium drops an OPP and by
> not reacting quickly at wake up we lose the chance to place it on a big; which
> if my memory didn't fail me is what Xuewen was seeing.
> 
> [1] https://lore.kernel.org/lkml/24631a27-42d9-229f-d9b0-040ac993b749@arm.com/
> 
>>
>> Can't we get rid of rq->cpu_capacity_inverted and replace this whole thing
>> with an unconditional
>>
>>   capacity_orig_thermal = capacity_orig_of(cpu) - thermal_load_avg(cpu_rq(cpu));
>>
>> ?
> 
> I can't see how we end up with equivalent behavior then. Or address the
> concerns raised by Xuewen and Lukasz on the RT thread in regards to avg vs
> instantaneous.
> 
> Specifically, if we don't use the new rq->cpu_capacity_inverted we can't handle
> the case where the task is requesting to run at maximum performance but a small
> drop in thermal pressure means it won't fit anywhere. That PD is the best fit
> until it hits an inversion.
> 
> Originally I wanted to defer handling thermal pressure into a different series.
> But Vincent thought it's better to handle it now. We want more data points from
> more systems tbh. But I think what we have now is still a good improvement over
> what we had before.

I can't see the rationale in using:

!inversion: `cap_orig - instantaneous thermal pressure`

 inversion: `cap_orig - PELT thermal pressure`

I can see that there was a lot of discussion on this topic but hardly
any agreement IMHO.

AFAICS, the 2 capacity inversion patches just appeared in v2 and haven't
seen any review yet I'm afraid.


[...]

Re: [PATCH v2 9/9] sched/fair: Consider capacity inversion in util_fits_cpu()

Posted by Valentin Schneider 3 years, 5 months ago

On 05/11/22 20:41, Qais Yousef wrote:
> On 11/04/22 17:35, Valentin Schneider wrote:
>> On 04/08/22 15:36, Qais Yousef wrote:
>> > We do consider thermal pressure in util_fits_cpu() for uclamp_min only.
>> > With the exception of the biggest cores which by definition are the max
>> > performance point of the system and all tasks by definition should fit.
>> >
>> > Even under thermal pressure, the capacity of the biggest CPU is the
>> > highest in the system and should still fit every task. Except when it
>> > reaches capacity inversion point, then this is no longer true.
>> >
>> > We can handle this by using the inverted capacity as capacity_orig in
>> > util_fits_cpu(). Which not only addresses the problem above, but also
>> > ensure uclamp_max now considers the inverted capacity. Force fitting
>> > a task when a CPU is in this adverse state will contribute to making the
>> > thermal throttling last longer.
>> >
>> > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
>> > ---
>> >  kernel/sched/fair.c | 14 +++++++++-----
>> >  1 file changed, 9 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index cb32dc9a057f..77ae343e32a3 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -4293,12 +4293,16 @@ static inline int util_fits_cpu(unsigned long util,
>> >        * For uclamp_max, we can tolerate a drop in performance level as the
>> >        * goal is to cap the task. So it's okay if it's getting less.
>> >        *
>> > -	 * In case of capacity inversion, which is not handled yet, we should
>> > -	 * honour the inverted capacity for both uclamp_min and uclamp_max all
>> > -	 * the time.
>> > +	 * In case of capacity inversion we should honour the inverted capacity
>> > +	 * for both uclamp_min and uclamp_max all the time.
>> >        */
>> > -	capacity_orig = capacity_orig_of(cpu);
>> > -	capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
>> > +	capacity_orig = cpu_in_capacity_inversion(cpu);
>> > +	if (capacity_orig) {
>> > +		capacity_orig_thermal = capacity_orig;
>> > +	} else {
>> > +		capacity_orig = capacity_orig_of(cpu);
>> > +		capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
>> > +	}
>> >
>>
>> IIUC the rq->cpu_capacity_inverted computation in update_cpu_capacity() can be
>> summarised as:
>>
>> - If there is a PD with equal cap_orig, but higher effective (orig - thermal)
>>   capacity
>>   OR
>>   there is a PD with pd_cap_orig > cpu_effective_cap:
>>   rq->cpu_capacity_inverted = capacity_orig - thermal_load_avg(rq)
>>
>> - Else:
>>   rq->cpu_capacity_inverted = 0
>>
>> Then, the code above uses either rq->cpu_capacity_inverted if it is
>> non-zero, otherwise:
>>
>>   capacity_orig - arch_scale_thermal_pressure(cpu);
>>
>> Why use average thermal pressure in one case, and use instantaneous
>> thermal pressure in the other?
>
> There was a big debate on [1] about using avg vs instantaneous.
>

Interesting thread, thanks for the link!

> I used avg for detecting inversion to be consistent with using average in in
> scale_rt_capacity(). I didn't want the inversion state to be flipping too
> quickly too.
>
> I used the instantaneous in the other check based on that discussion. It seemed
> using the average is hurtful when for example the medium drops an OPP and by
> not reacting quickly at wake up we lose the chance to place it on a big; which
> if my memory didn't fail me is what Xuewen was seeing.
>

OK So IIUC by using the inst. pressure you start excluding CPUs sooner, and
with the avg pressure you keep those CPUs out (if the pressure remained
long enough).

> [1] https://lore.kernel.org/lkml/24631a27-42d9-229f-d9b0-040ac993b749@arm.com/
>

>>
>> Can't we get rid of rq->cpu_capacity_inverted and replace this whole thing
>> with an unconditional
>>
>>   capacity_orig_thermal = capacity_orig_of(cpu) - thermal_load_avg(cpu_rq(cpu));
>>
>> ?
>
> I can't see how we end up with equivalent behavior then. Or address the
> concerns raised by Xuewen and Lukasz on the RT thread in regards to avg vs
> instantaneous.
>
> Specifically, if we don't use the new rq->cpu_capacity_inverted we can't handle
> the case where the task is requesting to run at maximum performance but a small
> drop in thermal pressure means it won't fit anywhere. That PD is the best fit
> until it hits an inversion.
>
> Originally I wanted to defer handling thermal pressure into a different series.
> But Vincent thought it's better to handle it now. We want more data points from
> more systems tbh. But I think what we have now is still a good improvement over
> what we had before.
>
> Lukasz had a patch [2] which could allow making thermal_load_avg() more
> acceptable for systems that care about faster response times.
>
> [2] https://lore.kernel.org/lkml/20220429091245.12423-1-lukasz.luba@arm.com/
>
>
> Thanks
>
> --
> Qais Yousef

Re: [PATCH v2 9/9] sched/fair: Consider capacity inversion in util_fits_cpu()

Posted by Qais Yousef 3 years, 5 months ago

On 11/07/22 18:58, Valentin Schneider wrote:
> On 05/11/22 20:41, Qais Yousef wrote:
> > On 11/04/22 17:35, Valentin Schneider wrote:
> >> On 04/08/22 15:36, Qais Yousef wrote:
> >> > We do consider thermal pressure in util_fits_cpu() for uclamp_min only.
> >> > With the exception of the biggest cores which by definition are the max
> >> > performance point of the system and all tasks by definition should fit.
> >> >
> >> > Even under thermal pressure, the capacity of the biggest CPU is the
> >> > highest in the system and should still fit every task. Except when it
> >> > reaches capacity inversion point, then this is no longer true.
> >> >
> >> > We can handle this by using the inverted capacity as capacity_orig in
> >> > util_fits_cpu(). Which not only addresses the problem above, but also
> >> > ensure uclamp_max now considers the inverted capacity. Force fitting
> >> > a task when a CPU is in this adverse state will contribute to making the
> >> > thermal throttling last longer.
> >> >
> >> > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> >> > ---
> >> >  kernel/sched/fair.c | 14 +++++++++-----
> >> >  1 file changed, 9 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> > index cb32dc9a057f..77ae343e32a3 100644
> >> > --- a/kernel/sched/fair.c
> >> > +++ b/kernel/sched/fair.c
> >> > @@ -4293,12 +4293,16 @@ static inline int util_fits_cpu(unsigned long util,
> >> >        * For uclamp_max, we can tolerate a drop in performance level as the
> >> >        * goal is to cap the task. So it's okay if it's getting less.
> >> >        *
> >> > -	 * In case of capacity inversion, which is not handled yet, we should
> >> > -	 * honour the inverted capacity for both uclamp_min and uclamp_max all
> >> > -	 * the time.
> >> > +	 * In case of capacity inversion we should honour the inverted capacity
> >> > +	 * for both uclamp_min and uclamp_max all the time.
> >> >        */
> >> > -	capacity_orig = capacity_orig_of(cpu);
> >> > -	capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
> >> > +	capacity_orig = cpu_in_capacity_inversion(cpu);
> >> > +	if (capacity_orig) {
> >> > +		capacity_orig_thermal = capacity_orig;
> >> > +	} else {
> >> > +		capacity_orig = capacity_orig_of(cpu);
> >> > +		capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
> >> > +	}
> >> >
> >>
> >> IIUC the rq->cpu_capacity_inverted computation in update_cpu_capacity() can be
> >> summarised as:
> >>
> >> - If there is a PD with equal cap_orig, but higher effective (orig - thermal)
> >>   capacity
> >>   OR
> >>   there is a PD with pd_cap_orig > cpu_effective_cap:
> >>   rq->cpu_capacity_inverted = capacity_orig - thermal_load_avg(rq)
> >>
> >> - Else:
> >>   rq->cpu_capacity_inverted = 0
> >>
> >> Then, the code above uses either rq->cpu_capacity_inverted if it is
> >> non-zero, otherwise:
> >>
> >>   capacity_orig - arch_scale_thermal_pressure(cpu);
> >>
> >> Why use average thermal pressure in one case, and use instantaneous
> >> thermal pressure in the other?
> >
> > There was a big debate on [1] about using avg vs instantaneous.
> >
> 
> Interesting thread, thanks for the link!
> 
> > I used avg for detecting inversion to be consistent with using average in in
> > scale_rt_capacity(). I didn't want the inversion state to be flipping too
> > quickly too.
> >
> > I used the instantaneous in the other check based on that discussion. It seemed
> > using the average is hurtful when for example the medium drops an OPP and by
> > not reacting quickly at wake up we lose the chance to place it on a big; which
> > if my memory didn't fail me is what Xuewen was seeing.
> >
> 
> OK So IIUC by using the inst. pressure you start excluding CPUs sooner, and
> with the avg pressure you keep those CPUs out (if the pressure remained
> long enough).

Yes. I hope the discussion on avg vs instantaneous will be continued and we can
unify the usages.

> 
> > [1] https://lore.kernel.org/lkml/24631a27-42d9-229f-d9b0-040ac993b749@arm.com/
> >
> 
> >>
> >> Can't we get rid of rq->cpu_capacity_inverted and replace this whole thing
> >> with an unconditional
> >>
> >>   capacity_orig_thermal = capacity_orig_of(cpu) - thermal_load_avg(cpu_rq(cpu));
> >>
> >> ?
> >
> > I can't see how we end up with equivalent behavior then. Or address the
> > concerns raised by Xuewen and Lukasz on the RT thread in regards to avg vs
> > instantaneous.
> >
> > Specifically, if we don't use the new rq->cpu_capacity_inverted we can't handle
> > the case where the task is requesting to run at maximum performance but a small
> > drop in thermal pressure means it won't fit anywhere. That PD is the best fit
> > until it hits an inversion.
> >
> > Originally I wanted to defer handling thermal pressure into a different series.
> > But Vincent thought it's better to handle it now. We want more data points from
> > more systems tbh. But I think what we have now is still a good improvement over
> > what we had before.
> >
> > Lukasz had a patch [2] which could allow making thermal_load_avg() more
> > acceptable for systems that care about faster response times.
> >
> > [2] https://lore.kernel.org/lkml/20220429091245.12423-1-lukasz.luba@arm.com/
> >
> >
> > Thanks
> >
> > --
> > Qais Yousef
>

[tip: sched/core] sched/fair: Consider capacity inversion in util_fits_cpu()

Posted by tip-bot2 for Qais Yousef 3 years, 5 months ago

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     aa69c36f31aadc1669bfa8a3de6a47b5e6c98ee8
Gitweb:        https://git.kernel.org/tip/aa69c36f31aadc1669bfa8a3de6a47b5e6c98ee8
Author:        Qais Yousef <qais.yousef@arm.com>
AuthorDate:    Thu, 04 Aug 2022 15:36:09 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 27 Oct 2022 11:01:20 +02:00

sched/fair: Consider capacity inversion in util_fits_cpu()

We do consider thermal pressure in util_fits_cpu() for uclamp_min only.
With the exception of the biggest cores which by definition are the max
performance point of the system and all tasks by definition should fit.

Even under thermal pressure, the capacity of the biggest CPU is the
highest in the system and should still fit every task. Except when it
reaches capacity inversion point, then this is no longer true.

We can handle this by using the inverted capacity as capacity_orig in
util_fits_cpu(). Which not only addresses the problem above, but also
ensure uclamp_max now considers the inverted capacity. Force fitting
a task when a CPU is in this adverse state will contribute to making the
thermal throttling last longer.

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-10-qais.yousef@arm.com
---
 kernel/sched/fair.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4c4ea47..919d016 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4465,12 +4465,16 @@ static inline int util_fits_cpu(unsigned long util,
 	 * For uclamp_max, we can tolerate a drop in performance level as the
 	 * goal is to cap the task. So it's okay if it's getting less.
 	 *
-	 * In case of capacity inversion, which is not handled yet, we should
-	 * honour the inverted capacity for both uclamp_min and uclamp_max all
-	 * the time.
+	 * In case of capacity inversion we should honour the inverted capacity
+	 * for both uclamp_min and uclamp_max all the time.
 	 */
-	capacity_orig = capacity_orig_of(cpu);
-	capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
+	capacity_orig = cpu_in_capacity_inversion(cpu);
+	if (capacity_orig) {
+		capacity_orig_thermal = capacity_orig;
+	} else {
+		capacity_orig = capacity_orig_of(cpu);
+		capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
+	}
 
 	/*
 	 * We want to force a task to fit a cpu as implied by uclamp_max.