sched/fair: Some improvements to feec()

[PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()

Posted by Xuewen Yan 1 year, 7 months ago

Commit f1f8d0a22422 ("sched/cpufreq: Take cpufreq feedback into account")
introduced get_actual_cpu_capacity(), and it had aggregated the
different pressures applied on the capacity of CPUs.
And in util_fits_cpu(), it would return true when uclamp_max
is smaller than SCHED_CAPACITY_SCALE, althought the uclamp_max
is bigger than actual_cpu_capacity.

So use actual_cpu_capacity everywhere in util_fits_cpu() to
cover all cases.

Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
---
 kernel/sched/fair.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5ca6396ef0b7..9c16ae192217 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4980,7 +4980,7 @@ static inline int util_fits_cpu(unsigned long util,
 				int cpu)
 {
 	unsigned long capacity = capacity_of(cpu);
-	unsigned long capacity_orig;
+	unsigned long capacity_actual;
 	bool fits, uclamp_max_fits;
 
 	/*
@@ -4992,15 +4992,15 @@ static inline int util_fits_cpu(unsigned long util,
 		return fits;
 
 	/*
-	 * We must use arch_scale_cpu_capacity() for comparing against uclamp_min and
+	 * We must use actual_cpu_capacity() for comparing against uclamp_min and
 	 * uclamp_max. We only care about capacity pressure (by using
 	 * capacity_of()) for comparing against the real util.
 	 *
 	 * If a task is boosted to 1024 for example, we don't want a tiny
 	 * pressure to skew the check whether it fits a CPU or not.
 	 *
-	 * Similarly if a task is capped to arch_scale_cpu_capacity(little_cpu), it
-	 * should fit a little cpu even if there's some pressure.
+	 * Similarly if a task is capped to actual_cpu_capacity, it should fit
+	 * the cpu even if there's some pressure.
 	 *
 	 * Only exception is for HW or cpufreq pressure since it has a direct impact
 	 * on available OPP of the system.
@@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
 	 * For uclamp_max, we can tolerate a drop in performance level as the
 	 * goal is to cap the task. So it's okay if it's getting less.
 	 */
-	capacity_orig = arch_scale_cpu_capacity(cpu);
+	capacity_actual = get_actual_cpu_capacity(cpu);
 
 	/*
 	 * We want to force a task to fit a cpu as implied by uclamp_max.
@@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
 	 *     uclamp_max request.
 	 *
 	 *   which is what we're enforcing here. A task always fits if
-	 *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
+	 *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
 	 *   the normal upmigration rules should withhold still.
 	 *
 	 *   Only exception is when we are on max capacity, then we need to be
@@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
 	 *     2. The system is being saturated when we're operating near
 	 *        max capacity, it doesn't make sense to block overutilized.
 	 */
-	uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
-	uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
+	uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
+	uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_actual);
 	fits = fits || uclamp_max_fits;
 
 	/*
@@ -5086,8 +5086,7 @@ static inline int util_fits_cpu(unsigned long util,
 	 * handle the case uclamp_min > uclamp_max.
 	 */
 	uclamp_min = min(uclamp_min, uclamp_max);
-	if (fits && (util < uclamp_min) &&
-	    (uclamp_min > get_actual_cpu_capacity(cpu)))
+	if (fits && (util < uclamp_min) && (uclamp_min > capacity_actual))
 		return -1;
 
 	return fits;
-- 
2.25.1

Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()

Posted by Qais Yousef 1 year, 7 months ago

On 06/24/24 16:20, Xuewen Yan wrote:
> Commit f1f8d0a22422 ("sched/cpufreq: Take cpufreq feedback into account")
> introduced get_actual_cpu_capacity(), and it had aggregated the
> different pressures applied on the capacity of CPUs.
> And in util_fits_cpu(), it would return true when uclamp_max
> is smaller than SCHED_CAPACITY_SCALE, althought the uclamp_max
> is bigger than actual_cpu_capacity.
> 
> So use actual_cpu_capacity everywhere in util_fits_cpu() to
> cover all cases.
> 
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> ---
>  kernel/sched/fair.c | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5ca6396ef0b7..9c16ae192217 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4980,7 +4980,7 @@ static inline int util_fits_cpu(unsigned long util,
>  				int cpu)
>  {
>  	unsigned long capacity = capacity_of(cpu);
> -	unsigned long capacity_orig;
> +	unsigned long capacity_actual;
>  	bool fits, uclamp_max_fits;
>  
>  	/*
> @@ -4992,15 +4992,15 @@ static inline int util_fits_cpu(unsigned long util,
>  		return fits;
>  
>  	/*
> -	 * We must use arch_scale_cpu_capacity() for comparing against uclamp_min and
> +	 * We must use actual_cpu_capacity() for comparing against uclamp_min and
>  	 * uclamp_max. We only care about capacity pressure (by using
>  	 * capacity_of()) for comparing against the real util.
>  	 *
>  	 * If a task is boosted to 1024 for example, we don't want a tiny
>  	 * pressure to skew the check whether it fits a CPU or not.
>  	 *
> -	 * Similarly if a task is capped to arch_scale_cpu_capacity(little_cpu), it
> -	 * should fit a little cpu even if there's some pressure.
> +	 * Similarly if a task is capped to actual_cpu_capacity, it should fit
> +	 * the cpu even if there's some pressure.

This statement is not clear now. We need to be specific since
actual_cpu_capacity() includes thermal pressure and cpufreq limits.

/even if there's some pressure/even if there is non OPP based pressure ie: RT,
DL or irq/?

>  	 *
>  	 * Only exception is for HW or cpufreq pressure since it has a direct impact
>  	 * on available OPP of the system.
> @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
>  	 * For uclamp_max, we can tolerate a drop in performance level as the
>  	 * goal is to cap the task. So it's okay if it's getting less.
>  	 */
> -	capacity_orig = arch_scale_cpu_capacity(cpu);
> +	capacity_actual = get_actual_cpu_capacity(cpu);
>  
>  	/*
>  	 * We want to force a task to fit a cpu as implied by uclamp_max.
> @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
>  	 *     uclamp_max request.
>  	 *
>  	 *   which is what we're enforcing here. A task always fits if
> -	 *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> +	 *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
>  	 *   the normal upmigration rules should withhold still.
>  	 *
>  	 *   Only exception is when we are on max capacity, then we need to be
> @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
>  	 *     2. The system is being saturated when we're operating near
>  	 *        max capacity, it doesn't make sense to block overutilized.
>  	 */
> -	uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> -	uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> +	uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);

We should use capacity_orig here. We are checking if the CPU is the max
capacity CPU.

> +	uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_actual);
>  	fits = fits || uclamp_max_fits;
>  
>  	/*
> @@ -5086,8 +5086,7 @@ static inline int util_fits_cpu(unsigned long util,
>  	 * handle the case uclamp_min > uclamp_max.
>  	 */
>  	uclamp_min = min(uclamp_min, uclamp_max);
> -	if (fits && (util < uclamp_min) &&
> -	    (uclamp_min > get_actual_cpu_capacity(cpu)))
> +	if (fits && (util < uclamp_min) && (uclamp_min > capacity_actual))
>  		return -1;
>  
>  	return fits;
> -- 
> 2.25.1
>

Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()

Posted by Vincent Guittot 1 year, 7 months ago

On Fri, 28 Jun 2024 at 03:28, Qais Yousef <qyousef@layalina.io> wrote:
>
> On 06/24/24 16:20, Xuewen Yan wrote:
> > Commit f1f8d0a22422 ("sched/cpufreq: Take cpufreq feedback into account")
> > introduced get_actual_cpu_capacity(), and it had aggregated the
> > different pressures applied on the capacity of CPUs.
> > And in util_fits_cpu(), it would return true when uclamp_max
> > is smaller than SCHED_CAPACITY_SCALE, althought the uclamp_max
> > is bigger than actual_cpu_capacity.
> >
> > So use actual_cpu_capacity everywhere in util_fits_cpu() to
> > cover all cases.
> >
> > Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> > ---
> >  kernel/sched/fair.c | 19 +++++++++----------
> >  1 file changed, 9 insertions(+), 10 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 5ca6396ef0b7..9c16ae192217 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4980,7 +4980,7 @@ static inline int util_fits_cpu(unsigned long util,
> >                               int cpu)
> >  {
> >       unsigned long capacity = capacity_of(cpu);
> > -     unsigned long capacity_orig;
> > +     unsigned long capacity_actual;
> >       bool fits, uclamp_max_fits;
> >
> >       /*
> > @@ -4992,15 +4992,15 @@ static inline int util_fits_cpu(unsigned long util,
> >               return fits;
> >
> >       /*
> > -      * We must use arch_scale_cpu_capacity() for comparing against uclamp_min and
> > +      * We must use actual_cpu_capacity() for comparing against uclamp_min and
> >        * uclamp_max. We only care about capacity pressure (by using
> >        * capacity_of()) for comparing against the real util.
> >        *
> >        * If a task is boosted to 1024 for example, we don't want a tiny
> >        * pressure to skew the check whether it fits a CPU or not.
> >        *
> > -      * Similarly if a task is capped to arch_scale_cpu_capacity(little_cpu), it
> > -      * should fit a little cpu even if there's some pressure.
> > +      * Similarly if a task is capped to actual_cpu_capacity, it should fit
> > +      * the cpu even if there's some pressure.
>
> This statement is not clear now. We need to be specific since
> actual_cpu_capacity() includes thermal pressure and cpufreq limits.
>
> /even if there's some pressure/even if there is non OPP based pressure ie: RT,
> DL or irq/?
>
> >        *
> >        * Only exception is for HW or cpufreq pressure since it has a direct impact
> >        * on available OPP of the system.
> > @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
> >        * For uclamp_max, we can tolerate a drop in performance level as the
> >        * goal is to cap the task. So it's okay if it's getting less.
> >        */
> > -     capacity_orig = arch_scale_cpu_capacity(cpu);
> > +     capacity_actual = get_actual_cpu_capacity(cpu);
> >
> >       /*
> >        * We want to force a task to fit a cpu as implied by uclamp_max.
> > @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
> >        *     uclamp_max request.
> >        *
> >        *   which is what we're enforcing here. A task always fits if
> > -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> > +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
> >        *   the normal upmigration rules should withhold still.
> >        *
> >        *   Only exception is when we are on max capacity, then we need to be
> > @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
> >        *     2. The system is being saturated when we're operating near
> >        *        max capacity, it doesn't make sense to block overutilized.
> >        */
> > -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
>
> We should use capacity_orig here. We are checking if the CPU is the max
> capacity CPU.

I was also wondering what would be the best choice there. If we
consider that we have only one performance domain with all max
capacity cpus then I agree that we should keep capacity_orig as we
can't find a better cpu that would fit. But is it always true that all
max cpu are tied to the same perf domain ?

>
> > +     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_actual);
> >       fits = fits || uclamp_max_fits;
> >
> >       /*
> > @@ -5086,8 +5086,7 @@ static inline int util_fits_cpu(unsigned long util,
> >        * handle the case uclamp_min > uclamp_max.
> >        */
> >       uclamp_min = min(uclamp_min, uclamp_max);
> > -     if (fits && (util < uclamp_min) &&
> > -         (uclamp_min > get_actual_cpu_capacity(cpu)))
> > +     if (fits && (util < uclamp_min) && (uclamp_min > capacity_actual))
> >               return -1;
> >
> >       return fits;
> > --
> > 2.25.1
> >

Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()

Posted by Qais Yousef 1 year, 7 months ago

On 07/02/24 15:25, Vincent Guittot wrote:

> > >        *
> > >        * Only exception is for HW or cpufreq pressure since it has a direct impact
> > >        * on available OPP of the system.
> > > @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
> > >        * For uclamp_max, we can tolerate a drop in performance level as the
> > >        * goal is to cap the task. So it's okay if it's getting less.
> > >        */
> > > -     capacity_orig = arch_scale_cpu_capacity(cpu);
> > > +     capacity_actual = get_actual_cpu_capacity(cpu);
> > >
> > >       /*
> > >        * We want to force a task to fit a cpu as implied by uclamp_max.
> > > @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
> > >        *     uclamp_max request.
> > >        *
> > >        *   which is what we're enforcing here. A task always fits if
> > > -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> > > +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
> > >        *   the normal upmigration rules should withhold still.
> > >        *
> > >        *   Only exception is when we are on max capacity, then we need to be
> > > @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
> > >        *     2. The system is being saturated when we're operating near
> > >        *        max capacity, it doesn't make sense to block overutilized.
> > >        */
> > > -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > > -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > > +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> >
> > We should use capacity_orig here. We are checking if the CPU is the max
> > capacity CPU.
> 
> I was also wondering what would be the best choice there. If we
> consider that we have only one performance domain with all max
> capacity cpus then I agree that we should keep capacity_orig as we
> can't find a better cpu that would fit. But is it always true that all
> max cpu are tied to the same perf domain ?

Hmm I could be not thinking straight today. But the purpose of this check is to
ensure overutilized can trigger for the common case where a task will always
fit the max capacity cpu (whether it's on a single pd or multiple ones). For
that corner case fits_capacity() should be the only fitness check otherwise
overutilized will never trigger by default.

Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()

Posted by Vincent Guittot 1 year, 7 months ago

On Wed, 3 Jul 2024 at 13:54, Qais Yousef <qyousef@layalina.io> wrote:
>
> On 07/02/24 15:25, Vincent Guittot wrote:
>
> > > >        *
> > > >        * Only exception is for HW or cpufreq pressure since it has a direct impact
> > > >        * on available OPP of the system.
> > > > @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
> > > >        * For uclamp_max, we can tolerate a drop in performance level as the
> > > >        * goal is to cap the task. So it's okay if it's getting less.
> > > >        */
> > > > -     capacity_orig = arch_scale_cpu_capacity(cpu);
> > > > +     capacity_actual = get_actual_cpu_capacity(cpu);
> > > >
> > > >       /*
> > > >        * We want to force a task to fit a cpu as implied by uclamp_max.
> > > > @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
> > > >        *     uclamp_max request.
> > > >        *
> > > >        *   which is what we're enforcing here. A task always fits if
> > > > -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> > > > +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
> > > >        *   the normal upmigration rules should withhold still.
> > > >        *
> > > >        *   Only exception is when we are on max capacity, then we need to be
> > > > @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
> > > >        *     2. The system is being saturated when we're operating near
> > > >        *        max capacity, it doesn't make sense to block overutilized.
> > > >        */
> > > > -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > > > -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > > > +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > >
> > > We should use capacity_orig here. We are checking if the CPU is the max
> > > capacity CPU.
> >
> > I was also wondering what would be the best choice there. If we
> > consider that we have only one performance domain with all max
> > capacity cpus then I agree that we should keep capacity_orig as we
> > can't find a better cpu that would fit. But is it always true that all
> > max cpu are tied to the same perf domain ?
>
> Hmm I could be not thinking straight today. But the purpose of this check is to
> ensure overutilized can trigger for the common case where a task will always
> fit the max capacity cpu (whether it's on a single pd or multiple ones). For
> that corner case fits_capacity() should be the only fitness check otherwise
> overutilized will never trigger by default.

Ok, so I messed up several use cases but in fact both are similar ...

if capacity_actual != SCHED_CAPACITY_SCALE and uclamp_max ==
SCHED_CAPACITY_SCALE
then uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) &&
(uclamp_max == SCHED_CAPACITY_SCALE) is false
and uclamp_max_fits = !uclamp_max_fits && (uclamp_max <=
capacity_actual); is also false because (uclamp_max <=
capacity_actual) is always false

if capacity_actual == SCHED_CAPACITY_SCALE and uclamp_max ==
SCHED_CAPACITY_SCALE
then we are the same as with capacity_orig

Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()

Posted by Qais Yousef 1 year, 7 months ago

On 07/03/24 16:54, Vincent Guittot wrote:
> On Wed, 3 Jul 2024 at 13:54, Qais Yousef <qyousef@layalina.io> wrote:
> >
> > On 07/02/24 15:25, Vincent Guittot wrote:
> >
> > > > >        *
> > > > >        * Only exception is for HW or cpufreq pressure since it has a direct impact
> > > > >        * on available OPP of the system.
> > > > > @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
> > > > >        * For uclamp_max, we can tolerate a drop in performance level as the
> > > > >        * goal is to cap the task. So it's okay if it's getting less.
> > > > >        */
> > > > > -     capacity_orig = arch_scale_cpu_capacity(cpu);
> > > > > +     capacity_actual = get_actual_cpu_capacity(cpu);
> > > > >
> > > > >       /*
> > > > >        * We want to force a task to fit a cpu as implied by uclamp_max.
> > > > > @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
> > > > >        *     uclamp_max request.
> > > > >        *
> > > > >        *   which is what we're enforcing here. A task always fits if
> > > > > -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> > > > > +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
> > > > >        *   the normal upmigration rules should withhold still.
> > > > >        *
> > > > >        *   Only exception is when we are on max capacity, then we need to be
> > > > > @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
> > > > >        *     2. The system is being saturated when we're operating near
> > > > >        *        max capacity, it doesn't make sense to block overutilized.
> > > > >        */
> > > > > -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > > > > -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > > > > +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > > >
> > > > We should use capacity_orig here. We are checking if the CPU is the max
> > > > capacity CPU.
> > >
> > > I was also wondering what would be the best choice there. If we
> > > consider that we have only one performance domain with all max
> > > capacity cpus then I agree that we should keep capacity_orig as we
> > > can't find a better cpu that would fit. But is it always true that all
> > > max cpu are tied to the same perf domain ?
> >
> > Hmm I could be not thinking straight today. But the purpose of this check is to
> > ensure overutilized can trigger for the common case where a task will always
> > fit the max capacity cpu (whether it's on a single pd or multiple ones). For
> > that corner case fits_capacity() should be the only fitness check otherwise
> > overutilized will never trigger by default.
> 
> Ok, so I messed up several use cases but in fact both are similar ...
> 
> if capacity_actual != SCHED_CAPACITY_SCALE and uclamp_max ==
> SCHED_CAPACITY_SCALE
> then uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) &&
> (uclamp_max == SCHED_CAPACITY_SCALE) is false
> and uclamp_max_fits = !uclamp_max_fits && (uclamp_max <=
> capacity_actual); is also false because (uclamp_max <=
> capacity_actual) is always false
> 
> if capacity_actual == SCHED_CAPACITY_SCALE and uclamp_max ==
> SCHED_CAPACITY_SCALE
> then we are the same as with capacity_orig

Right. The condition is becoming less readable, but you're right it doesn't
change functionality.

Xuewen, could you put something in the commit message please to remind us in
the future that we thought about this and it is fine?

Thanks!

--
Qais Yousef

Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()

Posted by Xuewen Yan 1 year, 6 months ago

On Fri, Jul 5, 2024 at 7:56 AM Qais Yousef <qyousef@layalina.io> wrote:
>
> On 07/03/24 16:54, Vincent Guittot wrote:
> > On Wed, 3 Jul 2024 at 13:54, Qais Yousef <qyousef@layalina.io> wrote:
> > >
> > > On 07/02/24 15:25, Vincent Guittot wrote:
> > >
> > > > > >        *
> > > > > >        * Only exception is for HW or cpufreq pressure since it has a direct impact
> > > > > >        * on available OPP of the system.
> > > > > > @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
> > > > > >        * For uclamp_max, we can tolerate a drop in performance level as the
> > > > > >        * goal is to cap the task. So it's okay if it's getting less.
> > > > > >        */
> > > > > > -     capacity_orig = arch_scale_cpu_capacity(cpu);
> > > > > > +     capacity_actual = get_actual_cpu_capacity(cpu);
> > > > > >
> > > > > >       /*
> > > > > >        * We want to force a task to fit a cpu as implied by uclamp_max.
> > > > > > @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
> > > > > >        *     uclamp_max request.
> > > > > >        *
> > > > > >        *   which is what we're enforcing here. A task always fits if
> > > > > > -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> > > > > > +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
> > > > > >        *   the normal upmigration rules should withhold still.
> > > > > >        *
> > > > > >        *   Only exception is when we are on max capacity, then we need to be
> > > > > > @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
> > > > > >        *     2. The system is being saturated when we're operating near
> > > > > >        *        max capacity, it doesn't make sense to block overutilized.
> > > > > >        */
> > > > > > -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > > > > > -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > > > > > +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > > > >
> > > > > We should use capacity_orig here. We are checking if the CPU is the max
> > > > > capacity CPU.
> > > >
> > > > I was also wondering what would be the best choice there. If we
> > > > consider that we have only one performance domain with all max
> > > > capacity cpus then I agree that we should keep capacity_orig as we
> > > > can't find a better cpu that would fit. But is it always true that all
> > > > max cpu are tied to the same perf domain ?
> > >
> > > Hmm I could be not thinking straight today. But the purpose of this check is to
> > > ensure overutilized can trigger for the common case where a task will always
> > > fit the max capacity cpu (whether it's on a single pd or multiple ones). For
> > > that corner case fits_capacity() should be the only fitness check otherwise
> > > overutilized will never trigger by default.
> >
> > Ok, so I messed up several use cases but in fact both are similar ...
> >
> > if capacity_actual != SCHED_CAPACITY_SCALE and uclamp_max ==
> > SCHED_CAPACITY_SCALE
> > then uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) &&
> > (uclamp_max == SCHED_CAPACITY_SCALE) is false
> > and uclamp_max_fits = !uclamp_max_fits && (uclamp_max <=
> > capacity_actual); is also false because (uclamp_max <=
> > capacity_actual) is always false
> >
> > if capacity_actual == SCHED_CAPACITY_SCALE and uclamp_max ==
> > SCHED_CAPACITY_SCALE
> > then we are the same as with capacity_orig
>
> Right. The condition is becoming less readable, but you're right it doesn't
> change functionality.
>
> Xuewen, could you put something in the commit message please to remind us in
> the future that we thought about this and it is fine?

Sorry for the late reply. I will read all the emails carefully.
I am also studying the definition and usage of this function carefully.
I will send patch v3 as soon as I fully understand it.

Thanks!

>
> Thanks!
>
> --
> Qais Yousef

Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()

Posted by Dietmar Eggemann 1 year, 7 months ago

On 05/07/2024 01:56, Qais Yousef wrote:
> On 07/03/24 16:54, Vincent Guittot wrote:
>> On Wed, 3 Jul 2024 at 13:54, Qais Yousef <qyousef@layalina.io> wrote:
>>>
>>> On 07/02/24 15:25, Vincent Guittot wrote:
>>>
>>>>>>        *
>>>>>>        * Only exception is for HW or cpufreq pressure since it has a direct impact
>>>>>>        * on available OPP of the system.
>>>>>> @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
>>>>>>        * For uclamp_max, we can tolerate a drop in performance level as the
>>>>>>        * goal is to cap the task. So it's okay if it's getting less.
>>>>>>        */
>>>>>> -     capacity_orig = arch_scale_cpu_capacity(cpu);
>>>>>> +     capacity_actual = get_actual_cpu_capacity(cpu);
>>>>>>
>>>>>>       /*
>>>>>>        * We want to force a task to fit a cpu as implied by uclamp_max.
>>>>>> @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
>>>>>>        *     uclamp_max request.
>>>>>>        *
>>>>>>        *   which is what we're enforcing here. A task always fits if
>>>>>> -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
>>>>>> +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
>>>>>>        *   the normal upmigration rules should withhold still.
>>>>>>        *
>>>>>>        *   Only exception is when we are on max capacity, then we need to be
>>>>>> @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
>>>>>>        *     2. The system is being saturated when we're operating near
>>>>>>        *        max capacity, it doesn't make sense to block overutilized.
>>>>>>        */
>>>>>> -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
>>>>>> -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
>>>>>> +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
>>>>>
>>>>> We should use capacity_orig here. We are checking if the CPU is the max
>>>>> capacity CPU.
>>>>
>>>> I was also wondering what would be the best choice there. If we
>>>> consider that we have only one performance domain with all max
>>>> capacity cpus then I agree that we should keep capacity_orig as we
>>>> can't find a better cpu that would fit. But is it always true that all
>>>> max cpu are tied to the same perf domain ?

This should be the case. Perf domains follow Frequency Domains (FD). So
even if we have the most powerful CPUs in 2 different FDs, only the CPUs
in the FD with the higher max OPP get SCHED_CAPACITY_SCALE assigned,
i.e. only they become big CPUs.

>>> Hmm I could be not thinking straight today. But the purpose of this check is to
>>> ensure overutilized can trigger for the common case where a task will always
>>> fit the max capacity cpu (whether it's on a single pd or multiple ones). For
>>> that corner case fits_capacity() should be the only fitness check otherwise
>>> overutilized will never trigger by default.
>>
>> Ok, so I messed up several use cases but in fact both are similar ...
>>
>> if capacity_actual != SCHED_CAPACITY_SCALE and uclamp_max ==
>> SCHED_CAPACITY_SCALE
>> then uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) &&
>> (uclamp_max == SCHED_CAPACITY_SCALE) is false
>> and uclamp_max_fits = !uclamp_max_fits && (uclamp_max <=
>> capacity_actual); is also false because (uclamp_max <=
>> capacity_actual) is always false
>>
>> if capacity_actual == SCHED_CAPACITY_SCALE and uclamp_max ==
>> SCHED_CAPACITY_SCALE
>> then we are the same as with capacity_orig
> 
> Right. The condition is becoming less readable, but you're right it doesn't
> change functionality.

+1. 'capacity_actual < SCHED_CAPACITY_SCALE' on big CPUs just make them
non-big CPUs.

> Xuewen, could you put something in the commit message please to remind us in
> the future that we thought about this and it is fine?
> 
> Thanks!
> 
> --
> Qais Yousef

Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()

Posted by Xuewen Yan 1 year, 7 months ago

On Fri, Jun 28, 2024 at 9:28 AM Qais Yousef <qyousef@layalina.io> wrote:
>
> On 06/24/24 16:20, Xuewen Yan wrote:
> > Commit f1f8d0a22422 ("sched/cpufreq: Take cpufreq feedback into account")
> > introduced get_actual_cpu_capacity(), and it had aggregated the
> > different pressures applied on the capacity of CPUs.
> > And in util_fits_cpu(), it would return true when uclamp_max
> > is smaller than SCHED_CAPACITY_SCALE, althought the uclamp_max
> > is bigger than actual_cpu_capacity.
> >
> > So use actual_cpu_capacity everywhere in util_fits_cpu() to
> > cover all cases.
> >
> > Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> > ---
> >  kernel/sched/fair.c | 19 +++++++++----------
> >  1 file changed, 9 insertions(+), 10 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 5ca6396ef0b7..9c16ae192217 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4980,7 +4980,7 @@ static inline int util_fits_cpu(unsigned long util,
> >                               int cpu)
> >  {
> >       unsigned long capacity = capacity_of(cpu);
> > -     unsigned long capacity_orig;
> > +     unsigned long capacity_actual;
> >       bool fits, uclamp_max_fits;
> >
> >       /*
> > @@ -4992,15 +4992,15 @@ static inline int util_fits_cpu(unsigned long util,
> >               return fits;
> >
> >       /*
> > -      * We must use arch_scale_cpu_capacity() for comparing against uclamp_min and
> > +      * We must use actual_cpu_capacity() for comparing against uclamp_min and
> >        * uclamp_max. We only care about capacity pressure (by using
> >        * capacity_of()) for comparing against the real util.
> >        *
> >        * If a task is boosted to 1024 for example, we don't want a tiny
> >        * pressure to skew the check whether it fits a CPU or not.
> >        *
> > -      * Similarly if a task is capped to arch_scale_cpu_capacity(little_cpu), it
> > -      * should fit a little cpu even if there's some pressure.
> > +      * Similarly if a task is capped to actual_cpu_capacity, it should fit
> > +      * the cpu even if there's some pressure.
>
> This statement is not clear now. We need to be specific since
> actual_cpu_capacity() includes thermal pressure and cpufreq limits.
>
> /even if there's some pressure/even if there is non OPP based pressure ie: RT,
> DL or irq/?
>
> >        *
> >        * Only exception is for HW or cpufreq pressure since it has a direct impact
> >        * on available OPP of the system.
> > @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
> >        * For uclamp_max, we can tolerate a drop in performance level as the
> >        * goal is to cap the task. So it's okay if it's getting less.
> >        */
> > -     capacity_orig = arch_scale_cpu_capacity(cpu);
> > +     capacity_actual = get_actual_cpu_capacity(cpu);
> >
> >       /*
> >        * We want to force a task to fit a cpu as implied by uclamp_max.
> > @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
> >        *     uclamp_max request.
> >        *
> >        *   which is what we're enforcing here. A task always fits if
> > -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> > +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
> >        *   the normal upmigration rules should withhold still.
> >        *
> >        *   Only exception is when we are on max capacity, then we need to be
> > @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
> >        *     2. The system is being saturated when we're operating near
> >        *        max capacity, it doesn't make sense to block overutilized.
> >        */
> > -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
>
> We should use capacity_orig here. We are checking if the CPU is the max
> capacity CPU.

Maybe we could remove the uclamp_max_fits = (capacity_orig ==
SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
and just judge the uclamp_max <= capacity_actual?

-     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) &&
(uclamp_max == SCHED_CAPACITY_SCALE);
-     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
+     uclamp_max_fits =  (uclamp_max <= capacity_actual);


>
> > +     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_actual);
> >       fits = fits || uclamp_max_fits;
> >
> >       /*
> > @@ -5086,8 +5086,7 @@ static inline int util_fits_cpu(unsigned long util,
> >        * handle the case uclamp_min > uclamp_max.
> >        */
> >       uclamp_min = min(uclamp_min, uclamp_max);
> > -     if (fits && (util < uclamp_min) &&
> > -         (uclamp_min > get_actual_cpu_capacity(cpu)))
> > +     if (fits && (util < uclamp_min) && (uclamp_min > capacity_actual))
> >               return -1;
> >
> >       return fits;
> > --
> > 2.25.1
> >

Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()

Posted by Qais Yousef 1 year, 7 months ago

On 07/01/24 20:13, Xuewen Yan wrote:

> > >        *
> > >        * Only exception is for HW or cpufreq pressure since it has a direct impact
> > >        * on available OPP of the system.
> > > @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
> > >        * For uclamp_max, we can tolerate a drop in performance level as the
> > >        * goal is to cap the task. So it's okay if it's getting less.
> > >        */
> > > -     capacity_orig = arch_scale_cpu_capacity(cpu);
> > > +     capacity_actual = get_actual_cpu_capacity(cpu);
> > >
> > >       /*
> > >        * We want to force a task to fit a cpu as implied by uclamp_max.
> > > @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
> > >        *     uclamp_max request.
> > >        *
> > >        *   which is what we're enforcing here. A task always fits if
> > > -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> > > +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
> > >        *   the normal upmigration rules should withhold still.
> > >        *
> > >        *   Only exception is when we are on max capacity, then we need to be
> > > @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
> > >        *     2. The system is being saturated when we're operating near
> > >        *        max capacity, it doesn't make sense to block overutilized.
> > >        */
> > > -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > > -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > > +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> >
> > We should use capacity_orig here. We are checking if the CPU is the max
> > capacity CPU.
> 
> Maybe we could remove the uclamp_max_fits = (capacity_orig ==
> SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> and just judge the uclamp_max <= capacity_actual?
> 
> -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) &&
> (uclamp_max == SCHED_CAPACITY_SCALE);
> -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> +     uclamp_max_fits =  (uclamp_max <= capacity_actual);

If capacity_orig = 1024, capacity_actual = 1024, uclamp_max = 1024 (which is
the common case), then overutilized will never trigger for big CPU, no?

We can't 'force' fit a task on the biggest core, and fits_capacity() should be
our sole judge here if we fit or not.

Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()

Posted by Vincent Guittot 1 year, 7 months ago

On Mon, 24 Jun 2024 at 10:21, Xuewen Yan <xuewen.yan@unisoc.com> wrote:
>
> Commit f1f8d0a22422 ("sched/cpufreq: Take cpufreq feedback into account")
> introduced get_actual_cpu_capacity(), and it had aggregated the
> different pressures applied on the capacity of CPUs.
> And in util_fits_cpu(), it would return true when uclamp_max
> is smaller than SCHED_CAPACITY_SCALE, althought the uclamp_max

s/althought/although/

> is bigger than actual_cpu_capacity.
>
> So use actual_cpu_capacity everywhere in util_fits_cpu() to
> cover all cases.
>
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> ---
>  kernel/sched/fair.c | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5ca6396ef0b7..9c16ae192217 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4980,7 +4980,7 @@ static inline int util_fits_cpu(unsigned long util,
>                                 int cpu)
>  {
>         unsigned long capacity = capacity_of(cpu);
> -       unsigned long capacity_orig;
> +       unsigned long capacity_actual;
>         bool fits, uclamp_max_fits;
>
>         /*
> @@ -4992,15 +4992,15 @@ static inline int util_fits_cpu(unsigned long util,
>                 return fits;
>
>         /*
> -        * We must use arch_scale_cpu_capacity() for comparing against uclamp_min and
> +        * We must use actual_cpu_capacity() for comparing against uclamp_min and
>          * uclamp_max. We only care about capacity pressure (by using
>          * capacity_of()) for comparing against the real util.

actual_cpu_capacity() includes capacity pressure so this comment needs
to be changed.

>          *
>          * If a task is boosted to 1024 for example, we don't want a tiny
>          * pressure to skew the check whether it fits a CPU or not.

In think that the pressure in this case is about DL, RT and Irq but
not OPP capping because of thermal pressure for example

Also, the returned value is not a boolean: fit or not fit but a
ternary for the case where it doesn't fit the uclamp min. In the later
case, the CPU is still considered as a candidate

>          *
> -        * Similarly if a task is capped to arch_scale_cpu_capacity(little_cpu), it
> -        * should fit a little cpu even if there's some pressure.
> +        * Similarly if a task is capped to actual_cpu_capacity, it should fit
> +        * the cpu even if there's some pressure.
>          *
>          * Only exception is for HW or cpufreq pressure since it has a direct impact
>          * on available OPP of the system.
> @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
>          * For uclamp_max, we can tolerate a drop in performance level as the
>          * goal is to cap the task. So it's okay if it's getting less.
>          */
> -       capacity_orig = arch_scale_cpu_capacity(cpu);
> +       capacity_actual = get_actual_cpu_capacity(cpu);
>
>         /*
>          * We want to force a task to fit a cpu as implied by uclamp_max.
> @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
>          *     uclamp_max request.
>          *
>          *   which is what we're enforcing here. A task always fits if
> -        *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> +        *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
>          *   the normal upmigration rules should withhold still.
>          *
>          *   Only exception is when we are on max capacity, then we need to be
> @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
>          *     2. The system is being saturated when we're operating near
>          *        max capacity, it doesn't make sense to block overutilized.
>          */
> -       uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> -       uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> +       uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> +       uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_actual);
>         fits = fits || uclamp_max_fits;
>
>         /*
> @@ -5086,8 +5086,7 @@ static inline int util_fits_cpu(unsigned long util,
>          * handle the case uclamp_min > uclamp_max.
>          */
>         uclamp_min = min(uclamp_min, uclamp_max);
> -       if (fits && (util < uclamp_min) &&
> -           (uclamp_min > get_actual_cpu_capacity(cpu)))
> +       if (fits && (util < uclamp_min) && (uclamp_min > capacity_actual))
>                 return -1;
>
>         return fits;
> --
> 2.25.1
>

[PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity
[PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()