[RFC][PATCH v0.1 5/6] sched/topology: Allow .setpolicy() cpufreq drivers to enable EAS

Rafael J. Wysocki posted 1 patch 2 weeks, 1 day ago
kernel/sched/topology.c |    6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
[RFC][PATCH v0.1 5/6] sched/topology: Allow .setpolicy() cpufreq drivers to enable EAS
Posted by Rafael J. Wysocki 2 weeks, 1 day ago
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Some cpufreq drivers, like intel_pstate, have built-in governors that
are used instead of regular cpufreq governors, schedutil in particular,
but they can work with EAS just fine, so allow EAS to be used with
those drivers.

Also update the debug message printed when the cpufreq governor in
use is not schedutil and the related comment, to better match the
code after the change.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

I'm not sure how much value there is in refusing to enable EAS without
schedutil in general.  For instance, if there are no crossover points
between the cost curves for different perf domains, EAS may as well be
used with the performance and powersave governors AFAICS.

---
 kernel/sched/topology.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-pm/kernel/sched/topology.c
===================================================================
--- linux-pm.orig/kernel/sched/topology.c
+++ linux-pm/kernel/sched/topology.c
@@ -251,7 +251,7 @@ static bool sched_is_eas_possible(const
 		return false;
 	}
 
-	/* Do not attempt EAS if schedutil is not being used. */
+	/* Do not attempt EAS with a cpufreq governor other than schedutil. */
 	for_each_cpu(i, cpu_mask) {
 		policy = cpufreq_cpu_get(i);
 		if (!policy) {
@@ -263,9 +263,9 @@ static bool sched_is_eas_possible(const
 		}
 		gov = policy->governor;
 		cpufreq_cpu_put(policy);
-		if (gov != &schedutil_gov) {
+		if (gov && gov != &schedutil_gov) {
 			if (sched_debug()) {
-				pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n",
+				pr_info("rd %*pbl: Checking EAS, cpufreq governor is not schedutil\n",
 					cpumask_pr_args(cpu_mask));
 			}
 			return false;
Re: [RFC][PATCH v0.1 5/6] sched/topology: Allow .setpolicy() cpufreq drivers to enable EAS
Posted by Christian Loehle 1 week, 5 days ago
On 11/8/24 16:41, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Some cpufreq drivers, like intel_pstate, have built-in governors that
> are used instead of regular cpufreq governors, schedutil in particular,
> but they can work with EAS just fine, so allow EAS to be used with
> those drivers.
> 
> Also update the debug message printed when the cpufreq governor in
> use is not schedutil and the related comment, to better match the
> code after the change.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> I'm not sure how much value there is in refusing to enable EAS without
> schedutil in general.  For instance, if there are no crossover points
> between the cost curves for different perf domains, EAS may as well be
> used with the performance and powersave governors AFAICS.
 
Agreed, but having no cross-over points or no DVFS at all should be the
only instances, right?
For plain (non-intel_pstate) powersave and performance we could replace
sugov_effective_cpu_perf()
that determines the OPP of the perf-domain by the OPP they will be
choosing, but for the rest?
Also there is the entire uclamp thing, not sure what the best
solution is there.
Will intel_pstate just always ignore it? Might be better then to
depend on !intel_pstate?

> ---
>  kernel/sched/topology.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> Index: linux-pm/kernel/sched/topology.c
> ===================================================================
> --- linux-pm.orig/kernel/sched/topology.c
> +++ linux-pm/kernel/sched/topology.c
> @@ -251,7 +251,7 @@ static bool sched_is_eas_possible(const
>  		return false;
>  	}
>  
> -	/* Do not attempt EAS if schedutil is not being used. */
> +	/* Do not attempt EAS with a cpufreq governor other than schedutil. */
>  	for_each_cpu(i, cpu_mask) {
>  		policy = cpufreq_cpu_get(i);
>  		if (!policy) {
> @@ -263,9 +263,9 @@ static bool sched_is_eas_possible(const
>  		}
>  		gov = policy->governor;
>  		cpufreq_cpu_put(policy);
> -		if (gov != &schedutil_gov) {
> +		if (gov && gov != &schedutil_gov) {
>  			if (sched_debug()) {
> -				pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n",
> +				pr_info("rd %*pbl: Checking EAS, cpufreq governor is not schedutil\n",
>  					cpumask_pr_args(cpu_mask));
>  			}
>  			return false;
> 
> 
> 
>
Re: [RFC][PATCH v0.1 5/6] sched/topology: Allow .setpolicy() cpufreq drivers to enable EAS
Posted by Rafael J. Wysocki 1 week, 5 days ago
On Mon, Nov 11, 2024 at 12:54 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 11/8/24 16:41, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Some cpufreq drivers, like intel_pstate, have built-in governors that
> > are used instead of regular cpufreq governors, schedutil in particular,
> > but they can work with EAS just fine, so allow EAS to be used with
> > those drivers.
> >
> > Also update the debug message printed when the cpufreq governor in
> > use is not schedutil and the related comment, to better match the
> > code after the change.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > I'm not sure how much value there is in refusing to enable EAS without
> > schedutil in general.  For instance, if there are no crossover points
> > between the cost curves for different perf domains, EAS may as well be
> > used with the performance and powersave governors AFAICS.
>
> Agreed, but having no cross-over points or no DVFS at all should be the
> only instances, right?

Not really.  This is the most obvious case, but there are other less
obvious ones.

Say there are two cross-over points: The  "performance" and
"powersave" governors should still be fine with EAS in that case.

Or what if somebody has a governor in user space that generally
behaves like schedutil?

Or what about ondemand?  Is it alway completely broken with EAS?

> For plain (non-intel_pstate) powersave and performance we could replace
> sugov_effective_cpu_perf()
> that determines the OPP of the perf-domain by the OPP they will be
> choosing, but for the rest?

I generally think that depending on schedutil for EAS is a mistake.

I would just print a warning that results may be suboptimal or
generally not as expected if the cpufreq governor is not schedutil
instead of preventing EAS from running at all.

> Also there is the entire uclamp thing, not sure what the best
> solution is there.
> Will intel_pstate just always ignore it? Might be better then to
> depend on !intel_pstate?

Well, it can be made dependent on policy->policy ==
CPUFREQ_POLICY_POWERSAVE if gov is NULL or similar, but honestly why
bother?

> > ---
> >  kernel/sched/topology.c |    6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > Index: linux-pm/kernel/sched/topology.c
> > ===================================================================
> > --- linux-pm.orig/kernel/sched/topology.c
> > +++ linux-pm/kernel/sched/topology.c
> > @@ -251,7 +251,7 @@ static bool sched_is_eas_possible(const
> >               return false;
> >       }
> >
> > -     /* Do not attempt EAS if schedutil is not being used. */
> > +     /* Do not attempt EAS with a cpufreq governor other than schedutil. */
> >       for_each_cpu(i, cpu_mask) {
> >               policy = cpufreq_cpu_get(i);
> >               if (!policy) {
> > @@ -263,9 +263,9 @@ static bool sched_is_eas_possible(const
> >               }
> >               gov = policy->governor;
> >               cpufreq_cpu_put(policy);
> > -             if (gov != &schedutil_gov) {
> > +             if (gov && gov != &schedutil_gov) {
> >                       if (sched_debug()) {
> > -                             pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n",
> > +                             pr_info("rd %*pbl: Checking EAS, cpufreq governor is not schedutil\n",
> >                                       cpumask_pr_args(cpu_mask));
> >                       }
> >                       return false;
> >
> >
> >
> >
>
>
Re: [RFC][PATCH v0.1 5/6] sched/topology: Allow .setpolicy() cpufreq drivers to enable EAS
Posted by Peter Zijlstra 4 days, 4 hours ago
On Mon, Nov 11, 2024 at 02:54:43PM +0100, Rafael J. Wysocki wrote:

> Or what about ondemand?  Is it alway completely broken with EAS?

I thought that thing was mostly considered broken anyway :-)

> > For plain (non-intel_pstate) powersave and performance we could replace
> > sugov_effective_cpu_perf()
> > that determines the OPP of the perf-domain by the OPP they will be
> > choosing, but for the rest?
> 
> I generally think that depending on schedutil for EAS is a mistake.

Well, the thinking was that we wanted to move to a single governor, and
not proliferate things.
Re: [RFC][PATCH v0.1 5/6] sched/topology: Allow .setpolicy() cpufreq drivers to enable EAS
Posted by Rafael J. Wysocki 4 days, 2 hours ago
On Tue, Nov 19, 2024 at 6:37 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Nov 11, 2024 at 02:54:43PM +0100, Rafael J. Wysocki wrote:
>
> > Or what about ondemand?  Is it alway completely broken with EAS?
>
> I thought that thing was mostly considered broken anyway :-)

Well, it's still there in the tree, although honestly I don't know how
many users of it there are.

> > > For plain (non-intel_pstate) powersave and performance we could replace
> > > sugov_effective_cpu_perf()
> > > that determines the OPP of the perf-domain by the OPP they will be
> > > choosing, but for the rest?
> >
> > I generally think that depending on schedutil for EAS is a mistake.
>
> Well, the thinking was that we wanted to move to a single governor, and
> not proliferate things.

Thing is, intel_pstate in its default configuration doesn't use a
separate cpufreq governor at all.  It allows P-code to select
P-states, but on the new HW the result of this isn't really that much
different from what schedutil would do.

The cpufreq governor check needs to be adjusted at least for this
case, but overall it should be done in cpufreq because it refers to
cpufreq internals.
Re: [RFC][PATCH v0.1 5/6] sched/topology: Allow .setpolicy() cpufreq drivers to enable EAS
Posted by Vincent Guittot 4 days, 6 hours ago
On Mon, 11 Nov 2024 at 14:54, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Nov 11, 2024 at 12:54 PM Christian Loehle
> <christian.loehle@arm.com> wrote:
> >
> > On 11/8/24 16:41, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > Some cpufreq drivers, like intel_pstate, have built-in governors that
> > > are used instead of regular cpufreq governors, schedutil in particular,
> > > but they can work with EAS just fine, so allow EAS to be used with
> > > those drivers.
> > >
> > > Also update the debug message printed when the cpufreq governor in
> > > use is not schedutil and the related comment, to better match the
> > > code after the change.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >
> > > I'm not sure how much value there is in refusing to enable EAS without
> > > schedutil in general.  For instance, if there are no crossover points
> > > between the cost curves for different perf domains, EAS may as well be
> > > used with the performance and powersave governors AFAICS.
> >
> > Agreed, but having no cross-over points or no DVFS at all should be the
> > only instances, right?
>
> Not really.  This is the most obvious case, but there are other less
> obvious ones.
>
> Say there are two cross-over points: The  "performance" and
> "powersave" governors should still be fine with EAS in that case.
>
> Or what if somebody has a governor in user space that generally
> behaves like schedutil?
>
> Or what about ondemand?  Is it alway completely broken with EAS?

The only requirement from EAS is to know which OPP and its cost will
be selected by cpufreq gov for an utilization level of the CPU.
sched_util provides it with sugov_effective_cpu_perf(). Any other gov
that can provide such estimate of the OPP and associated cost should
be ok

powersave and perf should be pretty obvious not so sure for ondemand

>
> > For plain (non-intel_pstate) powersave and performance we could replace
> > sugov_effective_cpu_perf()
> > that determines the OPP of the perf-domain by the OPP they will be
> > choosing, but for the rest?
>
> I generally think that depending on schedutil for EAS is a mistake.
>
> I would just print a warning that results may be suboptimal or
> generally not as expected if the cpufreq governor is not schedutil
> instead of preventing EAS from running at all.
>
> > Also there is the entire uclamp thing, not sure what the best
> > solution is there.
> > Will intel_pstate just always ignore it? Might be better then to
> > depend on !intel_pstate?
>
> Well, it can be made dependent on policy->policy ==
> CPUFREQ_POLICY_POWERSAVE if gov is NULL or similar, but honestly why
> bother?
>
> > > ---
> > >  kernel/sched/topology.c |    6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > Index: linux-pm/kernel/sched/topology.c
> > > ===================================================================
> > > --- linux-pm.orig/kernel/sched/topology.c
> > > +++ linux-pm/kernel/sched/topology.c
> > > @@ -251,7 +251,7 @@ static bool sched_is_eas_possible(const
> > >               return false;
> > >       }
> > >
> > > -     /* Do not attempt EAS if schedutil is not being used. */
> > > +     /* Do not attempt EAS with a cpufreq governor other than schedutil. */
> > >       for_each_cpu(i, cpu_mask) {
> > >               policy = cpufreq_cpu_get(i);
> > >               if (!policy) {
> > > @@ -263,9 +263,9 @@ static bool sched_is_eas_possible(const
> > >               }
> > >               gov = policy->governor;
> > >               cpufreq_cpu_put(policy);
> > > -             if (gov != &schedutil_gov) {
> > > +             if (gov && gov != &schedutil_gov) {
> > >                       if (sched_debug()) {
> > > -                             pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n",
> > > +                             pr_info("rd %*pbl: Checking EAS, cpufreq governor is not schedutil\n",
> > >                                       cpumask_pr_args(cpu_mask));
> > >                       }
> > >                       return false;
> > >
> > >
> > >
> > >
> >
> >