[RFC][PATCH v0.1 3/6] PM: EM: Add special case to em_dev_register_perf_domain()

Rafael J. Wysocki posted 1 patch 1 year, 1 month ago
kernel/power/energy_model.c |   26 +++++++++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)
[RFC][PATCH v0.1 3/6] PM: EM: Add special case to em_dev_register_perf_domain()
Posted by Rafael J. Wysocki 1 year, 1 month ago
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Allow em_dev_register_perf_domain() to register a cost-only stub
perf domain with one-element states table if the .active_power()
callback is not provided.

Subsequently, this will be used by the intel_pstate driver to register
stub perf domains for CPUs on hybrid systems.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 kernel/power/energy_model.c |   26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

Index: linux-pm/kernel/power/energy_model.c
===================================================================
--- linux-pm.orig/kernel/power/energy_model.c
+++ linux-pm/kernel/power/energy_model.c
@@ -426,9 +426,11 @@ static int em_create_pd(struct device *d
 	if (!em_table)
 		goto free_pd;
 
-	ret = em_create_perf_table(dev, pd, em_table->state, cb, flags);
-	if (ret)
-		goto free_pd_table;
+	if (cb->active_power) {
+		ret = em_create_perf_table(dev, pd, em_table->state, cb, flags);
+		if (ret)
+			goto free_pd_table;
+	}
 
 	ret = em_compute_costs(dev, em_table->state, cb, nr_states, flags);
 	if (ret)
@@ -561,11 +563,20 @@ int em_dev_register_perf_domain(struct d
 {
 	unsigned long cap, prev_cap = 0;
 	unsigned long flags = 0;
+	bool stub_pd = false;
 	int cpu, ret;
 
 	if (!dev || !nr_states || !cb)
 		return -EINVAL;
 
+	if (!cb->active_power) {
+		if (!cb->get_cost || nr_states > 1 || microwatts)
+			return -EINVAL;
+
+		/* Special case: a stub perf domain. */
+		stub_pd = true;
+	}
+
 	/*
 	 * Use a mutex to serialize the registration of performance domains and
 	 * let the driver-defined callback functions sleep.
@@ -590,6 +601,15 @@ int em_dev_register_perf_domain(struct d
 				ret = -EEXIST;
 				goto unlock;
 			}
+
+			/*
+			 * The capacity need not be the same for all CPUs in a
+			 * stub perf domain, so long as the average cost of
+			 * running on each of them is approximately the same.
+			 */
+			if (stub_pd)
+				continue;
+
 			/*
 			 * All CPUs of a domain must have the same
 			 * micro-architecture since they all share the same
Re: [RFC][PATCH v0.1 3/6] PM: EM: Add special case to em_dev_register_perf_domain()
Posted by Hongyan Xia 1 year ago
On 08/11/2024 16:38, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Allow em_dev_register_perf_domain() to register a cost-only stub
> perf domain with one-element states table if the .active_power()
> callback is not provided.
> 
> Subsequently, this will be used by the intel_pstate driver to register
> stub perf domains for CPUs on hybrid systems.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>   kernel/power/energy_model.c |   26 +++++++++++++++++++++++---
>   1 file changed, 23 insertions(+), 3 deletions(-)
> 
> Index: linux-pm/kernel/power/energy_model.c
> ===================================================================
> --- linux-pm.orig/kernel/power/energy_model.c
> +++ linux-pm/kernel/power/energy_model.c
> @@ -426,9 +426,11 @@ static int em_create_pd(struct device *d
>   	if (!em_table)
>   		goto free_pd;
>   
> -	ret = em_create_perf_table(dev, pd, em_table->state, cb, flags);
> -	if (ret)
> -		goto free_pd_table;
> +	if (cb->active_power) {
> +		ret = em_create_perf_table(dev, pd, em_table->state, cb, flags);
> +		if (ret)
> +			goto free_pd_table;
> +	}
>   
>   	ret = em_compute_costs(dev, em_table->state, cb, nr_states, flags);
>   	if (ret)
> @@ -561,11 +563,20 @@ int em_dev_register_perf_domain(struct d
>   {
>   	unsigned long cap, prev_cap = 0;
>   	unsigned long flags = 0;
> +	bool stub_pd = false;
>   	int cpu, ret;
>   
>   	if (!dev || !nr_states || !cb)
>   		return -EINVAL;
>   
> +	if (!cb->active_power) {
> +		if (!cb->get_cost || nr_states > 1 || microwatts)
> +			return -EINVAL;
> +
> +		/* Special case: a stub perf domain. */
> +		stub_pd = true;
> +	}
> +

I wonder if the only purpose of stub_pd is to just skip the capacity 
check below, which doesn't look very nice.

I may be echoing Dietmar's comments here. What's the problem of just 
having 3 domains?

Or, could you just specify the same capacities so that the same-capacity 
check won't fail, but just to use hardware load or CPU pressure to model 
the slight difference in real capacities? This way you'd re-use a lot of 
existing infrastructure.

>   	/*
>   	 * Use a mutex to serialize the registration of performance domains and
>   	 * let the driver-defined callback functions sleep.
> @@ -590,6 +601,15 @@ int em_dev_register_perf_domain(struct d
>   				ret = -EEXIST;
>   				goto unlock;
>   			}
> +
> +			/*
> +			 * The capacity need not be the same for all CPUs in a
> +			 * stub perf domain, so long as the average cost of
> +			 * running on each of them is approximately the same.
> +			 */
> +			if (stub_pd)
> +				continue;
> +
>   			/*
>   			 * All CPUs of a domain must have the same
>   			 * micro-architecture since they all share the same
> 
> 
>
Re: [RFC][PATCH v0.1 3/6] PM: EM: Add special case to em_dev_register_perf_domain()
Posted by Rafael J. Wysocki 1 year ago
On Mon, Nov 18, 2024 at 4:25 PM Hongyan Xia <hongyan.xia2@arm.com> wrote:
>
> On 08/11/2024 16:38, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Allow em_dev_register_perf_domain() to register a cost-only stub
> > perf domain with one-element states table if the .active_power()
> > callback is not provided.
> >
> > Subsequently, this will be used by the intel_pstate driver to register
> > stub perf domains for CPUs on hybrid systems.
> >
> > No intentional functional impact.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >   kernel/power/energy_model.c |   26 +++++++++++++++++++++++---
> >   1 file changed, 23 insertions(+), 3 deletions(-)
> >
> > Index: linux-pm/kernel/power/energy_model.c
> > ===================================================================
> > --- linux-pm.orig/kernel/power/energy_model.c
> > +++ linux-pm/kernel/power/energy_model.c
> > @@ -426,9 +426,11 @@ static int em_create_pd(struct device *d
> >       if (!em_table)
> >               goto free_pd;
> >
> > -     ret = em_create_perf_table(dev, pd, em_table->state, cb, flags);
> > -     if (ret)
> > -             goto free_pd_table;
> > +     if (cb->active_power) {
> > +             ret = em_create_perf_table(dev, pd, em_table->state, cb, flags);
> > +             if (ret)
> > +                     goto free_pd_table;
> > +     }
> >
> >       ret = em_compute_costs(dev, em_table->state, cb, nr_states, flags);
> >       if (ret)
> > @@ -561,11 +563,20 @@ int em_dev_register_perf_domain(struct d
> >   {
> >       unsigned long cap, prev_cap = 0;
> >       unsigned long flags = 0;
> > +     bool stub_pd = false;
> >       int cpu, ret;
> >
> >       if (!dev || !nr_states || !cb)
> >               return -EINVAL;
> >
> > +     if (!cb->active_power) {
> > +             if (!cb->get_cost || nr_states > 1 || microwatts)
> > +                     return -EINVAL;
> > +
> > +             /* Special case: a stub perf domain. */
> > +             stub_pd = true;
> > +     }
> > +
>
> I wonder if the only purpose of stub_pd is to just skip the capacity
> check below, which doesn't look very nice.

It is.

I guess I could just skip it if nr_states == 1 because that case means
the same cost for all frequency values.

>
> I may be echoing Dietmar's comments here. What's the problem of just
> having 3 domains?

The energy-efficiency of a CPU is not strictly related to its capacity.

It's about the cases when there are some special CPUs that can
turbo-up higher, but there's no other difference between them and the
other CPUs in the domain.

> Or, could you just specify the same capacities so that the same-capacity
> check won't fail, but just to use hardware load or CPU pressure to model
> the slight difference in real capacities? This way you'd re-use a lot of
> existing infrastructure.

That would have been confusing though, so thanks, but no thanks.

> >       /*
> >        * Use a mutex to serialize the registration of performance domains and
> >        * let the driver-defined callback functions sleep.
> > @@ -590,6 +601,15 @@ int em_dev_register_perf_domain(struct d
> >                               ret = -EEXIST;
> >                               goto unlock;
> >                       }
> > +
> > +                     /*
> > +                      * The capacity need not be the same for all CPUs in a
> > +                      * stub perf domain, so long as the average cost of
> > +                      * running on each of them is approximately the same.
> > +                      */
> > +                     if (stub_pd)
> > +                             continue;
> > +
> >                       /*
> >                        * All CPUs of a domain must have the same
> >                        * micro-architecture since they all share the same
> >
> >
> >
>
>
Re: [RFC][PATCH v0.1 3/6] PM: EM: Add special case to em_dev_register_perf_domain()
Posted by Dietmar Eggemann 1 year, 1 month ago
On 08/11/2024 17:38, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Allow em_dev_register_perf_domain() to register a cost-only stub
> perf domain with one-element states table if the .active_power()
> callback is not provided.
> 
> Subsequently, this will be used by the intel_pstate driver to register
> stub perf domains for CPUs on hybrid systems.

Looks like a 'stub' PD only distinguish itself from a normal PD by not
checking that all CPU in that PD have the same CPU capacity value?

I assume you do this since the Performance Cores (CPUs) can have
different CPU capacity values due to slightly different 'itmt prio' values?

So strictly speaking such a Intel hybrid machine would be tri-gear
system to fit the definition of a PD.

I thought initially by reading the word 'stub' that you only setup a
part of the default EM infrastructure.

[...]