[PATCH v2 3/3] cpufreq: intel_pstate: hybrid: Adjust energy model rules

Rafael J. Wysocki posted 1 patch 2 months ago
drivers/cpufreq/intel_pstate.c |   37 +++++++++++++++----------------------
1 file changed, 15 insertions(+), 22 deletions(-)
[PATCH v2 3/3] cpufreq: intel_pstate: hybrid: Adjust energy model rules
Posted by Rafael J. Wysocki 2 months ago
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Instead of using HWP-to-frequency scaling factors for computing cost
coefficients in the energy model used on hybrid systems, which is
fragile, rely on CPU type information that is easily accessible now and
the information on whether or not L3 cache is present for this purpose.

This also allows the cost coefficients for P-cores to be adjusted so
that they start to be populated somewhat earlier (that is, before
E-cores are loaded up to their full capacity).

In addition to the above, replace an inaccurate comment regarding the
reason why the freq value is added to the cost in hybrid_get_cost().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/intel_pstate.c |   37 +++++++++++++++----------------------
 1 file changed, 15 insertions(+), 22 deletions(-)

--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -933,11 +933,8 @@ static int hybrid_active_power(struct de
 			       unsigned long *freq)
 {
 	/*
-	 * Create "utilization bins" of 0-40%, 40%-60%, 60%-80%, and 80%-100%
-	 * of the maximum capacity such that two CPUs of the same type will be
-	 * regarded as equally attractive if the utilization of each of them
-	 * falls into the same bin, which should prevent tasks from being
-	 * migrated between them too often.
+	 * Create four "states" corresponding to 40%, 60%, 80%, and 100% of the
+	 * full capacity.
 	 *
 	 * For this purpose, return the "frequency" of 2 for the first
 	 * performance level and otherwise leave the value set by the caller.
@@ -970,26 +967,22 @@ static bool hybrid_has_l3(unsigned int c
 static int hybrid_get_cost(struct device *dev, unsigned long freq,
 			   unsigned long *cost)
 {
-	struct pstate_data *pstate = &all_cpu_data[dev->id]->pstate;
-
+	/* Facilitate load balancing between CPUs of the same type. */
+	*cost = freq;
 	/*
-	 * The smaller the perf-to-frequency scaling factor, the larger the IPC
-	 * ratio between the given CPU and the least capable CPU in the system.
-	 * Regard that IPC ratio as the primary cost component and assume that
-	 * the scaling factors for different CPU types will differ by at least
-	 * 5% and they will not be above INTEL_PSTATE_CORE_SCALING.
+	 * Adjust the cost depending on CPU type.
 	 *
-	 * Add the freq value to the cost, so that the cost of running on CPUs
-	 * of the same type in different "utilization bins" is different.
-	 */
-	*cost = div_u64(100ULL * INTEL_PSTATE_CORE_SCALING, pstate->scaling) + freq;
-	/*
-	 * Increase the cost slightly for CPUs able to access L3 to avoid
-	 * touching it in case some other CPUs of the same type can do the work
-	 * without it.
+	 * The idea is to start loading up LPE-cores before E-cores and start
+	 * to populate E-cores when LPE-cores are utilized above 60% of the
+	 * capacity.  Similarly, P-cores start to be populated when E-cores are
+	 * utilized above 60% of the capacity.
 	 */
-	if (hybrid_has_l3(dev->id))
-		*cost += 2;
+	if (hybrid_get_cpu_type(dev->id) == INTEL_CPU_TYPE_ATOM) {
+		if (hybrid_has_l3(dev->id)) /* E-core */
+			*cost += 2;
+	} else { /* P-core */
+		*cost += 4;
+	}
 
 	return 0;
 }
Re: [PATCH v2 3/3] cpufreq: intel_pstate: hybrid: Adjust energy model rules
Posted by Yaxiong Tian 2 months ago
在 2025/10/15 21:48, Rafael J. Wysocki 写道:> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Instead of using HWP-to-frequency scaling factors for computing cost
> coefficients in the energy model used on hybrid systems, which is
> fragile, rely on CPU type information that is easily accessible now and
> the information on whether or not L3 cache is present for this purpose.
> 
> This also allows the cost coefficients for P-cores to be adjusted so
> that they start to be populated somewhat earlier (that is, before
> E-cores are loaded up to their full capacity).
> 
> In addition to the above, replace an inaccurate comment regarding the
> reason why the freq value is added to the cost in hybrid_get_cost().
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>   drivers/cpufreq/intel_pstate.c |   37 +++++++++++++++----------------------
>   1 file changed, 15 insertions(+), 22 deletions(-)
> 
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -933,11 +933,8 @@ static int hybrid_active_power(struct de
>   			       unsigned long *freq)
>   {
>   	/*
> -	 * Create "utilization bins" of 0-40%, 40%-60%, 60%-80%, and 80%-100%
> -	 * of the maximum capacity such that two CPUs of the same type will be
> -	 * regarded as equally attractive if the utilization of each of them
> -	 * falls into the same bin, which should prevent tasks from being
> -	 * migrated between them too often.
> +	 * Create four "states" corresponding to 40%, 60%, 80%, and 100% of the
> +	 * full capacity.
>   	 *
>   	 * For this purpose, return the "frequency" of 2 for the first
>   	 * performance level and otherwise leave the value set by the caller.
> @@ -970,26 +967,22 @@ static bool hybrid_has_l3(unsigned int c
>   static int hybrid_get_cost(struct device *dev, unsigned long freq,
>   			   unsigned long *cost)
>   {
> -	struct pstate_data *pstate = &all_cpu_data[dev->id]->pstate;
> -
> +	/* Facilitate load balancing between CPUs of the same type. */
> +	*cost = freq;
>   	/*
> -	 * The smaller the perf-to-frequency scaling factor, the larger the IPC
> -	 * ratio between the given CPU and the least capable CPU in the system.
> -	 * Regard that IPC ratio as the primary cost component and assume that
> -	 * the scaling factors for different CPU types will differ by at least
> -	 * 5% and they will not be above INTEL_PSTATE_CORE_SCALING.
> +	 * Adjust the cost depending on CPU type.
>   	 *
> -	 * Add the freq value to the cost, so that the cost of running on CPUs
> -	 * of the same type in different "utilization bins" is different.
> -	 */
> -	*cost = div_u64(100ULL * INTEL_PSTATE_CORE_SCALING, pstate->scaling) + freq;
> -	/*
> -	 * Increase the cost slightly for CPUs able to access L3 to avoid
> -	 * touching it in case some other CPUs of the same type can do the work
> -	 * without it.
> +	 * The idea is to start loading up LPE-cores before E-cores and start
> +	 * to populate E-cores when LPE-cores are utilized above 60% of the
> +	 * capacity.  Similarly, P-cores start to be populated when E-cores are
> +	 * utilized above 60% of the capacity.
>   	 */
> -	if (hybrid_has_l3(dev->id))
> -		*cost += 2;
> +	if (hybrid_get_cpu_type(dev->id) == INTEL_CPU_TYPE_ATOM) {
> +		if (hybrid_has_l3(dev->id)) /* E-core */
> +			*cost += 2;
> +	} else { /* P-core */
> +		*cost += 4;
> +	}
>   
>   	return 0;
>   }

Hi Rafael J. Wysocki:

Is the increment of this cost for different types of CPUs by one instead 
of two?

cost by increment of 2:
          0~40%  40%~60%  60%~80% 80%~100
LPE-core    2       3        4      5
E-core      4       5        6      7
P-core      6       7        8      9

So, tasks only start being allocated to more powerful CPUs when 
utilization exceeds 80%, but by that point the system is already in an
 overloaded state.

cost by increment of 1:
          0~40%  40%~60%  60%~80% 80%~100
LPE-core    2       3        4      5
E-core      3       4        5      6
P-core      4       5        6      7

This situation aligns with the description in your patch.

The idea of this patch looks good to me.
Re: [PATCH v2 3/3] cpufreq: intel_pstate: hybrid: Adjust energy model rules
Posted by Christian Loehle 2 months ago
On 10/16/25 08:48, Yaxiong Tian wrote:
> 在 2025/10/15 21:48, Rafael J. Wysocki 写道:> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Instead of using HWP-to-frequency scaling factors for computing cost
>> coefficients in the energy model used on hybrid systems, which is
>> fragile, rely on CPU type information that is easily accessible now and
>> the information on whether or not L3 cache is present for this purpose.
>>
>> This also allows the cost coefficients for P-cores to be adjusted so
>> that they start to be populated somewhat earlier (that is, before
>> E-cores are loaded up to their full capacity).
>>
>> In addition to the above, replace an inaccurate comment regarding the
>> reason why the freq value is added to the cost in hybrid_get_cost().
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>>   drivers/cpufreq/intel_pstate.c |   37 +++++++++++++++----------------------
>>   1 file changed, 15 insertions(+), 22 deletions(-)
>>
>> --- a/drivers/cpufreq/intel_pstate.c
>> +++ b/drivers/cpufreq/intel_pstate.c
>> @@ -933,11 +933,8 @@ static int hybrid_active_power(struct de
>>   			       unsigned long *freq)
>>   {
>>   	/*
>> -	 * Create "utilization bins" of 0-40%, 40%-60%, 60%-80%, and 80%-100%
>> -	 * of the maximum capacity such that two CPUs of the same type will be
>> -	 * regarded as equally attractive if the utilization of each of them
>> -	 * falls into the same bin, which should prevent tasks from being
>> -	 * migrated between them too often.
>> +	 * Create four "states" corresponding to 40%, 60%, 80%, and 100% of the
>> +	 * full capacity.
>>   	 *
>>   	 * For this purpose, return the "frequency" of 2 for the first
>>   	 * performance level and otherwise leave the value set by the caller.
>> @@ -970,26 +967,22 @@ static bool hybrid_has_l3(unsigned int c
>>   static int hybrid_get_cost(struct device *dev, unsigned long freq,
>>   			   unsigned long *cost)
>>   {
>> -	struct pstate_data *pstate = &all_cpu_data[dev->id]->pstate;
>> -
>> +	/* Facilitate load balancing between CPUs of the same type. */
>> +	*cost = freq;
>>   	/*
>> -	 * The smaller the perf-to-frequency scaling factor, the larger the IPC
>> -	 * ratio between the given CPU and the least capable CPU in the system.
>> -	 * Regard that IPC ratio as the primary cost component and assume that
>> -	 * the scaling factors for different CPU types will differ by at least
>> -	 * 5% and they will not be above INTEL_PSTATE_CORE_SCALING.
>> +	 * Adjust the cost depending on CPU type.
>>   	 *
>> -	 * Add the freq value to the cost, so that the cost of running on CPUs
>> -	 * of the same type in different "utilization bins" is different.
>> -	 */
>> -	*cost = div_u64(100ULL * INTEL_PSTATE_CORE_SCALING, pstate->scaling) + freq;
>> -	/*
>> -	 * Increase the cost slightly for CPUs able to access L3 to avoid
>> -	 * touching it in case some other CPUs of the same type can do the work
>> -	 * without it.
>> +	 * The idea is to start loading up LPE-cores before E-cores and start
>> +	 * to populate E-cores when LPE-cores are utilized above 60% of the
>> +	 * capacity.  Similarly, P-cores start to be populated when E-cores are
>> +	 * utilized above 60% of the capacity.
>>   	 */
>> -	if (hybrid_has_l3(dev->id))
>> -		*cost += 2;
>> +	if (hybrid_get_cpu_type(dev->id) == INTEL_CPU_TYPE_ATOM) {
>> +		if (hybrid_has_l3(dev->id)) /* E-core */
>> +			*cost += 2;
>> +	} else { /* P-core */
>> +		*cost += 4;
>> +	}
>>   
>>   	return 0;
>>   }
> 
> Hi Rafael J. Wysocki:
> 
> Is the increment of this cost for different types of CPUs by one instead 
> of two?
> 
> cost by increment of 2:
>           0~40%  40%~60%  60%~80% 80%~100
> LPE-core    2       3        4      5
> E-core      4       5        6      7
> P-core      6       7        8      9
> 
> So, tasks only start being allocated to more powerful CPUs when 
> utilization exceeds 80%, but by that point the system is already in an
>  overloaded state.
> 
> cost by increment of 1:
>           0~40%  40%~60%  60%~80% 80%~100
> LPE-core    2       3        4      5
> E-core      3       4        5      6
> P-core      4       5        6      7
> 
> This situation aligns with the description in your patch.
> 
> The idea of this patch looks good to me.

Agreed if you want the threshold to be 60% for both it should be +1 for l3
and +2 for P-core.
Good catch!

Re: [PATCH v2 3/3] cpufreq: intel_pstate: hybrid: Adjust energy model rules
Posted by Rafael J. Wysocki 2 months ago
On Thu, Oct 16, 2025 at 11:00 AM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 10/16/25 08:48, Yaxiong Tian wrote:
> > 在 2025/10/15 21:48, Rafael J. Wysocki 写道:> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>
> >> Instead of using HWP-to-frequency scaling factors for computing cost
> >> coefficients in the energy model used on hybrid systems, which is
> >> fragile, rely on CPU type information that is easily accessible now and
> >> the information on whether or not L3 cache is present for this purpose.
> >>
> >> This also allows the cost coefficients for P-cores to be adjusted so
> >> that they start to be populated somewhat earlier (that is, before
> >> E-cores are loaded up to their full capacity).
> >>
> >> In addition to the above, replace an inaccurate comment regarding the
> >> reason why the freq value is added to the cost in hybrid_get_cost().
> >>
> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> ---
> >>   drivers/cpufreq/intel_pstate.c |   37 +++++++++++++++----------------------
> >>   1 file changed, 15 insertions(+), 22 deletions(-)
> >>
> >> --- a/drivers/cpufreq/intel_pstate.c
> >> +++ b/drivers/cpufreq/intel_pstate.c
> >> @@ -933,11 +933,8 @@ static int hybrid_active_power(struct de
> >>                             unsigned long *freq)
> >>   {
> >>      /*
> >> -     * Create "utilization bins" of 0-40%, 40%-60%, 60%-80%, and 80%-100%
> >> -     * of the maximum capacity such that two CPUs of the same type will be
> >> -     * regarded as equally attractive if the utilization of each of them
> >> -     * falls into the same bin, which should prevent tasks from being
> >> -     * migrated between them too often.
> >> +     * Create four "states" corresponding to 40%, 60%, 80%, and 100% of the
> >> +     * full capacity.
> >>       *
> >>       * For this purpose, return the "frequency" of 2 for the first
> >>       * performance level and otherwise leave the value set by the caller.
> >> @@ -970,26 +967,22 @@ static bool hybrid_has_l3(unsigned int c
> >>   static int hybrid_get_cost(struct device *dev, unsigned long freq,
> >>                         unsigned long *cost)
> >>   {
> >> -    struct pstate_data *pstate = &all_cpu_data[dev->id]->pstate;
> >> -
> >> +    /* Facilitate load balancing between CPUs of the same type. */
> >> +    *cost = freq;
> >>      /*
> >> -     * The smaller the perf-to-frequency scaling factor, the larger the IPC
> >> -     * ratio between the given CPU and the least capable CPU in the system.
> >> -     * Regard that IPC ratio as the primary cost component and assume that
> >> -     * the scaling factors for different CPU types will differ by at least
> >> -     * 5% and they will not be above INTEL_PSTATE_CORE_SCALING.
> >> +     * Adjust the cost depending on CPU type.
> >>       *
> >> -     * Add the freq value to the cost, so that the cost of running on CPUs
> >> -     * of the same type in different "utilization bins" is different.
> >> -     */
> >> -    *cost = div_u64(100ULL * INTEL_PSTATE_CORE_SCALING, pstate->scaling) + freq;
> >> -    /*
> >> -     * Increase the cost slightly for CPUs able to access L3 to avoid
> >> -     * touching it in case some other CPUs of the same type can do the work
> >> -     * without it.
> >> +     * The idea is to start loading up LPE-cores before E-cores and start
> >> +     * to populate E-cores when LPE-cores are utilized above 60% of the
> >> +     * capacity.  Similarly, P-cores start to be populated when E-cores are
> >> +     * utilized above 60% of the capacity.
> >>       */
> >> -    if (hybrid_has_l3(dev->id))
> >> -            *cost += 2;
> >> +    if (hybrid_get_cpu_type(dev->id) == INTEL_CPU_TYPE_ATOM) {
> >> +            if (hybrid_has_l3(dev->id)) /* E-core */
> >> +                    *cost += 2;
> >> +    } else { /* P-core */
> >> +            *cost += 4;
> >> +    }
> >>
> >>      return 0;
> >>   }
> >
> > Hi Rafael J. Wysocki:
> >
> > Is the increment of this cost for different types of CPUs by one instead
> > of two?
> >
> > cost by increment of 2:
> >           0~40%  40%~60%  60%~80% 80%~100
> > LPE-core    2       3        4      5
> > E-core      4       5        6      7
> > P-core      6       7        8      9
> >
> > So, tasks only start being allocated to more powerful CPUs when
> > utilization exceeds 80%, but by that point the system is already in an
> >  overloaded state.
> >
> > cost by increment of 1:
> >           0~40%  40%~60%  60%~80% 80%~100
> > LPE-core    2       3        4      5
> > E-core      3       4        5      6
> > P-core      4       5        6      7
> >
> > This situation aligns with the description in your patch.
> >
> > The idea of this patch looks good to me.
>
> Agreed if you want the threshold to be 60% for both it should be +1 for l3
> and +2 for P-core.
> Good catch!

OK, I'll send an update of this patch then, but I won't resend the
first two patches in the series.  I don't think that they are
objectionable.
Re: [PATCH v2 3/3] cpufreq: intel_pstate: hybrid: Adjust energy model rules
Posted by Dietmar Eggemann 2 months ago
On 16.10.25 11:52, Rafael J. Wysocki wrote:
> On Thu, Oct 16, 2025 at 11:00 AM Christian Loehle
> <christian.loehle@arm.com> wrote:
>>
>> On 10/16/25 08:48, Yaxiong Tian wrote:
>>> 在 2025/10/15 21:48, Rafael J. Wysocki 写道:> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

[...]

>>> Hi Rafael J. Wysocki:
>>>
>>> Is the increment of this cost for different types of CPUs by one instead
>>> of two?
>>>
>>> cost by increment of 2:
>>>           0~40%  40%~60%  60%~80% 80%~100
>>> LPE-core    2       3        4      5
>>> E-core      4       5        6      7
>>> P-core      6       7        8      9
>>>
>>> So, tasks only start being allocated to more powerful CPUs when
>>> utilization exceeds 80%, but by that point the system is already in an
>>>  overloaded state.
>>>
>>> cost by increment of 1:
>>>           0~40%  40%~60%  60%~80% 80%~100
>>> LPE-core    2       3        4      5
>>> E-core      3       4        5      6
>>> P-core      4       5        6      7
>>>
>>> This situation aligns with the description in your patch.
>>>
>>> The idea of this patch looks good to me.
>>
>> Agreed if you want the threshold to be 60% for both it should be +1 for l3
>> and +2 for P-core.
>> Good catch!
> 
> OK, I'll send an update of this patch then, but I won't resend the
> first two patches in the series.  I don't think that they are
> objectionable.

FWIW: LGTM: tested on i7-13700K (nosmt):

$ cat /sys/devices/system/cpu/cpu{0,8,16}/cpu_capacity
1009
1024
623

root@gulliver:~# cat /sys/kernel/debug/energy_model/cpu0/ps:*/cost
6
7
8
9
root@gulliver:~# cat /sys/kernel/debug/energy_model/cpu8/ps:*/cost
6
7
8
9
root@gulliver:~# cat /sys/kernel/debug/energy_model/cpu16/ps:*/cost
4
5
6
7

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>




Re: [PATCH v2 3/3] cpufreq: intel_pstate: hybrid: Adjust energy model rules
Posted by Christian Loehle 2 months ago
On 10/15/25 14:48, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Instead of using HWP-to-frequency scaling factors for computing cost
> coefficients in the energy model used on hybrid systems, which is
> fragile, rely on CPU type information that is easily accessible now and
> the information on whether or not L3 cache is present for this purpose.
> 
> This also allows the cost coefficients for P-cores to be adjusted so
> that they start to be populated somewhat earlier (that is, before
> E-cores are loaded up to their full capacity).
> 
> In addition to the above, replace an inaccurate comment regarding the
> reason why the freq value is added to the cost in hybrid_get_cost().
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpufreq/intel_pstate.c |   37 +++++++++++++++----------------------
>  1 file changed, 15 insertions(+), 22 deletions(-)
> 
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -933,11 +933,8 @@ static int hybrid_active_power(struct de
>  			       unsigned long *freq)
>  {
>  	/*
> -	 * Create "utilization bins" of 0-40%, 40%-60%, 60%-80%, and 80%-100%
> -	 * of the maximum capacity such that two CPUs of the same type will be
> -	 * regarded as equally attractive if the utilization of each of them
> -	 * falls into the same bin, which should prevent tasks from being
> -	 * migrated between them too often.
> +	 * Create four "states" corresponding to 40%, 60%, 80%, and 100% of the
> +	 * full capacity.
>  	 *
>  	 * For this purpose, return the "frequency" of 2 for the first
>  	 * performance level and otherwise leave the value set by the caller.
> @@ -970,26 +967,22 @@ static bool hybrid_has_l3(unsigned int c
>  static int hybrid_get_cost(struct device *dev, unsigned long freq,
>  			   unsigned long *cost)
>  {
> -	struct pstate_data *pstate = &all_cpu_data[dev->id]->pstate;
> -
> +	/* Facilitate load balancing between CPUs of the same type. */
> +	*cost = freq;
>  	/*
> -	 * The smaller the perf-to-frequency scaling factor, the larger the IPC
> -	 * ratio between the given CPU and the least capable CPU in the system.
> -	 * Regard that IPC ratio as the primary cost component and assume that
> -	 * the scaling factors for different CPU types will differ by at least
> -	 * 5% and they will not be above INTEL_PSTATE_CORE_SCALING.
> +	 * Adjust the cost depending on CPU type.
>  	 *
> -	 * Add the freq value to the cost, so that the cost of running on CPUs
> -	 * of the same type in different "utilization bins" is different.
> -	 */
> -	*cost = div_u64(100ULL * INTEL_PSTATE_CORE_SCALING, pstate->scaling) + freq;
> -	/*
> -	 * Increase the cost slightly for CPUs able to access L3 to avoid
> -	 * touching it in case some other CPUs of the same type can do the work
> -	 * without it.
> +	 * The idea is to start loading up LPE-cores before E-cores and start
> +	 * to populate E-cores when LPE-cores are utilized above 60% of the
> +	 * capacity.  Similarly, P-cores start to be populated when E-cores are
> +	 * utilized above 60% of the capacity.
>  	 */
> -	if (hybrid_has_l3(dev->id))
> -		*cost += 2;
> +	if (hybrid_get_cpu_type(dev->id) == INTEL_CPU_TYPE_ATOM) {
> +		if (hybrid_has_l3(dev->id)) /* E-core */
> +			*cost += 2;
> +	} else { /* P-core */
> +		*cost += 4;
> +	}

Interesting, is there any reason in particular why you're looking to change this?
Is it just performance because of the extra headroom? (I recall that your E-cores
are always more efficient than your P-cores at comparable computing power).

How long does it take to trigger overutilized for you?
I still have the OU based on 'last idle time' observation patches lying around,
although I haven't found the time to do more extensive testing if it doesn't
regress some platform / workload combination. I will dust them off soon, although
I'm not sure if they would help your case.
Happy to try though if you had a particular workload in mind!
Re: [PATCH v2 3/3] cpufreq: intel_pstate: hybrid: Adjust energy model rules
Posted by Rafael J. Wysocki 2 months ago
On Wed, Oct 15, 2025 at 5:13 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 10/15/25 14:48, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Instead of using HWP-to-frequency scaling factors for computing cost
> > coefficients in the energy model used on hybrid systems, which is
> > fragile, rely on CPU type information that is easily accessible now and
> > the information on whether or not L3 cache is present for this purpose.
> >
> > This also allows the cost coefficients for P-cores to be adjusted so
> > that they start to be populated somewhat earlier (that is, before
> > E-cores are loaded up to their full capacity).
> >
> > In addition to the above, replace an inaccurate comment regarding the
> > reason why the freq value is added to the cost in hybrid_get_cost().
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  drivers/cpufreq/intel_pstate.c |   37 +++++++++++++++----------------------
> >  1 file changed, 15 insertions(+), 22 deletions(-)
> >
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -933,11 +933,8 @@ static int hybrid_active_power(struct de
> >                              unsigned long *freq)
> >  {
> >       /*
> > -      * Create "utilization bins" of 0-40%, 40%-60%, 60%-80%, and 80%-100%
> > -      * of the maximum capacity such that two CPUs of the same type will be
> > -      * regarded as equally attractive if the utilization of each of them
> > -      * falls into the same bin, which should prevent tasks from being
> > -      * migrated between them too often.
> > +      * Create four "states" corresponding to 40%, 60%, 80%, and 100% of the
> > +      * full capacity.
> >        *
> >        * For this purpose, return the "frequency" of 2 for the first
> >        * performance level and otherwise leave the value set by the caller.
> > @@ -970,26 +967,22 @@ static bool hybrid_has_l3(unsigned int c
> >  static int hybrid_get_cost(struct device *dev, unsigned long freq,
> >                          unsigned long *cost)
> >  {
> > -     struct pstate_data *pstate = &all_cpu_data[dev->id]->pstate;
> > -
> > +     /* Facilitate load balancing between CPUs of the same type. */
> > +     *cost = freq;
> >       /*
> > -      * The smaller the perf-to-frequency scaling factor, the larger the IPC
> > -      * ratio between the given CPU and the least capable CPU in the system.
> > -      * Regard that IPC ratio as the primary cost component and assume that
> > -      * the scaling factors for different CPU types will differ by at least
> > -      * 5% and they will not be above INTEL_PSTATE_CORE_SCALING.
> > +      * Adjust the cost depending on CPU type.
> >        *
> > -      * Add the freq value to the cost, so that the cost of running on CPUs
> > -      * of the same type in different "utilization bins" is different.
> > -      */
> > -     *cost = div_u64(100ULL * INTEL_PSTATE_CORE_SCALING, pstate->scaling) + freq;
> > -     /*
> > -      * Increase the cost slightly for CPUs able to access L3 to avoid
> > -      * touching it in case some other CPUs of the same type can do the work
> > -      * without it.
> > +      * The idea is to start loading up LPE-cores before E-cores and start
> > +      * to populate E-cores when LPE-cores are utilized above 60% of the
> > +      * capacity.  Similarly, P-cores start to be populated when E-cores are
> > +      * utilized above 60% of the capacity.
> >        */
> > -     if (hybrid_has_l3(dev->id))
> > -             *cost += 2;
> > +     if (hybrid_get_cpu_type(dev->id) == INTEL_CPU_TYPE_ATOM) {
> > +             if (hybrid_has_l3(dev->id)) /* E-core */
> > +                     *cost += 2;
> > +     } else { /* P-core */
> > +             *cost += 4;
> > +     }
>
> Interesting, is there any reason in particular why you're looking to change this?
> Is it just performance because of the extra headroom? (I recall that your E-cores
> are always more efficient than your P-cores at comparable computing power).

Yes, it is performance mostly.

> How long does it take to trigger overutilized for you?

It depends, but sometimes it triggers really quickly due to LPE-core
load spikes.

> I still have the OU based on 'last idle time' observation patches lying around,
> although I haven't found the time to do more extensive testing if it doesn't
> regress some platform / workload combination. I will dust them off soon, although
> I'm not sure if they would help your case.
> Happy to try though if you had a particular workload in mind!

OK, good to know, thanks!
[PATCH v3 3/3] cpufreq: intel_pstate: hybrid: Adjust energy model rules
Posted by Rafael J. Wysocki 2 months ago
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Instead of using HWP-to-frequency scaling factors for computing cost
coefficients in the energy model used on hybrid systems, which is
fragile, rely on CPU type information that is easily accessible now and
the information on whether or not L3 cache is present for this purpose.

This also allows the cost coefficients for P-cores to be adjusted so
that they start to be populated somewhat earlier (that is, before
E-cores are loaded up to their full capacity).

In addition to the above, replace an inaccurate comment regarding the
reason why the freq value is added to the cost in hybrid_get_cost().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---

v2 -> v3:
   * Reduce cost differences between CPU types by 1 (Tian, Christian)
   * Add a tag from Dietmar

Note: The other patches in the series have not changed.

v1 -> v2: New patch

---
 drivers/cpufreq/intel_pstate.c |   35 ++++++++++++++---------------------
 1 file changed, 14 insertions(+), 21 deletions(-)

--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -933,11 +933,8 @@ static int hybrid_active_power(struct de
 			       unsigned long *freq)
 {
 	/*
-	 * Create "utilization bins" of 0-40%, 40%-60%, 60%-80%, and 80%-100%
-	 * of the maximum capacity such that two CPUs of the same type will be
-	 * regarded as equally attractive if the utilization of each of them
-	 * falls into the same bin, which should prevent tasks from being
-	 * migrated between them too often.
+	 * Create four "states" corresponding to 40%, 60%, 80%, and 100% of the
+	 * full capacity.
 	 *
 	 * For this purpose, return the "frequency" of 2 for the first
 	 * performance level and otherwise leave the value set by the caller.
@@ -970,26 +967,22 @@ static bool hybrid_has_l3(unsigned int c
 static int hybrid_get_cost(struct device *dev, unsigned long freq,
 			   unsigned long *cost)
 {
-	struct pstate_data *pstate = &all_cpu_data[dev->id]->pstate;
-
+	/* Facilitate load balancing between CPUs of the same type. */
+	*cost = freq;
 	/*
-	 * The smaller the perf-to-frequency scaling factor, the larger the IPC
-	 * ratio between the given CPU and the least capable CPU in the system.
-	 * Regard that IPC ratio as the primary cost component and assume that
-	 * the scaling factors for different CPU types will differ by at least
-	 * 5% and they will not be above INTEL_PSTATE_CORE_SCALING.
+	 * Adjust the cost depending on CPU type.
 	 *
-	 * Add the freq value to the cost, so that the cost of running on CPUs
-	 * of the same type in different "utilization bins" is different.
-	 */
-	*cost = div_u64(100ULL * INTEL_PSTATE_CORE_SCALING, pstate->scaling) + freq;
-	/*
-	 * Increase the cost slightly for CPUs able to access L3 to avoid
-	 * touching it in case some other CPUs of the same type can do the work
-	 * without it.
+	 * The idea is to start loading up LPE-cores before E-cores and start
+	 * to populate E-cores when LPE-cores are utilized above 60% of the
+	 * capacity.  Similarly, P-cores start to be populated when E-cores are
+	 * utilized above 60% of the capacity.
 	 */
-	if (hybrid_has_l3(dev->id))
+	if (hybrid_get_cpu_type(dev->id) == INTEL_CPU_TYPE_ATOM) {
+		if (hybrid_has_l3(dev->id)) /* E-core */
+			*cost += 1;
+	} else { /* P-core */
 		*cost += 2;
+	}
 
 	return 0;
 }
Re: [PATCH v3 3/3] cpufreq: intel_pstate: hybrid: Adjust energy model rules
Posted by Yaxiong Tian 2 months ago
在 2025/10/17 00:22, Rafael J. Wysocki 写道:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Instead of using HWP-to-frequency scaling factors for computing cost
> coefficients in the energy model used on hybrid systems, which is
> fragile, rely on CPU type information that is easily accessible now and
> the information on whether or not L3 cache is present for this purpose.
>
> This also allows the cost coefficients for P-cores to be adjusted so
> that they start to be populated somewhat earlier (that is, before
> E-cores are loaded up to their full capacity).
>
> In addition to the above, replace an inaccurate comment regarding the
> reason why the freq value is added to the cost in hybrid_get_cost().
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>
> v2 -> v3:
>     * Reduce cost differences between CPU types by 1 (Tian, Christian)
>     * Add a tag from Dietmar
>
> Note: The other patches in the series have not changed.
>
> v1 -> v2: New patch
>
> ---
>   drivers/cpufreq/intel_pstate.c |   35 ++++++++++++++---------------------
>   1 file changed, 14 insertions(+), 21 deletions(-)
>
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -933,11 +933,8 @@ static int hybrid_active_power(struct de
>   			       unsigned long *freq)
>   {
>   	/*
> -	 * Create "utilization bins" of 0-40%, 40%-60%, 60%-80%, and 80%-100%
> -	 * of the maximum capacity such that two CPUs of the same type will be
> -	 * regarded as equally attractive if the utilization of each of them
> -	 * falls into the same bin, which should prevent tasks from being
> -	 * migrated between them too often.
> +	 * Create four "states" corresponding to 40%, 60%, 80%, and 100% of the
> +	 * full capacity.
>   	 *
>   	 * For this purpose, return the "frequency" of 2 for the first
>   	 * performance level and otherwise leave the value set by the caller.
> @@ -970,26 +967,22 @@ static bool hybrid_has_l3(unsigned int c
>   static int hybrid_get_cost(struct device *dev, unsigned long freq,
>   			   unsigned long *cost)
>   {
> -	struct pstate_data *pstate = &all_cpu_data[dev->id]->pstate;
> -
> +	/* Facilitate load balancing between CPUs of the same type. */
> +	*cost = freq;
>   	/*
> -	 * The smaller the perf-to-frequency scaling factor, the larger the IPC
> -	 * ratio between the given CPU and the least capable CPU in the system.
> -	 * Regard that IPC ratio as the primary cost component and assume that
> -	 * the scaling factors for different CPU types will differ by at least
> -	 * 5% and they will not be above INTEL_PSTATE_CORE_SCALING.
> +	 * Adjust the cost depending on CPU type.
>   	 *
> -	 * Add the freq value to the cost, so that the cost of running on CPUs
> -	 * of the same type in different "utilization bins" is different.
> -	 */
> -	*cost = div_u64(100ULL * INTEL_PSTATE_CORE_SCALING, pstate->scaling) + freq;
> -	/*
> -	 * Increase the cost slightly for CPUs able to access L3 to avoid
> -	 * touching it in case some other CPUs of the same type can do the work
> -	 * without it.
> +	 * The idea is to start loading up LPE-cores before E-cores and start
> +	 * to populate E-cores when LPE-cores are utilized above 60% of the
> +	 * capacity.  Similarly, P-cores start to be populated when E-cores are
> +	 * utilized above 60% of the capacity.
>   	 */
> -	if (hybrid_has_l3(dev->id))
> +	if (hybrid_get_cpu_type(dev->id) == INTEL_CPU_TYPE_ATOM) {
> +		if (hybrid_has_l3(dev->id)) /* E-core */
> +			*cost += 1;
> +	} else { /* P-core */
>   		*cost += 2;
> +	}
>   
>   	return 0;
>   }

LGTM
Reviewed-by: Yaxiong Tian <tianyaxiong@kylinos.cn>