[PATCH v3 2/2] cpuidle: governors: teo: Refine intercepts-based idle state lookup

Rafael J. Wysocki posted 1 patch 1 week, 1 day ago
drivers/cpuidle/governors/teo.c |   50 ++++++++++++++++++++++++++++++++++------
1 file changed, 43 insertions(+), 7 deletions(-)
[PATCH v3 2/2] cpuidle: governors: teo: Refine intercepts-based idle state lookup
Posted by Rafael J. Wysocki 1 week, 1 day ago
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

There are cases in which decisions made by the teo governor are
arguably overly conservative.

For instance, suppose that there are 4 idle states and the values of
the intercepts metric for the first 3 of them are 400, 250, and 251,
respectively.  If the total sum computed in teo_update() is 1000, the
governor will select idle state 1 (provided that all idle states are
enabled and the scheduler tick has not been stopped) although arguably
idle state 0 would be a better choice because the likelihood of getting
an idle duration below the target residency of idle state 1 is greater
than the likelihood of getting an idle duration between the target
residency of idle state 1 and the target residency of idle state 2.

To address this, refine the candidate idle state lookup based on
intercepts to start at the state with the maximum intercepts metric,
below the deepest enabled one, to avoid the cases in which the search
may stop before reaching that state.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v2 -> v3: No changes

v1 -> v2:
   * Multiple fixes related to the handling of cases in which some states
     are disabled.
   * Fixes in new comments (there was some confusion in those comments
     regarding the direction of idle states table traversal).
   * Fixed typos in new comments.

---
 drivers/cpuidle/governors/teo.c |   50 ++++++++++++++++++++++++++++++++++------
 1 file changed, 43 insertions(+), 7 deletions(-)

--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -74,12 +74,17 @@
  *      than the candidate one (it represents the cases in which the CPU was
  *      likely woken up by a non-timer wakeup source).
  *
+ *    Also find the idle state with the maximum intercepts metric (if there are
+ *    multiple states with the maximum intercetps metric, choose the one with
+ *    the highest index).
+ *
  * 2. If the second sum computed in step 1 is greater than a half of the sum of
  *    both metrics for the candidate state bin and all subsequent bins (if any),
  *    a shallower idle state is likely to be more suitable, so look for it.
  *
  *    - Traverse the enabled idle states shallower than the candidate one in the
- *      descending order.
+ *      descending order, starting at the state with the maximum intercepts
+ *      metric found in step 1.
  *
  *    - For each of them compute the sum of the "intercepts" metrics over all
  *      of the idle states between it and the candidate one (including the
@@ -308,8 +313,10 @@ static int teo_select(struct cpuidle_dri
 	ktime_t delta_tick = TICK_NSEC / 2;
 	unsigned int idx_intercept_sum = 0;
 	unsigned int intercept_sum = 0;
+	unsigned int intercept_max = 0;
 	unsigned int idx_hit_sum = 0;
 	unsigned int hit_sum = 0;
+	int intercept_max_idx = -1;
 	int constraint_idx = 0;
 	int idx0 = 0, idx = -1;
 	s64 duration_ns;
@@ -340,17 +347,32 @@ static int teo_select(struct cpuidle_dri
 	if (!dev->states_usage[0].disable)
 		idx = 0;
 
-	/* Compute the sums of metrics for early wakeup pattern detection. */
+	/*
+	 * Compute the sums of metrics for early wakeup pattern detection and
+	 * look for the state bin with the maximum intercepts metric below the
+	 * deepest enabled one (if there are multiple states with the maximum
+	 * intercepts metric, choose the one with the highest index).
+	 */
 	for (i = 1; i < drv->state_count; i++) {
 		struct teo_bin *prev_bin = &cpu_data->state_bins[i-1];
+		unsigned int prev_intercepts = prev_bin->intercepts;
 		struct cpuidle_state *s = &drv->states[i];
 
 		/*
 		 * Update the sums of idle state metrics for all of the states
 		 * shallower than the current one.
 		 */
-		intercept_sum += prev_bin->intercepts;
 		hit_sum += prev_bin->hits;
+		intercept_sum += prev_intercepts;
+		/*
+		 * Check if this is the bin with the maximum number of
+		 * intercepts so far and in that case update the index of
+		 * the state with the maximum intercetps metric.
+		 */
+		if (prev_intercepts >= intercept_max) {
+			intercept_max = prev_intercepts;
+			intercept_max_idx = i - 1;
+		}
 
 		if (dev->states_usage[i].disable)
 			continue;
@@ -414,9 +436,22 @@ static int teo_select(struct cpuidle_dri
 		}
 
 		/*
-		 * Look for the deepest idle state whose target residency had
-		 * not exceeded the idle duration in over a half of the relevant
-		 * cases in the past.
+		 * If the minimum state index is greater than or equal to the
+		 * index of the state with the maximum intercepts metric and
+		 * the corresponding state is enabled, there is no need to look
+		 * at the deeper states.
+		 */
+		if (min_idx >= intercept_max_idx &&
+		    !dev->states_usage[min_idx].disable) {
+			idx = min_idx;
+			goto constraint;
+		}
+
+		/*
+		 * Look for the deepest enabled idle state, at most as deep as
+		 * the one with the maximum intercetps metric, whose target
+		 * residency had not been greater than the idle duration in over
+		 * a half of the relevant cases in the past.
 		 *
 		 * Take the possible duration limitation present if the tick
 		 * has been stopped already into account.
@@ -428,7 +463,8 @@ static int teo_select(struct cpuidle_dri
 				continue;
 
 			idx = i;
-			if (2 * intercept_sum > idx_intercept_sum)
+			if (2 * intercept_sum > idx_intercept_sum &&
+			    i <= intercept_max_idx)
 				break;
 		}
 	}
Re: [PATCH v3 2/2] cpuidle: governors: teo: Refine intercepts-based idle state lookup
Posted by Christian Loehle 1 week ago
On 1/29/26 20:51, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> There are cases in which decisions made by the teo governor are
> arguably overly conservative.
> 
> For instance, suppose that there are 4 idle states and the values of
> the intercepts metric for the first 3 of them are 400, 250, and 251,
> respectively.  If the total sum computed in teo_update() is 1000, the
> governor will select idle state 1 (provided that all idle states are
> enabled and the scheduler tick has not been stopped) although arguably
> idle state 0 would be a better choice because the likelihood of getting
> an idle duration below the target residency of idle state 1 is greater
> than the likelihood of getting an idle duration between the target
> residency of idle state 1 and the target residency of idle state 2.

FWIW the algorithm would be self-correcting long-term here because
selecting state1 would then no longer accumulate intercepts for >state1
and soon enough it will select state0.
Anyway I think this is an optimisation nonetheless.

> 
> To address this, refine the candidate idle state lookup based on
> intercepts to start at the state with the maximum intercepts metric,
> below the deepest enabled one, to avoid the cases in which the search
> may stop before reaching that state.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> v2 -> v3: No changes
> 
> v1 -> v2:
>    * Multiple fixes related to the handling of cases in which some states
>      are disabled.
>    * Fixes in new comments (there was some confusion in those comments
>      regarding the direction of idle states table traversal).
>    * Fixed typos in new comments.
> 
> ---
>  drivers/cpuidle/governors/teo.c |   50 ++++++++++++++++++++++++++++++++++------
>  1 file changed, 43 insertions(+), 7 deletions(-)
> 
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -74,12 +74,17 @@
>   *      than the candidate one (it represents the cases in which the CPU was
>   *      likely woken up by a non-timer wakeup source).
>   *
> + *    Also find the idle state with the maximum intercepts metric (if there are
> + *    multiple states with the maximum intercetps metric, choose the one with

s/intercetps/intercepts

> + *    the highest index).
> + *
>   * 2. If the second sum computed in step 1 is greater than a half of the sum of
>   *    both metrics for the candidate state bin and all subsequent bins (if any),
>   *    a shallower idle state is likely to be more suitable, so look for it.
>   *
>   *    - Traverse the enabled idle states shallower than the candidate one in the
> - *      descending order.
> + *      descending order, starting at the state with the maximum intercepts
> + *      metric found in step 1.
>   *
>   *    - For each of them compute the sum of the "intercepts" metrics over all
>   *      of the idle states between it and the candidate one (including the
> @@ -308,8 +313,10 @@ static int teo_select(struct cpuidle_dri
>  	ktime_t delta_tick = TICK_NSEC / 2;
>  	unsigned int idx_intercept_sum = 0;
>  	unsigned int intercept_sum = 0;
> +	unsigned int intercept_max = 0;
>  	unsigned int idx_hit_sum = 0;
>  	unsigned int hit_sum = 0;
> +	int intercept_max_idx = -1;
>  	int constraint_idx = 0;
>  	int idx0 = 0, idx = -1;
>  	s64 duration_ns;
> @@ -340,17 +347,32 @@ static int teo_select(struct cpuidle_dri
>  	if (!dev->states_usage[0].disable)
>  		idx = 0;
>  
> -	/* Compute the sums of metrics for early wakeup pattern detection. */
> +	/*
> +	 * Compute the sums of metrics for early wakeup pattern detection and
> +	 * look for the state bin with the maximum intercepts metric below the
> +	 * deepest enabled one (if there are multiple states with the maximum
> +	 * intercepts metric, choose the one with the highest index).
> +	 */
>  	for (i = 1; i < drv->state_count; i++) {
>  		struct teo_bin *prev_bin = &cpu_data->state_bins[i-1];
> +		unsigned int prev_intercepts = prev_bin->intercepts;
>  		struct cpuidle_state *s = &drv->states[i];
>  
>  		/*
>  		 * Update the sums of idle state metrics for all of the states
>  		 * shallower than the current one.
>  		 */
> -		intercept_sum += prev_bin->intercepts;
>  		hit_sum += prev_bin->hits;
> +		intercept_sum += prev_intercepts;
> +		/*
> +		 * Check if this is the bin with the maximum number of
> +		 * intercepts so far and in that case update the index of
> +		 * the state with the maximum intercetps metric.

s/intercetps/intercepts

> +		 */
> +		if (prev_intercepts >= intercept_max) {
> +			intercept_max = prev_intercepts;
> +			intercept_max_idx = i - 1;
> +		}
>  
>  		if (dev->states_usage[i].disable)
>  			continue;
> @@ -414,9 +436,22 @@ static int teo_select(struct cpuidle_dri
>  		}
>  
>  		/*
> -		 * Look for the deepest idle state whose target residency had
> -		 * not exceeded the idle duration in over a half of the relevant
> -		 * cases in the past.
> +		 * If the minimum state index is greater than or equal to the
> +		 * index of the state with the maximum intercepts metric and
> +		 * the corresponding state is enabled, there is no need to look
> +		 * at the deeper states.
> +		 */
> +		if (min_idx >= intercept_max_idx &&
> +		    !dev->states_usage[min_idx].disable) {
> +			idx = min_idx;
> +			goto constraint;
> +		}
> +
> +		/*
> +		 * Look for the deepest enabled idle state, at most as deep as
> +		 * the one with the maximum intercetps metric, whose target

s/intercetps/intercepts

> +		 * residency had not been greater than the idle duration in over
> +		 * a half of the relevant cases in the past.
>  		 *
>  		 * Take the possible duration limitation present if the tick
>  		 * has been stopped already into account.
> @@ -428,7 +463,8 @@ static int teo_select(struct cpuidle_dri
>  				continue;
>  
>  			idx = i;
> -			if (2 * intercept_sum > idx_intercept_sum)
> +			if (2 * intercept_sum > idx_intercept_sum &&
> +			    i <= intercept_max_idx)


Ah right, we don't wanna miss the intercept 'peak' here (and we're iterating idle
states in descending order).
All good then!
Reviewed-by: Christian Loehle <christian.loehle@arm.com>

>  				break;
>  		}
>  	}
> 
> 
>
Re: [PATCH v3 2/2] cpuidle: governors: teo: Refine intercepts-based idle state lookup
Posted by Rafael J. Wysocki 1 week ago
On Fri, Jan 30, 2026 at 12:04 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 1/29/26 20:51, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > There are cases in which decisions made by the teo governor are
> > arguably overly conservative.
> >
> > For instance, suppose that there are 4 idle states and the values of
> > the intercepts metric for the first 3 of them are 400, 250, and 251,
> > respectively.  If the total sum computed in teo_update() is 1000, the
> > governor will select idle state 1 (provided that all idle states are
> > enabled and the scheduler tick has not been stopped) although arguably
> > idle state 0 would be a better choice because the likelihood of getting
> > an idle duration below the target residency of idle state 1 is greater
> > than the likelihood of getting an idle duration between the target
> > residency of idle state 1 and the target residency of idle state 2.
>
> FWIW the algorithm would be self-correcting long-term here because
> selecting state1 would then no longer accumulate intercepts for >state1
> and soon enough it will select state0.
> Anyway I think this is an optimisation nonetheless.
>
> >
> > To address this, refine the candidate idle state lookup based on
> > intercepts to start at the state with the maximum intercepts metric,
> > below the deepest enabled one, to avoid the cases in which the search
> > may stop before reaching that state.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > v2 -> v3: No changes
> >
> > v1 -> v2:
> >    * Multiple fixes related to the handling of cases in which some states
> >      are disabled.
> >    * Fixes in new comments (there was some confusion in those comments
> >      regarding the direction of idle states table traversal).
> >    * Fixed typos in new comments.
> >
> > ---
> >  drivers/cpuidle/governors/teo.c |   50 ++++++++++++++++++++++++++++++++++------
> >  1 file changed, 43 insertions(+), 7 deletions(-)
> >
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -74,12 +74,17 @@
> >   *      than the candidate one (it represents the cases in which the CPU was
> >   *      likely woken up by a non-timer wakeup source).
> >   *
> > + *    Also find the idle state with the maximum intercepts metric (if there are
> > + *    multiple states with the maximum intercetps metric, choose the one with
>
> s/intercetps/intercepts
>
> > + *    the highest index).
> > + *
> >   * 2. If the second sum computed in step 1 is greater than a half of the sum of
> >   *    both metrics for the candidate state bin and all subsequent bins (if any),
> >   *    a shallower idle state is likely to be more suitable, so look for it.
> >   *
> >   *    - Traverse the enabled idle states shallower than the candidate one in the
> > - *      descending order.
> > + *      descending order, starting at the state with the maximum intercepts
> > + *      metric found in step 1.
> >   *
> >   *    - For each of them compute the sum of the "intercepts" metrics over all
> >   *      of the idle states between it and the candidate one (including the
> > @@ -308,8 +313,10 @@ static int teo_select(struct cpuidle_dri
> >       ktime_t delta_tick = TICK_NSEC / 2;
> >       unsigned int idx_intercept_sum = 0;
> >       unsigned int intercept_sum = 0;
> > +     unsigned int intercept_max = 0;
> >       unsigned int idx_hit_sum = 0;
> >       unsigned int hit_sum = 0;
> > +     int intercept_max_idx = -1;
> >       int constraint_idx = 0;
> >       int idx0 = 0, idx = -1;
> >       s64 duration_ns;
> > @@ -340,17 +347,32 @@ static int teo_select(struct cpuidle_dri
> >       if (!dev->states_usage[0].disable)
> >               idx = 0;
> >
> > -     /* Compute the sums of metrics for early wakeup pattern detection. */
> > +     /*
> > +      * Compute the sums of metrics for early wakeup pattern detection and
> > +      * look for the state bin with the maximum intercepts metric below the
> > +      * deepest enabled one (if there are multiple states with the maximum
> > +      * intercepts metric, choose the one with the highest index).
> > +      */
> >       for (i = 1; i < drv->state_count; i++) {
> >               struct teo_bin *prev_bin = &cpu_data->state_bins[i-1];
> > +             unsigned int prev_intercepts = prev_bin->intercepts;
> >               struct cpuidle_state *s = &drv->states[i];
> >
> >               /*
> >                * Update the sums of idle state metrics for all of the states
> >                * shallower than the current one.
> >                */
> > -             intercept_sum += prev_bin->intercepts;
> >               hit_sum += prev_bin->hits;
> > +             intercept_sum += prev_intercepts;
> > +             /*
> > +              * Check if this is the bin with the maximum number of
> > +              * intercepts so far and in that case update the index of
> > +              * the state with the maximum intercetps metric.
>
> s/intercetps/intercepts
>
> > +              */
> > +             if (prev_intercepts >= intercept_max) {
> > +                     intercept_max = prev_intercepts;
> > +                     intercept_max_idx = i - 1;
> > +             }
> >
> >               if (dev->states_usage[i].disable)
> >                       continue;
> > @@ -414,9 +436,22 @@ static int teo_select(struct cpuidle_dri
> >               }
> >
> >               /*
> > -              * Look for the deepest idle state whose target residency had
> > -              * not exceeded the idle duration in over a half of the relevant
> > -              * cases in the past.
> > +              * If the minimum state index is greater than or equal to the
> > +              * index of the state with the maximum intercepts metric and
> > +              * the corresponding state is enabled, there is no need to look
> > +              * at the deeper states.
> > +              */
> > +             if (min_idx >= intercept_max_idx &&
> > +                 !dev->states_usage[min_idx].disable) {
> > +                     idx = min_idx;
> > +                     goto constraint;
> > +             }
> > +
> > +             /*
> > +              * Look for the deepest enabled idle state, at most as deep as
> > +              * the one with the maximum intercetps metric, whose target
>
> s/intercetps/intercepts

Yeah, I've noticed this and the above, but thanks for pointing them out!

I'll fix them when applying the patch.

> > +              * residency had not been greater than the idle duration in over
> > +              * a half of the relevant cases in the past.
> >                *
> >                * Take the possible duration limitation present if the tick
> >                * has been stopped already into account.
> > @@ -428,7 +463,8 @@ static int teo_select(struct cpuidle_dri
> >                               continue;
> >
> >                       idx = i;
> > -                     if (2 * intercept_sum > idx_intercept_sum)
> > +                     if (2 * intercept_sum > idx_intercept_sum &&
> > +                         i <= intercept_max_idx)
>
>
> Ah right, we don't wanna miss the intercept 'peak' here (and we're iterating idle
> states in descending order).
> All good then!
> Reviewed-by: Christian Loehle <christian.loehle@arm.com>

Thanks!

> >                               break;
> >               }
> >       }