drivers/cpuidle/governors/teo.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It has been observed that the recent teo governor update which concluded
with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation
for low latency constraints") caused the max-jOPS score of the SPECjbb
2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%.
While it may be argued that this is not a significant increase, the
previous score can be restored by tweaking the inequality used by teo
to decide whether or not to preselect the deepest enabled idle state.
That change also causes the critical-jOPS score of SPECjbb to increase
by around 2%.
Namely, the likelihood of selecting the deepest enabled idle state in
teo on the platform in question has increased after commit 13ed5c4a6d9c
("cpuidle: teo: Skip getting the sleep length if wakeups are very
frequent") because some timer wakeups were previously counted as non-
timer ones and they were effectively added to the left-hand side of the
inequality deciding whether or not to preselect the deepest idle state.
Many of them are now (accurately) counted as timer wakeups, so the left-
hand side of that inequality is now effectively smaller in some cases,
especially when timer wakeups often occur in the range below the target
residency of the deepest enabled idle state and idle states with target
residencies below CPUIDLE_FLAG_POLLING are often selected, but the
majority of recent idle intervals are still above that value most of
the time. As a result, the deepest enabled idle state may be selected
more often than it used to be selected in some cases.
To counter that effect, add the sum of the hits metric for all of the
idle states below the candidate one (which is the deepest enabled idle
state at that point) to the left-hand side of the inequality mentioned
above. This will cause it to be more balanced because, in principle,
putting both timer and non-timer wakeups on both sides of it is more
consistent than only taking into account the timer wakeups in the range
above the target residency of the deepest enabled idle state.
Link: https://www.spec.org/jbb2015/
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/cpuidle/governors/teo.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -349,13 +349,13 @@
}
/*
- * If the sum of the intercepts metric for all of the idle states
- * shallower than the current candidate one (idx) is greater than the
+ * If the sum of the intercepts and hits metric for all of the idle
+ * states below the current candidate one (idx) is greater than the
* sum of the intercepts and hits metrics for the candidate state and
* all of the deeper states, a shallower idle state is likely to be a
* better choice.
*/
- if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
+ if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) {
int first_suitable_idx = idx;
/*
On 2/4/25 20:58, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> It has been observed that the recent teo governor update which concluded
> with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation
> for low latency constraints") caused the max-jOPS score of the SPECjbb
> 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%.
> While it may be argued that this is not a significant increase, the
> previous score can be restored by tweaking the inequality used by teo
> to decide whether or not to preselect the deepest enabled idle state.
> That change also causes the critical-jOPS score of SPECjbb to increase
> by around 2%.
>
> Namely, the likelihood of selecting the deepest enabled idle state in
> teo on the platform in question has increased after commit 13ed5c4a6d9c
> ("cpuidle: teo: Skip getting the sleep length if wakeups are very
> frequent") because some timer wakeups were previously counted as non-
> timer ones and they were effectively added to the left-hand side of the
> inequality deciding whether or not to preselect the deepest idle state.
>
> Many of them are now (accurately) counted as timer wakeups, so the left-
> hand side of that inequality is now effectively smaller in some cases,
> especially when timer wakeups often occur in the range below the target
> residency of the deepest enabled idle state and idle states with target
> residencies below CPUIDLE_FLAG_POLLING are often selected, but the
> majority of recent idle intervals are still above that value most of
> the time. As a result, the deepest enabled idle state may be selected
> more often than it used to be selected in some cases.
>
> To counter that effect, add the sum of the hits metric for all of the
> idle states below the candidate one (which is the deepest enabled idle
> state at that point) to the left-hand side of the inequality mentioned
> above. This will cause it to be more balanced because, in principle,
> putting both timer and non-timer wakeups on both sides of it is more
> consistent than only taking into account the timer wakeups in the range
> above the target residency of the deepest enabled idle state.
>
> Link: https://www.spec.org/jbb2015/
> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/cpuidle/governors/teo.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -349,13 +349,13 @@
> }
>
> /*
> - * If the sum of the intercepts metric for all of the idle states
> - * shallower than the current candidate one (idx) is greater than the
> + * If the sum of the intercepts and hits metric for all of the idle
> + * states below the current candidate one (idx) is greater than the
> * sum of the intercepts and hits metrics for the candidate state and
> * all of the deeper states, a shallower idle state is likely to be a
> * better choice.
> */
> - if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
> + if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) {
> int first_suitable_idx = idx;
>
> /*
>
>
>
I'm curious, are Doug's numbers reproducible?
Or could you share the idle state usage numbers? Is that explainable?
Seems like a lot and it does worry me that I can't reproduce anything
as drastic.
I did a bit of x86 as well and got for Raptor Lake (I won't post the
non-x86 numbers now, but teo-tweak performs comparable to teo mainline.)
Idle 5 min:
device gov iter Joules idles idle_misses idle_miss_ratio belows aboves
teo 0 170.02 12690 646 0.051 371 275
teo 1 123.17 8361 517 0.062 281 236
teo 2 122.76 7741 347 0.045 262 85
teo 3 118.5 8699 668 0.077 307 361
teo 4 80.46 8113 443 0.055 264 179
teo-tweak 0 115.05 10223 803 0.079 323 480
teo-tweak 1 164.41 8523 631 0.074 263 368
teo-tweak 2 163.91 8409 711 0.085 256 455
teo-tweak 3 137.22 8581 721 0.084 261 460
teo-tweak 4 174.95 8703 675 0.078 261 414
teo 0 164.34 8443 516 0.061 303 213
teo 1 167.85 8767 492 0.056 256 236
teo 2 166.25 7835 406 0.052 263 143
teo 3 189.77 8865 493 0.056 276 217
teo 4 136.97 9185 467 0.051 286 181
At least in the idle case you can see an increase in 'above' idle_misses.
Firefox Youtube 4K video playback 2 min:
device gov iter Joules idles idle_misses idle_miss_ratio belows aboves
teo 0 260.09 67404 11189 0.166 1899 9290
teo 1 273.71 76649 12155 0.159 2233 9922
teo 2 231.45 59559 10344 0.174 1747 8597
teo 3 202.61 58223 10641 0.183 1748 8893
teo 4 217.56 61411 10744 0.175 1809 8935
teo-tweak 0 227.99 61209 11251 0.184 2110 9141
teo-tweak 1 222.44 61959 10323 0.167 1474 8849
teo-tweak 2 218.1 64380 11080 0.172 1845 9235
teo-tweak 3 207.4 60183 11267 0.187 1929 9338
teo-tweak 4 217.46 61253 10381 0.169 1620 8761
menu 0 225.72 87871 26032 0.296 25412 620
menu 1 200.36 86577 24712 0.285 24486 226
menu 2 214.79 84885 24750 0.292 24556 194
menu 3 206.07 88007 25938 0.295 25683 255
menu 4 216.48 88700 26504 0.299 26302 202
(Idle numbers aren't really reflective in energy used -> dominated by
active power.)
On Thu, Feb 13, 2025 at 3:08 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 2/4/25 20:58, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > It has been observed that the recent teo governor update which concluded
> > with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation
> > for low latency constraints") caused the max-jOPS score of the SPECjbb
> > 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%.
> > While it may be argued that this is not a significant increase, the
> > previous score can be restored by tweaking the inequality used by teo
> > to decide whether or not to preselect the deepest enabled idle state.
> > That change also causes the critical-jOPS score of SPECjbb to increase
> > by around 2%.
> >
> > Namely, the likelihood of selecting the deepest enabled idle state in
> > teo on the platform in question has increased after commit 13ed5c4a6d9c
> > ("cpuidle: teo: Skip getting the sleep length if wakeups are very
> > frequent") because some timer wakeups were previously counted as non-
> > timer ones and they were effectively added to the left-hand side of the
> > inequality deciding whether or not to preselect the deepest idle state.
> >
> > Many of them are now (accurately) counted as timer wakeups, so the left-
> > hand side of that inequality is now effectively smaller in some cases,
> > especially when timer wakeups often occur in the range below the target
> > residency of the deepest enabled idle state and idle states with target
> > residencies below CPUIDLE_FLAG_POLLING are often selected, but the
> > majority of recent idle intervals are still above that value most of
> > the time. As a result, the deepest enabled idle state may be selected
> > more often than it used to be selected in some cases.
> >
> > To counter that effect, add the sum of the hits metric for all of the
> > idle states below the candidate one (which is the deepest enabled idle
> > state at that point) to the left-hand side of the inequality mentioned
> > above. This will cause it to be more balanced because, in principle,
> > putting both timer and non-timer wakeups on both sides of it is more
> > consistent than only taking into account the timer wakeups in the range
> > above the target residency of the deepest enabled idle state.
> >
> > Link: https://www.spec.org/jbb2015/
> > Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> > drivers/cpuidle/governors/teo.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -349,13 +349,13 @@
> > }
> >
> > /*
> > - * If the sum of the intercepts metric for all of the idle states
> > - * shallower than the current candidate one (idx) is greater than the
> > + * If the sum of the intercepts and hits metric for all of the idle
> > + * states below the current candidate one (idx) is greater than the
> > * sum of the intercepts and hits metrics for the candidate state and
> > * all of the deeper states, a shallower idle state is likely to be a
> > * better choice.
> > */
> > - if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
> > + if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) {
> > int first_suitable_idx = idx;
> >
> > /*
> >
> >
> >
>
> I'm curious, are Doug's numbers reproducible?
> Or could you share the idle state usage numbers? Is that explainable?
> Seems like a lot and it does worry me that I can't reproduce anything
> as drastic.
Well, it may not be drastic, but the results below pretty much confirm
that this particular change isn't going in the right direction IMV.
> I did a bit of x86 as well and got for Raptor Lake (I won't post the
> non-x86 numbers now, but teo-tweak performs comparable to teo mainline.)
>
> Idle 5 min:
> device gov iter Joules idles idle_misses idle_miss_ratio belows aboves
> teo 0 170.02 12690 646 0.051 371 275
> teo 1 123.17 8361 517 0.062 281 236
> teo 2 122.76 7741 347 0.045 262 85
> teo 3 118.5 8699 668 0.077 307 361
> teo 4 80.46 8113 443 0.055 264 179
> teo-tweak 0 115.05 10223 803 0.079 323 480
> teo-tweak 1 164.41 8523 631 0.074 263 368
> teo-tweak 2 163.91 8409 711 0.085 256 455
> teo-tweak 3 137.22 8581 721 0.084 261 460
> teo-tweak 4 174.95 8703 675 0.078 261 414
So basically the energy usage goes up, idle misses go up, idle misses
ratio goes up and the "above" misses go way up. Not so good as far as
I'm concerned.
> teo 0 164.34 8443 516 0.061 303 213
> teo 1 167.85 8767 492 0.056 256 236
> teo 2 166.25 7835 406 0.052 263 143
> teo 3 189.77 8865 493 0.056 276 217
> teo 4 136.97 9185 467 0.051 286 181
The above is menu I think?
> At least in the idle case you can see an increase in 'above' idle_misses.
>
> Firefox Youtube 4K video playback 2 min:
> device gov iter Joules idles idle_misses idle_miss_ratio belows aboves
> teo 0 260.09 67404 11189 0.166 1899 9290
> teo 1 273.71 76649 12155 0.159 2233 9922
> teo 2 231.45 59559 10344 0.174 1747 8597
> teo 3 202.61 58223 10641 0.183 1748 8893
> teo 4 217.56 61411 10744 0.175 1809 8935
> teo-tweak 0 227.99 61209 11251 0.184 2110 9141
> teo-tweak 1 222.44 61959 10323 0.167 1474 8849
> teo-tweak 2 218.1 64380 11080 0.172 1845 9235
> teo-tweak 3 207.4 60183 11267 0.187 1929 9338
> teo-tweak 4 217.46 61253 10381 0.169 1620 8761
And it doesn't improve things drastically here, although on average it
does reduce energy usage.
> menu 0 225.72 87871 26032 0.296 25412 620
> menu 1 200.36 86577 24712 0.285 24486 226
> menu 2 214.79 84885 24750 0.292 24556 194
> menu 3 206.07 88007 25938 0.295 25683 255
> menu 4 216.48 88700 26504 0.299 26302 202
>
> (Idle numbers aren't really reflective in energy used -> dominated by
> active power.)
>
On 2/14/25 21:34, Rafael J. Wysocki wrote:
> On Thu, Feb 13, 2025 at 3:08 PM Christian Loehle
> <christian.loehle@arm.com> wrote:
>>
>> On 2/4/25 20:58, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>>> It has been observed that the recent teo governor update which concluded
>>> with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation
>>> for low latency constraints") caused the max-jOPS score of the SPECjbb
>>> 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%.
>>> While it may be argued that this is not a significant increase, the
>>> previous score can be restored by tweaking the inequality used by teo
>>> to decide whether or not to preselect the deepest enabled idle state.
>>> That change also causes the critical-jOPS score of SPECjbb to increase
>>> by around 2%.
>>>
>>> Namely, the likelihood of selecting the deepest enabled idle state in
>>> teo on the platform in question has increased after commit 13ed5c4a6d9c
>>> ("cpuidle: teo: Skip getting the sleep length if wakeups are very
>>> frequent") because some timer wakeups were previously counted as non-
>>> timer ones and they were effectively added to the left-hand side of the
>>> inequality deciding whether or not to preselect the deepest idle state.
>>>
>>> Many of them are now (accurately) counted as timer wakeups, so the left-
>>> hand side of that inequality is now effectively smaller in some cases,
>>> especially when timer wakeups often occur in the range below the target
>>> residency of the deepest enabled idle state and idle states with target
>>> residencies below CPUIDLE_FLAG_POLLING are often selected, but the
>>> majority of recent idle intervals are still above that value most of
>>> the time. As a result, the deepest enabled idle state may be selected
>>> more often than it used to be selected in some cases.
>>>
>>> To counter that effect, add the sum of the hits metric for all of the
>>> idle states below the candidate one (which is the deepest enabled idle
>>> state at that point) to the left-hand side of the inequality mentioned
>>> above. This will cause it to be more balanced because, in principle,
>>> putting both timer and non-timer wakeups on both sides of it is more
>>> consistent than only taking into account the timer wakeups in the range
>>> above the target residency of the deepest enabled idle state.
>>>
>>> Link: https://www.spec.org/jbb2015/
>>> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> ---
>>> drivers/cpuidle/governors/teo.c | 6 +++---
>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> --- a/drivers/cpuidle/governors/teo.c
>>> +++ b/drivers/cpuidle/governors/teo.c
>>> @@ -349,13 +349,13 @@
>>> }
>>>
>>> /*
>>> - * If the sum of the intercepts metric for all of the idle states
>>> - * shallower than the current candidate one (idx) is greater than the
>>> + * If the sum of the intercepts and hits metric for all of the idle
>>> + * states below the current candidate one (idx) is greater than the
>>> * sum of the intercepts and hits metrics for the candidate state and
>>> * all of the deeper states, a shallower idle state is likely to be a
>>> * better choice.
>>> */
>>> - if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
>>> + if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) {
>>> int first_suitable_idx = idx;
>>>
>>> /*
>>>
>>>
>>>
>>
>> I'm curious, are Doug's numbers reproducible?
>> Or could you share the idle state usage numbers? Is that explainable?
>> Seems like a lot and it does worry me that I can't reproduce anything
>> as drastic.
>
> Well, it may not be drastic, but the results below pretty much confirm
> that this particular change isn't going in the right direction IMV.
Agreed, I'd still be eager to pick up something like Doug reported with
my tests too :(
>
>> I did a bit of x86 as well and got for Raptor Lake (I won't post the
>> non-x86 numbers now, but teo-tweak performs comparable to teo mainline.)
>>
>> Idle 5 min:
>> device gov iter Joules idles idle_misses idle_miss_ratio belows aboves
>> teo 0 170.02 12690 646 0.051 371 275
>> teo 1 123.17 8361 517 0.062 281 236
>> teo 2 122.76 7741 347 0.045 262 85
>> teo 3 118.5 8699 668 0.077 307 361
>> teo 4 80.46 8113 443 0.055 264 179
>> teo-tweak 0 115.05 10223 803 0.079 323 480
>> teo-tweak 1 164.41 8523 631 0.074 263 368
>> teo-tweak 2 163.91 8409 711 0.085 256 455
>> teo-tweak 3 137.22 8581 721 0.084 261 460
>> teo-tweak 4 174.95 8703 675 0.078 261 414
>
> So basically the energy usage goes up, idle misses go up, idle misses
> ratio goes up and the "above" misses go way up. Not so good as far as
> I'm concerned.
>
>> teo 0 164.34 8443 516 0.061 303 213
>> teo 1 167.85 8767 492 0.056 256 236
>> teo 2 166.25 7835 406 0.052 263 143
>> teo 3 189.77 8865 493 0.056 276 217
>> teo 4 136.97 9185 467 0.051 286 181
>
> The above is menu I think?
No this is teo again, just wanted to include it because the variance is
quite large, (not unusual for idle).
The full table (with menu (mainline))
teo 0 170.02 12690 646 0.051 371 275
teo 1 123.17 8361 517 0.062 281 236
teo 2 122.76 7741 347 0.045 262 85
teo 3 118.5 8699 668 0.077 307 361
teo 4 80.46 8113 443 0.055 264 179
teo-tweak 0 115.05 10223 803 0.079 323 480
teo-tweak 1 164.41 8523 631 0.074 263 368
teo-tweak 2 163.91 8409 711 0.085 256 455
teo-tweak 3 137.22 8581 721 0.084 261 460
teo-tweak 4 174.95 8703 675 0.078 261 414
teo 0 164.34 8443 516 0.061 303 213
teo 1 167.85 8767 492 0.056 256 236
teo 2 166.25 7835 406 0.052 263 143
teo 3 189.77 8865 493 0.056 276 217
teo 4 136.97 9185 467 0.051 286 181
menu 0 180.13 8925 343 0.038 303 40
menu 1 208.49 8717 345 0.040 312 33
menu 2 168.38 8451 321 0.038 274 47
menu 3 139.48 7853 310 0.039 289 21
menu 4 166.61 7769 339 0.044 322 17
>
>> At least in the idle case you can see an increase in 'above' idle_misses.
Agreed, idle_misses clearly go up as mentioned.
>>
>> Firefox Youtube 4K video playback 2 min:
>> device gov iter Joules idles idle_misses idle_miss_ratio belows aboves
>> teo 0 260.09 67404 11189 0.166 1899 9290
>> teo 1 273.71 76649 12155 0.159 2233 9922
>> teo 2 231.45 59559 10344 0.174 1747 8597
>> teo 3 202.61 58223 10641 0.183 1748 8893
>> teo 4 217.56 61411 10744 0.175 1809 8935
>> teo-tweak 0 227.99 61209 11251 0.184 2110 9141
>> teo-tweak 1 222.44 61959 10323 0.167 1474 8849
>> teo-tweak 2 218.1 64380 11080 0.172 1845 9235
>> teo-tweak 3 207.4 60183 11267 0.187 1929 9338
>> teo-tweak 4 217.46 61253 10381 0.169 1620 8761
>
> And it doesn't improve things drastically here, although on average it
> does reduce energy usage.
Hi Christian, Thank you for trying to repeat my idle test results. On 2025.02.13 06:07 Christian Loehle wrote: > I'm curious, are Doug's numbers reproducible? > Or could you share the idle state usage numbers? Is that explainable? > Seems like a lot and it does worry me that I can't reproduce anything > as drastic. While I am having some severe repeatability issues with my testing, not for this test. Please recall my test conditions because the CPU frequency scaling governor does matter. I was using "performance". The power comes from the high amount of time in idle state 1. I verified the idle state 1 power use by disabling all other idle states. I also have HWP disabled, but do not know if it matters. If I use the "powersave" governor (driver is intel_pstate, not intel_cpufreq) then idle power is < 2 watts. Anyway, my data: http://smythies.com/~doug/linux/idle/teo-6.14/idle/perf/ > (Idle numbers aren't really reflective in energy used -> dominated by > active power.) Well, it depends on idle time verses active time on the computer. I also measured the difference in the mains power at 20%, from 43.2 watts to 51.4 watts. I am about to send a long email with all of my test results. Since I can not seem to function without making graphs, it has a lot of links to graphs.
Hi Rafael,
On 2025.02.04 12:58 Rafael J. Wysocki wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> It has been observed that the recent teo governor update which concluded
> with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation
> for low latency constraints") caused the max-jOPS score of the SPECjbb
> 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%.
> While it may be argued that this is not a significant increase, the
> previous score can be restored by tweaking the inequality used by teo
> to decide whether or not to preselect the deepest enabled idle state.
> That change also causes the critical-jOPS score of SPECjbb to increase
> by around 2%.
>
> Namely, the likelihood of selecting the deepest enabled idle state in
> teo on the platform in question has increased after commit 13ed5c4a6d9c
> ("cpuidle: teo: Skip getting the sleep length if wakeups are very
> frequent") because some timer wakeups were previously counted as non-
> timer ones and they were effectively added to the left-hand side of the
> inequality deciding whether or not to preselect the deepest idle state.
>
> Many of them are now (accurately) counted as timer wakeups, so the left-
> hand side of that inequality is now effectively smaller in some cases,
> especially when timer wakeups often occur in the range below the target
> residency of the deepest enabled idle state and idle states with target
> residencies below CPUIDLE_FLAG_POLLING are often selected, but the
> majority of recent idle intervals are still above that value most of
> the time. As a result, the deepest enabled idle state may be selected
> more often than it used to be selected in some cases.
>
> To counter that effect, add the sum of the hits metric for all of the
> idle states below the candidate one (which is the deepest enabled idle
> state at that point) to the left-hand side of the inequality mentioned
> above. This will cause it to be more balanced because, in principle,
> putting both timer and non-timer wakeups on both sides of it is more
> consistent than only taking into account the timer wakeups in the range
> above the target residency of the deepest enabled idle state.
>
> Link: https://www.spec.org/jbb2015/
> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/cpuidle/governors/teo.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -349,13 +349,13 @@
> }
>
> /*
> - * If the sum of the intercepts metric for all of the idle states
> - * shallower than the current candidate one (idx) is greater than the
> + * If the sum of the intercepts and hits metric for all of the idle
> + * states below the current candidate one (idx) is greater than the
> * sum of the intercepts and hits metrics for the candidate state and
> * all of the deeper states, a shallower idle state is likely to be a
> * better choice.
> */
> - if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
> + if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) {
> int first_suitable_idx = idx;
>
> /*
I have only just started testing the recent idle governor changes,
and have not gotten very far yet.
There is a significant increase in processor package power during idle
with this patch, about 5 times increase (400%).
My processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
Distro: Ubuntu 24.04.1, server, no desktop GUI.
CPU frequency scaling driver: intel_pstate
HWP: disabled.
CPU frequency scaling governor: performance
Idle states:
$ grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI
/sys/devices/system/cpu/cpu0/cpuidle/state2/name:C2_ACPI
/sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3_ACPI
Test durations were >= 45 minutes each.
Kernel 6.14-rc1: Includes cpuidle: teo: Cleanups and very frequent wakeups handling update
Average Idle Power: teo governor: 2.199 watts (+25.51%)
Average Idle power: menu governor: 1.873 watts (+6.91%)
Kernel 6.14-rc1-p: Added this patch for teo and "cpuidle: menu: Avoid discarding useful information when processing recent idle intervals" for menu
Average Idle Power: teo governor: 9.401 watts (+436.6%)
Only 69.61% idle is in the deepest idle state. More typically it would be 98% to 99%.
28.6531% idle time is in state 1. More typically it would be 0.03%
Average Idle Power: menu governor: 1.820 watts (+3.9%)
Kernel 6.13: before "cpuidle: teo: Cleanups and very frequent wakeups handling update"
Average Idle Power: teo governor: 1.752 watts (reference: 0.0%)
Average Idle power: menu governor: 1.909 watts (+9.0%)
... Doug
On Fri, 2025-02-07 at 15:40 -0800, Doug Smythies wrote: > I have only just started testing the recent idle governor changes, > and have not gotten very far yet. Hi Dough, there is the menu governor patch too. While it helps the server, I did not test it on a client system. May be you would to check it too? Subject is: [RFT][PATCH v1 0/5] cpuidle: menu: Avoid discarding useful information when processing recent idle intervals Thanks!
Hi Artem, On 2025.02.09 01:15 : Artem Bityutskiy wrote: > On Fri, 2025-02-07 at 15:40 -0800, Doug Smythies wrote: >> I have only just started testing the recent idle governor changes, >> and have not gotten very far yet. > > Hi Dough, > > there is the menu governor patch too. While it helps the server, I did not test > it on a client system. May be you would to check it too? Subject is: > > [RFT][PATCH v1 0/5] cpuidle: menu: Avoid discarding useful information when processing recent idle intervals Yes, I have included that patch set in my kernel 6.14-rc1 plus patches under test. Other than the idle data, I don't have any other data to present yet as I have been acquiring baseline data.
Hi Doug,
On Sat, Feb 8, 2025 at 12:40 AM Doug Smythies <dsmythies@telus.net> wrote:
>
> Hi Rafael,
>
> On 2025.02.04 12:58 Rafael J. Wysocki wrote:
> >
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > It has been observed that the recent teo governor update which concluded
> > with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation
> > for low latency constraints") caused the max-jOPS score of the SPECjbb
> > 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%.
> > While it may be argued that this is not a significant increase, the
> > previous score can be restored by tweaking the inequality used by teo
> > to decide whether or not to preselect the deepest enabled idle state.
> > That change also causes the critical-jOPS score of SPECjbb to increase
> > by around 2%.
> >
> > Namely, the likelihood of selecting the deepest enabled idle state in
> > teo on the platform in question has increased after commit 13ed5c4a6d9c
> > ("cpuidle: teo: Skip getting the sleep length if wakeups are very
> > frequent") because some timer wakeups were previously counted as non-
> > timer ones and they were effectively added to the left-hand side of the
> > inequality deciding whether or not to preselect the deepest idle state.
> >
> > Many of them are now (accurately) counted as timer wakeups, so the left-
> > hand side of that inequality is now effectively smaller in some cases,
> > especially when timer wakeups often occur in the range below the target
> > residency of the deepest enabled idle state and idle states with target
> > residencies below CPUIDLE_FLAG_POLLING are often selected, but the
> > majority of recent idle intervals are still above that value most of
> > the time. As a result, the deepest enabled idle state may be selected
> > more often than it used to be selected in some cases.
> >
> > To counter that effect, add the sum of the hits metric for all of the
> > idle states below the candidate one (which is the deepest enabled idle
> > state at that point) to the left-hand side of the inequality mentioned
> > above. This will cause it to be more balanced because, in principle,
> > putting both timer and non-timer wakeups on both sides of it is more
> > consistent than only taking into account the timer wakeups in the range
> > above the target residency of the deepest enabled idle state.
> >
> > Link: https://www.spec.org/jbb2015/
> > Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> > drivers/cpuidle/governors/teo.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -349,13 +349,13 @@
> > }
> >
> > /*
> > - * If the sum of the intercepts metric for all of the idle states
> > - * shallower than the current candidate one (idx) is greater than the
> > + * If the sum of the intercepts and hits metric for all of the idle
> > + * states below the current candidate one (idx) is greater than the
> > * sum of the intercepts and hits metrics for the candidate state and
> > * all of the deeper states, a shallower idle state is likely to be a
> > * better choice.
> > */
> > - if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
> > + if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) {
> > int first_suitable_idx = idx;
> >
> > /*
>
> I have only just started testing the recent idle governor changes,
> and have not gotten very far yet.
>
> There is a significant increase in processor package power during idle
> with this patch, about 5 times increase (400%).
Thanks for this data point!
The restoration of the 1.4% benchmark score is not worth this cost IMV.
> My processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> Distro: Ubuntu 24.04.1, server, no desktop GUI.
> CPU frequency scaling driver: intel_pstate
> HWP: disabled.
> CPU frequency scaling governor: performance
>
> Idle states:
> $ grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
> /sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
> /sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI
> /sys/devices/system/cpu/cpu0/cpuidle/state2/name:C2_ACPI
> /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3_ACPI
>
> Test durations were >= 45 minutes each.
>
> Kernel 6.14-rc1: Includes cpuidle: teo: Cleanups and very frequent wakeups handling update
> Average Idle Power: teo governor: 2.199 watts (+25.51%)
> Average Idle power: menu governor: 1.873 watts (+6.91%)
menu hasn't changed in 6.14-rc1, so this would be variation between
runs I suppose.
> Kernel 6.14-rc1-p: Added this patch for teo and "cpuidle: menu: Avoid discarding useful information when processing recent idle intervals" for menu
> Average Idle Power: teo governor: 9.401 watts (+436.6%)
> Only 69.61% idle is in the deepest idle state. More typically it would be 98% to 99%.
Ah, not good.
OK, this clearly doesn't go in the right direction.
> 28.6531% idle time is in state 1. More typically it would be 0.03%
> Average Idle Power: menu governor: 1.820 watts (+3.9%)
>
> Kernel 6.13: before "cpuidle: teo: Cleanups and very frequent wakeups handling update"
> Average Idle Power: teo governor: 1.752 watts (reference: 0.0%)
> Average Idle power: menu governor: 1.909 watts (+9.0%)
Thanks, I'm just going to drop this patch then.
If you don't mind, I'll have a couple more teo updates for testing.
On 2025.02.08 03:25 Rafael J. Wysocki wrote: >On Sat, Feb 8, 2025 at 12:40 AM Doug Smythies <dsmythies@telus.net> wrote: >> On 2025.02.04 12:58 Rafael J. Wysocki wrote: >>> >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> ... snip ... >> >> Test durations were >= 45 minutes each. >> >> Kernel 6.14-rc1: Includes cpuidle: teo: Cleanups and very frequent wakeups handling update >> Average Idle Power: teo governor: 2.199 watts (+25.51%) >> Average Idle power: menu governor: 1.873 watts (+6.91%) > > menu hasn't changed in 6.14-rc1, so this would be variation between >runs I suppose. Perhaps the way I presented my data wasn't the best method. All the % were relative to kernel 6.13 and the teo idle governor. Relative to kernel 6.13 menu governor that would be: -1.9%, certainly within the noise floor. >> Kernel 6.14-rc1-p: Added this patch for teo and "cpuidle: menu: Avoid discarding useful information when processing recent idle intervals" for menu >> Average Idle Power: teo governor: 9.401 watts (+436.6%) >> Only 69.61% idle is in the deepest idle state. More typically it would be 98% to 99%. > > Ah, not good. > > OK, this clearly doesn't go in the right direction. > >> 28.6531% idle time is in state 1. More typically it would be 0.03% >> Average Idle Power: menu governor: 1.820 watts (+3.9%) >> >> Kernel 6.13: before "cpuidle: teo: Cleanups and very frequent wakeups handling update" >> Average Idle Power: teo governor: 1.752 watts (reference: 0.0%) >> Average Idle power: menu governor: 1.909 watts (+9.0%) > > Thanks, I'm just going to drop this patch then. > > If you don't mind, I'll have a couple more teo updates for testing. O.K. I'll watch for the patches. Plenty to do in the meantime, acquiring baseline data and continuing with the: "cpuidle: menu: Avoid discarding useful information when processing recent idle intervals" patch set portion of kernel 6.14-rc1-p
On Tue, Feb 4, 2025 at 9:58 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> It has been observed that the recent teo governor update which concluded
> with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation
> for low latency constraints") caused the max-jOPS score of the SPECjbb
> 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%.
> While it may be argued that this is not a significant increase, the
> previous score can be restored by tweaking the inequality used by teo
> to decide whether or not to preselect the deepest enabled idle state.
> That change also causes the critical-jOPS score of SPECjbb to increase
> by around 2%.
>
> Namely, the likelihood of selecting the deepest enabled idle state in
> teo on the platform in question has increased after commit 13ed5c4a6d9c
> ("cpuidle: teo: Skip getting the sleep length if wakeups are very
> frequent") because some timer wakeups were previously counted as non-
> timer ones and they were effectively added to the left-hand side of the
> inequality deciding whether or not to preselect the deepest idle state.
>
> Many of them are now (accurately) counted as timer wakeups, so the left-
> hand side of that inequality is now effectively smaller in some cases,
> especially when timer wakeups often occur in the range below the target
> residency of the deepest enabled idle state and idle states with target
> residencies below CPUIDLE_FLAG_POLLING are often selected, but the
> majority of recent idle intervals are still above that value most of
> the time. As a result, the deepest enabled idle state may be selected
> more often than it used to be selected in some cases.
>
> To counter that effect, add the sum of the hits metric for all of the
> idle states below the candidate one (which is the deepest enabled idle
> state at that point) to the left-hand side of the inequality mentioned
> above. This will cause it to be more balanced because, in principle,
> putting both timer and non-timer wakeups on both sides of it is more
> consistent than only taking into account the timer wakeups in the range
> above the target residency of the deepest enabled idle state.
>
> Link: https://www.spec.org/jbb2015/
> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/cpuidle/governors/teo.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -349,13 +349,13 @@
> }
>
> /*
> - * If the sum of the intercepts metric for all of the idle states
> - * shallower than the current candidate one (idx) is greater than the
> + * If the sum of the intercepts and hits metric for all of the idle
> + * states below the current candidate one (idx) is greater than the
> * sum of the intercepts and hits metrics for the candidate state and
> * all of the deeper states, a shallower idle state is likely to be a
> * better choice.
> */
> - if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
> + if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) {
> int first_suitable_idx = idx;
>
> /*
For easier reference/testing this has been exposed in the git branch at
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=experimental/teo-tweak
on top of the cpuidle material that went into 6.14-rc1.
© 2016 - 2025 Red Hat, Inc.