drivers/cpuidle/governors/menu.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
Hi Doug,
On Thursday, October 23, 2025 5:05:44 AM CEST Doug Smythies wrote:
> Hi Rafael,
>
> Recent email communications about other patches had me
> looking at this one again.
>
> On 2025.08.13 03:26 Rafael wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> ... snip...
>
> > However, after the above change, latency_req cannot take the predicted_ns
> > value any more, which takes place after commit 38f83090f515 ("cpuidle:
> > menu: Remove iowait influence"), because it may cause a polling state
> > to be returned prematurely.
> >
> > In the context of the previous example say that predicted_ns is 3000 and
> > the PM QoS latency limit is still 20 us. Additionally, say that idle
> > state 0 is a polling one. Moving the exit_latency_ns check before the
> > target_residency_ns one causes the loop to terminate in the second
> > iteration, before the target_residency_ns check, so idle state 0 will be
> > returned even though previously state 1 would be returned if there were
> > no imminent timers.
> >
> > For this reason, remove the assignment of the predicted_ns value to
> > latency_req from the code.
>
> Which is okay for timer-based workflow,
> but what about non-timer based, or interrupt driven, workflow?
>
> Under conditions where idle state 0, or Polling, would be used a lot,
> I am observing about a 11 % throughput regression with this patch
> And idle state 0, polling, usage going from 20% to 0%.
>
> From my testing of kernels 6.17-rc1, rc2,rc3 in August and September
> and again now. I missed this in August/September:
>
> 779b1a1cb13a cpuidle: governors: menu: Avoid selecting states with too much latency - v6.17-rc3
> fa3fa55de0d6 cpuidle: governors: menu: Avoid using invalid recent intervals data - v6.17-rc2
> baseline reference: v6.17-rc1
>
> teo was included also. As far as I can recall its response has always been similar to rc3. At least, recently.
>
> Three graphs are attached:
> Sampling data once per 20 seconds, the test is started after the first idle sample,
> and at least one sample is taken after the system returns to idle after the test.
> The faster the test runs the better.
>
> Test computer:
> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> Distro: Ubuntu 24.04.3, server, no desktop GUI.
> CPU frequency scaling driver: intel_pstate
> HWP: disabled.
> CPU frequency scaling governor: performance
> Ilde driver: intel_idle
> Idle governor: menu (except teo for one compare test run)
> Idle states: 4: name : description:
> state0/name:POLL desc:CPUIDLE CORE POLL IDLE
> state1/name:C1_ACPI desc:ACPI FFH MWAIT 0x0
> state2/name:C2_ACPI desc:ACPI FFH MWAIT 0x30
> state3/name:C3_ACPI desc:ACPI FFH MWAIT 0x60
OK, so since the exit residency of an idle state cannot exceed its target
residency, the appended change (on top of 6.18-rc2) should make the throughput
regression go away.
---
drivers/cpuidle/governors/menu.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -321,10 +321,13 @@ static int menu_select(struct cpuidle_dr
/*
* Use a physical idle state, not busy polling, unless a timer
- * is going to trigger soon enough.
+ * is going to trigger soon enough or the exit latency of the
+ * idle state in question is greater than the predicted idle
+ * duration.
*/
if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
- s->target_residency_ns <= data->next_timer_ns) {
+ s->target_residency_ns <= data->next_timer_ns &&
+ s->exit_latency_ns <= predicted_ns) {
predicted_ns = s->target_residency_ns;
idx = i;
break;
On 2025.10.23 07:51 Rafael wrote:
> Hi Doug,
>
> On Thursday, October 23, 2025 5:05:44 AM CEST Doug Smythies wrote:
>> Hi Rafael,
>>
>> Recent email communications about other patches had me
>> looking at this one again.
>>
>> On 2025.08.13 03:26 Rafael wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>> ... snip...
>>
>>> However, after the above change, latency_req cannot take the predicted_ns
>>> value any more, which takes place after commit 38f83090f515 ("cpuidle:
>>> menu: Remove iowait influence"), because it may cause a polling state
>>> to be returned prematurely.
>>>
>>> In the context of the previous example say that predicted_ns is 3000 and
>>> the PM QoS latency limit is still 20 us. Additionally, say that idle
>>> state 0 is a polling one. Moving the exit_latency_ns check before the
>>> target_residency_ns one causes the loop to terminate in the second
>>> iteration, before the target_residency_ns check, so idle state 0 will be
>>> returned even though previously state 1 would be returned if there were
>>> no imminent timers.
>>>
>>> For this reason, remove the assignment of the predicted_ns value to
>>> latency_req from the code.
>>
>> Which is okay for timer-based workflow,
>> but what about non-timer based, or interrupt driven, workflow?
>>
>> Under conditions where idle state 0, or Polling, would be used a lot,
>> I am observing about a 11 % throughput regression with this patch
>> And idle state 0, polling, usage going from 20% to 0%.
>>
>> From my testing of kernels 6.17-rc1, rc2,rc3 in August and September
>> and again now. I missed this in August/September:
>>
>> 779b1a1cb13a cpuidle: governors: menu: Avoid selecting states with too much latency - v6.17-rc3
>> fa3fa55de0d6 cpuidle: governors: menu: Avoid using invalid recent intervals data - v6.17-rc2
>> baseline reference: v6.17-rc1
>>
>> teo was included also. As far as I can recall its response has always been similar to rc3. At least, recently.
>>
>> Three graphs are attached:
>> Sampling data once per 20 seconds, the test is started after the first idle sample,
>> and at least one sample is taken after the system returns to idle after the test.
>> The faster the test runs the better.
>>
>> Test computer:
>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
>> Distro: Ubuntu 24.04.3, server, no desktop GUI.
>> CPU frequency scaling driver: intel_pstate
>> HWP: disabled.
>> CPU frequency scaling governor: performance
>> Ilde driver: intel_idle
>> Idle governor: menu (except teo for one compare test run)
>> Idle states: 4: name : description:
>> state0/name:POLL desc:CPUIDLE CORE POLL IDLE
>> state1/name:C1_ACPI desc:ACPI FFH MWAIT 0x0
>> state2/name:C2_ACPI desc:ACPI FFH MWAIT 0x30
>> state3/name:C3_ACPI desc:ACPI FFH MWAIT 0x60
>
> OK, so since the exit residency of an idle state cannot exceed its target
> residency, the appended change (on top of 6.18-rc2) should make the throughput
> regression go away.
Indeed, the patch you appended below did make the
throughput regression go away.
Thank you.
>
> ---
> drivers/cpuidle/governors/menu.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -321,10 +321,13 @@ static int menu_select(struct cpuidle_dr
>
> /*
> * Use a physical idle state, not busy polling, unless a timer
> - * is going to trigger soon enough.
> + * is going to trigger soon enough or the exit latency of the
> + * idle state in question is greater than the predicted idle
> + * duration.
> */
> if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
> - s->target_residency_ns <= data->next_timer_ns) {
> + s->target_residency_ns <= data->next_timer_ns &&
> + s->exit_latency_ns <= predicted_ns) {
> predicted_ns = s->target_residency_ns;
> idx = i;
> break;
On Thu, Oct 23, 2025 at 6:03 PM Doug Smythies <dsmythies@telus.net> wrote:
>
> On 2025.10.23 07:51 Rafael wrote:
>
> > Hi Doug,
> >
> > On Thursday, October 23, 2025 5:05:44 AM CEST Doug Smythies wrote:
> >> Hi Rafael,
> >>
> >> Recent email communications about other patches had me
> >> looking at this one again.
> >>
> >> On 2025.08.13 03:26 Rafael wrote:
> >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>>
> >> ... snip...
> >>
> >>> However, after the above change, latency_req cannot take the predicted_ns
> >>> value any more, which takes place after commit 38f83090f515 ("cpuidle:
> >>> menu: Remove iowait influence"), because it may cause a polling state
> >>> to be returned prematurely.
> >>>
> >>> In the context of the previous example say that predicted_ns is 3000 and
> >>> the PM QoS latency limit is still 20 us. Additionally, say that idle
> >>> state 0 is a polling one. Moving the exit_latency_ns check before the
> >>> target_residency_ns one causes the loop to terminate in the second
> >>> iteration, before the target_residency_ns check, so idle state 0 will be
> >>> returned even though previously state 1 would be returned if there were
> >>> no imminent timers.
> >>>
> >>> For this reason, remove the assignment of the predicted_ns value to
> >>> latency_req from the code.
> >>
> >> Which is okay for timer-based workflow,
> >> but what about non-timer based, or interrupt driven, workflow?
> >>
> >> Under conditions where idle state 0, or Polling, would be used a lot,
> >> I am observing about a 11 % throughput regression with this patch
> >> And idle state 0, polling, usage going from 20% to 0%.
> >>
> >> From my testing of kernels 6.17-rc1, rc2,rc3 in August and September
> >> and again now. I missed this in August/September:
> >>
> >> 779b1a1cb13a cpuidle: governors: menu: Avoid selecting states with too much latency - v6.17-rc3
> >> fa3fa55de0d6 cpuidle: governors: menu: Avoid using invalid recent intervals data - v6.17-rc2
> >> baseline reference: v6.17-rc1
> >>
> >> teo was included also. As far as I can recall its response has always been similar to rc3. At least, recently.
> >>
> >> Three graphs are attached:
> >> Sampling data once per 20 seconds, the test is started after the first idle sample,
> >> and at least one sample is taken after the system returns to idle after the test.
> >> The faster the test runs the better.
> >>
> >> Test computer:
> >> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> >> Distro: Ubuntu 24.04.3, server, no desktop GUI.
> >> CPU frequency scaling driver: intel_pstate
> >> HWP: disabled.
> >> CPU frequency scaling governor: performance
> >> Ilde driver: intel_idle
> >> Idle governor: menu (except teo for one compare test run)
> >> Idle states: 4: name : description:
> >> state0/name:POLL desc:CPUIDLE CORE POLL IDLE
> >> state1/name:C1_ACPI desc:ACPI FFH MWAIT 0x0
> >> state2/name:C2_ACPI desc:ACPI FFH MWAIT 0x30
> >> state3/name:C3_ACPI desc:ACPI FFH MWAIT 0x60
> >
> > OK, so since the exit residency of an idle state cannot exceed its target
> > residency, the appended change (on top of 6.18-rc2) should make the throughput
> > regression go away.
>
> Indeed, the patch you appended below did make the
> throughput regression go away.
>
> Thank you.
OK, this is not an unreasonable change, so let me add a changelog to
it and submit it.
© 2016 - 2026 Red Hat, Inc.