Re: [PATCH v1 1/3] cpuidle: governors: menu: Avoid selecting states with too much latency

Rafael J. Wysocki posted 1 patch 3 months, 2 weeks ago
drivers/cpuidle/governors/menu.c |    7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
Re: [PATCH v1 1/3] cpuidle: governors: menu: Avoid selecting states with too much latency
Posted by Rafael J. Wysocki 3 months, 2 weeks ago
Hi Doug,

On Thursday, October 23, 2025 5:05:44 AM CEST Doug Smythies wrote:
> Hi Rafael,
> 
> Recent email communications about other patches had me
> looking at this one again. 
> 
> On 2025.08.13 03:26 Rafael wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> ... snip...
> 
> > However, after the above change, latency_req cannot take the predicted_ns
> > value any more, which takes place after commit 38f83090f515 ("cpuidle:
> > menu: Remove iowait influence"), because it may cause a polling state
> > to be returned prematurely.
> >
> > In the context of the previous example say that predicted_ns is 3000 and
> > the PM QoS latency limit is still 20 us.  Additionally, say that idle
> > state 0 is a polling one.  Moving the exit_latency_ns check before the
> > target_residency_ns one causes the loop to terminate in the second
> > iteration, before the target_residency_ns check, so idle state 0 will be
> > returned even though previously state 1 would be returned if there were
> > no imminent timers.
> >
> > For this reason, remove the assignment of the predicted_ns value to
> > latency_req from the code.
> 
> Which is okay for timer-based workflow,
> but what about non-timer based, or interrupt driven, workflow?
> 
> Under conditions where idle state 0, or Polling, would be used a lot,
> I am observing about a 11 % throughput regression with this patch
> And idle state 0, polling, usage going from 20% to 0%. 
> 
> From my testing of kernels 6.17-rc1, rc2,rc3 in August and September
> and again now. I missed this in August/September:
> 
> 779b1a1cb13a cpuidle: governors: menu: Avoid selecting states with too much latency - v6.17-rc3
> fa3fa55de0d6 cpuidle: governors: menu: Avoid using invalid recent intervals data - v6.17-rc2
> baseline reference: v6.17-rc1
> 
> teo was included also. As far as I can recall its response has always been similar to rc3. At least, recently.
> 
> Three graphs are attached:
> Sampling data once per 20 seconds, the test is started after the first idle sample,
> and at least one sample is taken after the system returns to idle after the test.
> The faster the test runs the better.
> 
> Test computer:
> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> Distro: Ubuntu 24.04.3, server, no desktop GUI.
> CPU frequency scaling driver: intel_pstate
> HWP: disabled.
> CPU frequency scaling governor: performance
> Ilde driver: intel_idle
> Idle governor: menu (except teo for one compare test run)
> Idle states: 4: name : description:
>   state0/name:POLL                desc:CPUIDLE CORE POLL IDLE
>   state1/name:C1_ACPI          desc:ACPI FFH MWAIT 0x0
>   state2/name:C2_ACPI          desc:ACPI FFH MWAIT 0x30
>   state3/name:C3_ACPI          desc:ACPI FFH MWAIT 0x60

OK, so since the exit residency of an idle state cannot exceed its target
residency, the appended change (on top of 6.18-rc2) should make the throughput
regression go away.

---
 drivers/cpuidle/governors/menu.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -321,10 +321,13 @@ static int menu_select(struct cpuidle_dr
 
 		/*
 		 * Use a physical idle state, not busy polling, unless a timer
-		 * is going to trigger soon enough.
+		 * is going to trigger soon enough or the exit latency of the
+		 * idle state in question is greater than the predicted idle
+		 * duration.
 		 */
 		if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
-		    s->target_residency_ns <= data->next_timer_ns) {
+		    s->target_residency_ns <= data->next_timer_ns &&
+		    s->exit_latency_ns <= predicted_ns) {
 			predicted_ns = s->target_residency_ns;
 			idx = i;
 			break;
RE: [PATCH v1 1/3] cpuidle: governors: menu: Avoid selecting states with too much latency
Posted by Doug Smythies 3 months, 2 weeks ago
On 2025.10.23 07:51 Rafael wrote:

> Hi Doug,
>
> On Thursday, October 23, 2025 5:05:44 AM CEST Doug Smythies wrote:
>> Hi Rafael,
>> 
>> Recent email communications about other patches had me
>> looking at this one again. 
>> 
>> On 2025.08.13 03:26 Rafael wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>> ... snip...
>> 
>>> However, after the above change, latency_req cannot take the predicted_ns
>>> value any more, which takes place after commit 38f83090f515 ("cpuidle:
>>> menu: Remove iowait influence"), because it may cause a polling state
>>> to be returned prematurely.
>>>
>>> In the context of the previous example say that predicted_ns is 3000 and
>>> the PM QoS latency limit is still 20 us.  Additionally, say that idle
>>> state 0 is a polling one.  Moving the exit_latency_ns check before the
>>> target_residency_ns one causes the loop to terminate in the second
>>> iteration, before the target_residency_ns check, so idle state 0 will be
>>> returned even though previously state 1 would be returned if there were
>>> no imminent timers.
>>>
>>> For this reason, remove the assignment of the predicted_ns value to
>>> latency_req from the code.
>> 
>> Which is okay for timer-based workflow,
>> but what about non-timer based, or interrupt driven, workflow?
>> 
>> Under conditions where idle state 0, or Polling, would be used a lot,
>> I am observing about a 11 % throughput regression with this patch
>> And idle state 0, polling, usage going from 20% to 0%. 
>> 
>> From my testing of kernels 6.17-rc1, rc2,rc3 in August and September
>> and again now. I missed this in August/September:
>> 
>> 779b1a1cb13a cpuidle: governors: menu: Avoid selecting states with too much latency - v6.17-rc3
>> fa3fa55de0d6 cpuidle: governors: menu: Avoid using invalid recent intervals data - v6.17-rc2
>> baseline reference: v6.17-rc1
>> 
>> teo was included also. As far as I can recall its response has always been similar to rc3. At least, recently.
>> 
>> Three graphs are attached:
>> Sampling data once per 20 seconds, the test is started after the first idle sample,
>> and at least one sample is taken after the system returns to idle after the test.
>> The faster the test runs the better.
>> 
>> Test computer:
>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
>> Distro: Ubuntu 24.04.3, server, no desktop GUI.
>> CPU frequency scaling driver: intel_pstate
>> HWP: disabled.
>> CPU frequency scaling governor: performance
>> Ilde driver: intel_idle
>> Idle governor: menu (except teo for one compare test run)
>> Idle states: 4: name : description:
>>   state0/name:POLL                desc:CPUIDLE CORE POLL IDLE
>>   state1/name:C1_ACPI          desc:ACPI FFH MWAIT 0x0
>>   state2/name:C2_ACPI          desc:ACPI FFH MWAIT 0x30
>>   state3/name:C3_ACPI          desc:ACPI FFH MWAIT 0x60
>
> OK, so since the exit residency of an idle state cannot exceed its target
> residency, the appended change (on top of 6.18-rc2) should make the throughput
> regression go away.

Indeed, the patch you appended below did make the
throughput regression go away.

Thank you.

>
> ---
> drivers/cpuidle/governors/menu.c |    7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -321,10 +321,13 @@ static int menu_select(struct cpuidle_dr
> 
> 		/*
> 		 * Use a physical idle state, not busy polling, unless a timer
> -		 * is going to trigger soon enough.
> +		 * is going to trigger soon enough or the exit latency of the
> +		 * idle state in question is greater than the predicted idle
> +		 * duration.
> 		 */
> 		if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
> -		    s->target_residency_ns <= data->next_timer_ns) {
> +		    s->target_residency_ns <= data->next_timer_ns &&
> +		    s->exit_latency_ns <= predicted_ns) {
> 			predicted_ns = s->target_residency_ns;
> 			idx = i;
> 			break;
Re: [PATCH v1 1/3] cpuidle: governors: menu: Avoid selecting states with too much latency
Posted by Rafael J. Wysocki 3 months, 2 weeks ago
On Thu, Oct 23, 2025 at 6:03 PM Doug Smythies <dsmythies@telus.net> wrote:
>
> On 2025.10.23 07:51 Rafael wrote:
>
> > Hi Doug,
> >
> > On Thursday, October 23, 2025 5:05:44 AM CEST Doug Smythies wrote:
> >> Hi Rafael,
> >>
> >> Recent email communications about other patches had me
> >> looking at this one again.
> >>
> >> On 2025.08.13 03:26 Rafael wrote:
> >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>>
> >> ... snip...
> >>
> >>> However, after the above change, latency_req cannot take the predicted_ns
> >>> value any more, which takes place after commit 38f83090f515 ("cpuidle:
> >>> menu: Remove iowait influence"), because it may cause a polling state
> >>> to be returned prematurely.
> >>>
> >>> In the context of the previous example say that predicted_ns is 3000 and
> >>> the PM QoS latency limit is still 20 us.  Additionally, say that idle
> >>> state 0 is a polling one.  Moving the exit_latency_ns check before the
> >>> target_residency_ns one causes the loop to terminate in the second
> >>> iteration, before the target_residency_ns check, so idle state 0 will be
> >>> returned even though previously state 1 would be returned if there were
> >>> no imminent timers.
> >>>
> >>> For this reason, remove the assignment of the predicted_ns value to
> >>> latency_req from the code.
> >>
> >> Which is okay for timer-based workflow,
> >> but what about non-timer based, or interrupt driven, workflow?
> >>
> >> Under conditions where idle state 0, or Polling, would be used a lot,
> >> I am observing about a 11 % throughput regression with this patch
> >> And idle state 0, polling, usage going from 20% to 0%.
> >>
> >> From my testing of kernels 6.17-rc1, rc2,rc3 in August and September
> >> and again now. I missed this in August/September:
> >>
> >> 779b1a1cb13a cpuidle: governors: menu: Avoid selecting states with too much latency - v6.17-rc3
> >> fa3fa55de0d6 cpuidle: governors: menu: Avoid using invalid recent intervals data - v6.17-rc2
> >> baseline reference: v6.17-rc1
> >>
> >> teo was included also. As far as I can recall its response has always been similar to rc3. At least, recently.
> >>
> >> Three graphs are attached:
> >> Sampling data once per 20 seconds, the test is started after the first idle sample,
> >> and at least one sample is taken after the system returns to idle after the test.
> >> The faster the test runs the better.
> >>
> >> Test computer:
> >> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> >> Distro: Ubuntu 24.04.3, server, no desktop GUI.
> >> CPU frequency scaling driver: intel_pstate
> >> HWP: disabled.
> >> CPU frequency scaling governor: performance
> >> Ilde driver: intel_idle
> >> Idle governor: menu (except teo for one compare test run)
> >> Idle states: 4: name : description:
> >>   state0/name:POLL                desc:CPUIDLE CORE POLL IDLE
> >>   state1/name:C1_ACPI          desc:ACPI FFH MWAIT 0x0
> >>   state2/name:C2_ACPI          desc:ACPI FFH MWAIT 0x30
> >>   state3/name:C3_ACPI          desc:ACPI FFH MWAIT 0x60
> >
> > OK, so since the exit residency of an idle state cannot exceed its target
> > residency, the appended change (on top of 6.18-rc2) should make the throughput
> > regression go away.
>
> Indeed, the patch you appended below did make the
> throughput regression go away.
>
> Thank you.

OK, this is not an unreasonable change, so let me add a changelog to
it and submit it.