[RFT][PATCH v1 5/5] cpuidle: menu: Avoid discarding useful information

Posted by Rafael J. Wysocki 1 year ago

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

When giving up on making a high-confidence prediction,
get_typical_interval() always returns UINT_MAX which means that the
next idle interval prediction will be based entirely on the time till
the next timer.  However, the information represented by the most
recent intervals may not be completely useless in those cases.

Namely, the largest recent idle interval is an upper bound on the
recently observed idle duration, so it is reasonable to assume that
the next idle duration is unlikely to exceed it.  Moreover, this is
still true after eliminating the suspected outliers if the sample
set still under consideration is at least as large as 50% of the
maximum sample set size.

Accordingly, make get_typical_interval() return the current maximum
recent interval value in that case instead of UINT_MAX.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpuidle/governors/menu.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -190,8 +190,19 @@
 	 * This can deal with workloads that have long pauses interspersed
 	 * with sporadic activity with a bunch of short pauses.
 	 */
-	if ((divisor * 4) <= INTERVALS * 3)
+	if (divisor * 4 <= INTERVALS * 3) {
+		/*
+		 * If there are sufficiently many data points still under
+		 * consideration after the outliers have been eliminated,
+		 * returning without a prediction would be a mistake because it
+		 * is likely that the next interval will not exceed the current
+		 * maximum, so return the latter in that case.
+		 */
+		if (divisor >= INTERVALS / 2)
+			return max;
+
 		return UINT_MAX;
+	}
 
 	/* Update the thresholds for the next round. */
 	if (avg - min > max - avg)

Re: [RFT][PATCH v1 5/5] cpuidle: menu: Avoid discarding useful information

Posted by Marc Zyngier 6 months ago

[+ Thomas, Mark]

On Thu, 06 Feb 2025 14:29:05 +0000,
"Rafael J. Wysocki" <rjw@rjwysocki.net> wrote:
> 
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> When giving up on making a high-confidence prediction,
> get_typical_interval() always returns UINT_MAX which means that the
> next idle interval prediction will be based entirely on the time till
> the next timer.  However, the information represented by the most
> recent intervals may not be completely useless in those cases.
> 
> Namely, the largest recent idle interval is an upper bound on the
> recently observed idle duration, so it is reasonable to assume that
> the next idle duration is unlikely to exceed it.  Moreover, this is
> still true after eliminating the suspected outliers if the sample
> set still under consideration is at least as large as 50% of the
> maximum sample set size.
> 
> Accordingly, make get_typical_interval() return the current maximum
> recent interval value in that case instead of UINT_MAX.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpuidle/governors/menu.c |   13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -190,8 +190,19 @@
>  	 * This can deal with workloads that have long pauses interspersed
>  	 * with sporadic activity with a bunch of short pauses.
>  	 */
> -	if ((divisor * 4) <= INTERVALS * 3)
> +	if (divisor * 4 <= INTERVALS * 3) {
> +		/*
> +		 * If there are sufficiently many data points still under
> +		 * consideration after the outliers have been eliminated,
> +		 * returning without a prediction would be a mistake because it
> +		 * is likely that the next interval will not exceed the current
> +		 * maximum, so return the latter in that case.
> +		 */
> +		if (divisor >= INTERVALS / 2)
> +			return max;
> +
>  		return UINT_MAX;
> +	}
>  
>  	/* Update the thresholds for the next round. */
>  	if (avg - min > max - avg)

It appears that this patch, which made it in 6.15, results in *a lot*
of extra interrupts on one of my arm64 test machines.

* Without this patch:

maz@big-leg-emma:~$ vmstat -y 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 65370828  29244 106088    0    0     0     0   66   26  0  0 100  0  0
 1  0      0 65370828  29244 106088    0    0     0     0  103   66  0  0 100  0  0
 1  0      0 65370828  29244 106088    0    0     0     0   34   12  0  0 100  0  0
 1  0      0 65370828  29244 106088    0    0     0     0   25   12  0  0 100  0  0
 1  0      0 65370828  29244 106088    0    0     0     0   28   14  0  0 100  0  0

we're idling at only a few interrupts per second, which isn't bad for
a 24 CPU toy.

* With this patch:

maz@big-leg-emma:~$ vmstat -y 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 65361024  28420 105388    0    0     0     0 3710   27  0  0 100  0  0
 1  0      0 65361024  28420 105388    0    0     0     0 3399   20  0  0 100  0  0
 1  0      0 65361024  28420 105388    0    0     0     0 4439   78  0  0 100  0  0
 1  0      0 65361024  28420 105388    0    0     0     0 5634   14  0  0 100  0  0
 1  0      0 65361024  28420 105388    0    0     0     0 5575   14  0  0 100  0  0

we're idling at anywhere between 3k and 6k interrupts per second. Not
exactly what you want. This appears to be caused by the broadcast
timer IPI.

Reverting this patch on top of 6.16 restores sanity on this machine.

I suspect that we're entering some deep idle state in a much more
aggressive way, leading to a global timer firing as a wake-up
mechanism, and the broadcast IPI being used to kick everybody else
back. This is further confirmed by seeing the broadcast IPI almost
disappearing completely if I load the system a bit.

Daniel, you should be able to reproduce this on a Synquacer box (this
what I used here).

I'm happy to test things that could help restore some sanity.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

Re: [RFT][PATCH v1 5/5] cpuidle: menu: Avoid discarding useful information

Posted by Rafael J. Wysocki 6 months ago

On Mon, Aug 4, 2025 at 6:54 PM Marc Zyngier <maz@kernel.org> wrote:
>
> [+ Thomas, Mark]
>
> On Thu, 06 Feb 2025 14:29:05 +0000,
> "Rafael J. Wysocki" <rjw@rjwysocki.net> wrote:
> >
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > When giving up on making a high-confidence prediction,
> > get_typical_interval() always returns UINT_MAX which means that the
> > next idle interval prediction will be based entirely on the time till
> > the next timer.  However, the information represented by the most
> > recent intervals may not be completely useless in those cases.
> >
> > Namely, the largest recent idle interval is an upper bound on the
> > recently observed idle duration, so it is reasonable to assume that
> > the next idle duration is unlikely to exceed it.  Moreover, this is
> > still true after eliminating the suspected outliers if the sample
> > set still under consideration is at least as large as 50% of the
> > maximum sample set size.
> >
> > Accordingly, make get_typical_interval() return the current maximum
> > recent interval value in that case instead of UINT_MAX.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  drivers/cpuidle/governors/menu.c |   13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > --- a/drivers/cpuidle/governors/menu.c
> > +++ b/drivers/cpuidle/governors/menu.c
> > @@ -190,8 +190,19 @@
> >        * This can deal with workloads that have long pauses interspersed
> >        * with sporadic activity with a bunch of short pauses.
> >        */
> > -     if ((divisor * 4) <= INTERVALS * 3)
> > +     if (divisor * 4 <= INTERVALS * 3) {
> > +             /*
> > +              * If there are sufficiently many data points still under
> > +              * consideration after the outliers have been eliminated,
> > +              * returning without a prediction would be a mistake because it
> > +              * is likely that the next interval will not exceed the current
> > +              * maximum, so return the latter in that case.
> > +              */
> > +             if (divisor >= INTERVALS / 2)
> > +                     return max;
> > +
> >               return UINT_MAX;
> > +     }
> >
> >       /* Update the thresholds for the next round. */
> >       if (avg - min > max - avg)
>
> It appears that this patch, which made it in 6.15, results in *a lot*
> of extra interrupts on one of my arm64 test machines.
>
> * Without this patch:
>
> maz@big-leg-emma:~$ vmstat -y 1
> procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0 65370828  29244 106088    0    0     0     0   66   26  0  0 100  0  0
>  1  0      0 65370828  29244 106088    0    0     0     0  103   66  0  0 100  0  0
>  1  0      0 65370828  29244 106088    0    0     0     0   34   12  0  0 100  0  0
>  1  0      0 65370828  29244 106088    0    0     0     0   25   12  0  0 100  0  0
>  1  0      0 65370828  29244 106088    0    0     0     0   28   14  0  0 100  0  0
>
> we're idling at only a few interrupts per second, which isn't bad for
> a 24 CPU toy.
>
> * With this patch:
>
> maz@big-leg-emma:~$ vmstat -y 1
> procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0 65361024  28420 105388    0    0     0     0 3710   27  0  0 100  0  0
>  1  0      0 65361024  28420 105388    0    0     0     0 3399   20  0  0 100  0  0
>  1  0      0 65361024  28420 105388    0    0     0     0 4439   78  0  0 100  0  0
>  1  0      0 65361024  28420 105388    0    0     0     0 5634   14  0  0 100  0  0
>  1  0      0 65361024  28420 105388    0    0     0     0 5575   14  0  0 100  0  0
>
> we're idling at anywhere between 3k and 6k interrupts per second. Not
> exactly what you want. This appears to be caused by the broadcast
> timer IPI.
>
> Reverting this patch on top of 6.16 restores sanity on this machine.

I don't know what is going on here, but it looks highly suspicious to me.

The only effect of the change in question should be selecting a
shallower idle state occasionally and why would this alone cause the
number of wakeup interrupts to increase?

Arguably, it might interfere with the tick stopping logic if
predicted_ns happened to be less than TICK_NSEC sufficiently often,
but that is not expected to happen on an idle system because in that
case the average interval between genuine wakeups is relatively large.
The tick itself is not counted as a wakeup event, so returning a
shallower state at one point shouldn't affect future predictions, but
the data above suggests that it actually does affect them.

It looks like selecting a shallower idle state by the governor at one
point causes more wakeup interrupts to occur in the future which is
really note expected to happen.

Christian, what do you think?

> I suspect that we're entering some deep idle state in a much more
> aggressive way,

The change actually goes the other way around.  It causes shallower
idle states to be more likely to be selected overall.

> leading to a global timer firing as a wake-up mechanism,

What timer and why would it fire?

> and the broadcast IPI being used to kick everybody else
> back. This is further confirmed by seeing the broadcast IPI almost
> disappearing completely if I load the system a bit.
>
> Daniel, you should be able to reproduce this on a Synquacer box (this
> what I used here).
>
> I'm happy to test things that could help restore some sanity.

Before anything can be tested, I need to understand what exactly is going on.

What cpuidle driver is used on this platform?

Any chance to try the teo governor on it to see if this problem can
also be observed?

Please send the output of

$ grep -r '.*' /sys/devices/system/cpu/cpu*/cpuidle

collected after a period of idleness from the kernel in which the
change in question is present and from a kernel without it?

Re: [RFT][PATCH v1 5/5] cpuidle: menu: Avoid discarding useful information

Posted by Christian Loehle 6 months ago

On 8/5/25 14:23, Rafael J. Wysocki wrote:
> On Mon, Aug 4, 2025 at 6:54 PM Marc Zyngier <maz@kernel.org> wrote:
>>
>> [+ Thomas, Mark]
>>
>> On Thu, 06 Feb 2025 14:29:05 +0000,
>> "Rafael J. Wysocki" <rjw@rjwysocki.net> wrote:
>>>
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>>> When giving up on making a high-confidence prediction,
>>> get_typical_interval() always returns UINT_MAX which means that the
>>> next idle interval prediction will be based entirely on the time till
>>> the next timer.  However, the information represented by the most
>>> recent intervals may not be completely useless in those cases.
>>>
>>> Namely, the largest recent idle interval is an upper bound on the
>>> recently observed idle duration, so it is reasonable to assume that
>>> the next idle duration is unlikely to exceed it.  Moreover, this is
>>> still true after eliminating the suspected outliers if the sample
>>> set still under consideration is at least as large as 50% of the
>>> maximum sample set size.
>>>
>>> Accordingly, make get_typical_interval() return the current maximum
>>> recent interval value in that case instead of UINT_MAX.
>>>
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> ---
>>>  drivers/cpuidle/governors/menu.c |   13 ++++++++++++-
>>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>>
>>> --- a/drivers/cpuidle/governors/menu.c
>>> +++ b/drivers/cpuidle/governors/menu.c
>>> @@ -190,8 +190,19 @@
>>>        * This can deal with workloads that have long pauses interspersed
>>>        * with sporadic activity with a bunch of short pauses.
>>>        */
>>> -     if ((divisor * 4) <= INTERVALS * 3)
>>> +     if (divisor * 4 <= INTERVALS * 3) {
>>> +             /*
>>> +              * If there are sufficiently many data points still under
>>> +              * consideration after the outliers have been eliminated,
>>> +              * returning without a prediction would be a mistake because it
>>> +              * is likely that the next interval will not exceed the current
>>> +              * maximum, so return the latter in that case.
>>> +              */
>>> +             if (divisor >= INTERVALS / 2)
>>> +                     return max;
>>> +
>>>               return UINT_MAX;
>>> +     }
>>>
>>>       /* Update the thresholds for the next round. */
>>>       if (avg - min > max - avg)
>>
>> It appears that this patch, which made it in 6.15, results in *a lot*
>> of extra interrupts on one of my arm64 test machines.
>>
>> * Without this patch:
>>
>> maz@big-leg-emma:~$ vmstat -y 1
>> procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>>  1  0      0 65370828  29244 106088    0    0     0     0   66   26  0  0 100  0  0
>>  1  0      0 65370828  29244 106088    0    0     0     0  103   66  0  0 100  0  0
>>  1  0      0 65370828  29244 106088    0    0     0     0   34   12  0  0 100  0  0
>>  1  0      0 65370828  29244 106088    0    0     0     0   25   12  0  0 100  0  0
>>  1  0      0 65370828  29244 106088    0    0     0     0   28   14  0  0 100  0  0
>>
>> we're idling at only a few interrupts per second, which isn't bad for
>> a 24 CPU toy.
>>
>> * With this patch:
>>
>> maz@big-leg-emma:~$ vmstat -y 1
>> procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>>  1  0      0 65361024  28420 105388    0    0     0     0 3710   27  0  0 100  0  0
>>  1  0      0 65361024  28420 105388    0    0     0     0 3399   20  0  0 100  0  0
>>  1  0      0 65361024  28420 105388    0    0     0     0 4439   78  0  0 100  0  0
>>  1  0      0 65361024  28420 105388    0    0     0     0 5634   14  0  0 100  0  0
>>  1  0      0 65361024  28420 105388    0    0     0     0 5575   14  0  0 100  0  0
>>
>> we're idling at anywhere between 3k and 6k interrupts per second. Not
>> exactly what you want. This appears to be caused by the broadcast
>> timer IPI.
>>
>> Reverting this patch on top of 6.16 restores sanity on this machine.
> 
> I don't know what is going on here, but it looks highly suspicious to me.
> 
> The only effect of the change in question should be selecting a
> shallower idle state occasionally and why would this alone cause the
> number of wakeup interrupts to increase?
> 
> Arguably, it might interfere with the tick stopping logic if
> predicted_ns happened to be less than TICK_NSEC sufficiently often,
> but that is not expected to happen on an idle system because in that
> case the average interval between genuine wakeups is relatively large.
> The tick itself is not counted as a wakeup event, so returning a
> shallower state at one point shouldn't affect future predictions, but
> the data above suggests that it actually does affect them.
> 
> It looks like selecting a shallower idle state by the governor at one
> point causes more wakeup interrupts to occur in the future which is
> really note expected to happen.
> 
> Christian, what do you think?
> 
So I looked a bit into it, unfortunately I don't have the box in question.
Apparently psci cpuidle is broken (and never worked), it has:

/sys/devices/system/cpu/cpu0/cpuidle/driver/name: psci_idle
/sys/devices/system/cpu/cpu0/cpuidle/state0/disable: 0
/sys/devices/system/cpu/cpu0/cpuidle/state0/above: 0
/sys/devices/system/cpu/cpu0/cpuidle/state0/time: 1206224
/sys/devices/system/cpu/cpu0/cpuidle/state0/rejected: 0
/sys/devices/system/cpu/cpu0/cpuidle/state0/power: 4294967295
/sys/devices/system/cpu/cpu0/cpuidle/state0/residency: 1
/sys/devices/system/cpu/cpu0/cpuidle/state0/latency: 1
/sys/devices/system/cpu/cpu0/cpuidle/state0/usage: 1447
/sys/devices/system/cpu/cpu0/cpuidle/state0/desc: ARM WFI
/sys/devices/system/cpu/cpu0/cpuidle/state0/below: 182
/sys/devices/system/cpu/cpu0/cpuidle/state0/default_status: enabled
/sys/devices/system/cpu/cpu0/cpuidle/state0/name: WFI
/sys/devices/system/cpu/cpu0/cpuidle/state1/disable: 0
/sys/devices/system/cpu/cpu0/cpuidle/state1/above: 0
/sys/devices/system/cpu/cpu0/cpuidle/state1/time: 0
/sys/devices/system/cpu/cpu0/cpuidle/state1/rejected: 17021
/sys/devices/system/cpu/cpu0/cpuidle/state1/power: 0
/sys/devices/system/cpu/cpu0/cpuidle/state1/residency: 2000
/sys/devices/system/cpu/cpu0/cpuidle/state1/latency: 1500
/sys/devices/system/cpu/cpu0/cpuidle/state1/usage: 0
/sys/devices/system/cpu/cpu0/cpuidle/state1/desc: cpu-sleep-0
/sys/devices/system/cpu/cpu0/cpuidle/state1/below: 0
/sys/devices/system/cpu/cpu0/cpuidle/state1/default_status: enabled
/sys/devices/system/cpu/cpu0/cpuidle/state1/name: cpu-sleep-0
/sys/devices/system/cpu/cpu0/cpuidle/state1/s2idle/time: 0
/sys/devices/system/cpu/cpu0/cpuidle/state1/s2idle/usage: 0
/sys/devices/system/cpu/cpu0/cpuidle/state2/disable: 0
/sys/devices/system/cpu/cpu0/cpuidle/state2/above: 0
/sys/devices/system/cpu/cpu0/cpuidle/state2/time: 0
/sys/devices/system/cpu/cpu0/cpuidle/state2/rejected: 19598056
/sys/devices/system/cpu/cpu0/cpuidle/state2/power: 0
/sys/devices/system/cpu/cpu0/cpuidle/state2/residency: 2500
/sys/devices/system/cpu/cpu0/cpuidle/state2/latency: 1600
/sys/devices/system/cpu/cpu0/cpuidle/state2/usage: 0
/sys/devices/system/cpu/cpu0/cpuidle/state2/desc: cluster-sleep-0
/sys/devices/system/cpu/cpu0/cpuidle/state2/below: 0
/sys/devices/system/cpu/cpu0/cpuidle/state2/default_status: enabled
/sys/devices/system/cpu/cpu0/cpuidle/state2/name: cluster-sleep-0
/sys/devices/system/cpu/cpu0/cpuidle/state2/s2idle/time: 0
/sys/devices/system/cpu/cpu0/cpuidle/state2/s2idle/usage:: 0

(similar on all CPUs, i.e. it never accounts any (!WFI) cpuidle states
as entered, only as rejected. (for both with and without this patch,
rejected is by factor 10 or so bigger with this patch, which is
what Marc is saying and complaining about.)
If cpuidle enter fails this will report as 0 idle residency in cpuidle.c,
which indeed makes these statistical tests bogus.
I guess we could add a safeguard here again that at least requires
most of the values to be != 0? The real issue is that cpuidle is
being used while it's clearly not working as expected. So we could
also bake some sanity-checks into cpuidle core.
I'd be curious if this is more common or is this really the only
platform were this occurs (and went unnoticed all this time).

Re: [RFT][PATCH v1 5/5] cpuidle: menu: Avoid discarding useful information

Posted by Marc Zyngier 6 months ago

On Tue, 05 Aug 2025 14:23:56 +0100,
"Rafael J. Wysocki" <rafael@kernel.org> wrote:
> 
> On Mon, Aug 4, 2025 at 6:54 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > [+ Thomas, Mark]
> >
> > On Thu, 06 Feb 2025 14:29:05 +0000,
> > "Rafael J. Wysocki" <rjw@rjwysocki.net> wrote:
> > >
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > When giving up on making a high-confidence prediction,
> > > get_typical_interval() always returns UINT_MAX which means that the
> > > next idle interval prediction will be based entirely on the time till
> > > the next timer.  However, the information represented by the most
> > > recent intervals may not be completely useless in those cases.
> > >
> > > Namely, the largest recent idle interval is an upper bound on the
> > > recently observed idle duration, so it is reasonable to assume that
> > > the next idle duration is unlikely to exceed it.  Moreover, this is
> > > still true after eliminating the suspected outliers if the sample
> > > set still under consideration is at least as large as 50% of the
> > > maximum sample set size.
> > >
> > > Accordingly, make get_typical_interval() return the current maximum
> > > recent interval value in that case instead of UINT_MAX.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >  drivers/cpuidle/governors/menu.c |   13 ++++++++++++-
> > >  1 file changed, 12 insertions(+), 1 deletion(-)
> > >
> > > --- a/drivers/cpuidle/governors/menu.c
> > > +++ b/drivers/cpuidle/governors/menu.c
> > > @@ -190,8 +190,19 @@
> > >        * This can deal with workloads that have long pauses interspersed
> > >        * with sporadic activity with a bunch of short pauses.
> > >        */
> > > -     if ((divisor * 4) <= INTERVALS * 3)
> > > +     if (divisor * 4 <= INTERVALS * 3) {
> > > +             /*
> > > +              * If there are sufficiently many data points still under
> > > +              * consideration after the outliers have been eliminated,
> > > +              * returning without a prediction would be a mistake because it
> > > +              * is likely that the next interval will not exceed the current
> > > +              * maximum, so return the latter in that case.
> > > +              */
> > > +             if (divisor >= INTERVALS / 2)
> > > +                     return max;
> > > +
> > >               return UINT_MAX;
> > > +     }
> > >
> > >       /* Update the thresholds for the next round. */
> > >       if (avg - min > max - avg)
> >
> > It appears that this patch, which made it in 6.15, results in *a lot*
> > of extra interrupts on one of my arm64 test machines.
> >
> > * Without this patch:
> >
> > maz@big-leg-emma:~$ vmstat -y 1
> > procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
> >  1  0      0 65370828  29244 106088    0    0     0     0   66   26  0  0 100  0  0
> >  1  0      0 65370828  29244 106088    0    0     0     0  103   66  0  0 100  0  0
> >  1  0      0 65370828  29244 106088    0    0     0     0   34   12  0  0 100  0  0
> >  1  0      0 65370828  29244 106088    0    0     0     0   25   12  0  0 100  0  0
> >  1  0      0 65370828  29244 106088    0    0     0     0   28   14  0  0 100  0  0
> >
> > we're idling at only a few interrupts per second, which isn't bad for
> > a 24 CPU toy.
> >
> > * With this patch:
> >
> > maz@big-leg-emma:~$ vmstat -y 1
> > procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
> >  1  0      0 65361024  28420 105388    0    0     0     0 3710   27  0  0 100  0  0
> >  1  0      0 65361024  28420 105388    0    0     0     0 3399   20  0  0 100  0  0
> >  1  0      0 65361024  28420 105388    0    0     0     0 4439   78  0  0 100  0  0
> >  1  0      0 65361024  28420 105388    0    0     0     0 5634   14  0  0 100  0  0
> >  1  0      0 65361024  28420 105388    0    0     0     0 5575   14  0  0 100  0  0
> >
> > we're idling at anywhere between 3k and 6k interrupts per second. Not
> > exactly what you want. This appears to be caused by the broadcast
> > timer IPI.
> >
> > Reverting this patch on top of 6.16 restores sanity on this machine.
> 
> I don't know what is going on here, but it looks highly suspicious to me.

What does? My observation? The likelihood of this patch being the
source (or the trigger) for an unwanted behaviour? Something else?

> The only effect of the change in question should be selecting a
> shallower idle state occasionally and why would this alone cause the
> number of wakeup interrupts to increase?

You tell me. I'm the messenger here.

> Arguably, it might interfere with the tick stopping logic if
> predicted_ns happened to be less than TICK_NSEC sufficiently often,
> but that is not expected to happen on an idle system because in that
> case the average interval between genuine wakeups is relatively large.
> The tick itself is not counted as a wakeup event, so returning a
> shallower state at one point shouldn't affect future predictions, but
> the data above suggests that it actually does affect them.
> 
> It looks like selecting a shallower idle state by the governor at one
> point causes more wakeup interrupts to occur in the future which is
> really note expected to happen.
> 
> Christian, what do you think?
> 
> > I suspect that we're entering some deep idle state in a much more
> > aggressive way,
> 
> The change actually goes the other way around.  It causes shallower
> idle states to be more likely to be selected overall.

Another proof that I don't understand a thing, and that I should go
play music instead of worrying about kernel issues.

> 
> > leading to a global timer firing as a wake-up mechanism,
> 
> What timer and why would it fire?

The arch_timer_mem timer, which is used as a backup timer when the
CPUs lose their timer context while going into a deep enough idle
state.

> 
> > and the broadcast IPI being used to kick everybody else
> > back. This is further confirmed by seeing the broadcast IPI almost
> > disappearing completely if I load the system a bit.
> >
> > Daniel, you should be able to reproduce this on a Synquacer box (this
> > what I used here).
> >
> > I'm happy to test things that could help restore some sanity.
> 
> Before anything can be tested, I need to understand what exactly is going on.
> 
> What cpuidle driver is used on this platform?

psci_idle.

> Any chance to try the teo governor on it to see if this problem can
> also be observed?

Neither ladder nor teo have this issue. The number of broadcast timer
IPIs is minimal, and so is the number of interrupts delivered from the
backup timer. Only menu exhibits the IPI-hose behaviour on this box
(and only this one).

> Please send the output of
> 
> $ grep -r '.*' /sys/devices/system/cpu/cpu*/cpuidle
> 
> collected after a period of idleness from the kernel in which the
> change in question is present and from a kernel without it?

* with the change present: https://pastebin.com/Cb45Rysy

* with the change reverted: https://pastebin.com/qRy2xzeT

	M.

-- 
Without deviation from the norm, progress is not possible.

Re: [RFT][PATCH v1 5/5] cpuidle: menu: Avoid discarding useful information

Posted by Rafael J. Wysocki 6 months ago

On Tue, Aug 5, 2025 at 6:00 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 05 Aug 2025 14:23:56 +0100,
> "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> >
> > On Mon, Aug 4, 2025 at 6:54 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > [+ Thomas, Mark]
> > >
> > > On Thu, 06 Feb 2025 14:29:05 +0000,
> > > "Rafael J. Wysocki" <rjw@rjwysocki.net> wrote:
> > > >
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > When giving up on making a high-confidence prediction,
> > > > get_typical_interval() always returns UINT_MAX which means that the
> > > > next idle interval prediction will be based entirely on the time till
> > > > the next timer.  However, the information represented by the most
> > > > recent intervals may not be completely useless in those cases.
> > > >
> > > > Namely, the largest recent idle interval is an upper bound on the
> > > > recently observed idle duration, so it is reasonable to assume that
> > > > the next idle duration is unlikely to exceed it.  Moreover, this is
> > > > still true after eliminating the suspected outliers if the sample
> > > > set still under consideration is at least as large as 50% of the
> > > > maximum sample set size.
> > > >
> > > > Accordingly, make get_typical_interval() return the current maximum
> > > > recent interval value in that case instead of UINT_MAX.
> > > >
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > ---
> > > >  drivers/cpuidle/governors/menu.c |   13 ++++++++++++-
> > > >  1 file changed, 12 insertions(+), 1 deletion(-)
> > > >
> > > > --- a/drivers/cpuidle/governors/menu.c
> > > > +++ b/drivers/cpuidle/governors/menu.c
> > > > @@ -190,8 +190,19 @@
> > > >        * This can deal with workloads that have long pauses interspersed
> > > >        * with sporadic activity with a bunch of short pauses.
> > > >        */
> > > > -     if ((divisor * 4) <= INTERVALS * 3)
> > > > +     if (divisor * 4 <= INTERVALS * 3) {
> > > > +             /*
> > > > +              * If there are sufficiently many data points still under
> > > > +              * consideration after the outliers have been eliminated,
> > > > +              * returning without a prediction would be a mistake because it
> > > > +              * is likely that the next interval will not exceed the current
> > > > +              * maximum, so return the latter in that case.
> > > > +              */
> > > > +             if (divisor >= INTERVALS / 2)
> > > > +                     return max;
> > > > +
> > > >               return UINT_MAX;
> > > > +     }
> > > >
> > > >       /* Update the thresholds for the next round. */
> > > >       if (avg - min > max - avg)
> > >
> > > It appears that this patch, which made it in 6.15, results in *a lot*
> > > of extra interrupts on one of my arm64 test machines.
> > >
> > > * Without this patch:
> > >
> > > maz@big-leg-emma:~$ vmstat -y 1
> > > procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
> > >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
> > >  1  0      0 65370828  29244 106088    0    0     0     0   66   26  0  0 100  0  0
> > >  1  0      0 65370828  29244 106088    0    0     0     0  103   66  0  0 100  0  0
> > >  1  0      0 65370828  29244 106088    0    0     0     0   34   12  0  0 100  0  0
> > >  1  0      0 65370828  29244 106088    0    0     0     0   25   12  0  0 100  0  0
> > >  1  0      0 65370828  29244 106088    0    0     0     0   28   14  0  0 100  0  0
> > >
> > > we're idling at only a few interrupts per second, which isn't bad for
> > > a 24 CPU toy.
> > >
> > > * With this patch:
> > >
> > > maz@big-leg-emma:~$ vmstat -y 1
> > > procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
> > >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
> > >  1  0      0 65361024  28420 105388    0    0     0     0 3710   27  0  0 100  0  0
> > >  1  0      0 65361024  28420 105388    0    0     0     0 3399   20  0  0 100  0  0
> > >  1  0      0 65361024  28420 105388    0    0     0     0 4439   78  0  0 100  0  0
> > >  1  0      0 65361024  28420 105388    0    0     0     0 5634   14  0  0 100  0  0
> > >  1  0      0 65361024  28420 105388    0    0     0     0 5575   14  0  0 100  0  0
> > >
> > > we're idling at anywhere between 3k and 6k interrupts per second. Not
> > > exactly what you want. This appears to be caused by the broadcast
> > > timer IPI.
> > >
> > > Reverting this patch on top of 6.16 restores sanity on this machine.
> >
> > I don't know what is going on here, but it looks highly suspicious to me.
>
> What does? My observation? The likelihood of this patch being the
> source (or the trigger) for an unwanted behaviour? Something else?

The behavior of the platform, which is basically confirmed by
Christian's observation here:

https://lore.kernel.org/linux-pm/7ffcb716-9a1b-48c2-aaa4-469d0df7c792@arm.com/

And yes, cpuidle has a problem with this corner case.

> > The only effect of the change in question should be selecting a
> > shallower idle state occasionally and why would this alone cause the
> > number of wakeup interrupts to increase?
>
> You tell me. I'm the messenger here.

The tick stopping logic in menu appears to be confused.

> > Arguably, it might interfere with the tick stopping logic if
> > predicted_ns happened to be less than TICK_NSEC sufficiently often,
> > but that is not expected to happen on an idle system because in that
> > case the average interval between genuine wakeups is relatively large.
> > The tick itself is not counted as a wakeup event, so returning a
> > shallower state at one point shouldn't affect future predictions, but
> > the data above suggests that it actually does affect them.
> >
> > It looks like selecting a shallower idle state by the governor at one
> > point causes more wakeup interrupts to occur in the future which is
> > really note expected to happen.
> >
> > Christian, what do you think?
> >
> > > I suspect that we're entering some deep idle state in a much more
> > > aggressive way,
> >
> > The change actually goes the other way around.  It causes shallower
> > idle states to be more likely to be selected overall.
>
> Another proof that I don't understand a thing, and that I should go
> play music instead of worrying about kernel issues.
>
> >
> > > leading to a global timer firing as a wake-up mechanism,
> >
> > What timer and why would it fire?
>
> The arch_timer_mem timer, which is used as a backup timer when the
> CPUs lose their timer context while going into a deep enough idle
> state.

Interesting.  They should not go anywhere below WFI as per the message
from Christian linked above.

> >
> > > and the broadcast IPI being used to kick everybody else
> > > back. This is further confirmed by seeing the broadcast IPI almost
> > > disappearing completely if I load the system a bit.
> > >
> > > Daniel, you should be able to reproduce this on a Synquacer box (this
> > > what I used here).
> > >
> > > I'm happy to test things that could help restore some sanity.
> >
> > Before anything can be tested, I need to understand what exactly is going on.
> >
> > What cpuidle driver is used on this platform?
>
> psci_idle.

Right.

> > Any chance to try the teo governor on it to see if this problem can
> > also be observed?
>
> Neither ladder nor teo have this issue. The number of broadcast timer
> IPIs is minimal, and so is the number of interrupts delivered from the
> backup timer. Only menu exhibits the IPI-hose behaviour on this box
> (and only this one).

Good to know, thanks!

<shameless plug>Switch over to teo?</shameless plug>

> > Please send the output of
> >
> > $ grep -r '.*' /sys/devices/system/cpu/cpu*/cpuidle
> >
> > collected after a period of idleness from the kernel in which the
> > change in question is present and from a kernel without it?
>
> * with the change present: https://pastebin.com/Cb45Rysy

This is what Christian said, idle states 1 and 2 get rejected every time.

> * with the change reverted: https://pastebin.com/qRy2xzeT

And same here, but in this case menu doesn't get confused because
predicted_ns is basically UINT_MAX all the time.

The attached patch (completely untested) causes menu to insert an
"invalid interval" value to the array of recent intervals after the
idle state selected previously got rejected.  It basically should
prevent get_typical_interval() from returning small values if deeper
idle states get rejected all the time.

Re: [RFT][PATCH v1 5/5] cpuidle: menu: Avoid discarding useful information

Posted by Marc Zyngier 6 months ago

On Tue, 05 Aug 2025 19:50:21 +0100,
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

[...]

> > > Any chance to try the teo governor on it to see if this problem can
> > > also be observed?
> >
> > Neither ladder nor teo have this issue. The number of broadcast timer
> > IPIs is minimal, and so is the number of interrupts delivered from the
> > backup timer. Only menu exhibits the IPI-hose behaviour on this box
> > (and only this one).
> 
> Good to know, thanks!
> 
> <shameless plug>Switch over to teo?</shameless plug>

Sure thing. Just start with:

	git rm drivers/cpuidle/governors/menu.c

and I'll gladly switch to something else! ;-)

[...]

> The attached patch (completely untested) causes menu to insert an
> "invalid interval" value to the array of recent intervals after the
> idle state selected previously got rejected.  It basically should
> prevent get_typical_interval() from returning small values if deeper
> idle states get rejected all the time.

Yup, this does the trick, thanks. When you get to post this, please
add my:

Tested-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

Re: [RFT][PATCH v1 5/5] cpuidle: menu: Avoid discarding useful information

Posted by Christian Loehle 6 months ago

On 8/5/25 19:50, Rafael J. Wysocki wrote:
> On Tue, Aug 5, 2025 at 6:00 PM Marc Zyngier <maz@kernel.org> wrote:
>>
>> On Tue, 05 Aug 2025 14:23:56 +0100,
>> "Rafael J. Wysocki" <rafael@kernel.org> wrote:
>>>
>>> On Mon, Aug 4, 2025 at 6:54 PM Marc Zyngier <maz@kernel.org> wrote:
>>>>
>>>> [+ Thomas, Mark]
>>>>
>>>> On Thu, 06 Feb 2025 14:29:05 +0000,
>>>> "Rafael J. Wysocki" <rjw@rjwysocki.net> wrote:
>>>>>
>>>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>>>
>>>>> When giving up on making a high-confidence prediction,
>>>>> get_typical_interval() always returns UINT_MAX which means that the
>>>>> next idle interval prediction will be based entirely on the time till
>>>>> the next timer.  However, the information represented by the most
>>>>> recent intervals may not be completely useless in those cases.
>>>>>
>>>>> Namely, the largest recent idle interval is an upper bound on the
>>>>> recently observed idle duration, so it is reasonable to assume that
>>>>> the next idle duration is unlikely to exceed it.  Moreover, this is
>>>>> still true after eliminating the suspected outliers if the sample
>>>>> set still under consideration is at least as large as 50% of the
>>>>> maximum sample set size.
>>>>>
>>>>> Accordingly, make get_typical_interval() return the current maximum
>>>>> recent interval value in that case instead of UINT_MAX.
>>>>>
>>>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>>> ---
>>>>>  drivers/cpuidle/governors/menu.c |   13 ++++++++++++-
>>>>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>
>>>>> --- a/drivers/cpuidle/governors/menu.c
>>>>> +++ b/drivers/cpuidle/governors/menu.c
>>>>> @@ -190,8 +190,19 @@
>>>>>        * This can deal with workloads that have long pauses interspersed
>>>>>        * with sporadic activity with a bunch of short pauses.
>>>>>        */
>>>>> -     if ((divisor * 4) <= INTERVALS * 3)
>>>>> +     if (divisor * 4 <= INTERVALS * 3) {
>>>>> +             /*
>>>>> +              * If there are sufficiently many data points still under
>>>>> +              * consideration after the outliers have been eliminated,
>>>>> +              * returning without a prediction would be a mistake because it
>>>>> +              * is likely that the next interval will not exceed the current
>>>>> +              * maximum, so return the latter in that case.
>>>>> +              */
>>>>> +             if (divisor >= INTERVALS / 2)
>>>>> +                     return max;
>>>>> +
>>>>>               return UINT_MAX;
>>>>> +     }
>>>>>
>>>>>       /* Update the thresholds for the next round. */
>>>>>       if (avg - min > max - avg)
>>>>
>>>> It appears that this patch, which made it in 6.15, results in *a lot*
>>>> of extra interrupts on one of my arm64 test machines.
>>>>
>>>> * Without this patch:
>>>>
>>>> maz@big-leg-emma:~$ vmstat -y 1
>>>> procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
>>>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>>>>  1  0      0 65370828  29244 106088    0    0     0     0   66   26  0  0 100  0  0
>>>>  1  0      0 65370828  29244 106088    0    0     0     0  103   66  0  0 100  0  0
>>>>  1  0      0 65370828  29244 106088    0    0     0     0   34   12  0  0 100  0  0
>>>>  1  0      0 65370828  29244 106088    0    0     0     0   25   12  0  0 100  0  0
>>>>  1  0      0 65370828  29244 106088    0    0     0     0   28   14  0  0 100  0  0
>>>>
>>>> we're idling at only a few interrupts per second, which isn't bad for
>>>> a 24 CPU toy.
>>>>
>>>> * With this patch:
>>>>
>>>> maz@big-leg-emma:~$ vmstat -y 1
>>>> procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
>>>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>>>>  1  0      0 65361024  28420 105388    0    0     0     0 3710   27  0  0 100  0  0
>>>>  1  0      0 65361024  28420 105388    0    0     0     0 3399   20  0  0 100  0  0
>>>>  1  0      0 65361024  28420 105388    0    0     0     0 4439   78  0  0 100  0  0
>>>>  1  0      0 65361024  28420 105388    0    0     0     0 5634   14  0  0 100  0  0
>>>>  1  0      0 65361024  28420 105388    0    0     0     0 5575   14  0  0 100  0  0
>>>>
>>>> we're idling at anywhere between 3k and 6k interrupts per second. Not
>>>> exactly what you want. This appears to be caused by the broadcast
>>>> timer IPI.
>>>>
>>>> Reverting this patch on top of 6.16 restores sanity on this machine.
>>>
>>> I don't know what is going on here, but it looks highly suspicious to me.
>>
>> What does? My observation? The likelihood of this patch being the
>> source (or the trigger) for an unwanted behaviour? Something else?
> 
> The behavior of the platform, which is basically confirmed by
> Christian's observation here:
> 
> https://lore.kernel.org/linux-pm/7ffcb716-9a1b-48c2-aaa4-469d0df7c792@arm.com/
> 
> And yes, cpuidle has a problem with this corner case.
> 
>>> The only effect of the change in question should be selecting a
>>> shallower idle state occasionally and why would this alone cause the
>>> number of wakeup interrupts to increase?
>>
>> You tell me. I'm the messenger here.
> 
> The tick stopping logic in menu appears to be confused.
> 
>>> Arguably, it might interfere with the tick stopping logic if
>>> predicted_ns happened to be less than TICK_NSEC sufficiently often,
>>> but that is not expected to happen on an idle system because in that
>>> case the average interval between genuine wakeups is relatively large.
>>> The tick itself is not counted as a wakeup event, so returning a
>>> shallower state at one point shouldn't affect future predictions, but
>>> the data above suggests that it actually does affect them.
>>>
>>> It looks like selecting a shallower idle state by the governor at one
>>> point causes more wakeup interrupts to occur in the future which is
>>> really note expected to happen.
>>>
>>> Christian, what do you think?
>>>
>>>> I suspect that we're entering some deep idle state in a much more
>>>> aggressive way,
>>>
>>> The change actually goes the other way around.  It causes shallower
>>> idle states to be more likely to be selected overall.
>>
>> Another proof that I don't understand a thing, and that I should go
>> play music instead of worrying about kernel issues.
>>
>>>
>>>> leading to a global timer firing as a wake-up mechanism,
>>>
>>> What timer and why would it fire?
>>
>> The arch_timer_mem timer, which is used as a backup timer when the
>> CPUs lose their timer context while going into a deep enough idle
>> state.
> 
> Interesting.  They should not go anywhere below WFI as per the message
> from Christian linked above.
> 
>>>
>>>> and the broadcast IPI being used to kick everybody else
>>>> back. This is further confirmed by seeing the broadcast IPI almost
>>>> disappearing completely if I load the system a bit.
>>>>
>>>> Daniel, you should be able to reproduce this on a Synquacer box (this
>>>> what I used here).
>>>>
>>>> I'm happy to test things that could help restore some sanity.
>>>
>>> Before anything can be tested, I need to understand what exactly is going on.
>>>
>>> What cpuidle driver is used on this platform?
>>
>> psci_idle.
> 
> Right.
> 
>>> Any chance to try the teo governor on it to see if this problem can
>>> also be observed?
>>
>> Neither ladder nor teo have this issue. The number of broadcast timer
>> IPIs is minimal, and so is the number of interrupts delivered from the
>> backup timer. Only menu exhibits the IPI-hose behaviour on this box
>> (and only this one).
> 
> Good to know, thanks!
> 
> <shameless plug>Switch over to teo?</shameless plug>
> 
>>> Please send the output of
>>>
>>> $ grep -r '.*' /sys/devices/system/cpu/cpu*/cpuidle
>>>
>>> collected after a period of idleness from the kernel in which the
>>> change in question is present and from a kernel without it?
>>
>> * with the change present: https://pastebin.com/Cb45Rysy
> 
> This is what Christian said, idle states 1 and 2 get rejected every time.
> 
>> * with the change reverted: https://pastebin.com/qRy2xzeT
> 
> And same here, but in this case menu doesn't get confused because
> predicted_ns is basically UINT_MAX all the time.
> 
> The attached patch (completely untested) causes menu to insert an
> "invalid interval" value to the array of recent intervals after the
> idle state selected previously got rejected.  It basically should
> prevent get_typical_interval() from returning small values if deeper
> idle states get rejected all the time.

Reran the same tests on the same platform, obviously a platform with
a sane rejection rate, no big change.

Tested-by: Christian Loehle <christian.loehle@arm.com>

I guess the code could do with a comment why we need the special
last_residency_ns == 0 case, apart from that LGTM.

Thanks, Rafael!

I still think it's worrying that cpuidle never worked on that platform
apparently and it went unnoticed, maybe a warning / print would be
helpful in that case, I'll post something.

Re: [RFT][PATCH v1 5/5] cpuidle: menu: Avoid discarding useful information

Posted by Christian Loehle 11 months, 3 weeks ago

On 2/6/25 14:29, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> When giving up on making a high-confidence prediction,
> get_typical_interval() always returns UINT_MAX which means that the
> next idle interval prediction will be based entirely on the time till
> the next timer.  However, the information represented by the most
> recent intervals may not be completely useless in those cases.
> 
> Namely, the largest recent idle interval is an upper bound on the
> recently observed idle duration, so it is reasonable to assume that
> the next idle duration is unlikely to exceed it.  Moreover, this is
> still true after eliminating the suspected outliers if the sample
> set still under consideration is at least as large as 50% of the
> maximum sample set size.
> 
> Accordingly, make get_typical_interval() return the current maximum
> recent interval value in that case instead of UINT_MAX.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpuidle/governors/menu.c |   13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -190,8 +190,19 @@
>  	 * This can deal with workloads that have long pauses interspersed
>  	 * with sporadic activity with a bunch of short pauses.
>  	 */
> -	if ((divisor * 4) <= INTERVALS * 3)
> +	if (divisor * 4 <= INTERVALS * 3) {
> +		/*
> +		 * If there are sufficiently many data points still under
> +		 * consideration after the outliers have been eliminated,
> +		 * returning without a prediction would be a mistake because it
> +		 * is likely that the next interval will not exceed the current
> +		 * maximum, so return the latter in that case.
> +		 */
> +		if (divisor >= INTERVALS / 2)
> +			return max;
> +
>  		return UINT_MAX;
> +	}
>  
>  	/* Update the thresholds for the next round. */
>  	if (avg - min > max - avg)
> 

You might want to amend the description at the top of menu.c then given that
this now returns something without any meaning in a statistical significance
way. Similar to admin-guide doc.
As reported by the tests, this does improve performance a lot in scenarios of
short intervals (where passing the statistical test is hard).
Teo exploits the idle state residencies for this (i.e. as long as they fall
into the same bin, they are equal for our means), this can be viewed as the
menu equivalent to it, without relying on idle states.

Reviewed-by: Christian Loehle <christian.loehle@arm.com>

Re: [RFT][PATCH v1 5/5] cpuidle: menu: Avoid discarding useful information

Posted by Rafael J. Wysocki 11 months, 3 weeks ago

On Mon, Feb 17, 2025 at 2:39 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 2/6/25 14:29, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > When giving up on making a high-confidence prediction,
> > get_typical_interval() always returns UINT_MAX which means that the
> > next idle interval prediction will be based entirely on the time till
> > the next timer.  However, the information represented by the most
> > recent intervals may not be completely useless in those cases.
> >
> > Namely, the largest recent idle interval is an upper bound on the
> > recently observed idle duration, so it is reasonable to assume that
> > the next idle duration is unlikely to exceed it.  Moreover, this is
> > still true after eliminating the suspected outliers if the sample
> > set still under consideration is at least as large as 50% of the
> > maximum sample set size.
> >
> > Accordingly, make get_typical_interval() return the current maximum
> > recent interval value in that case instead of UINT_MAX.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  drivers/cpuidle/governors/menu.c |   13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > --- a/drivers/cpuidle/governors/menu.c
> > +++ b/drivers/cpuidle/governors/menu.c
> > @@ -190,8 +190,19 @@
> >        * This can deal with workloads that have long pauses interspersed
> >        * with sporadic activity with a bunch of short pauses.
> >        */
> > -     if ((divisor * 4) <= INTERVALS * 3)
> > +     if (divisor * 4 <= INTERVALS * 3) {
> > +             /*
> > +              * If there are sufficiently many data points still under
> > +              * consideration after the outliers have been eliminated,
> > +              * returning without a prediction would be a mistake because it
> > +              * is likely that the next interval will not exceed the current
> > +              * maximum, so return the latter in that case.
> > +              */
> > +             if (divisor >= INTERVALS / 2)
> > +                     return max;
> > +
> >               return UINT_MAX;
> > +     }
> >
> >       /* Update the thresholds for the next round. */
> >       if (avg - min > max - avg)
> >
>
> You might want to amend the description at the top of menu.c then given that
> this now returns something without any meaning in a statistical significance
> way. Similar to admin-guide doc.

OK, I'll send a documentation update patch on top of this.

> As reported by the tests, this does improve performance a lot in scenarios of
> short intervals (where passing the statistical test is hard).

Yes, that pretty much is what the SPECjbb critical-jOPS improvement means.

> Teo exploits the idle state residencies for this (i.e. as long as they fall
> into the same bin, they are equal for our means), this can be viewed as the
> menu equivalent to it, without relying on idle states.
>
> Reviewed-by: Christian Loehle <christian.loehle@arm.com>

Thank you!