While we don't want to skip calling update_idle_stats(), arrange for it
to not increment the overall time spent in the state we didn't really
enter.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC: If we wanted to also move the tracing, then I think the part ahead
of the if() also would need moving. At that point we could as well
move update_last_cx_stat(), too, which afaict would allow skipping
update_idle_stats() on the "else" path (which therefore would go
away). Yet then, with the setting of power->safe_state moved up a
little (which imo it should have been anyway) the two
cpu_is_haltable() invocations would only have the lapic_timer_off()
invocation left in between. This would then seem to call for simply
ditching the 2nd one - acpi-idle also doesn't have a 2nd instance.
TBD: For the tracing I wonder if that really needs to come ahead of the
local_irq_enable(). Maybe trace_exit_reason() needs to, but quite
certainly TRACE_6D() doesn't.
---
v3: Also move cstate_restore_tsc() invocation and split ones to
update_idle_stats().
v2: New.
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -854,17 +854,23 @@ static void mwait_idle(void)
mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK);
local_irq_disable();
- }
- after = alternative_call(cpuidle_get_tick);
+ after = alternative_call(cpuidle_get_tick);
+
+ cstate_restore_tsc();
+
+ /* Now back in C0. */
+ update_idle_stats(power, cx, before, after);
+ } else {
+ /* Never left C0. */
+ after = alternative_call(cpuidle_get_tick);
+ update_idle_stats(power, cx, after, after);
+ }
- cstate_restore_tsc();
trace_exit_reason(irq_traced);
TRACE_6D(TRC_PM_IDLE_EXIT, cx->type, after,
irq_traced[0], irq_traced[1], irq_traced[2], irq_traced[3]);
- /* Now back in C0. */
- update_idle_stats(power, cx, before, after);
local_irq_enable();
if (!(lapic_timer_reliable_states & (1 << cx->type)))
On Thu, Jan 27, 2022 at 04:13:47PM +0100, Jan Beulich wrote:
> While we don't want to skip calling update_idle_stats(), arrange for it
> to not increment the overall time spent in the state we didn't really
> enter.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC: If we wanted to also move the tracing, then I think the part ahead
> of the if() also would need moving. At that point we could as well
> move update_last_cx_stat(), too, which afaict would allow skipping
> update_idle_stats() on the "else" path (which therefore would go
> away). Yet then, with the setting of power->safe_state moved up a
> little (which imo it should have been anyway) the two
> cpu_is_haltable() invocations would only have the lapic_timer_off()
> invocation left in between. This would then seem to call for simply
> ditching the 2nd one - acpi-idle also doesn't have a 2nd instance.
It's possible for lapic_timer_off to take a non-trivial amount of time
when virtualized, but it's likely we won't be using mwait in that
case, so not sure it matter much to have the two cpu_is_haltable calls
if there's just a lapic_timer_off between them.
> TBD: For the tracing I wonder if that really needs to come ahead of the
> local_irq_enable(). Maybe trace_exit_reason() needs to, but quite
> certainly TRACE_6D() doesn't.
Would be good if it could be moved after the local_irq_enable call, as
it's not as trivial as I've expected, and will just add latency to any
pending interrupt waiting to be serviced. FWIW, I haven't spotted a
need to call it with interrupt disabled.
> ---
> v3: Also move cstate_restore_tsc() invocation and split ones to
> update_idle_stats().
> v2: New.
>
> --- a/xen/arch/x86/cpu/mwait-idle.c
> +++ b/xen/arch/x86/cpu/mwait-idle.c
> @@ -854,17 +854,23 @@ static void mwait_idle(void)
> mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK);
>
> local_irq_disable();
> - }
>
> - after = alternative_call(cpuidle_get_tick);
> + after = alternative_call(cpuidle_get_tick);
> +
> + cstate_restore_tsc();
> +
> + /* Now back in C0. */
> + update_idle_stats(power, cx, before, after);
> + } else {
> + /* Never left C0. */
> + after = alternative_call(cpuidle_get_tick);
> + update_idle_stats(power, cx, after, after);
While adjusting this, could you also modify update_idle_stats to avoid
increasing cx->usage if before == after (or !sleep_ticks). I don't
think it's fine to increase the state counter if we never actually
entered it.
I was also going to suggest that you don't set 'after' and just use
update_idle_stats(power, cx, before, before); but seeing as TRACE_6D
also makes use of 'after' there's not much point without further
rework. I also see the RFC note at the top, so while I think this is
an improvement, I agree it would be nice to avoid the trace altogether
if we never actually enter the state. If you want to rework the patch
or send a followup that's fine for me.
Thanks, Roger.
On 01.02.2022 12:04, Roger Pau Monné wrote:
> On Thu, Jan 27, 2022 at 04:13:47PM +0100, Jan Beulich wrote:
>> While we don't want to skip calling update_idle_stats(), arrange for it
>> to not increment the overall time spent in the state we didn't really
>> enter.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> RFC: If we wanted to also move the tracing, then I think the part ahead
>> of the if() also would need moving. At that point we could as well
>> move update_last_cx_stat(), too, which afaict would allow skipping
>> update_idle_stats() on the "else" path (which therefore would go
>> away). Yet then, with the setting of power->safe_state moved up a
>> little (which imo it should have been anyway) the two
>> cpu_is_haltable() invocations would only have the lapic_timer_off()
>> invocation left in between. This would then seem to call for simply
>> ditching the 2nd one - acpi-idle also doesn't have a 2nd instance.
>
> It's possible for lapic_timer_off to take a non-trivial amount of time
> when virtualized, but it's likely we won't be using mwait in that
> case, so not sure it matter much to have the two cpu_is_haltable calls
> if there's just a lapic_timer_off between them.
>
>> TBD: For the tracing I wonder if that really needs to come ahead of the
>> local_irq_enable(). Maybe trace_exit_reason() needs to, but quite
>> certainly TRACE_6D() doesn't.
>
> Would be good if it could be moved after the local_irq_enable call, as
> it's not as trivial as I've expected, and will just add latency to any
> pending interrupt waiting to be serviced. FWIW, I haven't spotted a
> need to call it with interrupt disabled.
Okay, I guess I'll to the larger rework then.
>> --- a/xen/arch/x86/cpu/mwait-idle.c
>> +++ b/xen/arch/x86/cpu/mwait-idle.c
>> @@ -854,17 +854,23 @@ static void mwait_idle(void)
>> mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK);
>>
>> local_irq_disable();
>> - }
>>
>> - after = alternative_call(cpuidle_get_tick);
>> + after = alternative_call(cpuidle_get_tick);
>> +
>> + cstate_restore_tsc();
>> +
>> + /* Now back in C0. */
>> + update_idle_stats(power, cx, before, after);
>> + } else {
>> + /* Never left C0. */
>> + after = alternative_call(cpuidle_get_tick);
>> + update_idle_stats(power, cx, after, after);
>
> While adjusting this, could you also modify update_idle_stats to avoid
> increasing cx->usage if before == after (or !sleep_ticks). I don't
> think it's fine to increase the state counter if we never actually
> entered it.
I did consider it but then decided against. Even leaving this aspect
aside the counter only counts _attempts_ to enter a certain state;
the CPU may find reasons to never actually enter it. And what we have
when before == after is still an attempt, albeit an unsuccessful one.
Jan
On Tue, Feb 01, 2022 at 12:37:27PM +0100, Jan Beulich wrote:
> On 01.02.2022 12:04, Roger Pau Monné wrote:
> > On Thu, Jan 27, 2022 at 04:13:47PM +0100, Jan Beulich wrote:
> >> While we don't want to skip calling update_idle_stats(), arrange for it
> >> to not increment the overall time spent in the state we didn't really
> >> enter.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> ---
> >> RFC: If we wanted to also move the tracing, then I think the part ahead
> >> of the if() also would need moving. At that point we could as well
> >> move update_last_cx_stat(), too, which afaict would allow skipping
> >> update_idle_stats() on the "else" path (which therefore would go
> >> away). Yet then, with the setting of power->safe_state moved up a
> >> little (which imo it should have been anyway) the two
> >> cpu_is_haltable() invocations would only have the lapic_timer_off()
> >> invocation left in between. This would then seem to call for simply
> >> ditching the 2nd one - acpi-idle also doesn't have a 2nd instance.
> >
> > It's possible for lapic_timer_off to take a non-trivial amount of time
> > when virtualized, but it's likely we won't be using mwait in that
> > case, so not sure it matter much to have the two cpu_is_haltable calls
> > if there's just a lapic_timer_off between them.
> >
> >> TBD: For the tracing I wonder if that really needs to come ahead of the
> >> local_irq_enable(). Maybe trace_exit_reason() needs to, but quite
> >> certainly TRACE_6D() doesn't.
> >
> > Would be good if it could be moved after the local_irq_enable call, as
> > it's not as trivial as I've expected, and will just add latency to any
> > pending interrupt waiting to be serviced. FWIW, I haven't spotted a
> > need to call it with interrupt disabled.
>
> Okay, I guess I'll to the larger rework then.
>
> >> --- a/xen/arch/x86/cpu/mwait-idle.c
> >> +++ b/xen/arch/x86/cpu/mwait-idle.c
> >> @@ -854,17 +854,23 @@ static void mwait_idle(void)
> >> mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK);
> >>
> >> local_irq_disable();
> >> - }
> >>
> >> - after = alternative_call(cpuidle_get_tick);
> >> + after = alternative_call(cpuidle_get_tick);
> >> +
> >> + cstate_restore_tsc();
> >> +
> >> + /* Now back in C0. */
> >> + update_idle_stats(power, cx, before, after);
> >> + } else {
> >> + /* Never left C0. */
> >> + after = alternative_call(cpuidle_get_tick);
> >> + update_idle_stats(power, cx, after, after);
> >
> > While adjusting this, could you also modify update_idle_stats to avoid
> > increasing cx->usage if before == after (or !sleep_ticks). I don't
> > think it's fine to increase the state counter if we never actually
> > entered it.
>
> I did consider it but then decided against. Even leaving this aspect
> aside the counter only counts _attempts_ to enter a certain state;
> the CPU may find reasons to never actually enter it. And what we have
> when before == after is still an attempt, albeit an unsuccessful one.
Right, in which case:
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Not sure whether you would like to commit this now and do the lager
rework as a followup patch. That would be fine by me.
Thanks, Roger.
On 01.02.2022 13:42, Roger Pau Monné wrote:
> On Tue, Feb 01, 2022 at 12:37:27PM +0100, Jan Beulich wrote:
>> On 01.02.2022 12:04, Roger Pau Monné wrote:
>>> On Thu, Jan 27, 2022 at 04:13:47PM +0100, Jan Beulich wrote:
>>>> While we don't want to skip calling update_idle_stats(), arrange for it
>>>> to not increment the overall time spent in the state we didn't really
>>>> enter.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>> ---
>>>> RFC: If we wanted to also move the tracing, then I think the part ahead
>>>> of the if() also would need moving. At that point we could as well
>>>> move update_last_cx_stat(), too, which afaict would allow skipping
>>>> update_idle_stats() on the "else" path (which therefore would go
>>>> away). Yet then, with the setting of power->safe_state moved up a
>>>> little (which imo it should have been anyway) the two
>>>> cpu_is_haltable() invocations would only have the lapic_timer_off()
>>>> invocation left in between. This would then seem to call for simply
>>>> ditching the 2nd one - acpi-idle also doesn't have a 2nd instance.
>>>
>>> It's possible for lapic_timer_off to take a non-trivial amount of time
>>> when virtualized, but it's likely we won't be using mwait in that
>>> case, so not sure it matter much to have the two cpu_is_haltable calls
>>> if there's just a lapic_timer_off between them.
>>>
>>>> TBD: For the tracing I wonder if that really needs to come ahead of the
>>>> local_irq_enable(). Maybe trace_exit_reason() needs to, but quite
>>>> certainly TRACE_6D() doesn't.
>>>
>>> Would be good if it could be moved after the local_irq_enable call, as
>>> it's not as trivial as I've expected, and will just add latency to any
>>> pending interrupt waiting to be serviced. FWIW, I haven't spotted a
>>> need to call it with interrupt disabled.
>>
>> Okay, I guess I'll to the larger rework then.
>>
>>>> --- a/xen/arch/x86/cpu/mwait-idle.c
>>>> +++ b/xen/arch/x86/cpu/mwait-idle.c
>>>> @@ -854,17 +854,23 @@ static void mwait_idle(void)
>>>> mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK);
>>>>
>>>> local_irq_disable();
>>>> - }
>>>>
>>>> - after = alternative_call(cpuidle_get_tick);
>>>> + after = alternative_call(cpuidle_get_tick);
>>>> +
>>>> + cstate_restore_tsc();
>>>> +
>>>> + /* Now back in C0. */
>>>> + update_idle_stats(power, cx, before, after);
>>>> + } else {
>>>> + /* Never left C0. */
>>>> + after = alternative_call(cpuidle_get_tick);
>>>> + update_idle_stats(power, cx, after, after);
>>>
>>> While adjusting this, could you also modify update_idle_stats to avoid
>>> increasing cx->usage if before == after (or !sleep_ticks). I don't
>>> think it's fine to increase the state counter if we never actually
>>> entered it.
>>
>> I did consider it but then decided against. Even leaving this aspect
>> aside the counter only counts _attempts_ to enter a certain state;
>> the CPU may find reasons to never actually enter it. And what we have
>> when before == after is still an attempt, albeit an unsuccessful one.
>
> Right, in which case:
>
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Thanks, but ...
> Not sure whether you would like to commit this now and do the lager
> rework as a followup patch. That would be fine by me.
... no, I'd rather do this in a single step. In its current shape the
patch is actually moving us in the opposite direction.
Jan
© 2016 - 2026 Red Hat, Inc.