[RFC 2/3] tick-sched: Keep tick on if hrtimer is due imminently

Joel Fernandes (Google) posted 3 patches 2 weeks, 1 day ago
[RFC 2/3] tick-sched: Keep tick on if hrtimer is due imminently
Posted by Joel Fernandes (Google) 2 weeks, 1 day ago
In highres mode, the kernel only considers timer wheel events when
considering whether to keep the tick on (via get_next_interrupt()).

This seems odd because it consider several other reasons to keep the
tick on. Further, turning off the tick does not help because once idle
exit happens due to that imminent hrtimer interrupt, the tick hrtimer
interrupt is requeued. That means more hrtimer rbtree operations for not
much benefit.

Ideally we should not have to do anything because the cpuidle governor
should not try to the stop the tick because it knows about this
situation, but apparently it still does try to stop the tick.

To be more efficient, check for any immminent non-sched hrtimer event
and keep the tick on if we know such events exist.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/time/tick-sched.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 52a4eda664cf..4aa64266f2b0 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -890,7 +890,7 @@ u64 get_jiffies_update(unsigned long *basej)
  */
 static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
 {
-	u64 basemono, next_tick, delta, expires;
+	u64 basemono, next_tick, delta, expires, delta_hr, next_hr_wo;
 	unsigned long basejiff;
 	int tick_cpu;
 
@@ -932,7 +932,10 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
 	 * force prod the timer.
 	 */
 	delta = next_tick - basemono;
-	if (delta <= (u64)TICK_NSEC) {
+	next_hr_wo = hrtimer_next_event_without(&ts->sched_timer);
+	delta_hr = next_hr_wo - basemono;
+
+	if (delta <= (u64)TICK_NSEC || delta_hr <= (u64)TICK_NSEC) {
 		/*
 		 * We've not stopped the tick yet, and there's a timer in the
 		 * next period, so no point in stopping it either, bail.
-- 
2.47.0.277.g8800431eea-goog
Re: [RFC 2/3] tick-sched: Keep tick on if hrtimer is due imminently
Posted by Christian Loehle 1 week, 5 days ago
On 11/8/24 17:48, Joel Fernandes (Google) wrote:
> In highres mode, the kernel only considers timer wheel events when
> considering whether to keep the tick on (via get_next_interrupt()).
> 
> This seems odd because it consider several other reasons to keep the
> tick on. Further, turning off the tick does not help because once idle
> exit happens due to that imminent hrtimer interrupt, the tick hrtimer
> interrupt is requeued. That means more hrtimer rbtree operations for not
> much benefit.
> 
> Ideally we should not have to do anything because the cpuidle governor
> should not try to the stop the tick because it knows about this
> situation, but apparently it still does try to stop the tick.

Any details on this? Which governor?
Re: [RFC 2/3] tick-sched: Keep tick on if hrtimer is due imminently
Posted by Joel Fernandes 1 week, 5 days ago
On Mon, Nov 11, 2024 at 7:38 AM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 11/8/24 17:48, Joel Fernandes (Google) wrote:
> > In highres mode, the kernel only considers timer wheel events when
> > considering whether to keep the tick on (via get_next_interrupt()).
> >
> > This seems odd because it consider several other reasons to keep the
> > tick on. Further, turning off the tick does not help because once idle
> > exit happens due to that imminent hrtimer interrupt, the tick hrtimer
> > interrupt is requeued. That means more hrtimer rbtree operations for not
> > much benefit.
> >
> > Ideally we should not have to do anything because the cpuidle governor
> > should not try to the stop the tick because it knows about this
> > situation, but apparently it still does try to stop the tick.
>
> Any details on this? Which governor?

I noticed this in Qemu (virtualized hardware). Actually I need to
update the commit message. I think it is not because of the governor
but because of lack of guest cpuidle support.

static void cpuidle_idle_call(void)
{
....
  if (cpuidle_not_available(drv, dev)) {
    tick_nohz_idle_stop_tick();
    default_idle_call();
    goto exit_idle;
  }
...
Over here dev and drv are NULL for me. I will also test on real hardware.

Also maybe the " if (cpuidle_not_available(drv, dev))" condition
should do some more work to determine if tick_nohz_idle_stop_tick()
should be called instead of unconditionally calling it?

Pasting relevant parts of my .config:

# grep IDLE .config
CONFIG_NO_HZ_IDLE=y
CONFIG_ARCH_CPUIDLE_HALTPOLL=y
CONFIG_ACPI_PROCESSOR_IDLE=y
# CPU Idle
CONFIG_CPU_IDLE=y
# CONFIG_CPU_IDLE_GOV_LADDER is not set
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_CPU_IDLE_GOV_TEO is not set
CONFIG_CPU_IDLE_GOV_HALTPOLL=y
CONFIG_HALTPOLL_CPUIDLE=y
# end of CPU Idle
CONFIG_INTEL_IDLE=y

thanks,

 - Joel
Re: [RFC 2/3] tick-sched: Keep tick on if hrtimer is due imminently
Posted by Christian Loehle 1 week, 5 days ago
On 11/11/24 15:56, Joel Fernandes wrote:
> On Mon, Nov 11, 2024 at 7:38 AM Christian Loehle
> <christian.loehle@arm.com> wrote:
>>
>> On 11/8/24 17:48, Joel Fernandes (Google) wrote:
>>> In highres mode, the kernel only considers timer wheel events when
>>> considering whether to keep the tick on (via get_next_interrupt()).
>>>
>>> This seems odd because it consider several other reasons to keep the
>>> tick on. Further, turning off the tick does not help because once idle
>>> exit happens due to that imminent hrtimer interrupt, the tick hrtimer
>>> interrupt is requeued. That means more hrtimer rbtree operations for not
>>> much benefit.
>>>
>>> Ideally we should not have to do anything because the cpuidle governor
>>> should not try to the stop the tick because it knows about this
>>> situation, but apparently it still does try to stop the tick.
>>
>> Any details on this? Which governor?
> 
> I noticed this in Qemu (virtualized hardware). Actually I need to
> update the commit message. I think it is not because of the governor
> but because of lack of guest cpuidle support.

Ah indeed, then it makes sense.
FYI Anna-Maria proposed something like below a year ago:
https://lore.kernel.org/lkml/20231215130501.24542-1-anna-maria@linutronix.de/
I have no strong opinion on it either way.

Regards,
Christian

> 
> static void cpuidle_idle_call(void)
> {
> ....
>   if (cpuidle_not_available(drv, dev)) {
>     tick_nohz_idle_stop_tick();
>     default_idle_call();
>     goto exit_idle;
>   }
> ...
> Over here dev and drv are NULL for me. I will also test on real hardware.
> 
> Also maybe the " if (cpuidle_not_available(drv, dev))" condition
> should do some more work to determine if tick_nohz_idle_stop_tick()
> should be called instead of unconditionally calling it?
> 
> Pasting relevant parts of my .config:
> 
> # grep IDLE .config
> CONFIG_NO_HZ_IDLE=y
> CONFIG_ARCH_CPUIDLE_HALTPOLL=y
> CONFIG_ACPI_PROCESSOR_IDLE=y
> # CPU Idle
> CONFIG_CPU_IDLE=y
> # CONFIG_CPU_IDLE_GOV_LADDER is not set
> CONFIG_CPU_IDLE_GOV_MENU=y
> # CONFIG_CPU_IDLE_GOV_TEO is not set
> CONFIG_CPU_IDLE_GOV_HALTPOLL=y
> CONFIG_HALTPOLL_CPUIDLE=y
> # end of CPU Idle
> CONFIG_INTEL_IDLE=y
> 
> thanks,
> 
>  - Joel

Re: [RFC 2/3] tick-sched: Keep tick on if hrtimer is due imminently
Posted by Joel Fernandes 1 week, 5 days ago
On Mon, Nov 11, 2024 at 11:55 AM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 11/11/24 15:56, Joel Fernandes wrote:
> > On Mon, Nov 11, 2024 at 7:38 AM Christian Loehle
> > <christian.loehle@arm.com> wrote:
> >>
> >> On 11/8/24 17:48, Joel Fernandes (Google) wrote:
> >>> In highres mode, the kernel only considers timer wheel events when
> >>> considering whether to keep the tick on (via get_next_interrupt()).
> >>>
> >>> This seems odd because it consider several other reasons to keep the
> >>> tick on. Further, turning off the tick does not help because once idle
> >>> exit happens due to that imminent hrtimer interrupt, the tick hrtimer
> >>> interrupt is requeued. That means more hrtimer rbtree operations for not
> >>> much benefit.
> >>>
> >>> Ideally we should not have to do anything because the cpuidle governor
> >>> should not try to the stop the tick because it knows about this
> >>> situation, but apparently it still does try to stop the tick.
> >>
> >> Any details on this? Which governor?
> >
> > I noticed this in Qemu (virtualized hardware). Actually I need to
> > update the commit message. I think it is not because of the governor
> > but because of lack of guest cpuidle support.
>
> Ah indeed, then it makes sense.
> FYI Anna-Maria proposed something like below a year ago:
> https://lore.kernel.org/lkml/20231215130501.24542-1-anna-maria@linutronix.de/
> I have no strong opinion on it either way.

Thanks for the pointer, it is great to know this was discussed. One
thing I am not fully sure about that patch is, if we can just drop
tick-stoppage like that. I share Pierre's concern [1]. Maybe a better
approach is to do some rudimentary checks for imminent events when
there is no driver/device loaded and only then stop the tick. I will
try to explore such an approach and see if I can come up with
something.

thanks,

 - Joel

[1]
https://lore.kernel.org/lkml/06a2561f-557b-4eaa-8f11-75883bbbaef9@arm.com/