In highres mode, the kernel only considers timer wheel events when
considering whether to keep the tick on (via get_next_interrupt()).
This seems odd because it consider several other reasons to keep the
tick on. Further, turning off the tick does not help because once idle
exit happens due to that imminent hrtimer interrupt, the tick hrtimer
interrupt is requeued. That means more hrtimer rbtree operations for not
much benefit.
Ideally we should not have to do anything because the cpuidle governor
should not try to the stop the tick because it knows about this
situation, but apparently it still does try to stop the tick.
To be more efficient, check for any immminent non-sched hrtimer event
and keep the tick on if we know such events exist.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
kernel/time/tick-sched.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 52a4eda664cf..4aa64266f2b0 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -890,7 +890,7 @@ u64 get_jiffies_update(unsigned long *basej)
*/
static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
{
- u64 basemono, next_tick, delta, expires;
+ u64 basemono, next_tick, delta, expires, delta_hr, next_hr_wo;
unsigned long basejiff;
int tick_cpu;
@@ -932,7 +932,10 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
* force prod the timer.
*/
delta = next_tick - basemono;
- if (delta <= (u64)TICK_NSEC) {
+ next_hr_wo = hrtimer_next_event_without(&ts->sched_timer);
+ delta_hr = next_hr_wo - basemono;
+
+ if (delta <= (u64)TICK_NSEC || delta_hr <= (u64)TICK_NSEC) {
/*
* We've not stopped the tick yet, and there's a timer in the
* next period, so no point in stopping it either, bail.
--
2.47.0.277.g8800431eea-goog
On 11/8/24 17:48, Joel Fernandes (Google) wrote: > In highres mode, the kernel only considers timer wheel events when > considering whether to keep the tick on (via get_next_interrupt()). > > This seems odd because it consider several other reasons to keep the > tick on. Further, turning off the tick does not help because once idle > exit happens due to that imminent hrtimer interrupt, the tick hrtimer > interrupt is requeued. That means more hrtimer rbtree operations for not > much benefit. > > Ideally we should not have to do anything because the cpuidle governor > should not try to the stop the tick because it knows about this > situation, but apparently it still does try to stop the tick. Any details on this? Which governor?
On Mon, Nov 11, 2024 at 7:38 AM Christian Loehle <christian.loehle@arm.com> wrote: > > On 11/8/24 17:48, Joel Fernandes (Google) wrote: > > In highres mode, the kernel only considers timer wheel events when > > considering whether to keep the tick on (via get_next_interrupt()). > > > > This seems odd because it consider several other reasons to keep the > > tick on. Further, turning off the tick does not help because once idle > > exit happens due to that imminent hrtimer interrupt, the tick hrtimer > > interrupt is requeued. That means more hrtimer rbtree operations for not > > much benefit. > > > > Ideally we should not have to do anything because the cpuidle governor > > should not try to the stop the tick because it knows about this > > situation, but apparently it still does try to stop the tick. > > Any details on this? Which governor? I noticed this in Qemu (virtualized hardware). Actually I need to update the commit message. I think it is not because of the governor but because of lack of guest cpuidle support. static void cpuidle_idle_call(void) { .... if (cpuidle_not_available(drv, dev)) { tick_nohz_idle_stop_tick(); default_idle_call(); goto exit_idle; } ... Over here dev and drv are NULL for me. I will also test on real hardware. Also maybe the " if (cpuidle_not_available(drv, dev))" condition should do some more work to determine if tick_nohz_idle_stop_tick() should be called instead of unconditionally calling it? Pasting relevant parts of my .config: # grep IDLE .config CONFIG_NO_HZ_IDLE=y CONFIG_ARCH_CPUIDLE_HALTPOLL=y CONFIG_ACPI_PROCESSOR_IDLE=y # CPU Idle CONFIG_CPU_IDLE=y # CONFIG_CPU_IDLE_GOV_LADDER is not set CONFIG_CPU_IDLE_GOV_MENU=y # CONFIG_CPU_IDLE_GOV_TEO is not set CONFIG_CPU_IDLE_GOV_HALTPOLL=y CONFIG_HALTPOLL_CPUIDLE=y # end of CPU Idle CONFIG_INTEL_IDLE=y thanks, - Joel
On 11/11/24 15:56, Joel Fernandes wrote: > On Mon, Nov 11, 2024 at 7:38 AM Christian Loehle > <christian.loehle@arm.com> wrote: >> >> On 11/8/24 17:48, Joel Fernandes (Google) wrote: >>> In highres mode, the kernel only considers timer wheel events when >>> considering whether to keep the tick on (via get_next_interrupt()). >>> >>> This seems odd because it consider several other reasons to keep the >>> tick on. Further, turning off the tick does not help because once idle >>> exit happens due to that imminent hrtimer interrupt, the tick hrtimer >>> interrupt is requeued. That means more hrtimer rbtree operations for not >>> much benefit. >>> >>> Ideally we should not have to do anything because the cpuidle governor >>> should not try to the stop the tick because it knows about this >>> situation, but apparently it still does try to stop the tick. >> >> Any details on this? Which governor? > > I noticed this in Qemu (virtualized hardware). Actually I need to > update the commit message. I think it is not because of the governor > but because of lack of guest cpuidle support. Ah indeed, then it makes sense. FYI Anna-Maria proposed something like below a year ago: https://lore.kernel.org/lkml/20231215130501.24542-1-anna-maria@linutronix.de/ I have no strong opinion on it either way. Regards, Christian > > static void cpuidle_idle_call(void) > { > .... > if (cpuidle_not_available(drv, dev)) { > tick_nohz_idle_stop_tick(); > default_idle_call(); > goto exit_idle; > } > ... > Over here dev and drv are NULL for me. I will also test on real hardware. > > Also maybe the " if (cpuidle_not_available(drv, dev))" condition > should do some more work to determine if tick_nohz_idle_stop_tick() > should be called instead of unconditionally calling it? > > Pasting relevant parts of my .config: > > # grep IDLE .config > CONFIG_NO_HZ_IDLE=y > CONFIG_ARCH_CPUIDLE_HALTPOLL=y > CONFIG_ACPI_PROCESSOR_IDLE=y > # CPU Idle > CONFIG_CPU_IDLE=y > # CONFIG_CPU_IDLE_GOV_LADDER is not set > CONFIG_CPU_IDLE_GOV_MENU=y > # CONFIG_CPU_IDLE_GOV_TEO is not set > CONFIG_CPU_IDLE_GOV_HALTPOLL=y > CONFIG_HALTPOLL_CPUIDLE=y > # end of CPU Idle > CONFIG_INTEL_IDLE=y > > thanks, > > - Joel
On Mon, Nov 11, 2024 at 11:55 AM Christian Loehle <christian.loehle@arm.com> wrote: > > On 11/11/24 15:56, Joel Fernandes wrote: > > On Mon, Nov 11, 2024 at 7:38 AM Christian Loehle > > <christian.loehle@arm.com> wrote: > >> > >> On 11/8/24 17:48, Joel Fernandes (Google) wrote: > >>> In highres mode, the kernel only considers timer wheel events when > >>> considering whether to keep the tick on (via get_next_interrupt()). > >>> > >>> This seems odd because it consider several other reasons to keep the > >>> tick on. Further, turning off the tick does not help because once idle > >>> exit happens due to that imminent hrtimer interrupt, the tick hrtimer > >>> interrupt is requeued. That means more hrtimer rbtree operations for not > >>> much benefit. > >>> > >>> Ideally we should not have to do anything because the cpuidle governor > >>> should not try to the stop the tick because it knows about this > >>> situation, but apparently it still does try to stop the tick. > >> > >> Any details on this? Which governor? > > > > I noticed this in Qemu (virtualized hardware). Actually I need to > > update the commit message. I think it is not because of the governor > > but because of lack of guest cpuidle support. > > Ah indeed, then it makes sense. > FYI Anna-Maria proposed something like below a year ago: > https://lore.kernel.org/lkml/20231215130501.24542-1-anna-maria@linutronix.de/ > I have no strong opinion on it either way. Thanks for the pointer, it is great to know this was discussed. One thing I am not fully sure about that patch is, if we can just drop tick-stoppage like that. I share Pierre's concern [1]. Maybe a better approach is to do some rudimentary checks for imminent events when there is no driver/device loaded and only then stop the tick. I will try to explore such an approach and see if I can come up with something. thanks, - Joel [1] https://lore.kernel.org/lkml/06a2561f-557b-4eaa-8f11-75883bbbaef9@arm.com/
© 2016 - 2024 Red Hat, Inc.