[RFC PATCH 1/2] timer: Use is_idle_task() check instead of idle_cpu() on irq exit

Ze Gao posted 2 patches 1 year, 6 months ago
[RFC PATCH 1/2] timer: Use is_idle_task() check instead of idle_cpu() on irq exit
Posted by Ze Gao 1 year, 6 months ago
idle_cpu() was initially introduced in irq_enter()/exit() to check
whether an irq interrupts an idle cpu or not since commit
79bf2bb335b8 ("[PATCH] tick-management: dyntick / highres functionality")
and at that time, it's implemented via a simple check if the curr
of task of that rq is idle or not. And then commit 6378ddb59215 ("time:
track accurate idle time with tick_sched.idle_sleeptime") uses the same
check to do accurate idle time accounting.

But since commit 908a3283728d ("sched: Fix idle_cpu()"), idle_cpu()
takes scheduler stats into consideration and becomes more constrained,
and therefore it tells more than if we have interrupted an idle
process but also whether a cpu is going to be idle or not since it
takes queued tasks and queued to be woken tasks into account.

However for tick user, it is too much as now we only rely on this check
to do nohz idle time accounting in tick_nohz_start_idle() just in case
that tick_nohz_stop_idle() is called upon irq_enter() if we actually
rupture an idle cpu(process). The use of idle_cpu() simply complicates
things here, and the introduction of sched_core_idle_cpu() in
commit 548796e2e70b ("sched/core: introduce sched_core_idle_cpu()")
proves this.

The use of is_idle_task() just like in commit 0a8a2e78b7ee ("timer: Fix
bad idle check on irq entry") helps to save one unnecessary fix for idle
time accounting for the newly force idle state. Note this also preps for
the remove of sched_core_idle_cpu() in the following patch.

Signed-off-by: Ze Gao <zegao@tencent.com>
---
 kernel/softirq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 02582017759a..24c7bf3c3f6c 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -617,7 +617,7 @@ static inline void tick_irq_exit(void)
 	int cpu = smp_processor_id();
 
 	/* Make sure that timer wheel updates are propagated */
-	if ((sched_core_idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
+	if ((is_idle_task(current) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
 		if (!in_hardirq())
 			tick_nohz_irq_exit();
 	}
-- 
2.41.0
Re: [RFC PATCH 1/2] timer: Use is_idle_task() check instead of idle_cpu() on irq exit
Posted by Frederic Weisbecker 1 year, 6 months ago
Le Thu, May 30, 2024 at 08:24:00AM -0400, Ze Gao a écrit :
> idle_cpu() was initially introduced in irq_enter()/exit() to check
> whether an irq interrupts an idle cpu or not since commit
> 79bf2bb335b8 ("[PATCH] tick-management: dyntick / highres functionality")
> and at that time, it's implemented via a simple check if the curr
> of task of that rq is idle or not. And then commit 6378ddb59215 ("time:
> track accurate idle time with tick_sched.idle_sleeptime") uses the same
> check to do accurate idle time accounting.
> 
> But since commit 908a3283728d ("sched: Fix idle_cpu()"), idle_cpu()
> takes scheduler stats into consideration and becomes more constrained,
> and therefore it tells more than if we have interrupted an idle
> process but also whether a cpu is going to be idle or not since it
> takes queued tasks and queued to be woken tasks into account.
> 
> However for tick user, it is too much as now we only rely on this check
> to do nohz idle time accounting in tick_nohz_start_idle() just in case
> that tick_nohz_stop_idle() is called upon irq_enter() if we actually
> rupture an idle cpu(process). The use of idle_cpu() simply complicates
> things here, and the introduction of sched_core_idle_cpu() in
> commit 548796e2e70b ("sched/core: introduce sched_core_idle_cpu()")
> proves this.
> 
> The use of is_idle_task() just like in commit 0a8a2e78b7ee ("timer: Fix
> bad idle check on irq entry") helps to save one unnecessary fix for idle
> time accounting for the newly force idle state. Note this also preps for
> the remove of sched_core_idle_cpu() in the following patch.
> 
> Signed-off-by: Ze Gao <zegao@tencent.com>
> ---
>  kernel/softirq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 02582017759a..24c7bf3c3f6c 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -617,7 +617,7 @@ static inline void tick_irq_exit(void)
>  	int cpu = smp_processor_id();
>  
>  	/* Make sure that timer wheel updates are propagated */
> -	if ((sched_core_idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
> +	if ((is_idle_task(current) && !need_resched()) || tick_nohz_full_cpu(cpu)) {

The reason why there is a check here for idle_cpu() (or sched_core_idle_cpu())
is to avoid calling again tick_nohz_start_idle() and then again
tick_nohz_stop_idle() later from tick_nohz_idle_exit(). This can save two costly
calls to ktime_get() when a real task is waiting for the CPU. So any quick clue to
know if a task is going to be scheduled is good to get. And idle_cpu() gives
them all:

int idle_cpu(int cpu)
{
	struct rq *rq = cpu_rq(cpu);

	if (rq->curr != rq->idle)
		return 0;

// This is the necessary is_idle_task() check

	if (rq->nr_running)
		return 0;

// This tells if there is a real task pending. Ok that check
// is perhaps a bit redundant with need_resched()...

#ifdef CONFIG_SMP
	if (rq->ttwu_pending)
		return 0;
#endif

// This one tells if there is a remote wakeup pending for this CPU.
// And need_resched() doesn't tell about that yet...

	return 1;
}

So it looks to me that idle_cpu() is still a good fit at this place.
And sched_core_idle_cpu() doesn't bring more overhead since the static
key in sched_core_enabled() is rarely active (I guess...). And if it is,
then the check is even more simple.

Thanks.

>  		if (!in_hardirq())
>  			tick_nohz_irq_exit();
>  	}
> -- 
> 2.41.0
> 
Re: [RFC PATCH 1/2] timer: Use is_idle_task() check instead of idle_cpu() on irq exit
Posted by Ze Gao 1 year, 6 months ago
Hi Frederic,

Thanks for your reply and even more thanks for the detailed comments
and elaboration.

On Sat, Jun 8, 2024 at 6:46 AM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> Le Thu, May 30, 2024 at 08:24:00AM -0400, Ze Gao a écrit :
> > idle_cpu() was initially introduced in irq_enter()/exit() to check
> > whether an irq interrupts an idle cpu or not since commit
> > 79bf2bb335b8 ("[PATCH] tick-management: dyntick / highres functionality")
> > and at that time, it's implemented via a simple check if the curr
> > of task of that rq is idle or not. And then commit 6378ddb59215 ("time:
> > track accurate idle time with tick_sched.idle_sleeptime") uses the same
> > check to do accurate idle time accounting.
> >
> > But since commit 908a3283728d ("sched: Fix idle_cpu()"), idle_cpu()
> > takes scheduler stats into consideration and becomes more constrained,
> > and therefore it tells more than if we have interrupted an idle
> > process but also whether a cpu is going to be idle or not since it
> > takes queued tasks and queued to be woken tasks into account.
> >
> > However for tick user, it is too much as now we only rely on this check
> > to do nohz idle time accounting in tick_nohz_start_idle() just in case
> > that tick_nohz_stop_idle() is called upon irq_enter() if we actually
> > rupture an idle cpu(process). The use of idle_cpu() simply complicates
> > things here, and the introduction of sched_core_idle_cpu() in
> > commit 548796e2e70b ("sched/core: introduce sched_core_idle_cpu()")
> > proves this.
> >
> > The use of is_idle_task() just like in commit 0a8a2e78b7ee ("timer: Fix
> > bad idle check on irq entry") helps to save one unnecessary fix for idle
> > time accounting for the newly force idle state. Note this also preps for
> > the remove of sched_core_idle_cpu() in the following patch.
> >
> > Signed-off-by: Ze Gao <zegao@tencent.com>
> > ---
> >  kernel/softirq.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/softirq.c b/kernel/softirq.c
> > index 02582017759a..24c7bf3c3f6c 100644
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> > @@ -617,7 +617,7 @@ static inline void tick_irq_exit(void)
> >       int cpu = smp_processor_id();
> >
> >       /* Make sure that timer wheel updates are propagated */
> > -     if ((sched_core_idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
> > +     if ((is_idle_task(current) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
>
> The reason why there is a check here for idle_cpu() (or sched_core_idle_cpu())
> is to avoid calling again tick_nohz_start_idle() and then again
> tick_nohz_stop_idle() later from tick_nohz_idle_exit(). This can save two costly
> calls to ktime_get() when a real task is waiting for the CPU. So any quick clue to
> know if a task is going to be scheduled is good to get. And idle_cpu() gives
> them all:
>
> int idle_cpu(int cpu)
> {
>         struct rq *rq = cpu_rq(cpu);
>
>         if (rq->curr != rq->idle)
>                 return 0;
>
> // This is the necessary is_idle_task() check
>
>         if (rq->nr_running)
>                 return 0;
>
> // This tells if there is a real task pending. Ok that check
> // is perhaps a bit redundant with need_resched()...
>
> #ifdef CONFIG_SMP
>         if (rq->ttwu_pending)
>                 return 0;
> #endif
>
> // This one tells if there is a remote wakeup pending for this CPU.
> // And need_resched() doesn't tell about that yet...

Please correct me if I'm stupid here.

Is it possible that there is a time window between this becoming true and
schedule_idle(), which is TIF_NEED_RESCHED is not set in time and this CPU
will be doing arch idle again? If so, we're actually counting less
idle time than it is.

I will test if this is true and provide statistics later. Appreciate
your attention again:)

Thanks,
Ze

>         return 1;
> }
>
> So it looks to me that idle_cpu() is still a good fit at this place.
> And sched_core_idle_cpu() doesn't bring more overhead since the static
> key in sched_core_enabled() is rarely active (I guess...). And if it is,
> then the check is even more simple.
>
> Thanks.
>
> >               if (!in_hardirq())
> >                       tick_nohz_irq_exit();
> >       }
> > --
> > 2.41.0
> >