[PATCH v4 1/3] sched/fair: Move checking for nohz cpus after time check

Shrikanth Hegde posted 3 patches 3 weeks, 6 days ago
There is a newer version of this series
[PATCH v4 1/3] sched/fair: Move checking for nohz cpus after time check
Posted by Shrikanth Hegde 3 weeks, 6 days ago
NOHZ idle load balancer is kicked off only after time check. So move
the atomic read after the time check to access it only when needed.

When there are no idle CPUs(100% busy), even if the flag gets set to
NOHZ_STATS_KICK | NOHZ_NEXT_KICK, find_new_ilb will fail and
there will be no NOHZ idle balance. The current behaviour is retained.

Note: This patch doesn't solve any cacheline overheads. No improvement
in performance apart from saving a few cycles of atomic_read.

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
 kernel/sched/fair.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9743fc0b225c..17e4e8ac5fca 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12451,20 +12451,24 @@ static void nohz_balancer_kick(struct rq *rq)
 	 */
 	nohz_balance_exit_idle(rq);
 
-	/*
-	 * None are in tickless mode and hence no need for NOHZ idle load
-	 * balancing:
-	 */
-	if (likely(!atomic_read(&nohz.nr_cpus)))
-		return;
-
 	if (READ_ONCE(nohz.has_blocked_load) &&
 	    time_after(now, READ_ONCE(nohz.next_blocked)))
 		flags = NOHZ_STATS_KICK;
 
+	/*
+	 * If none are in tickless mode, though flag maybe set,
+	 * idle load balancing is not done as find_new_ilb fails
+	 */
 	if (time_before(now, nohz.next_balance))
 		goto out;
 
+	/*
+	 * None are in tickless mode and hence no need for NOHZ idle load
+	 * balancing:
+	 */
+	if (likely(!atomic_read(&nohz.nr_cpus)))
+		return;
+
 	if (rq->nr_running >= 2) {
 		flags = NOHZ_STATS_KICK | NOHZ_BALANCE_KICK;
 		goto out;
-- 
2.47.3
Re: [PATCH v4 1/3] sched/fair: Move checking for nohz cpus after time check
Posted by Vincent Guittot 3 weeks, 5 days ago
On Mon, 12 Jan 2026 at 06:05, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>
> NOHZ idle load balancer is kicked off only after time check. So move
> the atomic read after the time check to access it only when needed.
>
> When there are no idle CPUs(100% busy), even if the flag gets set to
> NOHZ_STATS_KICK | NOHZ_NEXT_KICK, find_new_ilb will fail and
> there will be no NOHZ idle balance. The current behaviour is retained.
>
> Note: This patch doesn't solve any cacheline overheads. No improvement
> in performance apart from saving a few cycles of atomic_read.

But won't these cycles be then wasted by calling needlessly kick_ilb

>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> ---
>  kernel/sched/fair.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9743fc0b225c..17e4e8ac5fca 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12451,20 +12451,24 @@ static void nohz_balancer_kick(struct rq *rq)
>          */
>         nohz_balance_exit_idle(rq);
>
> -       /*
> -        * None are in tickless mode and hence no need for NOHZ idle load
> -        * balancing:
> -        */
> -       if (likely(!atomic_read(&nohz.nr_cpus)))
> -               return;
> -
>         if (READ_ONCE(nohz.has_blocked_load) &&
>             time_after(now, READ_ONCE(nohz.next_blocked)))
>                 flags = NOHZ_STATS_KICK;
>
> +       /*
> +        * If none are in tickless mode, though flag maybe set,
> +        * idle load balancing is not done as find_new_ilb fails
> +        */
>         if (time_before(now, nohz.next_balance))
>                 goto out;
>
> +       /*
> +        * None are in tickless mode and hence no need for NOHZ idle load
> +        * balancing:
> +        */
> +       if (likely(!atomic_read(&nohz.nr_cpus)))
> +               return;
> +
>         if (rq->nr_running >= 2) {
>                 flags = NOHZ_STATS_KICK | NOHZ_BALANCE_KICK;
>                 goto out;
> --
> 2.47.3
>
Re: [PATCH v4 1/3] sched/fair: Move checking for nohz cpus after time check
Posted by Shrikanth Hegde 3 weeks, 5 days ago

On 1/13/26 2:37 PM, Vincent Guittot wrote:
> On Mon, 12 Jan 2026 at 06:05, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>>
>> NOHZ idle load balancer is kicked off only after time check. So move
>> the atomic read after the time check to access it only when needed.
>>
>> When there are no idle CPUs(100% busy), even if the flag gets set to
>> NOHZ_STATS_KICK | NOHZ_NEXT_KICK, find_new_ilb will fail and
>> there will be no NOHZ idle balance. The current behaviour is retained.
>>
>> Note: This patch doesn't solve any cacheline overheads. No improvement
>> in performance apart from saving a few cycles of atomic_read.
> 
> But won't these cycles be then wasted by calling needlessly kick_ilb
> 

when there are nohz cpus, i.e nohz.nr_cpus > 0, there is no change in codeflow.

Only when system is 100%(which is expected to be rare), nohz.nr_cpus == 0,
then it is expected that has_blocked_load = 0. So flags shouldn't be set.
Note we are still doing a return if nohz.nr_cpus == 0. So kick_ilb shouldn't be
called.

Do you see any path still calling kick_ilb un-necessarily?


>>
>> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
>> ---
>>   kernel/sched/fair.c | 18 +++++++++++-------
>>   1 file changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 9743fc0b225c..17e4e8ac5fca 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -12451,20 +12451,24 @@ static void nohz_balancer_kick(struct rq *rq)
>>           */
>>          nohz_balance_exit_idle(rq);
>>
>> -       /*
>> -        * None are in tickless mode and hence no need for NOHZ idle load
>> -        * balancing:
>> -        */
>> -       if (likely(!atomic_read(&nohz.nr_cpus)))
>> -               return;
>> -
>>          if (READ_ONCE(nohz.has_blocked_load) &&
>>              time_after(now, READ_ONCE(nohz.next_blocked)))
>>                  flags = NOHZ_STATS_KICK;
>>
>> +       /*
>> +        * If none are in tickless mode, though flag maybe set,
>> +        * idle load balancing is not done as find_new_ilb fails
>> +        */
>>          if (time_before(now, nohz.next_balance))
>>                  goto out;
>>
>> +       /*
>> +        * None are in tickless mode and hence no need for NOHZ idle load
>> +        * balancing:
>> +        */
>> +       if (likely(!atomic_read(&nohz.nr_cpus)))
>> +               return;
>> +
>>          if (rq->nr_running >= 2) {
>>                  flags = NOHZ_STATS_KICK | NOHZ_BALANCE_KICK;
>>                  goto out;
>> --
>> 2.47.3
>>
Re: [PATCH v4 1/3] sched/fair: Move checking for nohz cpus after time check
Posted by Vincent Guittot 3 weeks, 5 days ago
On Tue, 13 Jan 2026 at 10:23, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>
>
>
> On 1/13/26 2:37 PM, Vincent Guittot wrote:
> > On Mon, 12 Jan 2026 at 06:05, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
> >>
> >> NOHZ idle load balancer is kicked off only after time check. So move
> >> the atomic read after the time check to access it only when needed.
> >>
> >> When there are no idle CPUs(100% busy), even if the flag gets set to
> >> NOHZ_STATS_KICK | NOHZ_NEXT_KICK, find_new_ilb will fail and
> >> there will be no NOHZ idle balance. The current behaviour is retained.
> >>
> >> Note: This patch doesn't solve any cacheline overheads. No improvement
> >> in performance apart from saving a few cycles of atomic_read.
> >
> > But won't these cycles be then wasted by calling needlessly kick_ilb
> >
>
> when there are nohz cpus, i.e nohz.nr_cpus > 0, there is no change in codeflow.
>
> Only when system is 100%(which is expected to be rare), nohz.nr_cpus == 0,
> then it is expected that has_blocked_load = 0. So flags shouldn't be set.

The way we are setting/clearing has_blocked_load vs
nr_cpus/idle_cpus_mask implies that it's possible to get
has_blocked_load == 1 but nr_cpus == 0 although it's a corner case and
not a default behavior

No CPUs are idle: nr_cpus == 0

CPU 0 enters idle
  - inc nr_cpus and set idle_cpus_mask
  - set nohz.has_blocked

CPU0 wakes up

Tick fires on CPU0
  - dec nr_cpus and clear idle_cpus_mask
  - nohz.has_blocked == 1, most probably now > nohz.next_blocked, if
now < nohz.next_balance, we skip the test of nr_cpus and we call
kick_ilb() but nr_cpus == 0 and idle_cpus_mask is empty

> Note we are still doing a return if nohz.nr_cpus == 0. So kick_ilb shouldn't be
> called.

The return can be skipped by if (time_before(now, nohz.next_balance)) goto out

>
> Do you see any path still calling kick_ilb un-necessarily?

Yes but at the same time it's clearly not the main case


>
>
> >>
> >> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> >> ---
> >>   kernel/sched/fair.c | 18 +++++++++++-------
> >>   1 file changed, 11 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 9743fc0b225c..17e4e8ac5fca 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -12451,20 +12451,24 @@ static void nohz_balancer_kick(struct rq *rq)
> >>           */
> >>          nohz_balance_exit_idle(rq);
> >>
> >> -       /*
> >> -        * None are in tickless mode and hence no need for NOHZ idle load
> >> -        * balancing:
> >> -        */
> >> -       if (likely(!atomic_read(&nohz.nr_cpus)))
> >> -               return;
> >> -
> >>          if (READ_ONCE(nohz.has_blocked_load) &&
> >>              time_after(now, READ_ONCE(nohz.next_blocked)))
> >>                  flags = NOHZ_STATS_KICK;
> >>
> >> +       /*
> >> +        * If none are in tickless mode, though flag maybe set,
> >> +        * idle load balancing is not done as find_new_ilb fails
> >> +        */
> >>          if (time_before(now, nohz.next_balance))
> >>                  goto out;
> >>
> >> +       /*
> >> +        * None are in tickless mode and hence no need for NOHZ idle load
> >> +        * balancing:
> >> +        */
> >> +       if (likely(!atomic_read(&nohz.nr_cpus)))
> >> +               return;
> >> +
> >>          if (rq->nr_running >= 2) {
> >>                  flags = NOHZ_STATS_KICK | NOHZ_BALANCE_KICK;
> >>                  goto out;
> >> --
> >> 2.47.3
> >>
>
Re: [PATCH v4 1/3] sched/fair: Move checking for nohz cpus after time check
Posted by Shrikanth Hegde 3 weeks, 5 days ago

On 1/13/26 3:51 PM, Vincent Guittot wrote:
> On Tue, 13 Jan 2026 at 10:23, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>>
>>
>>
>> On 1/13/26 2:37 PM, Vincent Guittot wrote:
>>> On Mon, 12 Jan 2026 at 06:05, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>>>>
>>>> NOHZ idle load balancer is kicked off only after time check. So move
>>>> the atomic read after the time check to access it only when needed.
>>>>
>>>> When there are no idle CPUs(100% busy), even if the flag gets set to
>>>> NOHZ_STATS_KICK | NOHZ_NEXT_KICK, find_new_ilb will fail and
>>>> there will be no NOHZ idle balance. The current behaviour is retained.
>>>>
>>>> Note: This patch doesn't solve any cacheline overheads. No improvement
>>>> in performance apart from saving a few cycles of atomic_read.
>>>
>>> But won't these cycles be then wasted by calling needlessly kick_ilb
>>>
>>
>> when there are nohz cpus, i.e nohz.nr_cpus > 0, there is no change in codeflow.
>>
>> Only when system is 100%(which is expected to be rare), nohz.nr_cpus == 0,
>> then it is expected that has_blocked_load = 0. So flags shouldn't be set.
> 
> The way we are setting/clearing has_blocked_load vs
> nr_cpus/idle_cpus_mask implies that it's possible to get
> has_blocked_load == 1 but nr_cpus == 0 although it's a corner case and
> not a default behavior
> 
> No CPUs are idle: nr_cpus == 0
> 
> CPU 0 enters idle
>    - inc nr_cpus and set idle_cpus_mask
>    - set nohz.has_blocked
> 
> CPU0 wakes up
> 
> Tick fires on CPU0
>    - dec nr_cpus and clear idle_cpus_mask
>    - nohz.has_blocked == 1, most probably now > nohz.next_blocked, if
> now < nohz.next_balance, we skip the test of nr_cpus and we call
> kick_ilb() but nr_cpus == 0 and idle_cpus_mask is empty
> 
>> Note we are still doing a return if nohz.nr_cpus == 0. So kick_ilb shouldn't be
>> called.
> 
> The return can be skipped by if (time_before(now, nohz.next_balance)) goto out
> 

Assuming HZ=1000,

I see LOAD_AVG_PERIOD = 32, whereas next_balance is usually 60. so it is 
a really narrow window of 28 ticks and system being close to 100% busy.


>>
>> Do you see any path still calling kick_ilb un-necessarily?
> 
> Yes but at the same time it's clearly not the main case
> 
> 

So i assume we can do this patch considering the common case?
If not, let me know, I can drop it.
Re: [PATCH v4 1/3] sched/fair: Move checking for nohz cpus after time check
Posted by Vincent Guittot 3 weeks, 5 days ago
On Tue, 13 Jan 2026 at 12:18, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>
>
>
> On 1/13/26 3:51 PM, Vincent Guittot wrote:
> > On Tue, 13 Jan 2026 at 10:23, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
> >>
> >>
> >>
> >> On 1/13/26 2:37 PM, Vincent Guittot wrote:
> >>> On Mon, 12 Jan 2026 at 06:05, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
> >>>>
> >>>> NOHZ idle load balancer is kicked off only after time check. So move
> >>>> the atomic read after the time check to access it only when needed.
> >>>>
> >>>> When there are no idle CPUs(100% busy), even if the flag gets set to
> >>>> NOHZ_STATS_KICK | NOHZ_NEXT_KICK, find_new_ilb will fail and
> >>>> there will be no NOHZ idle balance. The current behaviour is retained.
> >>>>
> >>>> Note: This patch doesn't solve any cacheline overheads. No improvement
> >>>> in performance apart from saving a few cycles of atomic_read.
> >>>
> >>> But won't these cycles be then wasted by calling needlessly kick_ilb
> >>>
> >>
> >> when there are nohz cpus, i.e nohz.nr_cpus > 0, there is no change in codeflow.
> >>
> >> Only when system is 100%(which is expected to be rare), nohz.nr_cpus == 0,
> >> then it is expected that has_blocked_load = 0. So flags shouldn't be set.
> >
> > The way we are setting/clearing has_blocked_load vs
> > nr_cpus/idle_cpus_mask implies that it's possible to get
> > has_blocked_load == 1 but nr_cpus == 0 although it's a corner case and
> > not a default behavior
> >
> > No CPUs are idle: nr_cpus == 0
> >
> > CPU 0 enters idle
> >    - inc nr_cpus and set idle_cpus_mask
> >    - set nohz.has_blocked
> >
> > CPU0 wakes up
> >
> > Tick fires on CPU0
> >    - dec nr_cpus and clear idle_cpus_mask
> >    - nohz.has_blocked == 1, most probably now > nohz.next_blocked, if
> > now < nohz.next_balance, we skip the test of nr_cpus and we call
> > kick_ilb() but nr_cpus == 0 and idle_cpus_mask is empty
> >
> >> Note we are still doing a return if nohz.nr_cpus == 0. So kick_ilb shouldn't be
> >> called.
> >
> > The return can be skipped by if (time_before(now, nohz.next_balance)) goto out
> >
>
> Assuming HZ=1000,
>
> I see LOAD_AVG_PERIOD = 32, whereas next_balance is usually 60. so it is
> a really narrow window of 28 ticks and system being close to 100% busy.
>
>
> >>
> >> Do you see any path still calling kick_ilb un-necessarily?
> >
> > Yes but at the same time it's clearly not the main case
> >
> >
>
> So i assume we can do this patch considering the common case?
> If not, let me know, I can drop it.

Yes, I suppose it's ok.
Could you add a comment in commit so we remember