nohz.has_blocked can be updated often as and when CPUs enter idle state.
But stats are updated only at regular intervals. Usually fixed to
LOAD_AVG_PERIOD=32.
Read the value only after time check is successful to avoid cache
references to it.
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
kernel/sched/fair.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 55746274af06..5534822fd754 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12440,8 +12440,8 @@ static void nohz_balancer_kick(struct rq *rq)
*/
nohz_balance_exit_idle(rq);
- if (READ_ONCE(nohz.has_blocked) &&
- time_after(now, READ_ONCE(nohz.next_blocked)))
+ if (time_after(now, READ_ONCE(nohz.next_blocked)) &&
+ READ_ONCE(nohz.has_blocked))
flags = NOHZ_STATS_KICK;
if (time_before(now, nohz.next_balance))
--
2.43.0
* Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
> nohz.has_blocked can be updated often as and when CPUs enter idle state.
> But stats are updated only at regular intervals. Usually fixed to
> LOAD_AVG_PERIOD=32.
>
> Read the value only after time check is successful to avoid cache
> references to it.
>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> ---
> kernel/sched/fair.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 55746274af06..5534822fd754 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12440,8 +12440,8 @@ static void nohz_balancer_kick(struct rq *rq)
> */
> nohz_balance_exit_idle(rq);
>
> - if (READ_ONCE(nohz.has_blocked) &&
> - time_after(now, READ_ONCE(nohz.next_blocked)))
> + if (time_after(now, READ_ONCE(nohz.next_blocked)) &&
> + READ_ONCE(nohz.has_blocked))
> flags = NOHZ_STATS_KICK;
So this patch makes no sense, as the two fields [1] and
[2] are almost next to each other:
static struct {
cpumask_var_t idle_cpus_mask; // 0
atomic_t nr_cpus; // 8
int has_blocked; /* Idle CPUS has blocked load */ <========== [1] // 12
int needs_update; /* Newly idle CPUs need their next_balance collated */ // 16
unsigned long next_balance; /* in jiffy units */ // 24
unsigned long next_blocked; /* Next update of blocked load in jiffies */ <========== [2] // 32
} nohz ____cacheline_aligned;
... and thus they very likely share the same cacheline
and there can be no reduction in cacheline bouncing
from this change.
In fact with OFFSTACK=y the cpumask_var_t is 8 bytes
and thus the offset of the two fields will be 12 and 32
within the same 64-byte cacheline, guaranteed. I've
marked the field offsets in the rightmost column for
this case.
Thanks,
Ingo
On 12/2/25 11:56 AM, Ingo Molnar wrote:
>
> * Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>
>> nohz.has_blocked can be updated often as and when CPUs enter idle state.
>> But stats are updated only at regular intervals. Usually fixed to
>> LOAD_AVG_PERIOD=32.
>>
>> Read the value only after time check is successful to avoid cache
>> references to it.
>>
>> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
>> ---
>> kernel/sched/fair.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 55746274af06..5534822fd754 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -12440,8 +12440,8 @@ static void nohz_balancer_kick(struct rq *rq)
>> */
>> nohz_balance_exit_idle(rq);
>>
>> - if (READ_ONCE(nohz.has_blocked) &&
>> - time_after(now, READ_ONCE(nohz.next_blocked)))
>> + if (time_after(now, READ_ONCE(nohz.next_blocked)) &&
>> + READ_ONCE(nohz.has_blocked))
>> flags = NOHZ_STATS_KICK;
>
> So this patch makes no sense, as the two fields [1] and
> [2] are almost next to each other:
>
> static struct {
> cpumask_var_t idle_cpus_mask; // 0
> atomic_t nr_cpus; // 8
> int has_blocked; /* Idle CPUS has blocked load */ <========== [1] // 12
> int needs_update; /* Newly idle CPUs need their next_balance collated */ // 16
> unsigned long next_balance; /* in jiffy units */ // 24
> unsigned long next_blocked; /* Next update of blocked load in jiffies */ <========== [2] // 32
> } nohz ____cacheline_aligned;
>
> ... and thus they very likely share the same cacheline
> and there can be no reduction in cacheline bouncing
> from this change.
>
> In fact with OFFSTACK=y the cpumask_var_t is 8 bytes
> and thus the offset of the two fields will be 12 and 32
> within the same 64-byte cacheline, guaranteed. I've
> marked the field offsets in the rightmost column for
> this case.
>
> Thanks,
>
> Ingo
Ok. Since we fetch the line in either of the case,
read should be minimal overhead. at best we maybe saving one
read. Likely not worth it.
I got a bit carried away. We can ignore this change.
© 2016 - 2026 Red Hat, Inc.