kernel/sched/cputime.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Read-mostly sched_clock_irqtime may share the same cacheline with
frequently updated nohz struct. Make it as cache aligned to avoid
false sharing issue.
Details:
We observed ~3% cycles hotspots in irqtime_account_irq when running
SPECjbb2015 in a 2-sockets system. Most of cycles spent in reading
sched_clock_irqtime, which is a read-mostly var.
perf c2c (cachelien view) shows it has false sharing with nohz struct:
Num RmtHitm LclHitm Offset records Symbol
6.25% 0.00% 0.00% 0x0 4 [k] _nohz_idle_balance.isra.0
18.75% 100.00% 0.00% 0x8 14 [k] nohz_balance_exit_idle
6.25% 0.00% 0.00% 0x8 8 [k] nohz_balance_enter_idle
6.25% 0.00% 0.00% 0xc 8 [k] sched_balance_newidle
6.25% 0.00% 0.00% 0x10 31 [k] nohz_balancer_kick
6.25% 0.00% 0.00% 0x20 16 [k] sched_balance_newidle
37.50% 0.00% 0.00% 0x38 50 [k] irqtime_account_irq
6.25% 0.00% 0.00% 0x38 47 [k] account_process_tick
6.25% 0.00% 0.00% 0x38 12 [k] account_idle_ticks
Offsets:
* 0x0 -- nohz.idle_cpu_mask (r)
* 0x8 -- nohz.nr_cpus (w)
* 0x38 -- sched_clock_irqtime (r), not in nohz, but share cacheline
The layout in /proc/kallsyms can also confirm that:
ffffffff88600d40 b nohz
ffffffff88600d68 B arch_needs_tick_broadcast
ffffffff88600d6c b __key.264
ffffffff88600d6c b __key.265
ffffffff88600d70 b dl_generation
ffffffff88600d78 b sched_clock_irqtime
With the patch applied, irqtime_account_irq hotspot disappear.
Previous Discussion:
https://lore.kernel.org/all/20251211055612.4071266-1-wangyang.guo@intel.com/T/#u
Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reported-by: Benjamin Lei <benjamin.lei@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
---
kernel/sched/cputime.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 7097de2c8cda..34d15972f5fb 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -25,7 +25,7 @@
*/
DEFINE_PER_CPU(struct irqtime, cpu_irqtime);
-int sched_clock_irqtime;
+int sched_clock_irqtime __cacheline_aligned;
void enable_sched_clock_irqtime(void)
{
--
2.47.3
Hi Wangyang.
On 1/13/26 8:36 AM, Wangyang Guo wrote:
> Read-mostly sched_clock_irqtime may share the same cacheline with
> frequently updated nohz struct. Make it as cache aligned to avoid
> false sharing issue.
>
> Details:
> We observed ~3% cycles hotspots in irqtime_account_irq when running
> SPECjbb2015 in a 2-sockets system. Most of cycles spent in reading
> sched_clock_irqtime, which is a read-mostly var.
>
> perf c2c (cachelien view) shows it has false sharing with nohz struct:
> Num RmtHitm LclHitm Offset records Symbol
> 6.25% 0.00% 0.00% 0x0 4 [k] _nohz_idle_balance.isra.0
> 18.75% 100.00% 0.00% 0x8 14 [k] nohz_balance_exit_idle
> 6.25% 0.00% 0.00% 0x8 8 [k] nohz_balance_enter_idle
> 6.25% 0.00% 0.00% 0xc 8 [k] sched_balance_newidle
> 6.25% 0.00% 0.00% 0x10 31 [k] nohz_balancer_kick
> 6.25% 0.00% 0.00% 0x20 16 [k] sched_balance_newidle
> 37.50% 0.00% 0.00% 0x38 50 [k] irqtime_account_irq
> 6.25% 0.00% 0.00% 0x38 47 [k] account_process_tick
> 6.25% 0.00% 0.00% 0x38 12 [k] account_idle_ticks
>
> Offsets:
> * 0x0 -- nohz.idle_cpu_mask (r)
> * 0x8 -- nohz.nr_cpus (w)
> * 0x38 -- sched_clock_irqtime (r), not in nohz, but share cacheline
>
> The layout in /proc/kallsyms can also confirm that:
> ffffffff88600d40 b nohz
> ffffffff88600d68 B arch_needs_tick_broadcast
> ffffffff88600d6c b __key.264
> ffffffff88600d6c b __key.265
> ffffffff88600d70 b dl_generation
> ffffffff88600d78 b sched_clock_irqtime
>
> With the patch applied, irqtime_account_irq hotspot disappear.
>
> Previous Discussion:
> https://lore.kernel.org/all/20251211055612.4071266-1-wangyang.guo@intel.com/T/#u
>
> Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> Reported-by: Benjamin Lei <benjamin.lei@intel.com>
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Tianyou Li <tianyou.li@intel.com>
> Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
> ---
> kernel/sched/cputime.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 7097de2c8cda..34d15972f5fb 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -25,7 +25,7 @@
> */
> DEFINE_PER_CPU(struct irqtime, cpu_irqtime);
>
> -int sched_clock_irqtime;
> +int sched_clock_irqtime __cacheline_aligned;
>
> void enable_sched_clock_irqtime(void)
> {
Could you please apply this and see if you observe the issue still.
https://lore.kernel.org/all/20260112050442.138446-1-sshegde@linux.ibm.com/
You have CONFIG_CPUMASK_OFFSTACK=y. You still might see occasional cache
misses with the patch applied, but it would be quite less.
Hello Wangyang, On 1/13/2026 8:36 AM, Wangyang Guo wrote: > Read-mostly sched_clock_irqtime may share the same cacheline with > frequently updated nohz struct. Make it as cache aligned to avoid > false sharing issue. nit. Since you mention this is a "Read-mostly" data ... [..snip..] > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c > index 7097de2c8cda..34d15972f5fb 100644 > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -25,7 +25,7 @@ > */ > DEFINE_PER_CPU(struct irqtime, cpu_irqtime); > > -int sched_clock_irqtime; > +int sched_clock_irqtime __cacheline_aligned; Have you tried annotating it with "__ro_after_init" or "__read_mostly"? That should also ensure "sched_clock_irqtime" doesn't have false-sharing with the nohz struct without using an entire cacheline worth of space. -- Thanks and Regards, Prateek
On 1/13/2026 12:26 PM, K Prateek Nayak wrote: > Hello Wangyang, > > On 1/13/2026 8:36 AM, Wangyang Guo wrote: >> Read-mostly sched_clock_irqtime may share the same cacheline with >> frequently updated nohz struct. Make it as cache aligned to avoid >> false sharing issue. > > nit. > > Since you mention this is a "Read-mostly" data ... > > [..snip..] > >> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c >> index 7097de2c8cda..34d15972f5fb 100644 >> --- a/kernel/sched/cputime.c >> +++ b/kernel/sched/cputime.c >> @@ -25,7 +25,7 @@ >> */ >> DEFINE_PER_CPU(struct irqtime, cpu_irqtime); >> >> -int sched_clock_irqtime; >> +int sched_clock_irqtime __cacheline_aligned; > > Have you tried annotating it with "__ro_after_init" or "__read_mostly"? > > That should also ensure "sched_clock_irqtime" doesn't have false-sharing > with the nohz struct without using an entire cacheline worth of space. __read_mostly works and avoids wasting spaces. It's better than __cacheline_aligned, patch v2 is sent out as: https://lore.kernel.org/all/20260113074807.3404180-1-wangyang.guo@intel.com/ BR Wangyang
© 2016 - 2026 Red Hat, Inc.