kernel/sched/cputime.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Read-mostly sched_clock_irqtime may share the same cacheline with
frequently updated nohz struct. Mark it as __read_mostly to avoid
false sharing issue.
Details:
We observed ~3% cycles hotspots in irqtime_account_irq when running
SPECjbb2015 in a 2-sockets system. Most of cycles spent in reading
sched_clock_irqtime, which is a read-mostly var.
perf c2c (cachelien view) shows it has false sharing with nohz struct:
Num RmtHitm LclHitm Offset records Symbol
6.25% 0.00% 0.00% 0x0 4 [k] _nohz_idle_balance.isra.0
18.75% 100.00% 0.00% 0x8 14 [k] nohz_balance_exit_idle
6.25% 0.00% 0.00% 0x8 8 [k] nohz_balance_enter_idle
6.25% 0.00% 0.00% 0xc 8 [k] sched_balance_newidle
6.25% 0.00% 0.00% 0x10 31 [k] nohz_balancer_kick
6.25% 0.00% 0.00% 0x20 16 [k] sched_balance_newidle
37.50% 0.00% 0.00% 0x38 50 [k] irqtime_account_irq
6.25% 0.00% 0.00% 0x38 47 [k] account_process_tick
6.25% 0.00% 0.00% 0x38 12 [k] account_idle_ticks
Offsets:
* 0x0 -- nohz.idle_cpu_mask (r)
* 0x8 -- nohz.nr_cpus (w)
* 0x38 -- sched_clock_irqtime (r), not in nohz, but share cacheline
The layout in /proc/kallsyms can also confirm that:
ffffffff88600d40 b nohz
ffffffff88600d68 B arch_needs_tick_broadcast
ffffffff88600d6c b __key.264
ffffffff88600d6c b __key.265
ffffffff88600d70 b dl_generation
ffffffff88600d78 b sched_clock_irqtime
With the patch applied, irqtime_account_irq hotspot disappear.
Changes since V1:
- Use __read_mostly instead of __cacheline_aligned to avoid wasting
spaces.
History:
v1: https://lore.kernel.org/all/20260113022958.3379650-1-wangyang.guo@intel.com/
prev discussions: https://lore.kernel.org/all/20251211055612.4071266-1-wangyang.guo@intel.com/T/#u
Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reported-by: Benjamin Lei <benjamin.lei@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
---
kernel/sched/cputime.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 7097de2c8cda..fc8f8081c48f 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -25,7 +25,7 @@
*/
DEFINE_PER_CPU(struct irqtime, cpu_irqtime);
-int sched_clock_irqtime;
+int sched_clock_irqtime __read_mostly;
void enable_sched_clock_irqtime(void)
{
--
2.47.3
On Tue, Jan 13, 2026 at 03:48:07PM +0800, Wangyang Guo wrote: > Read-mostly sched_clock_irqtime may share the same cacheline with > frequently updated nohz struct. Mark it as __read_mostly to avoid > false sharing issue. > Is there a reason that thing can't be a static_key or so?
On Tue, 13 Jan 2026 at 11:08, Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, Jan 13, 2026 at 03:48:07PM +0800, Wangyang Guo wrote: > > Read-mostly sched_clock_irqtime may share the same cacheline with > > frequently updated nohz struct. Mark it as __read_mostly to avoid > > false sharing issue. > > > > Is there a reason that thing can't be a static_key or so? We tried that but disable_sched_clock_irqtime can be called in atomic context with mark_tsc_unstable() https://lore.kernel.org/all/174161357383.14745.8770394914047302959.tip-bot2@tip-bot2/
On Tue, Jan 13, 2026 at 11:14:54AM +0100, Vincent Guittot wrote: > On Tue, 13 Jan 2026 at 11:08, Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Tue, Jan 13, 2026 at 03:48:07PM +0800, Wangyang Guo wrote: > > > Read-mostly sched_clock_irqtime may share the same cacheline with > > > frequently updated nohz struct. Mark it as __read_mostly to avoid > > > false sharing issue. > > > > > > > Is there a reason that thing can't be a static_key or so? > > We tried that but disable_sched_clock_irqtime can be called in atomic > context with mark_tsc_unstable() > https://lore.kernel.org/all/174161357383.14745.8770394914047302959.tip-bot2@tip-bot2/ Right, but if you note, that very same path has clear_sched_clock_stable(), and that is a static_key too. It's just that that punts by pushing it into a workqueue. So why not lift that workqueue and do both keys or something?
On Tue, 13 Jan 2026 at 11:36, Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, Jan 13, 2026 at 11:14:54AM +0100, Vincent Guittot wrote: > > On Tue, 13 Jan 2026 at 11:08, Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Tue, Jan 13, 2026 at 03:48:07PM +0800, Wangyang Guo wrote: > > > > Read-mostly sched_clock_irqtime may share the same cacheline with > > > > frequently updated nohz struct. Mark it as __read_mostly to avoid > > > > false sharing issue. > > > > > > > > > > Is there a reason that thing can't be a static_key or so? > > > > We tried that but disable_sched_clock_irqtime can be called in atomic > > context with mark_tsc_unstable() > > https://lore.kernel.org/all/174161357383.14745.8770394914047302959.tip-bot2@tip-bot2/ > > Right, but if you note, that very same path has > clear_sched_clock_stable(), and that is a static_key too. > > It's just that that punts by pushing it into a workqueue. So why not > lift that workqueue and do both keys or something? fair enough
Hello Peter,
On 1/13/2026 3:37 PM, Peter Zijlstra wrote:
> On Tue, Jan 13, 2026 at 03:48:07PM +0800, Wangyang Guo wrote:
>> Read-mostly sched_clock_irqtime may share the same cacheline with
>> frequently updated nohz struct. Mark it as __read_mostly to avoid
>> false sharing issue.
>>
>
> Is there a reason that thing can't be a static_key or so?
Yes, the exact case is stated in commit b9f2b29b9494
("sched: Don't define sched_clock_irqtime as static key")
https://lore.kernel.org/lkml/173961695743.10177.17683780278419896262.tip-bot2@tip-bot2/
--
Thanks and Regards,
Prateek
On 1/13/26 1:18 PM, Wangyang Guo wrote:
> Read-mostly sched_clock_irqtime may share the same cacheline with
> frequently updated nohz struct. Mark it as __read_mostly to avoid
> false sharing issue.
>
> Details:
> We observed ~3% cycles hotspots in irqtime_account_irq when running
> SPECjbb2015 in a 2-sockets system. Most of cycles spent in reading
> sched_clock_irqtime, which is a read-mostly var.
>
> perf c2c (cachelien view) shows it has false sharing with nohz struct:
> Num RmtHitm LclHitm Offset records Symbol
> 6.25% 0.00% 0.00% 0x0 4 [k] _nohz_idle_balance.isra.0
> 18.75% 100.00% 0.00% 0x8 14 [k] nohz_balance_exit_idle
> 6.25% 0.00% 0.00% 0x8 8 [k] nohz_balance_enter_idle
> 6.25% 0.00% 0.00% 0xc 8 [k] sched_balance_newidle
> 6.25% 0.00% 0.00% 0x10 31 [k] nohz_balancer_kick
> 6.25% 0.00% 0.00% 0x20 16 [k] sched_balance_newidle
> 37.50% 0.00% 0.00% 0x38 50 [k] irqtime_account_irq
> 6.25% 0.00% 0.00% 0x38 47 [k] account_process_tick
> 6.25% 0.00% 0.00% 0x38 12 [k] account_idle_ticks
>
> Offsets:
> * 0x0 -- nohz.idle_cpu_mask (r)
> * 0x8 -- nohz.nr_cpus (w)
> * 0x38 -- sched_clock_irqtime (r), not in nohz, but share cacheline
>
> The layout in /proc/kallsyms can also confirm that:
> ffffffff88600d40 b nohz
> ffffffff88600d68 B arch_needs_tick_broadcast
> ffffffff88600d6c b __key.264
> ffffffff88600d6c b __key.265
> ffffffff88600d70 b dl_generation
> ffffffff88600d78 b sched_clock_irqtime
>
> With the patch applied, irqtime_account_irq hotspot disappear.
>
> Changes since V1:
> - Use __read_mostly instead of __cacheline_aligned to avoid wasting
> spaces.
>
> History:
> v1: https://lore.kernel.org/all/20260113022958.3379650-1-wangyang.guo@intel.com/
> prev discussions: https://lore.kernel.org/all/20251211055612.4071266-1-wangyang.guo@intel.com/T/#u
>
> Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
I think it would appropriate to take prateek's tag here.
> Reported-by: Benjamin Lei <benjamin.lei@intel.com>
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Tianyou Li <tianyou.li@intel.com>
> Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
> ---
> kernel/sched/cputime.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 7097de2c8cda..fc8f8081c48f 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -25,7 +25,7 @@
> */
> DEFINE_PER_CPU(struct irqtime, cpu_irqtime);
>
> -int sched_clock_irqtime;
> +int sched_clock_irqtime __read_mostly;
>
> void enable_sched_clock_irqtime(void)
> {
Patch should be good on its own.
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
© 2016 - 2026 Red Hat, Inc.