[v3] sched/clock: Avoid false sharing for sched_clock_irqtime

[PATCH v3] sched/clock: Avoid false sharing for sched_clock_irqtime

Posted by Wangyang Guo 3 weeks, 1 day ago

Read-mostly sched_clock_irqtime may share the same cacheline with
frequently updated nohz struct. Make it as static_key to avoid
false sharing issue.

Details:
We observed ~3% cycles hotspots in irqtime_account_irq when running
SPECjbb2015 in a 2-sockets system. Most of cycles spent in reading
sched_clock_irqtime, which is a read-mostly var.

perf c2c (cachelien view) shows it has false sharing with nohz struct:
     Num RmtHitm LclHitm  Offset records             Symbol
   6.25%   0.00%   0.00%   0x0       4   [k] _nohz_idle_balance.isra.0
  18.75% 100.00%   0.00%   0x8      14   [k] nohz_balance_exit_idle
   6.25%   0.00%   0.00%   0x8       8   [k] nohz_balance_enter_idle
   6.25%   0.00%   0.00%   0xc       8   [k] sched_balance_newidle
   6.25%   0.00%   0.00%  0x10      31   [k] nohz_balancer_kick
   6.25%   0.00%   0.00%  0x20      16   [k] sched_balance_newidle
  37.50%   0.00%   0.00%  0x38      50   [k] irqtime_account_irq
   6.25%   0.00%   0.00%  0x38      47   [k] account_process_tick
   6.25%   0.00%   0.00%  0x38      12   [k] account_idle_ticks

Offsets:
*  0x0 -- nohz.idle_cpu_mask (r)
*  0x8 -- nohz.nr_cpus (w)
* 0x38 -- sched_clock_irqtime (r), not in nohz, but share cacheline

The layout in /proc/kallsyms can also confirm that:
ffffffff88600d40 b nohz
ffffffff88600d68 B arch_needs_tick_broadcast
ffffffff88600d6c b __key.264
ffffffff88600d6c b __key.265
ffffffff88600d70 b dl_generation
ffffffff88600d78 b sched_clock_irqtime

With the patch applied, irqtime_account_irq hotspot disappear.

---
V2 -> V3:
- Use static_key instead of a __read_mostly var.

V1 -> V2:
- Use __read_mostly instead of __cacheline_aligned to avoid wasting
  spaces.

History:
  v2: https://lore.kernel.org/all/20260113074807.3404180-1-wangyang.guo@intel.com/
  v1: https://lore.kernel.org/all/20260113022958.3379650-1-wangyang.guo@intel.com/
  prev discussions: https://lore.kernel.org/all/20251211055612.4071266-1-wangyang.guo@intel.com/T/#u

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reported-by: Benjamin Lei <benjamin.lei@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
---
 kernel/sched/cputime.c | 20 ++++++++++++++++----
 kernel/sched/sched.h   |  4 ++--
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 7097de2c8cda..f37a27ed51d7 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -12,6 +12,8 @@
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 
+DEFINE_STATIC_KEY_FALSE(sched_clock_irqtime);
+
 /*
  * There are no locks covering percpu hardirq/softirq time.
  * They are only modified in vtime_account, on corresponding CPU
@@ -25,16 +27,26 @@
  */
 DEFINE_PER_CPU(struct irqtime, cpu_irqtime);
 
-int sched_clock_irqtime;
-
 void enable_sched_clock_irqtime(void)
 {
-	sched_clock_irqtime = 1;
+	static_branch_enable(&sched_clock_irqtime);
 }
 
+static void __disable_sched_clock_irqtime(struct work_struct *work)
+{
+	static_branch_disable(&sched_clock_irqtime);
+}
+
+static DECLARE_WORK(sched_clock_irqtime_work, __disable_sched_clock_irqtime);
+
 void disable_sched_clock_irqtime(void)
 {
-	sched_clock_irqtime = 0;
+	/* disable_sched_clock_irqtime can be called in atomic
+	 * context with mark_tsc_unstable(), use wq to avoid
+	 * "sleeping in atomic context" warning.
+	 */
+	if (irqtime_enabled())
+		schedule_work(&sched_clock_irqtime_work);
 }
 
 static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index adfb6e3409d7..ec963314287a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3172,11 +3172,11 @@ struct irqtime {
 };
 
 DECLARE_PER_CPU(struct irqtime, cpu_irqtime);
-extern int sched_clock_irqtime;
+DECLARE_STATIC_KEY_FALSE(sched_clock_irqtime);
 
 static inline int irqtime_enabled(void)
 {
-	return sched_clock_irqtime;
+	return static_branch_likely(&sched_clock_irqtime);
 }
 
 /*
-- 
2.47.3

Re: [PATCH v3] sched/clock: Avoid false sharing for sched_clock_irqtime

Posted by Shrikanth Hegde 3 weeks ago

Hi Wangyang.

On 1/16/26 8:09 AM, Wangyang Guo wrote:
> Read-mostly sched_clock_irqtime may share the same cacheline with
> frequently updated nohz struct. Make it as static_key to avoid
> false sharing issue.
> 
> Details:
> We observed ~3% cycles hotspots in irqtime_account_irq when running
> SPECjbb2015 in a 2-sockets system. Most of cycles spent in reading
> sched_clock_irqtime, which is a read-mostly var.
> 
> perf c2c (cachelien view) shows it has false sharing with nohz struct:
>       Num RmtHitm LclHitm  Offset records             Symbol
>     6.25%   0.00%   0.00%   0x0       4   [k] _nohz_idle_balance.isra.0
>    18.75% 100.00%   0.00%   0x8      14   [k] nohz_balance_exit_idle
>     6.25%   0.00%   0.00%   0x8       8   [k] nohz_balance_enter_idle
>     6.25%   0.00%   0.00%   0xc       8   [k] sched_balance_newidle
>     6.25%   0.00%   0.00%  0x10      31   [k] nohz_balancer_kick
>     6.25%   0.00%   0.00%  0x20      16   [k] sched_balance_newidle
>    37.50%   0.00%   0.00%  0x38      50   [k] irqtime_account_irq
>     6.25%   0.00%   0.00%  0x38      47   [k] account_process_tick
>     6.25%   0.00%   0.00%  0x38      12   [k] account_idle_ticks
> 
> Offsets:
> *  0x0 -- nohz.idle_cpu_mask (r)
> *  0x8 -- nohz.nr_cpus (w)

This is recently removed.

> * 0x38 -- sched_clock_irqtime (r), not in nohz, but share cacheline
> 
> The layout in /proc/kallsyms can also confirm that:
> ffffffff88600d40 b nohz
> ffffffff88600d68 B arch_needs_tick_broadcast
> ffffffff88600d6c b __key.264
> ffffffff88600d6c b __key.265
> ffffffff88600d70 b dl_generation
> ffffffff88600d78 b sched_clock_irqtime
> 
> With the patch applied, irqtime_account_irq hotspot disappear.
> 

This changelog likely needs to change with recent removal of nohz.nr_cpus.
Please re-run it once with latest sched/core.
(atleast offsets etc)

The patch has merit on its own since the variable doesn't update often.

Re: [PATCH v3] sched/clock: Avoid false sharing for sched_clock_irqtime

Posted by Guo, Wangyang 1 week, 5 days ago

On 1/17/2026 12:58 AM, Shrikanth Hegde wrote:
> Hi Wangyang.
> 
> On 1/16/26 8:09 AM, Wangyang Guo wrote:
>> Read-mostly sched_clock_irqtime may share the same cacheline with
>> frequently updated nohz struct. Make it as static_key to avoid
>> false sharing issue.
>>
>> Details:
>> We observed ~3% cycles hotspots in irqtime_account_irq when running
>> SPECjbb2015 in a 2-sockets system. Most of cycles spent in reading
>> sched_clock_irqtime, which is a read-mostly var.
>>
>> perf c2c (cachelien view) shows it has false sharing with nohz struct:
>>       Num RmtHitm LclHitm  Offset records             Symbol
>>     6.25%   0.00%   0.00%   0x0       4   [k] _nohz_idle_balance.isra.0
>>    18.75% 100.00%   0.00%   0x8      14   [k] nohz_balance_exit_idle
>>     6.25%   0.00%   0.00%   0x8       8   [k] nohz_balance_enter_idle
>>     6.25%   0.00%   0.00%   0xc       8   [k] sched_balance_newidle
>>     6.25%   0.00%   0.00%  0x10      31   [k] nohz_balancer_kick
>>     6.25%   0.00%   0.00%  0x20      16   [k] sched_balance_newidle
>>    37.50%   0.00%   0.00%  0x38      50   [k] irqtime_account_irq
>>     6.25%   0.00%   0.00%  0x38      47   [k] account_process_tick
>>     6.25%   0.00%   0.00%  0x38      12   [k] account_idle_ticks
>>
>> Offsets:
>> *  0x0 -- nohz.idle_cpu_mask (r)
>> *  0x8 -- nohz.nr_cpus (w)
> 
> This is recently removed.
> 
>> * 0x38 -- sched_clock_irqtime (r), not in nohz, but share cacheline
>>
>> The layout in /proc/kallsyms can also confirm that:
>> ffffffff88600d40 b nohz
>> ffffffff88600d68 B arch_needs_tick_broadcast
>> ffffffff88600d6c b __key.264
>> ffffffff88600d6c b __key.265
>> ffffffff88600d70 b dl_generation
>> ffffffff88600d78 b sched_clock_irqtime
>>
>> With the patch applied, irqtime_account_irq hotspot disappear.
>>
> 
> This changelog likely needs to change with recent removal of nohz.nr_cpus.
> Please re-run it once with latest sched/core.
> (atleast offsets etc)
> 
> The patch has merit on its own since the variable doesn't update often.

Thanks for the reminder, I have add kernel version info in the changelog 
(v4) to avoid the confusing.

BR
Wangyang

Re: [PATCH v3] sched/clock: Avoid false sharing for sched_clock_irqtime

Posted by K Prateek Nayak 3 weeks, 1 day ago

Hello Wangyang,

On 1/16/2026 8:09 AM, Wangyang Guo wrote:
>   */
>  DEFINE_PER_CPU(struct irqtime, cpu_irqtime);
>  
> -int sched_clock_irqtime;
> -
>  void enable_sched_clock_irqtime(void)
>  {
> -	sched_clock_irqtime = 1;
> +	static_branch_enable(&sched_clock_irqtime);
>  }
>  
> +static void __disable_sched_clock_irqtime(struct work_struct *work)
> +{
> +	static_branch_disable(&sched_clock_irqtime);
> +}
> +
> +static DECLARE_WORK(sched_clock_irqtime_work, __disable_sched_clock_irqtime);
> +
>  void disable_sched_clock_irqtime(void)
>  {
> -	sched_clock_irqtime = 0;
> +	/* disable_sched_clock_irqtime can be called in atomic
> +	 * context with mark_tsc_unstable(), use wq to avoid
> +	 * "sleeping in atomic context" warning.
> +	 */
> +	if (irqtime_enabled())
> +		schedule_work(&sched_clock_irqtime_work);
>  }

Your approach looks good to avoid the scheduling while atomic issue.
Just a small observation: The only user of disable_sched_clock_irqtime()
is tsc_.*mark_unstable() which calls clear_sched_clock_stable() just
before doing disable_sched_clock_irqtime().

It makes me wonder if we can just reuse "sched_clock_work" to also\
disable sched_clock_irqtime()?

Peter, Vincent, do we need to do enable_sched_clock_irqtime() that early
when we detect TSC freq / sched_clock_register() or can we wait until we
do __set_sched_clock_stable()?

If we can wait until we mark sched_clock() as stable, we can consolidate
enabling / disabling of irqtime with that of __sched_clock_stable.
Something like the following on top of Wangyang's patch:

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 7d3e13e14eab..8bb9c0baa93d 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1143,7 +1143,6 @@ static void tsc_cs_mark_unstable(struct clocksource *cs)
 	tsc_unstable = 1;
 	if (using_native_sched_clock())
 		clear_sched_clock_stable();
-	disable_sched_clock_irqtime();
 	pr_info("Marking TSC unstable due to clocksource watchdog\n");
 }
 
@@ -1213,7 +1212,6 @@ void mark_tsc_unstable(char *reason)
 	tsc_unstable = 1;
 	if (using_native_sched_clock())
 		clear_sched_clock_stable();
-	disable_sched_clock_irqtime();
 	pr_info("Marking TSC unstable due to %s\n", reason);
 
 	clocksource_mark_unstable(&clocksource_tsc_early);
@@ -1234,6 +1232,22 @@ bool tsc_clocksource_watchdog_disabled(void)
 	       tsc_as_watchdog && !no_tsc_watchdog;
 }
 
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+/*
+ * Allow IRQ time accounting if the user hasn't
+ * disabled it and TSC is found to be stable at
+ * the time of late_initcall().
+ *
+ * If the TSC is detected to be unstable later,
+ * the IRQ time accounting will be disabled from
+ * clear_sched_clock_stable().
+ */
+bool sched_clock_supports_irqtime_acct(void)
+{
+	return !no_sched_irq_time && !tsc_unstable;
+}
+#endif
+
 static void __init check_system_tsc_reliable(void)
 {
 #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
@@ -1551,9 +1565,6 @@ void __init tsc_init(void)
 
 	cyc2ns_init_secondary_cpus();
 
-	if (!no_sched_irq_time)
-		enable_sched_clock_irqtime();
-
 	lpj_fine = get_loops_per_jiffy();
 
 	check_system_tsc_reliable();
diff --git a/include/linux/sched/clock.h b/include/linux/sched/clock.h
index 196f0ca351a2..e9b8c88fada5 100644
--- a/include/linux/sched/clock.h
+++ b/include/linux/sched/clock.h
@@ -104,11 +104,7 @@ extern u64 local_clock(void);
  * The reason for this explicit opt-in is not to have perf penalty with
  * slow sched_clocks.
  */
-extern void enable_sched_clock_irqtime(void);
-extern void disable_sched_clock_irqtime(void);
-#else
-static inline void enable_sched_clock_irqtime(void) {}
-static inline void disable_sched_clock_irqtime(void) {}
+bool sched_clock_supports_irqtime_acct(void);
 #endif
 
 #endif /* _LINUX_SCHED_CLOCK_H */
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index f5e6dd6a6b3a..4d43eef8c326 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -137,6 +137,8 @@ notrace static void __set_sched_clock_stable(void)
 			scd->tick_gtod, __gtod_offset,
 			scd->tick_raw,  __sched_clock_offset);
 
+	if (sched_clock_supports_irqtime_acct())
+		enable_sched_clock_irqtime();
 	static_branch_enable(&__sched_clock_stable);
 	tick_dep_clear(TICK_DEP_BIT_CLOCK_UNSTABLE);
 }
@@ -173,6 +175,7 @@ notrace static void __sched_clock_work(struct work_struct *work)
 			scd->tick_gtod, __gtod_offset,
 			scd->tick_raw,  __sched_clock_offset);
 
+	disable_sched_clock_irqtime();
 	static_branch_disable(&__sched_clock_stable);
 }
 
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index a5a8bd0a5ede..83dd9f299ee4 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -32,21 +32,9 @@ void enable_sched_clock_irqtime(void)
 	static_branch_enable(&sched_clock_irqtime);
 }
 
-static void __disable_sched_clock_irqtime(struct work_struct *work)
-{
-	static_branch_disable(&sched_clock_irqtime);
-}
-
-static DECLARE_WORK(sched_clock_irqtime_work, __disable_sched_clock_irqtime);
-
 void disable_sched_clock_irqtime(void)
 {
-	/* disable_sched_clock_irqtime can be called in atomic
-	 * context with mark_tsc_unstable(), use wq to avoid
-	 * "sleeping in atomic context" warning.
-	 */
-	if (irqtime_enabled())
-		schedule_work(&sched_clock_irqtime_work);
+	static_branch_disable(&sched_clock_irqtime);
 }
 
 static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 164ebf47e5fd..3bae8baf7c00 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3333,6 +3333,8 @@ static inline u64 irq_time_read(int cpu)
 	return total;
 }
 
+void enable_sched_clock_irqtime(void);
+void disable_sched_clock_irqtime(void);
 #else /* !CONFIG_IRQ_TIME_ACCOUNTING: */
 
 static inline int irqtime_enabled(void)
@@ -3340,6 +3342,13 @@ static inline int irqtime_enabled(void)
 	return 0;
 }
 
+static inline bool sched_clock_supports_irqtime_acct(void)
+{
+	return false;
+}
+
+static inline void enable_sched_clock_irqtime(void) {}
+static inline void disable_sched_clock_irqtime(void) {}
 #endif /* !CONFIG_IRQ_TIME_ACCOUNTING */
 
 #ifdef CONFIG_CPU_FREQ
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index f39111830ca3..1cdd10026279 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -174,6 +174,18 @@ static enum hrtimer_restart sched_clock_poll(struct hrtimer *hrt)
 	return HRTIMER_RESTART;
 }
 
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+/*
+ * Enable IRQ time accounting if we have a fast enough sched_clock().
+ * This is checked as a part of late_initcall() once all the clock
+ * devices have registered themselves.
+ */
+bool sched_clock_supports_irqtime_acct(void)
+{
+	return irqtime > 0 || (irqtime == -1 && cd.rate >= 1000000);
+}
+#endif
+
 void sched_clock_register(u64 (*read)(void), int bits, unsigned long rate)
 {
 	u64 res, wrap, new_mask, new_epoch, cyc, ns;
@@ -238,10 +250,6 @@ void sched_clock_register(u64 (*read)(void), int bits, unsigned long rate)
 	pr_info("sched_clock: %u bits at %lu%cHz, resolution %lluns, wraps every %lluns\n",
 		bits, r, r_unit, res, wrap);
 
-	/* Enable IRQ time accounting if we have a fast enough sched_clock() */
-	if (irqtime > 0 || (irqtime == -1 && rate >= 1000000))
-		enable_sched_clock_irqtime();
-
 	local_irq_restore(flags);
 
 	pr_debug("Registered %pS as sched_clock source\n", read);
---

I don't know if it is any better (or even correct) but it does reduce
the scope of {enable,disable}_sched_clock_irqtime() to kernel/sched/.
Thoughts?

-- 
Thanks and Regards,
Prateek

Re: [PATCH v3] sched/clock: Avoid false sharing for sched_clock_irqtime

Posted by Vincent Guittot 3 weeks, 1 day ago

On Fri, 16 Jan 2026 at 10:43, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Hello Wangyang,
>
> On 1/16/2026 8:09 AM, Wangyang Guo wrote:
> >   */
> >  DEFINE_PER_CPU(struct irqtime, cpu_irqtime);
> >
> > -int sched_clock_irqtime;
> > -
> >  void enable_sched_clock_irqtime(void)
> >  {
> > -     sched_clock_irqtime = 1;
> > +     static_branch_enable(&sched_clock_irqtime);
> >  }
> >
> > +static void __disable_sched_clock_irqtime(struct work_struct *work)
> > +{
> > +     static_branch_disable(&sched_clock_irqtime);
> > +}
> > +
> > +static DECLARE_WORK(sched_clock_irqtime_work, __disable_sched_clock_irqtime);
> > +
> >  void disable_sched_clock_irqtime(void)
> >  {
> > -     sched_clock_irqtime = 0;
> > +     /* disable_sched_clock_irqtime can be called in atomic
> > +      * context with mark_tsc_unstable(), use wq to avoid
> > +      * "sleeping in atomic context" warning.
> > +      */
> > +     if (irqtime_enabled())
> > +             schedule_work(&sched_clock_irqtime_work);
> >  }
>
> Your approach looks good to avoid the scheduling while atomic issue.
> Just a small observation: The only user of disable_sched_clock_irqtime()
> is tsc_.*mark_unstable() which calls clear_sched_clock_stable() just
> before doing disable_sched_clock_irqtime().
>
> It makes me wonder if we can just reuse "sched_clock_work" to also\
> disable sched_clock_irqtime()?
>
> Peter, Vincent, do we need to do enable_sched_clock_irqtime() that early
> when we detect TSC freq / sched_clock_register() or can we wait until we
> do __set_sched_clock_stable()?

By default we don't need a workqueue to disable sched clock irq time
but only tsc clock needs it just like when it disables
sched_lock_stable

So the enablement during init should remain the same. Why would all
sched clocks delay their irq time accounting just for tsc.

Furthermore, __set_sched_clock_stable() is under
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK

I think that disabling irq time accounting if it was enabled in
__sched_clock_work() should be good

>
> If we can wait until we mark sched_clock() as stable, we can consolidate
> enabling / disabling of irqtime with that of __sched_clock_stable.
> Something like the following on top of Wangyang's patch:
>
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 7d3e13e14eab..8bb9c0baa93d 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1143,7 +1143,6 @@ static void tsc_cs_mark_unstable(struct clocksource *cs)
>         tsc_unstable = 1;
>         if (using_native_sched_clock())
>                 clear_sched_clock_stable();
> -       disable_sched_clock_irqtime();
>         pr_info("Marking TSC unstable due to clocksource watchdog\n");
>  }
>
> @@ -1213,7 +1212,6 @@ void mark_tsc_unstable(char *reason)
>         tsc_unstable = 1;
>         if (using_native_sched_clock())
>                 clear_sched_clock_stable();
> -       disable_sched_clock_irqtime();
>         pr_info("Marking TSC unstable due to %s\n", reason);
>
>         clocksource_mark_unstable(&clocksource_tsc_early);
> @@ -1234,6 +1232,22 @@ bool tsc_clocksource_watchdog_disabled(void)
>                tsc_as_watchdog && !no_tsc_watchdog;
>  }
>
> +#ifdef CONFIG_IRQ_TIME_ACCOUNTING
> +/*
> + * Allow IRQ time accounting if the user hasn't
> + * disabled it and TSC is found to be stable at
> + * the time of late_initcall().
> + *
> + * If the TSC is detected to be unstable later,
> + * the IRQ time accounting will be disabled from
> + * clear_sched_clock_stable().
> + */
> +bool sched_clock_supports_irqtime_acct(void)
> +{
> +       return !no_sched_irq_time && !tsc_unstable;
> +}
> +#endif
> +
>  static void __init check_system_tsc_reliable(void)
>  {
>  #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
> @@ -1551,9 +1565,6 @@ void __init tsc_init(void)
>
>         cyc2ns_init_secondary_cpus();
>
> -       if (!no_sched_irq_time)
> -               enable_sched_clock_irqtime();
> -
>         lpj_fine = get_loops_per_jiffy();
>
>         check_system_tsc_reliable();
> diff --git a/include/linux/sched/clock.h b/include/linux/sched/clock.h
> index 196f0ca351a2..e9b8c88fada5 100644
> --- a/include/linux/sched/clock.h
> +++ b/include/linux/sched/clock.h
> @@ -104,11 +104,7 @@ extern u64 local_clock(void);
>   * The reason for this explicit opt-in is not to have perf penalty with
>   * slow sched_clocks.
>   */
> -extern void enable_sched_clock_irqtime(void);
> -extern void disable_sched_clock_irqtime(void);
> -#else
> -static inline void enable_sched_clock_irqtime(void) {}
> -static inline void disable_sched_clock_irqtime(void) {}
> +bool sched_clock_supports_irqtime_acct(void);
>  #endif
>
>  #endif /* _LINUX_SCHED_CLOCK_H */
> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index f5e6dd6a6b3a..4d43eef8c326 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -137,6 +137,8 @@ notrace static void __set_sched_clock_stable(void)
>                         scd->tick_gtod, __gtod_offset,
>                         scd->tick_raw,  __sched_clock_offset);
>
> +       if (sched_clock_supports_irqtime_acct())
> +               enable_sched_clock_irqtime();
>         static_branch_enable(&__sched_clock_stable);
>         tick_dep_clear(TICK_DEP_BIT_CLOCK_UNSTABLE);
>  }
> @@ -173,6 +175,7 @@ notrace static void __sched_clock_work(struct work_struct *work)
>                         scd->tick_gtod, __gtod_offset,
>                         scd->tick_raw,  __sched_clock_offset);
>
> +       disable_sched_clock_irqtime();
>         static_branch_disable(&__sched_clock_stable);
>  }
>
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index a5a8bd0a5ede..83dd9f299ee4 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -32,21 +32,9 @@ void enable_sched_clock_irqtime(void)
>         static_branch_enable(&sched_clock_irqtime);
>  }
>
> -static void __disable_sched_clock_irqtime(struct work_struct *work)
> -{
> -       static_branch_disable(&sched_clock_irqtime);
> -}
> -
> -static DECLARE_WORK(sched_clock_irqtime_work, __disable_sched_clock_irqtime);
> -
>  void disable_sched_clock_irqtime(void)
>  {
> -       /* disable_sched_clock_irqtime can be called in atomic
> -        * context with mark_tsc_unstable(), use wq to avoid
> -        * "sleeping in atomic context" warning.
> -        */
> -       if (irqtime_enabled())
> -               schedule_work(&sched_clock_irqtime_work);
> +       static_branch_disable(&sched_clock_irqtime);
>  }
>
>  static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 164ebf47e5fd..3bae8baf7c00 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -3333,6 +3333,8 @@ static inline u64 irq_time_read(int cpu)
>         return total;
>  }
>
> +void enable_sched_clock_irqtime(void);
> +void disable_sched_clock_irqtime(void);
>  #else /* !CONFIG_IRQ_TIME_ACCOUNTING: */
>
>  static inline int irqtime_enabled(void)
> @@ -3340,6 +3342,13 @@ static inline int irqtime_enabled(void)
>         return 0;
>  }
>
> +static inline bool sched_clock_supports_irqtime_acct(void)
> +{
> +       return false;
> +}
> +
> +static inline void enable_sched_clock_irqtime(void) {}
> +static inline void disable_sched_clock_irqtime(void) {}
>  #endif /* !CONFIG_IRQ_TIME_ACCOUNTING */
>
>  #ifdef CONFIG_CPU_FREQ
> diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
> index f39111830ca3..1cdd10026279 100644
> --- a/kernel/time/sched_clock.c
> +++ b/kernel/time/sched_clock.c
> @@ -174,6 +174,18 @@ static enum hrtimer_restart sched_clock_poll(struct hrtimer *hrt)
>         return HRTIMER_RESTART;
>  }
>
> +#ifdef CONFIG_IRQ_TIME_ACCOUNTING
> +/*
> + * Enable IRQ time accounting if we have a fast enough sched_clock().
> + * This is checked as a part of late_initcall() once all the clock
> + * devices have registered themselves.
> + */
> +bool sched_clock_supports_irqtime_acct(void)
> +{
> +       return irqtime > 0 || (irqtime == -1 && cd.rate >= 1000000);
> +}
> +#endif
> +
>  void sched_clock_register(u64 (*read)(void), int bits, unsigned long rate)
>  {
>         u64 res, wrap, new_mask, new_epoch, cyc, ns;
> @@ -238,10 +250,6 @@ void sched_clock_register(u64 (*read)(void), int bits, unsigned long rate)
>         pr_info("sched_clock: %u bits at %lu%cHz, resolution %lluns, wraps every %lluns\n",
>                 bits, r, r_unit, res, wrap);
>
> -       /* Enable IRQ time accounting if we have a fast enough sched_clock() */
> -       if (irqtime > 0 || (irqtime == -1 && rate >= 1000000))
> -               enable_sched_clock_irqtime();
> -
>         local_irq_restore(flags);
>
>         pr_debug("Registered %pS as sched_clock source\n", read);
> ---
>
> I don't know if it is any better (or even correct) but it does reduce
> the scope of {enable,disable}_sched_clock_irqtime() to kernel/sched/.
> Thoughts?
>
> --
> Thanks and Regards,
> Prateek
>

Re: [PATCH v3] sched/clock: Avoid false sharing for sched_clock_irqtime

Posted by K Prateek Nayak 3 weeks, 1 day ago

Hello Vincent,

On 1/16/2026 3:52 PM, Vincent Guittot wrote:
> On Fri, 16 Jan 2026 at 10:43, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>>
>> Hello Wangyang,
>>
>> On 1/16/2026 8:09 AM, Wangyang Guo wrote:
>>>   */
>>>  DEFINE_PER_CPU(struct irqtime, cpu_irqtime);
>>>
>>> -int sched_clock_irqtime;
>>> -
>>>  void enable_sched_clock_irqtime(void)
>>>  {
>>> -     sched_clock_irqtime = 1;
>>> +     static_branch_enable(&sched_clock_irqtime);
>>>  }
>>>
>>> +static void __disable_sched_clock_irqtime(struct work_struct *work)
>>> +{
>>> +     static_branch_disable(&sched_clock_irqtime);
>>> +}
>>> +
>>> +static DECLARE_WORK(sched_clock_irqtime_work, __disable_sched_clock_irqtime);
>>> +
>>>  void disable_sched_clock_irqtime(void)
>>>  {
>>> -     sched_clock_irqtime = 0;
>>> +     /* disable_sched_clock_irqtime can be called in atomic
>>> +      * context with mark_tsc_unstable(), use wq to avoid
>>> +      * "sleeping in atomic context" warning.
>>> +      */
>>> +     if (irqtime_enabled())
>>> +             schedule_work(&sched_clock_irqtime_work);
>>>  }
>>
>> Your approach looks good to avoid the scheduling while atomic issue.
>> Just a small observation: The only user of disable_sched_clock_irqtime()
>> is tsc_.*mark_unstable() which calls clear_sched_clock_stable() just
>> before doing disable_sched_clock_irqtime().
>>
>> It makes me wonder if we can just reuse "sched_clock_work" to also\
>> disable sched_clock_irqtime()?
>>
>> Peter, Vincent, do we need to do enable_sched_clock_irqtime() that early
>> when we detect TSC freq / sched_clock_register() or can we wait until we
>> do __set_sched_clock_stable()?
> 
> By default we don't need a workqueue to disable sched clock irq time
> but only tsc clock needs it just like when it disables
> sched_lock_stable
> 
> So the enablement during init should remain the same. Why would all
> sched clocks delay their irq time accounting just for tsc.

Yeah it was a stupid idea to consolidate the sched_clock() enable,
disable with __sched_clock_stable enable / disable. Clearly I missed the
whole CONFIG_HAVE_UNSTABLE_SCHED_CLOCK dependency.

> 
> Furthermore, __set_sched_clock_stable() is under
> CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> 
> I think that disabling irq time accounting if it was enabled in
> __sched_clock_work() should be good

I though about this and you can have this particular case as a result
of when TSC vs sched_clock is marked unstable:

  tsc_init()
    enable_sched_clock_irqtime() # irqtime accounting is enabled here
    ...
    if (unsynchronized_tsc()) # true
      mark_tsc_unstable()
        clear_sched_clock_stable()
          __sched_clock_stable_early = 0;
          ...
          if (static_key_count(&sched_clock_running.key) == 2) # Only happens at sched_clock_init_late()
            __clear_sched_clock_stable(); # Never executed
  ...

  # late_initcall() phase
  sched_clock_init_late()
    if (__sched_clock_stable_early) # Already false
      __set_sched_clock_stable(); # sched_clock is never marked stable

  # TSC unstable; irqtime_enabled() is true


The current approach from Wangyang covers this case so the v3 should be
good as is.

-- 
Thanks and Regards,
Prateek

Re: [PATCH v3] sched/clock: Avoid false sharing for sched_clock_irqtime

Posted by Vincent Guittot 3 weeks ago

On Fri, 16 Jan 2026 at 15:22, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Hello Vincent,
>
> On 1/16/2026 3:52 PM, Vincent Guittot wrote:
> > On Fri, 16 Jan 2026 at 10:43, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> >>
> >> Hello Wangyang,
> >>
> >> On 1/16/2026 8:09 AM, Wangyang Guo wrote:
> >>>   */
> >>>  DEFINE_PER_CPU(struct irqtime, cpu_irqtime);
> >>>
> >>> -int sched_clock_irqtime;
> >>> -
> >>>  void enable_sched_clock_irqtime(void)
> >>>  {
> >>> -     sched_clock_irqtime = 1;
> >>> +     static_branch_enable(&sched_clock_irqtime);
> >>>  }
> >>>
> >>> +static void __disable_sched_clock_irqtime(struct work_struct *work)
> >>> +{
> >>> +     static_branch_disable(&sched_clock_irqtime);
> >>> +}
> >>> +
> >>> +static DECLARE_WORK(sched_clock_irqtime_work, __disable_sched_clock_irqtime);
> >>> +
> >>>  void disable_sched_clock_irqtime(void)
> >>>  {
> >>> -     sched_clock_irqtime = 0;
> >>> +     /* disable_sched_clock_irqtime can be called in atomic
> >>> +      * context with mark_tsc_unstable(), use wq to avoid
> >>> +      * "sleeping in atomic context" warning.
> >>> +      */
> >>> +     if (irqtime_enabled())
> >>> +             schedule_work(&sched_clock_irqtime_work);
> >>>  }
> >>
> >> Your approach looks good to avoid the scheduling while atomic issue.
> >> Just a small observation: The only user of disable_sched_clock_irqtime()
> >> is tsc_.*mark_unstable() which calls clear_sched_clock_stable() just
> >> before doing disable_sched_clock_irqtime().
> >>
> >> It makes me wonder if we can just reuse "sched_clock_work" to also\
> >> disable sched_clock_irqtime()?
> >>
> >> Peter, Vincent, do we need to do enable_sched_clock_irqtime() that early
> >> when we detect TSC freq / sched_clock_register() or can we wait until we
> >> do __set_sched_clock_stable()?
> >
> > By default we don't need a workqueue to disable sched clock irq time
> > but only tsc clock needs it just like when it disables
> > sched_lock_stable
> >
> > So the enablement during init should remain the same. Why would all
> > sched clocks delay their irq time accounting just for tsc.
>
> Yeah it was a stupid idea to consolidate the sched_clock() enable,
> disable with __sched_clock_stable enable / disable. Clearly I missed the
> whole CONFIG_HAVE_UNSTABLE_SCHED_CLOCK dependency.
>
> >
> > Furthermore, __set_sched_clock_stable() is under
> > CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> >
> > I think that disabling irq time accounting if it was enabled in
> > __sched_clock_work() should be good
>
> I though about this and you can have this particular case as a result
> of when TSC vs sched_clock is marked unstable:
>
>   tsc_init()
>     enable_sched_clock_irqtime() # irqtime accounting is enabled here
>     ...
>     if (unsynchronized_tsc()) # true
>       mark_tsc_unstable()
>         clear_sched_clock_stable()
>           __sched_clock_stable_early = 0;
>           ...
>           if (static_key_count(&sched_clock_running.key) == 2) # Only happens at sched_clock_init_late()
>             __clear_sched_clock_stable(); # Never executed
>   ...
>
>   # late_initcall() phase
>   sched_clock_init_late()
>     if (__sched_clock_stable_early) # Already false
>       __set_sched_clock_stable(); # sched_clock is never marked stable
>
>   # TSC unstable; irqtime_enabled() is true
>
>
> The current approach from Wangyang covers this case so the v3 should be
> good as is.

The need of a workqueue is only for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
and I'd like to keep it there instead of making the use of a workqueue
the default behavior for disabling irq time accounting whereas it's
not needed

So either we can use a single workqueue in unstable clock to handle
the disabling of one or both __sched_clock_stable and
sched_clock_irqtime

or  unstable clock need another workqueue for disabling sched_clock_irqtime

>
> --
> Thanks and Regards,
> Prateek
>

Re: [PATCH v3] sched/clock: Avoid false sharing for sched_clock_irqtime

Posted by K Prateek Nayak 2 weeks, 5 days ago

Hello Vincent,

On 1/16/2026 10:11 PM, Vincent Guittot wrote:
>>> I think that disabling irq time accounting if it was enabled in
>>> __sched_clock_work() should be good
>>
>> I though about this and you can have this particular case as a result
>> of when TSC vs sched_clock is marked unstable:
>>
>>   tsc_init()
>>     enable_sched_clock_irqtime() # irqtime accounting is enabled here
>>     ...
>>     if (unsynchronized_tsc()) # true
>>       mark_tsc_unstable()
>>         clear_sched_clock_stable()
>>           __sched_clock_stable_early = 0;
>>           ...
>>           if (static_key_count(&sched_clock_running.key) == 2) # Only happens at sched_clock_init_late()
>>             __clear_sched_clock_stable(); # Never executed
>>   ...
>>
>>   # late_initcall() phase
>>   sched_clock_init_late()
>>     if (__sched_clock_stable_early) # Already false
>>       __set_sched_clock_stable(); # sched_clock is never marked stable
>>
>>   # TSC unstable; irqtime_enabled() is true
>>
>>
>> The current approach from Wangyang covers this case so the v3 should be
>> good as is.
> 
> The need of a workqueue is only for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> and I'd like to keep it there instead of making the use of a workqueue
> the default behavior for disabling irq time accounting whereas it's
> not needed
> 
> So either we can use a single workqueue in unstable clock to handle
> the disabling of one or both __sched_clock_stable and
> sched_clock_irqtime
> 
> or  unstable clock need another workqueue for disabling sched_clock_irqtime

Ack! We'll likely need a second workqueue and call it early in
__clear_sched_clock_stable().

AFAICT, PA-RISC already suffers from this since it marks
sched_clock_unstable() on SMP but since it uses the generic sched_clock,
on a >= 1MHz processor, it enabled irqtime anyways and nothing disables
it later (unlike TSC, the clocksource doesn't have a "mark_unstable"
callback).

If we can delay enabling the irqtime accounting until we do
__set_sched_clock_stable() for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK (I have
no clue if this is acceptable of not), then we can move the enabling /
disabling of irqtime to the same spots as when we flip
__sched_clock_stable.

-- 
Thanks and Regards,
Prateek

Re: [PATCH v3] sched/clock: Avoid false sharing for sched_clock_irqtime

Posted by Vincent Guittot 2 weeks, 2 days ago

On Mon, 19 Jan 2026 at 05:13, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Hello Vincent,
>
> On 1/16/2026 10:11 PM, Vincent Guittot wrote:
> >>> I think that disabling irq time accounting if it was enabled in
> >>> __sched_clock_work() should be good
> >>
> >> I though about this and you can have this particular case as a result
> >> of when TSC vs sched_clock is marked unstable:
> >>
> >>   tsc_init()
> >>     enable_sched_clock_irqtime() # irqtime accounting is enabled here
> >>     ...
> >>     if (unsynchronized_tsc()) # true
> >>       mark_tsc_unstable()
> >>         clear_sched_clock_stable()
> >>           __sched_clock_stable_early = 0;
> >>           ...
> >>           if (static_key_count(&sched_clock_running.key) == 2) # Only happens at sched_clock_init_late()
> >>             __clear_sched_clock_stable(); # Never executed
> >>   ...
> >>
> >>   # late_initcall() phase
> >>   sched_clock_init_late()
> >>     if (__sched_clock_stable_early) # Already false
> >>       __set_sched_clock_stable(); # sched_clock is never marked stable
> >>
> >>   # TSC unstable; irqtime_enabled() is true
> >>
> >>
> >> The current approach from Wangyang covers this case so the v3 should be
> >> good as is.
> >
> > The need of a workqueue is only for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> > and I'd like to keep it there instead of making the use of a workqueue
> > the default behavior for disabling irq time accounting whereas it's
> > not needed
> >
> > So either we can use a single workqueue in unstable clock to handle
> > the disabling of one or both __sched_clock_stable and
> > sched_clock_irqtime
> >
> > or  unstable clock need another workqueue for disabling sched_clock_irqtime
>
> Ack! We'll likely need a second workqueue and call it early in
> __clear_sched_clock_stable().
>
> AFAICT, PA-RISC already suffers from this since it marks
> sched_clock_unstable() on SMP but since it uses the generic sched_clock,
> on a >= 1MHz processor, it enabled irqtime anyways and nothing disables
> it later (unlike TSC, the clocksource doesn't have a "mark_unstable"
> callback).

>
> If we can delay enabling the irqtime accounting until we do
> __set_sched_clock_stable() for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK (I have
> no clue if this is acceptable of not), then we can move the enabling /
> disabling of irqtime to the same spots as when we flip
> __sched_clock_stable.

Do you mean to delay the enabling/disabling of irqtime accounting in
sched_clock_init_late for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK case ?

I don't have system with unstable clock so I'm not the best to answer
but if irqtime accounting is not compatible with unstable clock it
should be enabled at the same time as the clock is marked stable

>
> --
> Thanks and Regards,
> Prateek
>

Re: [PATCH v3] sched/clock: Avoid false sharing for sched_clock_irqtime

Posted by K Prateek Nayak 2 weeks, 1 day ago

Hello Vincent,

On 1/22/2026 4:06 PM, Vincent Guittot wrote:
>>> or  unstable clock need another workqueue for disabling sched_clock_irqtime
>>
>> Ack! We'll likely need a second workqueue and call it early in
>> __clear_sched_clock_stable().
>>
>> AFAICT, PA-RISC already suffers from this since it marks
>> sched_clock_unstable() on SMP but since it uses the generic sched_clock,
>> on a >= 1MHz processor, it enabled irqtime anyways and nothing disables
>> it later (unlike TSC, the clocksource doesn't have a "mark_unstable"
>> callback).
> 
>>
>> If we can delay enabling the irqtime accounting until we do
>> __set_sched_clock_stable() for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK (I have
>> no clue if this is acceptable of not), then we can move the enabling /
>> disabling of irqtime to the same spots as when we flip
>> __sched_clock_stable.
> 
> Do you mean to delay the enabling/disabling of irqtime accounting in
> sched_clock_init_late for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK case ?

Yup!

> 
> I don't have system with unstable clock so I'm not the best to answer
> but if irqtime accounting is not compatible with unstable clock it
> should be enabled at the same time as the clock is marked stable

Here are logs from my system with stable TSC highlighting the flow:

    [    0.000000] tsc: Fast TSC calibration using PIT
    [    0.000000] tsc: Detected 1996.299 MHz processor

    # native_sched_clock() stats using TSC from jiffies
    # Once we hit sched_clock_init() we'll start using
    # local_clock_noinstr which uses tsc.

    [    6.497022] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x398d0c7513b, max_idle_ns: 881590744042 ns
    [    6.507546] Calibrating delay loop (skipped), value calculated using timer frequency.. 3992.59 BogoMIPS (lpj=7985196)

    # tsc-early is registered

    [   10.455611] clocksource: Switched to clocksource tsc-early

    # Clocksource has switched to tsc-early

    [   13.863291] tsc: Refined TSC clocksource calibration: 1996.250 MHz
    [   13.869689] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x398caf77d7a, max_idle_ns: 881590459467 ns
    [   13.880722] clocksource: Switched to clocksource tsc

    # Full tsc switch now that HPET is registered and the granularity
    # of TSC calibration can be found more precisely

    [   15.382498] sched_clock: Marking stable (15304347096, 75265818)->(17063112601, -1683499687)

    # sched-clock is finally marked stable at sched_clock_init_late()
    # __sched_clock_offset is calculated and __sched_clock_stable is flipped


If TSC / any potentially unstable clock is marked unstable before
sched_clock_init_late(), we never flip __sched_clock_stable.

On x86 side, since we already start using TSC for native_sched_clock()
super early, we have defensive measures (and a __use_tsc static key on
the x86 side too) to switch the sched clock back to jiffies if we
realise TSC was unstable before we hit sched_clock_init_late().

This is necessary since post the sched_clock_init(),
"sched_clock_running" is true and the local clocks will always uses
sched_clock_noinstr() and it must be switch to jiffies on the x86 side
since arch defines it.

-- 
Thanks and Regards,
Prateek

Re: [PATCH v3] sched/clock: Avoid false sharing for sched_clock_irqtime

Posted by Vincent Guittot 2 weeks ago

On Fri, 23 Jan 2026 at 05:51, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Hello Vincent,
>
> On 1/22/2026 4:06 PM, Vincent Guittot wrote:
> >>> or  unstable clock need another workqueue for disabling sched_clock_irqtime
> >>
> >> Ack! We'll likely need a second workqueue and call it early in
> >> __clear_sched_clock_stable().
> >>
> >> AFAICT, PA-RISC already suffers from this since it marks
> >> sched_clock_unstable() on SMP but since it uses the generic sched_clock,
> >> on a >= 1MHz processor, it enabled irqtime anyways and nothing disables
> >> it later (unlike TSC, the clocksource doesn't have a "mark_unstable"
> >> callback).
> >
> >>
> >> If we can delay enabling the irqtime accounting until we do
> >> __set_sched_clock_stable() for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK (I have
> >> no clue if this is acceptable of not), then we can move the enabling /
> >> disabling of irqtime to the same spots as when we flip
> >> __sched_clock_stable.
> >
> > Do you mean to delay the enabling/disabling of irqtime accounting in
> > sched_clock_init_late for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK case ?
>
> Yup!
>
> >
> > I don't have system with unstable clock so I'm not the best to answer
> > but if irqtime accounting is not compatible with unstable clock it
> > should be enabled at the same time as the clock is marked stable
>
> Here are logs from my system with stable TSC highlighting the flow:
>
>     [    0.000000] tsc: Fast TSC calibration using PIT
>     [    0.000000] tsc: Detected 1996.299 MHz processor
>
>     # native_sched_clock() stats using TSC from jiffies
>     # Once we hit sched_clock_init() we'll start using
>     # local_clock_noinstr which uses tsc.
>
>     [    6.497022] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x398d0c7513b, max_idle_ns: 881590744042 ns
>     [    6.507546] Calibrating delay loop (skipped), value calculated using timer frequency.. 3992.59 BogoMIPS (lpj=7985196)
>
>     # tsc-early is registered
>
>     [   10.455611] clocksource: Switched to clocksource tsc-early
>
>     # Clocksource has switched to tsc-early
>
>     [   13.863291] tsc: Refined TSC clocksource calibration: 1996.250 MHz
>     [   13.869689] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x398caf77d7a, max_idle_ns: 881590459467 ns
>     [   13.880722] clocksource: Switched to clocksource tsc
>
>     # Full tsc switch now that HPET is registered and the granularity
>     # of TSC calibration can be found more precisely
>
>     [   15.382498] sched_clock: Marking stable (15304347096, 75265818)->(17063112601, -1683499687)
>
>     # sched-clock is finally marked stable at sched_clock_init_late()
>     # __sched_clock_offset is calculated and __sched_clock_stable is flipped
>
>
> If TSC / any potentially unstable clock is marked unstable before
> sched_clock_init_late(), we never flip __sched_clock_stable.
>
> On x86 side, since we already start using TSC for native_sched_clock()
> super early, we have defensive measures (and a __use_tsc static key on
> the x86 side too) to switch the sched clock back to jiffies if we
> realise TSC was unstable before we hit sched_clock_init_late().
>
> This is necessary since post the sched_clock_init(),
> "sched_clock_running" is true and the local clocks will always uses
> sched_clock_noinstr() and it must be switch to jiffies on the x86 side
> since arch defines it.

sched_clock_init_late() is always called so can't we check if
irqtime_enabled and clock is unstable then we disable irqtime
accounting

>
> --
> Thanks and Regards,
> Prateek
>

Re: [PATCH v3] sched/clock: Avoid false sharing for sched_clock_irqtime

Posted by K Prateek Nayak 2 weeks ago

Hello Vincent,

On 1/23/2026 9:28 PM, Vincent Guittot wrote:
>> If TSC / any potentially unstable clock is marked unstable before
>> sched_clock_init_late(), we never flip __sched_clock_stable.
>>
>> On x86 side, since we already start using TSC for native_sched_clock()
>> super early, we have defensive measures (and a __use_tsc static key on
>> the x86 side too) to switch the sched clock back to jiffies if we
>> realise TSC was unstable before we hit sched_clock_init_late().
>>
>> This is necessary since post the sched_clock_init(),
>> "sched_clock_running" is true and the local clocks will always uses
>> sched_clock_noinstr() and it must be switch to jiffies on the x86 side
>> since arch defines it.
> 
> sched_clock_init_late() is always called so can't we check if
> irqtime_enabled and clock is unstable then we disable irqtime
> accounting

That can be done, yes.

Wangyang, can you also consolidate the irqtime disable bits (TSC and
generic) to trigger at sched_clock_init_late() if the clock was found
unstable and in __sched_clock_work()? 

Both context can directly do a static_branch_disable() without
needing an extra delayed work.

-- 
Thanks and Regards,
Prateek

Re: [PATCH v3] sched/clock: Avoid false sharing for sched_clock_irqtime

Posted by Guo, Wangyang 1 week, 5 days ago

On 1/24/2026 12:48 AM, K Prateek Nayak wrote:
> Hello Vincent,
> 
> On 1/23/2026 9:28 PM, Vincent Guittot wrote:
>>> If TSC / any potentially unstable clock is marked unstable before
>>> sched_clock_init_late(), we never flip __sched_clock_stable.
>>>
>>> On x86 side, since we already start using TSC for native_sched_clock()
>>> super early, we have defensive measures (and a __use_tsc static key on
>>> the x86 side too) to switch the sched clock back to jiffies if we
>>> realise TSC was unstable before we hit sched_clock_init_late().
>>>
>>> This is necessary since post the sched_clock_init(),
>>> "sched_clock_running" is true and the local clocks will always uses
>>> sched_clock_noinstr() and it must be switch to jiffies on the x86 side
>>> since arch defines it.
>>
>> sched_clock_init_late() is always called so can't we check if
>> irqtime_enabled and clock is unstable then we disable irqtime
>> accounting
> 
> That can be done, yes.
> 
> Wangyang, can you also consolidate the irqtime disable bits (TSC and
> generic) to trigger at sched_clock_init_late() if the clock was found
> unstable and in __sched_clock_work()?
> 
> Both context can directly do a static_branch_disable() without
> needing an extra delayed work.

Thanks for the advise, v4 sent out, please help to review:
https://lore.kernel.org/all/20260126021401.1490163-1-wangyang.guo@intel.com/

BR
Wangyang