[PATCH v9 2/2] sched: Annotate sched_clock_irqtime with __read_mostly

Yafang Shao posted 2 patches 7 months, 1 week ago
[PATCH v9 2/2] sched: Annotate sched_clock_irqtime with __read_mostly
Posted by Yafang Shao 7 months, 1 week ago
Eric reported an issue [0] as follows,
: rebalance_domains() can attempt to change sched_balance_running
: more than 350,000 times per second on our servers.

: If sched_clock_irqtime and sched_balance_running share the
: same cache line, we see a very high cost on hosts with 480 threads
: dealing with many interrupts.

While the rebalance_domains() issue has been resolved [1], we should
proactively annotate sched_clock_irqtime with __read_mostly to prevent
potential cacheline false sharing. This optimization is particularly
justified since sched_clock_irqtime is only modified during TSC instability
events.

Link: https://lore.kernel.org/all/20250423174634.3009657-1-edumazet@google.com/ [0]
Link: https://lore.kernel.org/all/20250416035823.1846307-1-tim.c.chen@linux.intel.com/ [1]

Reported-by: Eric Dumazet <edumazet@google.com>
Debugged-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
---
 kernel/sched/cputime.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 6dab4854c6c0..c499a42ceda4 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -22,7 +22,7 @@
  */
 DEFINE_PER_CPU(struct irqtime, cpu_irqtime);
 
-int sched_clock_irqtime;
+int __read_mostly sched_clock_irqtime;
 
 void enable_sched_clock_irqtime(void)
 {
-- 
2.43.5
Re: [PATCH v9 2/2] sched: Annotate sched_clock_irqtime with __read_mostly
Posted by Michal Koutný 6 months, 2 weeks ago
On Sun, May 11, 2025 at 11:08:00AM +0800, Yafang Shao <laoar.shao@gmail.com> wrote:
> Eric reported an issue [0] as follows,
> : rebalance_domains() can attempt to change sched_balance_running
> : more than 350,000 times per second on our servers.
> 
> : If sched_clock_irqtime and sched_balance_running share the
> : same cache line, we see a very high cost on hosts with 480 threads
> : dealing with many interrupts.

I'd say this patch could be independent from the "series".

> While the rebalance_domains() issue has been resolved [1], we should
> proactively annotate sched_clock_irqtime with __read_mostly to prevent
> potential cacheline false sharing. This optimization is particularly
> justified since sched_clock_irqtime is only modified during TSC instability
> events.
> 
> Link: https://lore.kernel.org/all/20250423174634.3009657-1-edumazet@google.com/ [0]
> Link: https://lore.kernel.org/all/20250416035823.1846307-1-tim.c.chen@linux.intel.com/ [1]
> 
> Reported-by: Eric Dumazet <edumazet@google.com>
> Debugged-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Eric Dumazet <edumazet@google.com>

I can say
Reviewed-by: Michal Koutný <mkoutny@suse.com>

but it'd be good to have also Tested-by: wrt the cache traffic
reduction.

0.02€,
Michal
Re: [PATCH v9 2/2] sched: Annotate sched_clock_irqtime with __read_mostly
Posted by Yafang Shao 6 months, 2 weeks ago
On Tue, May 27, 2025 at 11:33 PM Michal Koutný <mkoutny@suse.com> wrote:
>
> On Sun, May 11, 2025 at 11:08:00AM +0800, Yafang Shao <laoar.shao@gmail.com> wrote:
> > Eric reported an issue [0] as follows,
> > : rebalance_domains() can attempt to change sched_balance_running
> > : more than 350,000 times per second on our servers.
> >
> > : If sched_clock_irqtime and sched_balance_running share the
> > : same cache line, we see a very high cost on hosts with 480 threads
> > : dealing with many interrupts.
>
> I'd say this patch could be independent from the "series".

I will send it separately.

>
> > While the rebalance_domains() issue has been resolved [1], we should
> > proactively annotate sched_clock_irqtime with __read_mostly to prevent
> > potential cacheline false sharing. This optimization is particularly
> > justified since sched_clock_irqtime is only modified during TSC instability
> > events.
> >
> > Link: https://lore.kernel.org/all/20250423174634.3009657-1-edumazet@google.com/ [0]
> > Link: https://lore.kernel.org/all/20250416035823.1846307-1-tim.c.chen@linux.intel.com/ [1]
> >
> > Reported-by: Eric Dumazet <edumazet@google.com>
> > Debugged-by: Eric Dumazet <edumazet@google.com>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Eric Dumazet <edumazet@google.com>
>
> I can say
> Reviewed-by: Michal Koutný <mkoutny@suse.com>

Thanks for the review.

>
> but it'd be good to have also Tested-by: wrt the cache traffic
> reduction.


-- 
Regards
Yafang