Eric reported an issue [0] as follows,
: rebalance_domains() can attempt to change sched_balance_running
: more than 350,000 times per second on our servers.
: If sched_clock_irqtime and sched_balance_running share the
: same cache line, we see a very high cost on hosts with 480 threads
: dealing with many interrupts.
While the rebalance_domains() issue has been resolved [1], we should
proactively annotate sched_clock_irqtime with __read_mostly to prevent
potential cacheline false sharing. This optimization is particularly
justified since sched_clock_irqtime is only modified during TSC instability
events.
Link: https://lore.kernel.org/all/20250423174634.3009657-1-edumazet@google.com/ [0]
Link: https://lore.kernel.org/all/20250416035823.1846307-1-tim.c.chen@linux.intel.com/ [1]
Reported-by: Eric Dumazet <edumazet@google.com>
Debugged-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
---
kernel/sched/cputime.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 6dab4854c6c0..c499a42ceda4 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -22,7 +22,7 @@
*/
DEFINE_PER_CPU(struct irqtime, cpu_irqtime);
-int sched_clock_irqtime;
+int __read_mostly sched_clock_irqtime;
void enable_sched_clock_irqtime(void)
{
--
2.43.5
On Sun, May 11, 2025 at 11:08:00AM +0800, Yafang Shao <laoar.shao@gmail.com> wrote: > Eric reported an issue [0] as follows, > : rebalance_domains() can attempt to change sched_balance_running > : more than 350,000 times per second on our servers. > > : If sched_clock_irqtime and sched_balance_running share the > : same cache line, we see a very high cost on hosts with 480 threads > : dealing with many interrupts. I'd say this patch could be independent from the "series". > While the rebalance_domains() issue has been resolved [1], we should > proactively annotate sched_clock_irqtime with __read_mostly to prevent > potential cacheline false sharing. This optimization is particularly > justified since sched_clock_irqtime is only modified during TSC instability > events. > > Link: https://lore.kernel.org/all/20250423174634.3009657-1-edumazet@google.com/ [0] > Link: https://lore.kernel.org/all/20250416035823.1846307-1-tim.c.chen@linux.intel.com/ [1] > > Reported-by: Eric Dumazet <edumazet@google.com> > Debugged-by: Eric Dumazet <edumazet@google.com> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > Cc: Eric Dumazet <edumazet@google.com> I can say Reviewed-by: Michal Koutný <mkoutny@suse.com> but it'd be good to have also Tested-by: wrt the cache traffic reduction. 0.02€, Michal
On Tue, May 27, 2025 at 11:33 PM Michal Koutný <mkoutny@suse.com> wrote: > > On Sun, May 11, 2025 at 11:08:00AM +0800, Yafang Shao <laoar.shao@gmail.com> wrote: > > Eric reported an issue [0] as follows, > > : rebalance_domains() can attempt to change sched_balance_running > > : more than 350,000 times per second on our servers. > > > > : If sched_clock_irqtime and sched_balance_running share the > > : same cache line, we see a very high cost on hosts with 480 threads > > : dealing with many interrupts. > > I'd say this patch could be independent from the "series". I will send it separately. > > > While the rebalance_domains() issue has been resolved [1], we should > > proactively annotate sched_clock_irqtime with __read_mostly to prevent > > potential cacheline false sharing. This optimization is particularly > > justified since sched_clock_irqtime is only modified during TSC instability > > events. > > > > Link: https://lore.kernel.org/all/20250423174634.3009657-1-edumazet@google.com/ [0] > > Link: https://lore.kernel.org/all/20250416035823.1846307-1-tim.c.chen@linux.intel.com/ [1] > > > > Reported-by: Eric Dumazet <edumazet@google.com> > > Debugged-by: Eric Dumazet <edumazet@google.com> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > Cc: Eric Dumazet <edumazet@google.com> > > I can say > Reviewed-by: Michal Koutný <mkoutny@suse.com> Thanks for the review. > > but it'd be good to have also Tested-by: wrt the cache traffic > reduction. -- Regards Yafang
© 2016 - 2025 Red Hat, Inc.