From nobody Sat Feb 7 17:19:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FAFD145B3F for ; Mon, 26 Jan 2026 02:17:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769393823; cv=none; b=iw1KQyTlmEa8ZSn21QOme5KUOJmbKLh1fXbgSGwlJBdv3VmSmhIqVgJMuYZ8hFFdAWFF6yCgZy5hzgts96o+4sUdMgFhtIhnD35u1yL6rExwv4bdMGjd0WLrx8Zd3e7x+Urj7tM7IMkD/0o8WVQwUVQuJzzBYqgVlDdVK+uQyBw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769393823; c=relaxed/simple; bh=O2eBddYaPzFjnDR+AR9FjPiVGEskN2eqRk4jbz5t+oo=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=OugxispVvDiR7cynbk+GCySf8mkNKafy7kS8TClZzgHoOx7DvxOZe6iYdYq1a16qpFiI0dxn/31dagtCHp7j99Cdny4YPbcJ/CaU/r7c9OYUlPTvk2N9UeXgRWJHqnuZgJcJpg258qj9Uo4v+SRrKUIjK5J3vfn/lf15fNUDmaI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Ol6rroPr; arc=none smtp.client-ip=198.175.65.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Ol6rroPr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1769393822; x=1800929822; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=O2eBddYaPzFjnDR+AR9FjPiVGEskN2eqRk4jbz5t+oo=; b=Ol6rroPrn96ezr8FqAoXxd4yFAfNIHJIR8RhaMnS2ZyMD9qoEf8DW6q+ EEmzFACZtNi/CSD4Py5jj0AQNpuGIq/vyRardkOVjHgkmSpIxxODtHYMo UrfL1f8vRMIswHdkxsbac3uawmZEtUQveD4Awxmzx6eXYxstNY5Ch1bY4 uWY3Q1KeyF52PbXEg/rLHvBYEswJc5hm6IV3FBNh0VzUn+qpIINlXGkKf jIceTWX8V1OnepSh3HGLi3a2RNr4IGP7rcXDKxaftEXnpYpMYpZTPtR+F RrdmmjW4VWC/7rU5t7FyvkcWbXf3N7HtnJzy5qykg2BbQzoWuvaXf+nl6 A==; X-CSE-ConnectionGUID: hFZzGPBzSpC5LW39Yw51yA== X-CSE-MsgGUID: jCF5YCgbQtOxEhLdsd+eag== X-IronPort-AV: E=McAfee;i="6800,10657,11682"; a="82006874" X-IronPort-AV: E=Sophos;i="6.21,254,1763452800"; d="scan'208";a="82006874" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2026 18:17:01 -0800 X-CSE-ConnectionGUID: glF2xw2cQf+kRFv4DTru+A== X-CSE-MsgGUID: sgQ1l8TOTxeJ6CGJaRlxvA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,254,1763452800"; d="scan'208";a="206682820" Received: from linux-pnp-server-11.sh.intel.com ([10.239.176.178]) by orviesa006.jf.intel.com with ESMTP; 25 Jan 2026 18:16:56 -0800 From: Wangyang Guo To: K Prateek Nayak , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Wangyang Guo , Shrikanth Hegde , Benjamin Lei , Tim Chen , Tianyou Li Subject: [PATCH v4] sched/clock: Avoid false sharing for sched_clock_irqtime Date: Mon, 26 Jan 2026 10:14:01 +0800 Message-ID: <20260126021401.1490163-1-wangyang.guo@intel.com> X-Mailer: git-send-email 2.47.3 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Read-mostly sched_clock_irqtime may share the same cacheline with frequently updated nohz struct. Make it as static_key to avoid false sharing issue. Details: In kernel 6.14, we observed ~3% cycles hotspots in irqtime_account_irq when running SPECjbb2015 in a 2-sockets system. Most of cycles spent in reading sched_clock_irqtime, which is a read-mostly var. perf c2c (cachelien view) shows it has false sharing with nohz struct: Num RmtHitm LclHitm Offset records Symbol 6.25% 0.00% 0.00% 0x0 4 [k] _nohz_idle_balance.isra.0 18.75% 100.00% 0.00% 0x8 14 [k] nohz_balance_exit_idle 6.25% 0.00% 0.00% 0x8 8 [k] nohz_balance_enter_idle 6.25% 0.00% 0.00% 0xc 8 [k] sched_balance_newidle 6.25% 0.00% 0.00% 0x10 31 [k] nohz_balancer_kick 6.25% 0.00% 0.00% 0x20 16 [k] sched_balance_newidle 37.50% 0.00% 0.00% 0x38 50 [k] irqtime_account_irq 6.25% 0.00% 0.00% 0x38 47 [k] account_process_tick 6.25% 0.00% 0.00% 0x38 12 [k] account_idle_ticks Offsets: * 0x0 -- nohz.idle_cpu_mask (r) * 0x8 -- nohz.nr_cpus (w) * 0x38 -- sched_clock_irqtime (r), not in nohz, but share cacheline The layout in /proc/kallsyms can also confirm that: ffffffff88600d40 b nohz ffffffff88600d68 B arch_needs_tick_broadcast ffffffff88600d6c b __key.264 ffffffff88600d6c b __key.265 ffffffff88600d70 b dl_generation ffffffff88600d78 b sched_clock_irqtime With the patch applied, irqtime_account_irq hotspot disappear. Reported-by: Benjamin Lei Reviewed-by: Tianyou Li Reviewed-by: Tim Chen Suggested-by: K Prateek Nayak Suggested-by: Peter Zijlstra Suggested-by: Shrikanth Hegde --- V4 -> V3: - Avoid creating a new workqueue to disable static_key - Specify kernel version for c2c result in changelog V2 -> V3: - Use static_key instead of a __read_mostly var. V1 -> V2: - Use __read_mostly instead of __cacheline_aligned to avoid wasting spaces. History: v3: https://lore.kernel.org/all/20260116023945.1849329-1-wangyang.guo@int= el.com/ v2: https://lore.kernel.org/all/20260113074807.3404180-1-wangyang.guo@int= el.com/ v1: https://lore.kernel.org/all/20260113022958.3379650-1-wangyang.guo@int= el.com/ prev discussions: https://lore.kernel.org/all/20251211055612.4071266-1-wa= ngyang.guo@intel.com/T/#u Suggested-by: K Prateek Nayak Suggested-by: Peter Zijlstra Suggested-by: Shrikanth Hegde Reported-by: Benjamin Lei Reviewed-by: Tim Chen Reviewed-by: Tianyou Li Signed-off-by: Wangyang Guo --- arch/x86/kernel/tsc.c | 2 -- kernel/sched/clock.c | 3 +++ kernel/sched/cputime.c | 8 ++++---- kernel/sched/sched.h | 4 ++-- 4 files changed, 9 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 87e749106dda..9a62e18d1bff 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1142,7 +1142,6 @@ static void tsc_cs_mark_unstable(struct clocksource *= cs) tsc_unstable =3D 1; if (using_native_sched_clock()) clear_sched_clock_stable(); - disable_sched_clock_irqtime(); pr_info("Marking TSC unstable due to clocksource watchdog\n"); } =20 @@ -1212,7 +1211,6 @@ void mark_tsc_unstable(char *reason) tsc_unstable =3D 1; if (using_native_sched_clock()) clear_sched_clock_stable(); - disable_sched_clock_irqtime(); pr_info("Marking TSC unstable due to %s\n", reason); =20 clocksource_mark_unstable(&clocksource_tsc_early); diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c index f5e6dd6a6b3a..2ae4fbf13431 100644 --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -173,6 +173,7 @@ notrace static void __sched_clock_work(struct work_stru= ct *work) scd->tick_gtod, __gtod_offset, scd->tick_raw, __sched_clock_offset); =20 + disable_sched_clock_irqtime(); static_branch_disable(&__sched_clock_stable); } =20 @@ -238,6 +239,8 @@ static int __init sched_clock_init_late(void) =20 if (__sched_clock_stable_early) __set_sched_clock_stable(); + else + disable_sched_clock_irqtime(); /* disable if clock unstable. */ =20 return 0; } diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 7097de2c8cda..959a86206c64 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -12,6 +12,8 @@ =20 #ifdef CONFIG_IRQ_TIME_ACCOUNTING =20 +DEFINE_STATIC_KEY_FALSE(sched_clock_irqtime); + /* * There are no locks covering percpu hardirq/softirq time. * They are only modified in vtime_account, on corresponding CPU @@ -25,16 +27,14 @@ */ DEFINE_PER_CPU(struct irqtime, cpu_irqtime); =20 -int sched_clock_irqtime; - void enable_sched_clock_irqtime(void) { - sched_clock_irqtime =3D 1; + static_branch_enable(&sched_clock_irqtime); } =20 void disable_sched_clock_irqtime(void) { - sched_clock_irqtime =3D 0; + static_branch_disable(&sched_clock_irqtime); } =20 static void irqtime_account_delta(struct irqtime *irqtime, u64 delta, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index adfb6e3409d7..ec963314287a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3172,11 +3172,11 @@ struct irqtime { }; =20 DECLARE_PER_CPU(struct irqtime, cpu_irqtime); -extern int sched_clock_irqtime; +DECLARE_STATIC_KEY_FALSE(sched_clock_irqtime); =20 static inline int irqtime_enabled(void) { - return sched_clock_irqtime; + return static_branch_likely(&sched_clock_irqtime); } =20 /* --=20 2.47.3