From nobody Sat Feb 7 08:07:40 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FF90326D4D for ; Tue, 27 Jan 2026 07:28:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769498889; cv=none; b=qCZsSb1Nj7W+DJ8kZWOOK0QBRrNciNraapImBkKPt022g9qZo9gymjloDyvc6b5AvSJGOalljJWXA99pa5bCKrdWtxNKTIGa/JARQK6XYXCoFGapTTK3q18R2cYnTsK9w9gdoGMLpZQWb5J6rXeDqBTUq/SJEm319Xff7v/rlyk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769498889; c=relaxed/simple; bh=GiXBbKs/H+4hCeIwg6B9eLrvBhhG1pcelJFzJh5LiaQ=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=lh4j7HkGDho8mcolOdavFZe2Kv3l0gKhFNFBXMLUErwqXpR4apDUY4ubKdWjYB1L5xky5T9vyTzjSOJKNayO3BySGr2GVoh5opk6O+tmwhLHMSIbyts4vyojRgfzSBy4l5GRjjrzrrpQnjvUsdfgRdKq9Na3+3h3gFDYVm9d7bI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=CmJGvkUI; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="CmJGvkUI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1769498888; x=1801034888; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=GiXBbKs/H+4hCeIwg6B9eLrvBhhG1pcelJFzJh5LiaQ=; b=CmJGvkUIh+G+LYwVvrG68ww44kIFq4/tWVfeDn+FBhqtFPllT+7/ZfM5 gmk1zHTPt8U3UPWErbJvs5AclKGEAauJ+S7bsLIRggPUalP2+7mY8MKgz 1Ah42007NdECWIptl0jRATsZXCVr2QWtwZwbcZ7j+KP4BRIBhGWcInpzT uFNGKirkh5ReGwz84v/X8nApop3yC0v830vbFaXu21fASIkzwqPBeQp7T 9K1NaGZJvup75NcsfHcKZJlysaZ3nXsnyDX8JORHrG2dmRjY2w9K9QZqw GZkggwfQgg3GNC+lemwcWUi7ZOECESrmSKBY0ZMaO0cr8dZMKRqvB8La2 g==; X-CSE-ConnectionGUID: oJlXh74GRYKN16b0xGKy2w== X-CSE-MsgGUID: tlxNJL5pQtOpxLQYloy7EA== X-IronPort-AV: E=McAfee;i="6800,10657,11683"; a="88101196" X-IronPort-AV: E=Sophos;i="6.21,256,1763452800"; d="scan'208";a="88101196" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jan 2026 23:28:07 -0800 X-CSE-ConnectionGUID: YqIZwJKcRCGK+X8iwTPfow== X-CSE-MsgGUID: 9nXx4tP5RCSU/WhfMmHXjw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,256,1763452800"; d="scan'208";a="212763462" Received: from linux-pnp-server-11.sh.intel.com ([10.239.176.178]) by fmviesa004.fm.intel.com with ESMTP; 26 Jan 2026 23:28:03 -0800 From: Wangyang Guo To: K Prateek Nayak , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Wangyang Guo , Shrikanth Hegde , Benjamin Lei , Tim Chen , Tianyou Li Subject: [PATCH v7] sched/clock: Avoid false sharing for sched_clock_irqtime Date: Tue, 27 Jan 2026 15:25:09 +0800 Message-ID: <20260127072509.2627346-1-wangyang.guo@intel.com> X-Mailer: git-send-email 2.47.3 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Read-mostly sched_clock_irqtime may share the same cacheline with frequently updated nohz struct. Make it as static_key to avoid false sharing issue. The only user of disable_sched_clock_irqtime() is tsc_.*mark_unstable() which may be invoked under atomic context and require a workqueue to disable static_key. But both of them calls clear_sched_clock_stable() just before doing disable_sched_clock_irqtime(). We can reuse "sched_clock_work" to also disable sched_clock_irqtime(). One additional case need to handle is if the tsc is marked unstable before late_initcall() phase, sched_clock_work will not be invoked and sched_clock_irqtime will stay enabled although clock is unstable: tsc_init() enable_sched_clock_irqtime() # irqtime accounting is enabled here ... if (unsynchronized_tsc()) # true mark_tsc_unstable() clear_sched_clock_stable() __sched_clock_stable_early =3D 0; ... if (static_key_count(&sched_clock_running.key) =3D=3D 2) # Only happens at sched_clock_init_late() __clear_sched_clock_stable(); # Never executed ... # late_initcall() phase sched_clock_init_late() if (__sched_clock_stable_early) # Already false __set_sched_clock_stable(); # sched_clock is never marked stable # TSC unstable, but sched_clock_work won't run to disable irqtime So we need to disable_sched_clock_irqtime() in sched_clock_init_late() if clock is unstable. Reviewed-by: K Prateek Nayak Tested-by: K Prateek Nayak Suggested-by: K Prateek Nayak Suggested-by: Peter Zijlstra Suggested-by: Shrikanth Hegde Reported-by: Benjamin Lei Reviewed-by: Tim Chen Reviewed-by: Tianyou Li Signed-off-by: Wangyang Guo Acked-by: Vincent Guittot Reviewed-by: Shrikanth Hegde --- v7 -> v6: - move irqtime_enabled() check to disable_sched_clock_irqtime() v6 -> v5: - Only disable_sched_clock_irqtime() if irqtime_enabled() in sched_lock_init_late() to avoid unnessary overhead. V5 -> v4: - Changelog update to reflect static_key changes V4 -> V3: - Avoid creating a new workqueue to disable static_key - Specify kernel version for c2c result in changelog V2 -> V3: - Use static_key instead of a __read_mostly var. V1 -> V2: - Use __read_mostly instead of __cacheline_aligned to avoid wasting spaces. History: v6: https://lore.kernel.org/all/20260127044159.2254247-1-wangyang.guo@int= el.com/ v5: https://lore.kernel.org/all/20260127031602.1907377-1-wangyang.guo@int= el.com/ v4: https://lore.kernel.org/all/20260126021401.1490163-1-wangyang.guo@int= el.com/ v3: https://lore.kernel.org/all/20260116023945.1849329-1-wangyang.guo@int= el.com/ v2: https://lore.kernel.org/all/20260113074807.3404180-1-wangyang.guo@int= el.com/ v1: https://lore.kernel.org/all/20260113022958.3379650-1-wangyang.guo@int= el.com/ prev discussions: https://lore.kernel.org/all/20251211055612.4071266-1-wa= ngyang.guo@intel.com/T/#u --- arch/x86/kernel/tsc.c | 2 -- kernel/sched/clock.c | 3 +++ kernel/sched/cputime.c | 9 +++++---- kernel/sched/sched.h | 4 ++-- 4 files changed, 10 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 87e749106dda..9a62e18d1bff 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1142,7 +1142,6 @@ static void tsc_cs_mark_unstable(struct clocksource *= cs) tsc_unstable =3D 1; if (using_native_sched_clock()) clear_sched_clock_stable(); - disable_sched_clock_irqtime(); pr_info("Marking TSC unstable due to clocksource watchdog\n"); } =20 @@ -1212,7 +1211,6 @@ void mark_tsc_unstable(char *reason) tsc_unstable =3D 1; if (using_native_sched_clock()) clear_sched_clock_stable(); - disable_sched_clock_irqtime(); pr_info("Marking TSC unstable due to %s\n", reason); =20 clocksource_mark_unstable(&clocksource_tsc_early); diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c index f5e6dd6a6b3a..2ae4fbf13431 100644 --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -173,6 +173,7 @@ notrace static void __sched_clock_work(struct work_stru= ct *work) scd->tick_gtod, __gtod_offset, scd->tick_raw, __sched_clock_offset); =20 + disable_sched_clock_irqtime(); static_branch_disable(&__sched_clock_stable); } =20 @@ -238,6 +239,8 @@ static int __init sched_clock_init_late(void) =20 if (__sched_clock_stable_early) __set_sched_clock_stable(); + else + disable_sched_clock_irqtime(); /* disable if clock unstable. */ =20 return 0; } diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 7097de2c8cda..556a70f344d0 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -12,6 +12,8 @@ =20 #ifdef CONFIG_IRQ_TIME_ACCOUNTING =20 +DEFINE_STATIC_KEY_FALSE(sched_clock_irqtime); + /* * There are no locks covering percpu hardirq/softirq time. * They are only modified in vtime_account, on corresponding CPU @@ -25,16 +27,15 @@ */ DEFINE_PER_CPU(struct irqtime, cpu_irqtime); =20 -int sched_clock_irqtime; - void enable_sched_clock_irqtime(void) { - sched_clock_irqtime =3D 1; + static_branch_enable(&sched_clock_irqtime); } =20 void disable_sched_clock_irqtime(void) { - sched_clock_irqtime =3D 0; + if (irqtime_enabled()) + static_branch_disable(&sched_clock_irqtime); } =20 static void irqtime_account_delta(struct irqtime *irqtime, u64 delta, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index adfb6e3409d7..ec963314287a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3172,11 +3172,11 @@ struct irqtime { }; =20 DECLARE_PER_CPU(struct irqtime, cpu_irqtime); -extern int sched_clock_irqtime; +DECLARE_STATIC_KEY_FALSE(sched_clock_irqtime); =20 static inline int irqtime_enabled(void) { - return sched_clock_irqtime; + return static_branch_likely(&sched_clock_irqtime); } =20 /* --=20 2.47.3