From nobody Fri Mar 14 13:29:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86C4DC54E94 for ; Wed, 25 Jan 2023 00:28:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234909AbjAYA2t (ORCPT ); Tue, 24 Jan 2023 19:28:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234844AbjAYA2o (ORCPT ); Tue, 24 Jan 2023 19:28:44 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F91B4FCF9 for ; Tue, 24 Jan 2023 16:28:12 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 13015B81733 for ; Wed, 25 Jan 2023 00:27:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B9194C433D2; Wed, 25 Jan 2023 00:27:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1674606452; bh=tNnAiq5HYDDhjHdWky6iQVYABLIWoBenLL5L50zs8N4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VwL9RjIynMxgRjV7vJJ4DVSGgr9HP66z59GPqdNyaTTCH/mGO8pgQlEMliBHMBbWf K/v4erTgqLapYCyAQYvU1gcIWjNRJve/+Y7y8JvTWuFBSd/6IOMdyWMKzezOiFC0X9 kUtDuapIjbCkaMFgU4bc+gtcbEcyAAlngeMI2I96uzJoODzEWFNV2fZyTogbV9QVOJ Kpx6jGAdNRcmKNXn6c8o5uHzinrXPxVvj1e1iypF2YB9RCwMIal2NbiWDZvEP8oHMA 6bGLfg56FPc/taghp4N0ColrG4u57Jo2dZuY3togkI8893JEGEZT3eaiZKNiSkKeL/ 2meIwr+u4tkUA== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 6FE445C1052; Tue, 24 Jan 2023 16:27:32 -0800 (PST) From: "Paul E. McKenney" To: tglx@linutronix.de Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org, sboyd@kernel.org, corbet@lwn.net, Mark.Rutland@arm.com, maz@kernel.org, kernel-team@meta.com, neeraju@codeaurora.org, ak@linux.intel.com, feng.tang@intel.com, zhengjun.xing@intel.com, Yunying Sun , "Paul E . McKenney" Subject: [PATCH v2 clocksource 1/7] clocksource: Print clocksource name when clocksource is tested unstable Date: Tue, 24 Jan 2023 16:27:24 -0800 Message-Id: <20230125002730.1471349-1-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> References: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Yunying Sun Some "TSC fall back to HPET" messages appear on systems having more than 2 NUMA nodes: clocksource: timekeeping watchdog on CPU168: hpet read-back delay of 429620= 0ns, attempt 4, marking unstable The "hpet" here is misleading the clocksource watchdog is really doing repeated reads of "hpet" in order to check for unrelated delays. Therefore, print the name of the clocksource under test, prefixed by "wd-" and suffixed by "-wd", for example, "wd-tsc-wd". Signed-off-by: Yunying Sun Signed-off-by: Paul E. McKenney --- kernel/time/clocksource.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 9cf32ccda715d..4a2c3bb92e2e9 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -257,8 +257,8 @@ static enum wd_read_status cs_watchdog_read(struct cloc= ksource *cs, u64 *csnow, goto skip_test; } =20 - pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, att= empt %d, marking unstable\n", - smp_processor_id(), watchdog->name, wd_delay, nretries); + pr_warn("timekeeping watchdog on CPU%d: wd-%s-wd read-back delay of %lldn= s, attempt %d, marking unstable\n", + smp_processor_id(), cs->name, wd_delay, nretries); return WD_READ_UNSTABLE; =20 skip_test: --=20 2.31.1.189.g2e36527f23 From nobody Fri Mar 14 13:29:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 762F5C54E94 for ; Wed, 25 Jan 2023 00:28:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234917AbjAYA2w (ORCPT ); Tue, 24 Jan 2023 19:28:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234859AbjAYA2o (ORCPT ); Tue, 24 Jan 2023 19:28:44 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65CA74CE73 for ; Tue, 24 Jan 2023 16:28:12 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 181BEB81732 for ; Wed, 25 Jan 2023 00:27:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BC5A8C433EF; Wed, 25 Jan 2023 00:27:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1674606452; bh=yfdddVAsVm0wKfhUXEcuvIc8bj3pYmqk8I+8WWgBY/w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ZQ24BFTraLEIXc/xXMXIE8QW+TWApivhTNcmlC4LDDq1JzVlzmVikaF026ycEe6LX d37GJib2NSbzWdG5zBt8wB2VWY3DAxAmW7TQfxtg7qwGAqO8Tcrw1tmo7Fxod97H0+ s1OrvjGp0Wq8UYs5M1ocdYcn+sM+THIQAdSLF3DnTd2en8nXDRyYRacoYxfk3KUxCX YYFmPZiWdXv+xdrum3R6cUFYBGi6MKyRSuS3cDzvCziZsfvmC0PlUgK7WqCZ1fDJuM DTLyrAdhwPU+esSyxPtTu7WrAvb1njyGn2r9/OH9Fa0zLX0lwg5Nx/R9Js7Nn1J2O4 C/ZDf2Ny/DPMQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 723795C155D; Tue, 24 Jan 2023 16:27:32 -0800 (PST) From: "Paul E. McKenney" To: tglx@linutronix.de Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org, sboyd@kernel.org, corbet@lwn.net, Mark.Rutland@arm.com, maz@kernel.org, kernel-team@meta.com, neeraju@codeaurora.org, ak@linux.intel.com, feng.tang@intel.com, zhengjun.xing@intel.com, "Paul E. McKenney" Subject: [PATCH v2 clocksource 2/7] clocksource: Loosen clocksource watchdog constraints Date: Tue, 24 Jan 2023 16:27:25 -0800 Message-Id: <20230125002730.1471349-2-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> References: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, MAX_SKEW_USEC is set to 100 microseconds, which has worked reasonably well. However, NTP is willing to tolerate 500 microseconds of skew per second, and a clocksource that is good enough for NTP should be good enough for the clocksource watchdog. The watchdog's skew is controlled by MAX_SKEW_USEC and the CLOCKSOURCE_WATCHDOG_MAX_SKEW_US Kconfig option. However, these values are doubled before being associated with a clocksource's ->uncertainty_margin, and the ->uncertainty_margin values of the pair of clocksource's being compared are summed before checking against the skew. Therefore, set both MAX_SKEW_USEC and the default for the CLOCKSOURCE_WATCHDOG_MAX_SKEW_US Kconfig option to 125 microseconds of skew per second, resulting in 500 microseconds of skew per second in the clocksource watchdog's skew comparison. Suggested-by Rik van Riel Signed-off-by: Paul E. McKenney Suggested-by Rik van Riel --- kernel/time/Kconfig | 6 +++++- kernel/time/clocksource.c | 15 +++++++++------ 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index a41753be1a2bf..bae8f11070bef 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -200,10 +200,14 @@ config CLOCKSOURCE_WATCHDOG_MAX_SKEW_US int "Clocksource watchdog maximum allowable skew (in =CE=BCs)" depends on CLOCKSOURCE_WATCHDOG range 50 1000 - default 100 + default 125 help Specify the maximum amount of allowable watchdog skew in microseconds before reporting the clocksource to be unstable. + The default is based on a half-second clocksource watchdog + interval and NTP's maximum frequency drift of 500 parts + per million. If the clocksource is good enough for NTP, + it is good enough for the clocksource watchdog! =20 endmenu endif diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 4a2c3bb92e2e9..a3d19f6660ac7 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -95,6 +95,11 @@ static char override_name[CS_NAME_LEN]; static int finished_booting; static u64 suspend_start; =20 +/* + * Interval: 0.5sec. + */ +#define WATCHDOG_INTERVAL (HZ >> 1) + /* * Threshold: 0.0312s, when doubled: 0.0625s. * Also a default for cs->uncertainty_margin when registering clocks. @@ -106,11 +111,14 @@ static u64 suspend_start; * clocksource surrounding a read of the clocksource being validated. * This delay could be due to SMIs, NMIs, or to VCPU preemptions. Used as * a lower bound for cs->uncertainty_margin values when registering clocks. + * + * The default of 500 parts per million is based on NTP's limits. + * If a clocksource is good enough for NTP, it is good enough for us! */ #ifdef CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US #define MAX_SKEW_USEC CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US #else -#define MAX_SKEW_USEC 100 +#define MAX_SKEW_USEC (125 * WATCHDOG_INTERVAL / HZ) #endif =20 #define WATCHDOG_MAX_SKEW (MAX_SKEW_USEC * NSEC_PER_USEC) @@ -140,11 +148,6 @@ static inline void clocksource_watchdog_unlock(unsigne= d long *flags) static int clocksource_watchdog_kthread(void *data); static void __clocksource_change_rating(struct clocksource *cs, int rating= ); =20 -/* - * Interval: 0.5sec. - */ -#define WATCHDOG_INTERVAL (HZ >> 1) - static void clocksource_watchdog_work(struct work_struct *work) { /* --=20 2.31.1.189.g2e36527f23 From nobody Fri Mar 14 13:29:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83AB0C54E94 for ; Wed, 25 Jan 2023 00:28:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234774AbjAYA2R (ORCPT ); Tue, 24 Jan 2023 19:28:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234776AbjAYA2P (ORCPT ); Tue, 24 Jan 2023 19:28:15 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B10B34FCE4 for ; Tue, 24 Jan 2023 16:27:39 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 1F85AB817B0 for ; Wed, 25 Jan 2023 00:27:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C5AE4C433A1; Wed, 25 Jan 2023 00:27:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1674606452; bh=ZxflhOZe+HfSeKoe2YE5lW7UCVsim4bFg/Q32iI5pZk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CFCBsChk11U1OdzY0Ae1UZuelg2vUshpjAcvDpHpLrGrsl3hYCHZgf36c3Vz81xpq t5McnArBb5PK1zKcnF7XMajXFvs2O6c+NQ8xLdHqJs6MyGqt0i4hT6arXaMy1J4mhx tArxfPbW/plciZF0sFLosPE4BYqRcj4P0c8EyaJkbpaEiW90NJl758LJorI7zMALQd 5Y2e4d5m2QlXc8Kk6iCsbHgAvPoAqyLXkUkHyKgy43MXHo+Fvkfx6kaVnn/nk7iDRp Ne1dqc6suNhYtf177qfGNy46QpyXzQpaqmLajeFUocHT8BibTwe2dKVROtA6Hsh1bJ oEHHg1uzaU14A== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 748B75C1C66; Tue, 24 Jan 2023 16:27:32 -0800 (PST) From: "Paul E. McKenney" To: tglx@linutronix.de Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org, sboyd@kernel.org, corbet@lwn.net, Mark.Rutland@arm.com, maz@kernel.org, kernel-team@meta.com, neeraju@codeaurora.org, ak@linux.intel.com, feng.tang@intel.com, zhengjun.xing@intel.com, "Paul E. McKenney" , John Stultz Subject: [PATCH v2 clocksource 3/7] clocksource: Improve read-back-delay message Date: Tue, 24 Jan 2023 16:27:26 -0800 Message-Id: <20230125002730.1471349-3-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> References: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When cs_watchdog_read() is unable to get a qualifying clocksource read within the limit set by max_cswd_read_retries, it prints a message and marks the clocksource under test as unstable. But that message is unclear to anyone unfamiliar with the code: clocksource: timekeeping watchdog on CPU13: wd-tsc-wd read-back delay 10006= 14ns, attempt 3, marking unstable Therefore, add some context so that the message appears as follows: clocksource: timekeeping watchdog on CPU13: wd-tsc-wd excessive read-back d= elay of 1000614ns vs. limit of 125000ns, wd-wd read-back delay only 27ns, a= ttempt 3, marking tsc unstable Signed-off-by: Paul E. McKenney Cc: John Stultz Cc: Thomas Gleixner Cc: Stephen Boyd Cc: Feng Tang --- kernel/time/clocksource.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index a3d19f6660ac7..b59914953809f 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -260,8 +260,8 @@ static enum wd_read_status cs_watchdog_read(struct cloc= ksource *cs, u64 *csnow, goto skip_test; } =20 - pr_warn("timekeeping watchdog on CPU%d: wd-%s-wd read-back delay of %lldn= s, attempt %d, marking unstable\n", - smp_processor_id(), cs->name, wd_delay, nretries); + pr_warn("timekeeping watchdog on CPU%d: wd-%s-wd excessive read-back dela= y of %lldns vs. limit of %ldns, wd-wd read-back delay only %lldns, attempt = %d, marking %s unstable\n", + smp_processor_id(), cs->name, wd_delay, WATCHDOG_MAX_SKEW, wd_seq_delay,= nretries, cs->name); return WD_READ_UNSTABLE; =20 skip_test: --=20 2.31.1.189.g2e36527f23 From nobody Fri Mar 14 13:29:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97DE7C54E94 for ; Wed, 25 Jan 2023 00:28:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234823AbjAYA2m (ORCPT ); Tue, 24 Jan 2023 19:28:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41072 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234776AbjAYA2k (ORCPT ); Tue, 24 Jan 2023 19:28:40 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 34FFCF75D for ; Tue, 24 Jan 2023 16:28:10 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 65E0D61411 for ; Wed, 25 Jan 2023 00:27:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BE5C1C4339E; Wed, 25 Jan 2023 00:27:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1674606452; bh=xEevU0ma7O1DL7C5uiFkN+haaj0y+rnQnb6YU8qQKmc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=uD7pPh8DvrqCt71QimBiukpmO6s3hCllmHEYIKbbA2ZbXBrOND1ENGWhgDXKsVe/d /0hFuP8UA4x4Xivu47dVTz5SNhTIgdx5oEYe8p6gkAXBx4xxEp9e4BGuzYM9BaRM7U 1N7YU0sHgyiC4W+ralpiZoAdFKibDQChSdL7D9Axg3rGirTNc7dTSlmGn5WePEDFX6 RHJL7SNPCfXJWkPVmi97VftpVM/PNgyigN6ggpgcNDAnmNvqBToK4+xWvTzZTyuX51 21YnxwdFj3wC2jbZiMao9O4/iXRnhOmf0EVqfF0CB2xMMWJ/JDSWn5gTPIJJ9xo5fk czf3j5pZNNprQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 767EA5C1C79; Tue, 24 Jan 2023 16:27:32 -0800 (PST) From: "Paul E. McKenney" To: tglx@linutronix.de Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org, sboyd@kernel.org, corbet@lwn.net, Mark.Rutland@arm.com, maz@kernel.org, kernel-team@meta.com, neeraju@codeaurora.org, ak@linux.intel.com, feng.tang@intel.com, zhengjun.xing@intel.com, "Paul E. McKenney" , John Stultz Subject: [PATCH v2 clocksource 4/7] clocksource: Improve "skew is too large" messages Date: Tue, 24 Jan 2023 16:27:27 -0800 Message-Id: <20230125002730.1471349-4-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> References: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When clocksource_watchdog() detects excessive clocksource skew compared to the watchdog clocksource, it marks the clocksource under test as unstable and prints several lines worth of message. But that message is unclear to anyone unfamiliar with the code: clocksource: timekeeping watchdog on CPU2: Marking clocksource 'wdtest-ktim= e' as unstable because the skew is too large: clocksource: 'kvm-clock' wd_nsec: 400744390 wd_now: 6= 12625c2c wd_last: 5fa7f7c66 mask: ffffffffffffffff clocksource: 'wdtest-ktime' cs_nsec: 600744034 cs_now= : 173081397a292d4f cs_last: 17308139565a8ced mask: ffffffffffffffff clocksource: 'kvm-clock' (not 'wdtest-ktime') is curr= ent clocksource. Therefore, add the following line near the end of that message: Clocksource 'wdtest-ktime' skewed 199999644 ns (199 ms) over watchdog 'kvm-= clock' interval of 400744390 ns (400 ms) This new line clearly indicates the amount of skew between the two clocksources, along with the duration of the time interval over which the skew occurred, both in nanoseconds and milliseconds. Signed-off-by: Paul E. McKenney Cc: John Stultz Cc: Thomas Gleixner Cc: Stephen Boyd Cc: Feng Tang --- kernel/time/clocksource.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index b59914953809f..fc486cd972635 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -446,12 +446,20 @@ static void clocksource_watchdog(struct timer_list *u= nused) /* Check the deviation from the watchdog clocksource. */ md =3D cs->uncertainty_margin + watchdog->uncertainty_margin; if (abs(cs_nsec - wd_nsec) > md) { + u64 cs_wd_msec; + u64 wd_msec; + u32 wd_rem; + pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as uns= table because the skew is too large:\n", smp_processor_id(), cs->name); pr_warn(" '%s' wd_nsec: %lld wd_now: %llx wd_last:= %llx mask: %llx\n", watchdog->name, wd_nsec, wdnow, wdlast, watchdog->mask); pr_warn(" '%s' cs_nsec: %lld cs_now: %llx cs_last:= %llx mask: %llx\n", cs->name, cs_nsec, csnow, cslast, cs->mask); + cs_wd_msec =3D div_u64_rem(cs_nsec - wd_nsec, 1000U * 1000U, &wd_rem); + wd_msec =3D div_u64_rem(wd_nsec, 1000U * 1000U, &wd_rem); + pr_warn(" Clocksource '%s' skewed %lld ns (%lld ms= ) over watchdog '%s' interval of %lld ns (%lld ms)\n", + cs->name, cs_nsec - wd_nsec, cs_wd_msec, watchdog->name, wd_nsec, wd_m= sec); if (curr_clocksource =3D=3D cs) pr_warn(" '%s' is current clocksource.\n", cs->na= me); else if (curr_clocksource) --=20 2.31.1.189.g2e36527f23 From nobody Fri Mar 14 13:29:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D0A2C54EED for ; Wed, 25 Jan 2023 00:28:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234862AbjAYA2o (ORCPT ); Tue, 24 Jan 2023 19:28:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41094 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234840AbjAYA2l (ORCPT ); Tue, 24 Jan 2023 19:28:41 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7011F4DE27 for ; Tue, 24 Jan 2023 16:28:10 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7015A613DC for ; Wed, 25 Jan 2023 00:27:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C81BCC433A4; Wed, 25 Jan 2023 00:27:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1674606452; bh=qdP6NXSXpyI8pYe6XWt4MIApSDs2MCybUN24u0PUPWM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ZK08XQliqevn5CBB4TaWIMENAvtA3RRTjasnvfvdSSTJ/bYHYy+Q4+Yfwn062zEJR n8uJ5wyXlYDFdafjugzt8vswtiZ+lJoP/Hwvhh8kkbdEKdhr6SuC6ozXxVcfE0TTHk YjUDSLE9auItg+lnJ++fkCGGk2UWe/dZtWAsb5Ya2dCtJ5vLxy/LG2T8+NCHOrvjBG bUsN6UNzjxzXt6dy9Y/IgbgmOXMtlatXybXM7jnqNa0SonWXMMhfCqIltDuMU+rJc5 LJN4UX09lBE+9If4iqZX97KPfm+qSjjQNSaZgWC7gTIDd3L4Nn+9z17CqjP5EftVGK RyhcWDEyLizjg== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 793CD5C1CEF; Tue, 24 Jan 2023 16:27:32 -0800 (PST) From: "Paul E. McKenney" To: tglx@linutronix.de Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org, sboyd@kernel.org, corbet@lwn.net, Mark.Rutland@arm.com, maz@kernel.org, kernel-team@meta.com, neeraju@codeaurora.org, ak@linux.intel.com, feng.tang@intel.com, zhengjun.xing@intel.com, Waiman Long , John Stultz , "Paul E . McKenney" Subject: [PATCH v2 clocksource 5/7] clocksource: Suspend the watchdog temporarily when high read latency detected Date: Tue, 24 Jan 2023 16:27:28 -0800 Message-Id: <20230125002730.1471349-5-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> References: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Feng Tang Bugs have been reported on 8 sockets x86 machines in which the TSC was wrongly disabled when the system is under heavy workload. [ 818.380354] clocksource: timekeeping watchdog on CPU336: hpet wd-wd read= -back delay of 1203520ns [ 818.436160] clocksource: wd-tsc-wd read-back delay of 181880ns, clock-sk= ew test skipped! [ 819.402962] clocksource: timekeeping watchdog on CPU338: hpet wd-wd read= -back delay of 324000ns [ 819.448036] clocksource: wd-tsc-wd read-back delay of 337240ns, clock-sk= ew test skipped! [ 819.880863] clocksource: timekeeping watchdog on CPU339: hpet read-back = delay of 150280ns, attempt 3, marking unstable [ 819.936243] tsc: Marking TSC unstable due to clocksource watchdog [ 820.068173] TSC found unstable after boot, most likely due to broken BIO= S. Use 'tsc=3Dunstable'. [ 820.092382] sched_clock: Marking unstable (818769414384, 1195404998) [ 820.643627] clocksource: Checking clocksource tsc synchronization from C= PU 267 to CPUs 0,4,25,70,126,430,557,564. [ 821.067990] clocksource: Switched to clocksource hpet This can be reproduced by running memory intensive 'stream' tests, or some of the stress-ng subcases such as 'ioport'. The reason for these issues is the when system is under heavy load, the read latency of the clocksources can be very high. Even lightweight TSC reads can show high latencies, and latencies are much worse for external clocksources such as HPET or the APIC PM timer. These latencies can result in false-positive clocksource-unstable determinations. These issues were initially reported by a customer running on a production system, and this problem was reproduced on several generations of Xeon servers, especially when running the stress-ng test. These Xeon servers were not production systems, but they did have the latest steppings and firmware. Given that the clocksource watchdog is a continual diagnostic check with frequency of twice a second, there is no need to rush it when the system is under heavy load. Therefore, when high clocksource read latencies are detected, suspend the watchdog timer for 5 minutes. Signed-off-by: Feng Tang Acked-by: Waiman Long Cc: John Stultz Cc: Thomas Gleixner Cc: Stephen Boyd Cc: Feng Tang Signed-off-by: Paul E. McKenney --- kernel/time/clocksource.c | 45 ++++++++++++++++++++++++++++----------- 1 file changed, 32 insertions(+), 13 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index fc486cd972635..91836b727cef5 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -387,6 +387,15 @@ void clocksource_verify_percpu(struct clocksource *cs) } EXPORT_SYMBOL_GPL(clocksource_verify_percpu); =20 +static inline void clocksource_reset_watchdog(void) +{ + struct clocksource *cs; + + list_for_each_entry(cs, &watchdog_list, wd_list) + cs->flags &=3D ~CLOCK_SOURCE_WATCHDOG; +} + + static void clocksource_watchdog(struct timer_list *unused) { u64 csnow, wdnow, cslast, wdlast, delta; @@ -394,6 +403,7 @@ static void clocksource_watchdog(struct timer_list *unu= sed) int64_t wd_nsec, cs_nsec; struct clocksource *cs; enum wd_read_status read_ret; + unsigned long extra_wait =3D 0; u32 md; =20 spin_lock(&watchdog_lock); @@ -413,13 +423,30 @@ static void clocksource_watchdog(struct timer_list *u= nused) =20 read_ret =3D cs_watchdog_read(cs, &csnow, &wdnow); =20 - if (read_ret !=3D WD_READ_SUCCESS) { - if (read_ret =3D=3D WD_READ_UNSTABLE) - /* Clock readout unreliable, so give it up. */ - __clocksource_unstable(cs); + if (read_ret =3D=3D WD_READ_UNSTABLE) { + /* Clock readout unreliable, so give it up. */ + __clocksource_unstable(cs); continue; } =20 + /* + * When WD_READ_SKIP is returned, it means the system is likely + * under very heavy load, where the latency of reading + * watchdog/clocksource is very big, and affect the accuracy of + * watchdog check. So give system some space and suspend the + * watchdog check for 5 minutes. + */ + if (read_ret =3D=3D WD_READ_SKIP) { + /* + * As the watchdog timer will be suspended, and + * cs->last could keep unchanged for 5 minutes, reset + * the counters. + */ + clocksource_reset_watchdog(); + extra_wait =3D HZ * 300; + break; + } + /* Clocksource initialized ? */ if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) || atomic_read(&watchdog_reset_pending)) { @@ -523,7 +550,7 @@ static void clocksource_watchdog(struct timer_list *unu= sed) * pair clocksource_stop_watchdog() clocksource_start_watchdog(). */ if (!timer_pending(&watchdog_timer)) { - watchdog_timer.expires +=3D WATCHDOG_INTERVAL; + watchdog_timer.expires +=3D WATCHDOG_INTERVAL + extra_wait; add_timer_on(&watchdog_timer, next_cpu); } out: @@ -548,14 +575,6 @@ static inline void clocksource_stop_watchdog(void) watchdog_running =3D 0; } =20 -static inline void clocksource_reset_watchdog(void) -{ - struct clocksource *cs; - - list_for_each_entry(cs, &watchdog_list, wd_list) - cs->flags &=3D ~CLOCK_SOURCE_WATCHDOG; -} - static void clocksource_resume_watchdog(void) { atomic_inc(&watchdog_reset_pending); --=20 2.31.1.189.g2e36527f23 From nobody Fri Mar 14 13:29:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EFEEC54E94 for ; Wed, 25 Jan 2023 00:28:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234880AbjAYA2q (ORCPT ); Tue, 24 Jan 2023 19:28:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41114 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234845AbjAYA2m (ORCPT ); Tue, 24 Jan 2023 19:28:42 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76E5D65AF for ; Tue, 24 Jan 2023 16:28:11 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 089CF61425 for ; Wed, 25 Jan 2023 00:27:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 20CB2C43321; Wed, 25 Jan 2023 00:27:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1674606453; bh=2niA6R2oYPuWrzIU0B6wVlNq4jSpbdluB2kfj2HTDEE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=P890L5dMPd0z14y217P4NJAsrRGhnNTPWJb+uAeG8aIx0xrLSURfo8KclMKBjrr/j QVpNEjrbBNAWCWVdYlzAXYuyhgIZfGFxc+VZ7iLzq+DDd/ty5W/VEE9aR3TGBwy3Eo TG2lF20+d1MZzl3woOiq2XdeWEo6b3P9ZzJJt7qrfMRvL2BnvkRHToVIZHpizGtF9S rgmcbOO7HCBGo4p7/AWWMq39sHof8LIUrc6OIw0LMExGdS8BDKMchdWlOmTU6QWUH3 RNgxElyg24b5zVZwYzBOvMfj2tGgxSmqPyjPDM0KYnRtYiFHI1mhuq2E/Mh88XW80N c5h9N7Nc69zBw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 7B5EF5C1CF4; Tue, 24 Jan 2023 16:27:32 -0800 (PST) From: "Paul E. McKenney" To: tglx@linutronix.de Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org, sboyd@kernel.org, corbet@lwn.net, Mark.Rutland@arm.com, maz@kernel.org, kernel-team@meta.com, neeraju@codeaurora.org, ak@linux.intel.com, feng.tang@intel.com, zhengjun.xing@intel.com, "Paul E. McKenney" , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Daniel Lezcano , Waiman Long , x86@kernel.org Subject: [PATCH v2 clocksource 6/7] clocksource: Verify HPET and PMTMR when TSC unverified Date: Tue, 24 Jan 2023 16:27:29 -0800 Message-Id: <20230125002730.1471349-6-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> References: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC, NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the TSC is disabled. This works well much of the time, but there is the occasional production-level system that meets all of these criteria, but which still has a TSC that skews significantly from atomic-clock time. This is usually attributed to a firmware or hardware fault. Yes, the various NTP daemons do express their opinions of userspace-to-atomic-clock time skew, but they put them in various places, depending on the daemon and distro in question. It would therefore be good for the kernel to have some clue that there is a problem. The old behavior of marking the TSC unstable is a non-starter because a great many workloads simply cannot tolerate the overheads and latencies of the various non-TSC clocksources. In addition, NTP-corrected systems sometimes can tolerate significant kernel-space time skew as long as the userspace time sources are within epsilon of atomic-clock time. Therefore, when watchdog verification of TSC is disabled, enable it for HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel time-skew diagnostic without degrading the system's performance. Signed-off-by: Paul E. McKenney Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Cc: Daniel Lezcano Cc: Waiman Long Cc: Tested-by: Feng Tang --- arch/x86/include/asm/time.h | 1 + arch/x86/kernel/hpet.c | 2 ++ arch/x86/kernel/tsc.c | 5 +++++ drivers/clocksource/acpi_pm.c | 6 ++++-- 4 files changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/time.h b/arch/x86/include/asm/time.h index 8ac563abb567b..a53961c64a567 100644 --- a/arch/x86/include/asm/time.h +++ b/arch/x86/include/asm/time.h @@ -8,6 +8,7 @@ extern void hpet_time_init(void); extern void time_init(void); extern bool pit_timer_init(void); +extern bool tsc_clocksource_watchdog_disabled(void); =20 extern struct clock_event_device *global_clock_event; =20 diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index 71f336425e58a..c8eb1ac5125ab 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -1091,6 +1091,8 @@ int __init hpet_enable(void) if (!hpet_counting()) goto out_nohpet; =20 + if (tsc_clocksource_watchdog_disabled()) + clocksource_hpet.flags |=3D CLOCK_SOURCE_MUST_VERIFY; clocksource_register_hz(&clocksource_hpet, (u32)hpet_freq); =20 if (id & HPET_ID_LEGSUP) { diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index a78e73da4a74b..af3782fb6200c 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1186,6 +1186,11 @@ static void __init tsc_disable_clocksource_watchdog(= void) clocksource_tsc.flags &=3D ~CLOCK_SOURCE_MUST_VERIFY; } =20 +bool tsc_clocksource_watchdog_disabled(void) +{ + return !(clocksource_tsc.flags & CLOCK_SOURCE_MUST_VERIFY); +} + static void __init check_system_tsc_reliable(void) { #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONF= IG_X86_GENERIC) diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c index 279ddff81ab49..82338773602ca 100644 --- a/drivers/clocksource/acpi_pm.c +++ b/drivers/clocksource/acpi_pm.c @@ -23,6 +23,7 @@ #include #include #include +#include =20 /* * The I/O port the PMTMR resides at. @@ -210,8 +211,9 @@ static int __init init_acpi_pm_clocksource(void) return -ENODEV; } =20 - return clocksource_register_hz(&clocksource_acpi_pm, - PMTMR_TICKS_PER_SEC); + if (tsc_clocksource_watchdog_disabled()) + clocksource_acpi_pm.flags |=3D CLOCK_SOURCE_MUST_VERIFY; + return clocksource_register_hz(&clocksource_acpi_pm, PMTMR_TICKS_PER_SEC); } =20 /* We use fs_initcall because we want the PCI fixups to have run --=20 2.31.1.189.g2e36527f23 From nobody Fri Mar 14 13:29:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5054C38142 for ; Wed, 25 Jan 2023 00:28:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234798AbjAYA2P (ORCPT ); Tue, 24 Jan 2023 19:28:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234745AbjAYA2N (ORCPT ); Tue, 24 Jan 2023 19:28:13 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C719518E0; Tue, 24 Jan 2023 16:27:39 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id F33146141D; Wed, 25 Jan 2023 00:27:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2088BC43442; Wed, 25 Jan 2023 00:27:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1674606453; bh=Hf6D8jaVh9N46w1G0EEZbriQYqxBI7NIDla18BtPWI4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GsY2fk5zT8DTcq3AwyfRvEy78M9JjdbFUe8YMnBV7hXK7/RIi5I8ZkMfNVaxnQfoi uT6pA7E7X2eN+KlCKaWkSG8pvviYwHmm2/ji9DqcUamRKwt0h/IKwmy1NPPekEvYXX i5JxsWEH6816qagZnEbQCN2dWOeNM+y87ChgdTEqHuOW10rYUruZxvIQu0x53ogep+ lcv7lypzWme4ZHo+TO/GX0SbaBrXR5srcqayHkvhPLyyLRt88zs3m+Dldxnzz+UEMn SIiAZLAhaqRx0i14Tfho4gcsTgBv0Bj7l7vAfSKoZoWJ4yoYGd4svMwYpbc9gTStQc FjFFY0BU+NGQg== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 7D74F5C1D0D; Tue, 24 Jan 2023 16:27:32 -0800 (PST) From: "Paul E. McKenney" To: tglx@linutronix.de Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org, sboyd@kernel.org, corbet@lwn.net, Mark.Rutland@arm.com, maz@kernel.org, kernel-team@meta.com, neeraju@codeaurora.org, ak@linux.intel.com, feng.tang@intel.com, zhengjun.xing@intel.com, Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , x86@kernel.org, linux-doc@vger.kernel.org, "Paul E . McKenney" Subject: [PATCH v2 clocksource 7/7] x86/tsc: Add option to force frequency recalibration with HW timer Date: Tue, 24 Jan 2023 16:27:30 -0800 Message-Id: <20230125002730.1471349-7-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> References: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Feng Tang The kernel assumes that the TSC frequency which is provided by the hardware / firmware via MSRs or CPUID(0x15) is correct after applying a few basic consistency checks. This disables the TSC recalibration against HPET or PM timer. As a result there is no mechanism to validate that frequency in cases where a firmware or hardware defect is suspected. And there was case that some user used atomic clock to measure the TSC frequency and reported an inaccuracy issue, which was later fixed in firmware. Add an option 'recalibrate' for 'tsc' kernel parameter to force the tsc freq recalibration with HPET or PM timer, and warn if the deviation from previous value is more than about 500 PPM, which provides a way to verify the data from hardware / firmware. There is no functional change to existing work flow. Recently there was a real-world case: "The 40ms/s divergence between TSC and HPET was observed on hardware that is quite recent" [1], on that platform the TSC frequence 1896 MHz was got from CPUID(0x15), and the force-reclibration with HPET/PMTIMER both calibrated out value of 1975 MHz, which also matched with check from software 'chronyd', indicating it's a problem of BIOS or firmware. [Thanks tglx for helping improving the commit log] [ paulmck: Wordsmith Kconfig help text. ] [1]. https://lore.kernel.org/lkml/20221117230910.GI4001@paulmck-ThinkPad-P1= 7-Gen-1/ Signed-off-by: Feng Tang Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Cc: Jonathan Corbet Cc: Cc: Signed-off-by: Paul E. McKenney --- .../admin-guide/kernel-parameters.txt | 4 +++ arch/x86/kernel/tsc.c | 34 ++++++++++++++++--- 2 files changed, 34 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 6cfa6e3996cf7..95f0d104c2322 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6369,6 +6369,10 @@ in situations with strict latency requirements (where interruptions from clocksource watchdog are not acceptable). + [x86] recalibrate: force recalibration against a HW timer + (HPET or PM timer) on systems whose TSC frequency was + obtained from HW or FW using either an MSR or CPUID(0x15). + Warn if the difference is more than 500 ppm. =20 tsc_early_khz=3D [X86] Skip early TSC calibration and use the given value instead. Useful when the early TSC frequency discovery diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index af3782fb6200c..a5371c6d4b64b 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -48,6 +48,8 @@ static DEFINE_STATIC_KEY_FALSE(__use_tsc); =20 int tsc_clocksource_reliable; =20 +static int __read_mostly tsc_force_recalibrate; + static u32 art_to_tsc_numerator; static u32 art_to_tsc_denominator; static u64 art_to_tsc_offset; @@ -303,6 +305,8 @@ static int __init tsc_setup(char *str) mark_tsc_unstable("boot parameter"); if (!strcmp(str, "nowatchdog")) no_tsc_watchdog =3D 1; + if (!strcmp(str, "recalibrate")) + tsc_force_recalibrate =3D 1; return 1; } =20 @@ -1379,6 +1383,25 @@ static void tsc_refine_calibration_work(struct work_= struct *work) else freq =3D calc_pmtimer_ref(delta, ref_start, ref_stop); =20 + /* Will hit this only if tsc_force_recalibrate has been set */ + if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) { + + /* Warn if the deviation exceeds 500 ppm */ + if (abs(tsc_khz - freq) > (tsc_khz >> 11)) { + pr_warn("Warning: TSC freq calibrated by CPUID/MSR differs from what is= calibrated by HW timer, please check with vendor!!\n"); + pr_info("Previous calibrated TSC freq:\t %lu.%03lu MHz\n", + (unsigned long)tsc_khz / 1000, + (unsigned long)tsc_khz % 1000); + } + + pr_info("TSC freq recalibrated by [%s]:\t %lu.%03lu MHz\n", + hpet ? "HPET" : "PM_TIMER", + (unsigned long)freq / 1000, + (unsigned long)freq % 1000); + + return; + } + /* Make sure we're within 1% */ if (abs(tsc_khz - freq) > tsc_khz/100) goto out; @@ -1412,8 +1435,10 @@ static int __init init_tsc_clocksource(void) if (!boot_cpu_has(X86_FEATURE_TSC) || !tsc_khz) return 0; =20 - if (tsc_unstable) - goto unreg; + if (tsc_unstable) { + clocksource_unregister(&clocksource_tsc_early); + return 0; + } =20 if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3)) clocksource_tsc.flags |=3D CLOCK_SOURCE_SUSPEND_NONSTOP; @@ -1426,9 +1451,10 @@ static int __init init_tsc_clocksource(void) if (boot_cpu_has(X86_FEATURE_ART)) art_related_clocksource =3D &clocksource_tsc; clocksource_register_khz(&clocksource_tsc, tsc_khz); -unreg: clocksource_unregister(&clocksource_tsc_early); - return 0; + + if (!tsc_force_recalibrate) + return 0; } =20 schedule_delayed_work(&tsc_irqwork, 0); --=20 2.31.1.189.g2e36527f23 From nobody Fri Mar 14 13:29:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F565C636CC for ; Fri, 3 Feb 2023 04:37:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231748AbjBCEhF (ORCPT ); Thu, 2 Feb 2023 23:37:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229645AbjBCEhB (ORCPT ); Thu, 2 Feb 2023 23:37:01 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 473D860CB9 for ; Thu, 2 Feb 2023 20:37:00 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D669C61D7D for ; Fri, 3 Feb 2023 04:36:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 33A9CC433D2; Fri, 3 Feb 2023 04:36:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1675399019; bh=vSXG2FNpbLvUQNLylcNUj1Px15+hEgLUz7VrP/XsoDQ=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=HsK1ZAGq++mEDNF+UNZz+3dH1nzLFgjDRsrh8UrC8+IpzA8LZuOCKcJ4b7tzcT9Wk JinU9YJPTRy9BZlDz+2QLRVYiNIDdldbf62jo7eSdaMnraafzvQAZo4/vWbO0LxrVx 6PsyZZRpVwn72JKjBPfh3sr7+2f42XR9bthXi70Ex9KaVmxigATVA+SGE5lqAHmTFA NYCtJ+eTIbYc9TnlMNGPtnGb9r/PhYUBqPFwApZfU6EdMJH6I2lNW4bJsWa0npDruu 7rntRxb2AAx49L0qR3brnZ84G/0yDwY34TwsujnM/JfkuLAxgvSMAwOwbweJPfgyJP 5luMMEvw/cL9Q== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id CAFA35C0DE7; Thu, 2 Feb 2023 20:36:58 -0800 (PST) Date: Thu, 2 Feb 2023 20:36:58 -0800 From: "Paul E. McKenney" To: tglx@linutronix.de Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org, sboyd@kernel.org, corbet@lwn.net, Mark.Rutland@arm.com, maz@kernel.org, kernel-team@meta.com, neeraju@codeaurora.org, ak@linux.intel.com, feng.tang@intel.com, zhengjun.xing@intel.com, longman@redhat.com Subject: PATCH v2 clocksource 8/7] clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested Message-ID: <20230203043658.GA1513624@paulmck-ThinkPad-P17-Gen-1> Reply-To: paulmck@kernel.org References: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230125002708.GA1471122@paulmck-ThinkPad-P17-Gen-1> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Unconditionally enabling TSC watchdog checking of the HPET and PMTMR clocksources can degrade latency and performance. Therefore, provide a new "watchdog" option to the tsc=3D boot parameter that opts into such checking. Note that tsc=3Dwatchdog is overridden by a tsc=3Dnowatchdog regardless of their relative positions in the list of boot parameters. Reported-by: Thomas Gleixner Reported-by: Waiman Long Signed-off-by: Paul E. McKenney diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 95f0d104c2322..7b4df6d89d3c3 100644 Acked-by: Waiman Long --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6373,6 +6373,12 @@ (HPET or PM timer) on systems whose TSC frequency was obtained from HW or FW using either an MSR or CPUID(0x15). Warn if the difference is more than 500 ppm. + [x86] watchdog: Use TSC as the watchdog clocksource with + which to check other HW timers (HPET or PM timer), but + only on systems where TSC has been deemed trustworthy. + This will be suppressed by an earlier tsc=3Dnowatchdog and + can be overridden by a later tsc=3Dnowatchdog. A console + message will flag any such suppression or overriding. =20 tsc_early_khz=3D [X86] Skip early TSC calibration and use the given value instead. Useful when the early TSC frequency discovery diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index a5371c6d4b64b..306c233c98d84 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -294,6 +294,7 @@ __setup("notsc", notsc_setup); =20 static int no_sched_irq_time; static int no_tsc_watchdog; +static int tsc_as_watchdog; =20 static int __init tsc_setup(char *str) { @@ -303,10 +304,22 @@ static int __init tsc_setup(char *str) no_sched_irq_time =3D 1; if (!strcmp(str, "unstable")) mark_tsc_unstable("boot parameter"); - if (!strcmp(str, "nowatchdog")) + if (!strcmp(str, "nowatchdog")) { no_tsc_watchdog =3D 1; + if (tsc_as_watchdog) + pr_alert("%s: Overriding earlier tsc=3Dwatchdog with tsc=3Dnowatchdog\n= ", + __func__); + tsc_as_watchdog =3D 0; + } if (!strcmp(str, "recalibrate")) tsc_force_recalibrate =3D 1; + if (!strcmp(str, "watchdog")) { + if (no_tsc_watchdog) + pr_alert("%s: tsc=3Dwatchdog overridden by earlier tsc=3Dnowatchdog\n", + __func__); + else + tsc_as_watchdog =3D 1; + } return 1; } =20 @@ -1192,7 +1205,8 @@ static void __init tsc_disable_clocksource_watchdog(v= oid) =20 bool tsc_clocksource_watchdog_disabled(void) { - return !(clocksource_tsc.flags & CLOCK_SOURCE_MUST_VERIFY); + return !(clocksource_tsc.flags & CLOCK_SOURCE_MUST_VERIFY) && + tsc_as_watchdog && !no_tsc_watchdog; } =20 static void __init check_system_tsc_reliable(void)