From nobody Fri Jun 12 16:00:22 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98C643955ED for ; Wed, 13 May 2026 21:02:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778706140; cv=none; b=OZFYzoB9wmfsSsbJMi0dmEW19bEOakcECUFHU+RLJwxceKoVz/0Fdz/Pi8VdQ5JvSOWr53tk+5nYN/RbOiJkhS2H8AC3ksWNmnjWqoeYFKkMUP8oIYfjumNhkMTBXHSuCD8RiG/zjz6Pns5LQ7Liwazvu8J3i0pbQOhr7pZOey4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778706140; c=relaxed/simple; bh=QbGs1t2DXBSSC4qGqPdRqDkvS613OjqvdAFzpjqNwos=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=P97gFxbI9DlMxraJ4j+UCUyQutNyZMMHxC9hH8aGgl9vHMGutWM6JCF+pHWB5sVIn9eqJnP2C1yV5L/dj3hZEm5Uqqw1SVVe/wU7zBy+6hZhLJXaw2ADlEyjriKWobQtaiq0y0ciOhBICuT3aeLL/GRyBBlhbFPoScFktB045vA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=desiato.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=Fp40VBXx; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=desiato.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Fp40VBXx" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=I7YiC/kMMyNyDPvJ1bBdoG3jBZi2TppZYv8DAMXHJrI=; b=Fp40VBXxxZlCm37i1HBHLKUBvT HDZzyTDA37eS6bZavZWg+wNAfRC/KQ84UPQUM0DfdqEq4nU/sfeWWXWz2GKG0orufdZzAN1nFJLzj EU6gwLhHGLr4zqfIM/AnyR7wQwBThjyy0sqzrhmY8eBvWkKlxHkPlC6675vx60dn6YC9hjnxmBTDE lWjpBeRuCbkfSpFKdkc7QC+kczUtfI4cpyFaIpOJPyQksbOYhICMmbohofKrXXsplm6fjXNufZMNz t4J2CiZuYhD9OqzRr2TUPqhxKccNNRBrYQp6QDiSAeO5Z7MEXMsu4SYXPruhfKLO+aGqP1iTcogjD zD6mTWMw==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wNGiN-00000001The-3a2e; Wed, 13 May 2026 21:02:00 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1wNGiM-0000000EJXA-3qQx; Wed, 13 May 2026 22:01:58 +0100 From: David Woodhouse To: Richard Cochran , Wen Gu , David Woodhouse , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , John Stultz , Thomas Gleixner , Stephen Boyd , Anna-Maria Behnsen , Frederic Weisbecker , Shuah Khan , Peter Zijlstra , =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= , Arnd Bergmann , linux-kernel@vger.kernel.org Cc: David Woodhouse Subject: [RFC PATCH 1/4] timekeeping: Remove xtime_remainder from ntp_error accumulation Date: Wed, 13 May 2026 21:44:36 +0100 Message-ID: <20260513210157.3410814-2-dwmw2@infradead.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260513210157.3410814-1-dwmw2@infradead.org> References: <20260513210157.3410814-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse The ntp_error accumulator tracks the difference between intended and actual clock advance. Each tick it adds ntp_tick (the intended advance) and subtracts what the clock actually advanced. The subtraction was (xtime_interval + xtime_remainder), but only xtime_interval is actually added to xtime_nsec each tick. xtime_remainder was a boot-time constant representing the rounding error from converting the tick period to an integer number of counter cycles. It was never added to xtime_nsec, so subtracting it from ntp_error created a phantom credit that biased the dithering ratio. The effect is a systematic drift whose magnitude depends on the value of xtime_remainder and the NTP frequency correction. NTP masks this by continuously adjusting the frequency to compensate, but with a fixed frequency (or an external reference clock like vmclock), the drift is exposed. Also remove xtime_remainder from the mult computation in timekeeping_adjust(), which used it to offset the division for the same (incorrect) reason. Fixes: a386b5af8edd ("time: Compensate for rounding on odd-frequency clocks= ources") Signed-off-by: David Woodhouse --- include/linux/timekeeper_internal.h | 2 -- kernel/time/timekeeping.c | 7 +++---- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper= _internal.h index e36d11e33e0c..2f4cfcfcaac0 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -84,7 +84,6 @@ struct tk_read_base { * @cycle_interval: Number of clock cycles in one NTP interval * @xtime_interval: Number of clock shifted nano seconds in one NTP * interval. - * @xtime_remainder: Shifted nano seconds left over when rounding * @cycle_interval * @raw_interval: Shifted raw nano seconds accumulated per NTP interval. * @next_leap_ktime: CLOCK_MONOTONIC time value of a pending leap-second @@ -178,7 +177,6 @@ struct timekeeper { =20 u64 cycle_interval; u64 xtime_interval; - s64 xtime_remainder; u64 raw_interval; =20 ktime_t next_leap_ktime; diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index c493a4010305..3da7167ceb0d 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -360,7 +360,6 @@ static void tk_setup_internals(struct timekeeper *tk, s= truct clocksource *clock) =20 /* Go back from cycles -> shifted ns */ tk->xtime_interval =3D interval * clock->mult; - tk->xtime_remainder =3D ntpinterval - tk->xtime_interval; tk->raw_interval =3D interval * clock->mult; =20 /* if changing clocks, convert xtime_nsec shift units */ @@ -2337,8 +2336,8 @@ static void timekeeping_adjust(struct timekeeper *tk,= s64 offset) mult =3D tk->tkr_mono.mult - tk->ntp_err_mult; } else { tk->ntp_tick =3D ntp_tl; - mult =3D div64_u64((tk->ntp_tick >> tk->ntp_error_shift) - - tk->xtime_remainder, tk->cycle_interval); + mult =3D div64_u64(tk->ntp_tick >> tk->ntp_error_shift, + tk->cycle_interval); } =20 /* @@ -2463,7 +2462,7 @@ static u64 logarithmic_accumulation(struct timekeeper= *tk, u64 offset, =20 /* Accumulate error between NTP and clock interval */ tk->ntp_error +=3D tk->ntp_tick << shift; - tk->ntp_error -=3D (tk->xtime_interval + tk->xtime_remainder) << + tk->ntp_error -=3D tk->xtime_interval << (tk->ntp_error_shift + shift); =20 return offset; --=20 2.51.0 From nobody Fri Jun 12 16:00:22 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BAFC125B090 for ; Wed, 13 May 2026 21:02:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778706139; cv=none; b=VdZ6op6XMmBdiFEZ1svC0B7oCTcUaxnDhNtCSUK00WbytUkJUhjSoFl7NUqBWYhD8wXWcEYcD/O/kcZ6umCnCilgJr4pYRGXHIr3VgSl9JimC6oQa750ZC9du7FRb1zROG+mfkbi1YaP1jdyCj3yqhjBaXLr0hro1wXKvXLarmo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778706139; c=relaxed/simple; bh=qRI30I8ZZnnB/AmTvo3HFCg37jM4RR5TJ7L/H3ZN/dg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=kyIVZ0TQYTWNQR+KPZeNW8i4Em/RScru2AVX8tApzAttHZ2s9T9+n7QgIOyT1LbtXtA7zUXTPP8VTAMxYJBFjtU8dL+qKW59fE5FEgMNbbC1hDcSTNgBen21oyaSsNm/TyIVu3sjKSjtHXRLNyx6R+QGMC0bpQEnhK4+mOKdXtk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=desiato.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=IrFTiaDQ; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=desiato.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="IrFTiaDQ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc: To:From:Reply-To:Content-ID:Content-Description; bh=ETUW92k5rgi0n0Bjk7AAZnqm5XjhV4iflKRxEhnJKug=; b=IrFTiaDQy5AJWAUQ5e+lzg9QcK qzUtEOsMBOotIafadojmZvgohsEhQ4IBFqulpEYyQ9SDft5wD3rD2841hd3r9MXP4NFvA0ZJp0Iyq d/l9tSMm15nOJQpLkExaLAABSokHx3n2VY+LXE4uD8aZaT9d6CkGtsGGtiniur3HF2rEYX8+mOd2i CduLkWDhDxoZOTlVyh7qcWdZOvj6cJvMP9m0CGUL7mFDAP/OdpYDIxVbTRI956qA+xEtu7nW6O9iP hWFpH+By+omm/6gY4F2Ouq3y7D5n2cIuvXhIm2Q/xchbizR2woWLrpCw/w8MXnO5evh84lj244kg9 U4CC92ZQ==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wNGiN-00000001Thg-3aKg; Wed, 13 May 2026 21:02:02 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1wNGiM-0000000EJXE-49zq; Wed, 13 May 2026 22:01:59 +0100 From: David Woodhouse To: Richard Cochran , Wen Gu , David Woodhouse , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , John Stultz , Thomas Gleixner , Stephen Boyd , Anna-Maria Behnsen , Frederic Weisbecker , Shuah Khan , Peter Zijlstra , =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= , Arnd Bergmann , linux-kernel@vger.kernel.org Cc: David Woodhouse Subject: [RFC PATCH 2/4] WIP: kernel/time: Add /dev/vmclock_host miscdev Date: Wed, 13 May 2026 21:44:37 +0100 Message-ID: <20260513210157.3410814-3-dwmw2@infradead.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260513210157.3410814-1-dwmw2@infradead.org> References: <20260513210157.3410814-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html From: David Woodhouse Expose the host's NTP-disciplined clock as a vmclock_abi page via /dev/vmclock_host. A VMM can mmap or poll() this device to obtain precision time parameters for relaying to guests. The page is updated only when ntp_tick changes (i.e., when NTP actually adjusts the frequency), not on every timekeeping tick. This avoids the per-tick overhead of the existing pvclock_gtod notifier while providing the same information. Fields populated: - counter_id: X86_TSC - time_type: TAI - counter_value: TSC at reference point - time_sec/time_frac_sec: TAI at reference point - counter_period_frac_sec: NTP-disciplined TSC period - tai_offset_sec: current UTC-TAI offset NOT YET DONE: - Error bounds (esterror/maxerror) - Leap second indicator - Disruption marker (needs clocksource change hook) - Selftest Signed-off-by: David Woodhouse --- include/linux/vmclock_host.h | 17 + kernel/time/Kconfig | 8 + kernel/time/Makefile | 1 + kernel/time/ntp.c | 3 +- kernel/time/ntp_internal.h | 1 + kernel/time/timekeeping.c | 6 + kernel/time/vmclock_host.c | 319 ++++++++++++++++++ .../selftests/timers/vmclock_host_test.c | 171 ++++++++++ 8 files changed, 525 insertions(+), 1 deletion(-) create mode 100644 include/linux/vmclock_host.h create mode 100644 kernel/time/vmclock_host.c create mode 100644 tools/testing/selftests/timers/vmclock_host_test.c diff --git a/include/linux/vmclock_host.h b/include/linux/vmclock_host.h new file mode 100644 index 000000000000..388a5a1b470c --- /dev/null +++ b/include/linux/vmclock_host.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_VMCLOCK_HOST_H +#define _LINUX_VMCLOCK_HOST_H + +struct timekeeper; + +extern void (*vmclock_host_update_fn)(struct timekeeper *tk); + +static inline void vmclock_host_update(struct timekeeper *tk) +{ + typeof(vmclock_host_update_fn) fn =3D READ_ONCE(vmclock_host_update_fn); + + if (fn) + fn(tk); +} + +#endif /* _LINUX_VMCLOCK_HOST_H */ diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index 02aac7c5aa76..493ffda434a8 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -221,4 +221,12 @@ config POSIX_AUX_CLOCKS and other clock domains, which are not correlated to the TAI/NTP notion of time. =20 +config VMCLOCK_HOST + tristate "VMClock host time provider (/dev/vmclock_host)" + depends on X86_TSC || ARM64 + help + Expose the host NTP-disciplined clock as a vmclock page via + /dev/vmclock_host for VMMs to relay precision time to guests. + endmenu + diff --git a/kernel/time/Makefile b/kernel/time/Makefile index eaf290c972f9..549070254e3a 100644 --- a/kernel/time/Makefile +++ b/kernel/time/Makefile @@ -33,3 +33,4 @@ obj-$(CONFIG_TIME_NS) +=3D namespace.o obj-$(CONFIG_TIME_NS_VDSO) +=3D namespace_vdso.o obj-$(CONFIG_TEST_CLOCKSOURCE_WATCHDOG) +=3D clocksource-wdtest.o obj-$(CONFIG_TIME_KUNIT_TEST) +=3D time_test.o +obj-$(CONFIG_VMCLOCK_HOST) +=3D vmclock_host.o diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 97fa99b96dd0..3c318c96f35d 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -611,10 +611,11 @@ static inline int update_rtc(struct timespec64 *to_se= t, unsigned long *offset_ns * ntp_synced - Tells whether the NTP status is not UNSYNC * Returns: true if not UNSYNC, false otherwise */ -static inline bool ntp_synced(void) +bool ntp_synced(void) { return !(tk_ntp_data[TIMEKEEPER_CORE].time_status & STA_UNSYNC); } +EXPORT_SYMBOL_GPL(ntp_synced); =20 /* * If we have an externally synchronized Linux clock, then update RTC clock diff --git a/kernel/time/ntp_internal.h b/kernel/time/ntp_internal.h index 7084d839c207..b36a8090fc9c 100644 --- a/kernel/time/ntp_internal.h +++ b/kernel/time/ntp_internal.h @@ -3,6 +3,7 @@ #define _LINUX_NTP_INTERNAL_H =20 extern void ntp_init(void); +extern bool ntp_synced(void); extern void ntp_clear(unsigned int tkid); /* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */ extern u64 ntp_tick_length(unsigned int tkid); diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 3da7167ceb0d..1935881041d0 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -27,6 +27,10 @@ #include "tick-internal.h" #include "timekeeping_internal.h" #include "ntp_internal.h" +#include + +void (*vmclock_host_update_fn)(struct timekeeper *tk); +EXPORT_SYMBOL_GPL(vmclock_host_update_fn); =20 #define TK_CLEAR_NTP (1 << 0) #define TK_CLOCK_WAS_SET (1 << 1) @@ -2340,6 +2344,8 @@ static void timekeeping_adjust(struct timekeeper *tk,= s64 offset) tk->cycle_interval); } =20 + vmclock_host_update(tk); + /* * If the clock is behind the NTP time, increase the multiplier by 1 * to catch up with it. If it's ahead and there was a remainder in the diff --git a/kernel/time/vmclock_host.c b/kernel/time/vmclock_host.c new file mode 100644 index 000000000000..f4baf9069e70 --- /dev/null +++ b/kernel/time/vmclock_host.c @@ -0,0 +1,319 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * /dev/vmclock_host - Expose host NTP-disciplined time as a vmclock page. + * + * This provides a vmclock_abi structure populated from the host's + * CLOCK_REALTIME (TAI), allowing a VMM to efficiently relay precision + * time to guests without per-tick overhead. + * + * The page is updated only when the NTP frequency (ntp_tick) changes + * or the clocksource changes =E2=80=94 not on every timekeeping tick. + * Userspace can poll() for changes. + * + * Copyright =C2=A9 2026 Amazon.com, Inc. or its affiliates. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +extern void (*vmclock_host_update_fn)(struct timekeeper *tk); +extern bool ntp_synced(void); + +static struct vmclock_abi *vmclock_page; +static DECLARE_WAIT_QUEUE_HEAD(vmclock_wait); +static u64 vmclock_last_ntp_tick =3D 1; /* Sentinel: force first update */ +static enum clocksource_ids vmclock_last_cs_id; + +/* + * Compute counter_period_frac_sec from ntp_tick and cycle_interval. + * + * ntp_tick is ns_per_tick << 32. + * cycle_interval is counter cycles per tick. + * + * vmclock wants: period =3D frac_sec / 2^(64 + shift) in seconds. + * + * ns_per_cycle =3D ntp_tick / cycle_interval (in <<32 fixed point) +/* + * Compute counter_period_frac_sec from ntp_tick and cycle_interval. + * + * period =3D ntp_tick / (cycle_interval * 10^9 * 2^32) seconds/cycle + * frac_sec =3D ntp_tick * 2^(32+shift) / (cycle_interval * 10^9) + * + * Use div64_u64 with maximum pre-shift for precision. + * The key: do TWO divisions to get 64 bits of quotient. + */ +static void vmclock_compute_period(struct timekeeper *tk, + u64 *period_frac, u8 *period_shift) +{ + u64 ntp_tick =3D tk->ntp_tick; + u64 cycle_interval =3D tk->cycle_interval; + u64 divisor =3D cycle_interval * 1000000000ULL; + int headroom =3D __builtin_clzll(ntp_tick); + u64 rem, result; + int bits_so_far, need; + + /* + * Compute ntp_tick * 2^(headroom + N) / divisor with 64 bits + * of precision, using iterative 32-bit chunk divisions. + * + * First division: ntp_tick << headroom / divisor + */ + result =3D div64_u64_rem(ntp_tick << headroom, divisor, &rem); + bits_so_far =3D 64 - __builtin_clzll(result ?: 1); + + /* Fill remaining bits 32 at a time from the remainder */ + while (bits_so_far < 64 && rem) { + int chunk =3D min(32, 64 - bits_so_far); + int rem_headroom =3D __builtin_clzll(rem); + u64 extra; + + if (rem_headroom < chunk) + chunk =3D rem_headroom; + + extra =3D div64_u64_rem(rem << chunk, divisor, &rem); + result =3D (result << chunk) | extra; + bits_so_far +=3D chunk; + headroom +=3D chunk; + } + + /* Pad with zeros if we ran out of remainder */ + if (bits_so_far < 64) { + result <<=3D (64 - bits_so_far); + headroom +=3D (64 - bits_so_far); + } + + /* + * result =3D ntp_tick * 2^headroom / divisor + * =3D (ntp_tick / (cycle_interval * 10^9)) * 2^headroom + * =3D period_seconds * 2^32 * 2^headroom + * =3D period_seconds * 2^(32 + headroom) + * + * vmclock: frac_sec / 2^(64 + shift) =3D period_seconds + * So: shift =3D 32 + headroom - 64 =3D headroom - 32 + */ + *period_frac =3D result; + *period_shift =3D (u8)(headroom - 32); +} + + +static u8 vmclock_counter_id(struct timekeeper *tk) +{ + enum clocksource_ids id =3D tk->cs_id; + + if (IS_ENABLED(CONFIG_X86) && id =3D=3D CSID_X86_TSC) + return VMCLOCK_COUNTER_X86_TSC; + if (IS_ENABLED(CONFIG_ARM64) && id =3D=3D CSID_ARM_ARCH_COUNTER) + return VMCLOCK_COUNTER_ARM_VCNT; + return VMCLOCK_COUNTER_INVALID; +} + +/* + * Called from timekeeping_adjust() when ntp_tick changes. + * Also needs to be called on clocksource change. + */ +static void vmclock_host_do_update(struct timekeeper *tk) +{ + struct vmclock_abi *clk =3D vmclock_page; + u64 period_frac; + u8 period_shift, counter_id; + + if (!clk) + return; + + counter_id =3D vmclock_counter_id(tk); + + /* Only do a full update when something meaningful changes */ + if (tk->ntp_tick =3D=3D vmclock_last_ntp_tick && + tk->cs_id =3D=3D vmclock_last_cs_id) + return; + + vmclock_last_ntp_tick =3D tk->ntp_tick; + vmclock_last_cs_id =3D tk->cs_id; + + /* Increment seq_count to odd (update in progress) */ + WRITE_ONCE(clk->seq_count, cpu_to_le32(le32_to_cpu(clk->seq_count) + 1)); + smp_wmb(); + + clk->counter_id =3D counter_id; + + if (counter_id !=3D VMCLOCK_COUNTER_INVALID) { + u64 ns =3D tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift; + u64 hi, rem; + + /* Adjust for ntp_error: represent where the clock is + * converging TO, not where it is right now. */ + ns +=3D tk->ntp_error >> (tk->tkr_mono.shift + tk->ntp_error_shift); + + clk->counter_value =3D cpu_to_le64(tk->tkr_mono.cycle_last); + clk->time_sec =3D cpu_to_le64(tk->xtime_sec + tk->tai_offset); + + hi =3D div64_u64_rem(ns << 32, 1000000000ULL, &rem); + clk->time_frac_sec =3D cpu_to_le64( + (hi << 32) | div64_u64(rem << 32, 1000000000ULL)); + + vmclock_compute_period(tk, + &period_frac, &period_shift); + clk->counter_period_frac_sec =3D cpu_to_le64(period_frac); + clk->counter_period_shift =3D period_shift; + + clk->clock_status =3D ntp_synced() ? + VMCLOCK_STATUS_SYNCHRONIZED : + VMCLOCK_STATUS_FREERUNNING; + } else { + clk->clock_status =3D VMCLOCK_STATUS_UNKNOWN; + } + + clk->tai_offset_sec =3D cpu_to_le16((s16)tk->tai_offset); + clk->flags =3D cpu_to_le64(VMCLOCK_FLAG_TAI_OFFSET_VALID | + VMCLOCK_FLAG_TIME_MONOTONIC | + VMCLOCK_FLAG_NOTIFICATION_PRESENT); + + smp_wmb(); + WRITE_ONCE(clk->seq_count, cpu_to_le32(le32_to_cpu(clk->seq_count) + 1)); + + wake_up_interruptible(&vmclock_wait); +} + +/* File operations */ + +struct vmclock_host_file { + u32 last_seq; +}; + +static int vmclock_host_open(struct inode *inode, struct file *fp) +{ + struct vmclock_host_file *fst; + + fst =3D kzalloc(sizeof(*fst), GFP_KERNEL); + if (!fst) + return -ENOMEM; + + fp->private_data =3D fst; + return 0; +} + +static int vmclock_host_release(struct inode *inode, struct file *fp) +{ + kfree(fp->private_data); + return 0; +} + +static int vmclock_host_mmap(struct file *fp, struct vm_area_struct *vma) +{ + if ((vma->vm_flags & (VM_READ | VM_WRITE)) !=3D VM_READ) + return -EROFS; + + if (vma->vm_end - vma->vm_start !=3D PAGE_SIZE || vma->vm_pgoff) + return -EINVAL; + + return remap_pfn_range(vma, vma->vm_start, + virt_to_phys(vmclock_page) >> PAGE_SHIFT, + PAGE_SIZE, vma->vm_page_prot); +} + +static ssize_t vmclock_host_read(struct file *fp, char __user *buf, + size_t count, loff_t *ppos) +{ + struct vmclock_host_file *fst =3D fp->private_data; + u32 seq; + + if (*ppos >=3D PAGE_SIZE) + return 0; + if (count > PAGE_SIZE - *ppos) + count =3D PAGE_SIZE - *ppos; + + do { + seq =3D le32_to_cpu(READ_ONCE(vmclock_page->seq_count)); + if (seq & 1) { + cpu_relax(); + continue; + } + smp_rmb(); + if (copy_to_user(buf, (char *)vmclock_page + *ppos, count)) + return -EFAULT; + smp_rmb(); + } while (le32_to_cpu(READ_ONCE(vmclock_page->seq_count)) !=3D seq); + + fst->last_seq =3D seq; + *ppos +=3D count; + return count; +} + +static __poll_t vmclock_host_poll(struct file *fp, poll_table *wait) +{ + struct vmclock_host_file *fst =3D fp->private_data; + u32 seq; + + poll_wait(fp, &vmclock_wait, wait); + + seq =3D le32_to_cpu(READ_ONCE(vmclock_page->seq_count)); + if (fst->last_seq !=3D seq) + return EPOLLIN | EPOLLRDNORM; + + return 0; +} + +static const struct file_operations vmclock_host_fops =3D { + .owner =3D THIS_MODULE, + .open =3D vmclock_host_open, + .release =3D vmclock_host_release, + .mmap =3D vmclock_host_mmap, + .read =3D vmclock_host_read, + .poll =3D vmclock_host_poll, +}; + +static struct miscdevice vmclock_host_miscdev =3D { + .minor =3D MISC_DYNAMIC_MINOR, + .name =3D "vmclock_host", + .fops =3D &vmclock_host_fops, +}; + +static int __init vmclock_host_init(void) +{ + int ret; + + vmclock_page =3D (struct vmclock_abi *)get_zeroed_page(GFP_KERNEL); + if (!vmclock_page) + return -ENOMEM; + + /* Set constant fields */ + vmclock_page->magic =3D cpu_to_le32(VMCLOCK_MAGIC); + vmclock_page->size =3D cpu_to_le32(PAGE_SIZE); + vmclock_page->version =3D cpu_to_le16(1); + vmclock_page->time_type =3D VMCLOCK_TIME_TAI; + + ret =3D misc_register(&vmclock_host_miscdev); + if (ret) { + free_page((unsigned long)vmclock_page); + vmclock_page =3D NULL; + return ret; + } + + WRITE_ONCE(vmclock_host_update_fn, vmclock_host_do_update); + pr_info("vmclock_host: registered /dev/vmclock_host\n"); + return 0; +} + +static void __exit vmclock_host_exit(void) +{ + WRITE_ONCE(vmclock_host_update_fn, NULL); + synchronize_rcu(); + misc_deregister(&vmclock_host_miscdev); + free_page((unsigned long)vmclock_page); + vmclock_page =3D NULL; +} + +module_init(vmclock_host_init); +module_exit(vmclock_host_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("David Woodhouse "); +MODULE_DESCRIPTION("VMClock host time provider"); diff --git a/tools/testing/selftests/timers/vmclock_host_test.c b/tools/tes= ting/selftests/timers/vmclock_host_test.c new file mode 100644 index 000000000000..c83cc7e6d404 --- /dev/null +++ b/tools/testing/selftests/timers/vmclock_host_test.c @@ -0,0 +1,171 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Test /dev/vmclock_host by comparing its time against CLOCK_TAI. + * + * Maps the vmclock page, reads time from it using the ABI formula, + * and compares with clock_gettime(CLOCK_TAI) using ABA timestamps + * to bound the uncertainty. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#ifdef __x86_64__ +static inline uint64_t read_counter(void) +{ + unsigned int lo, hi; + asm volatile("rdtsc" : "=3Da"(lo), "=3Dd"(hi)); + return ((uint64_t)hi << 32) | lo; +} +#elif defined(__aarch64__) +static inline uint64_t read_counter(void) +{ + uint64_t val; + asm volatile("mrs %0, cntvct_el0" : "=3Dr"(val)); + return val; +} +#else +#error "Unsupported architecture" +#endif + +/* + * Compute time from vmclock: T =3D time_sec + time_frac_sec/2^64 + + * (counter_now - counter_value) * counter_period_frac_sec >> (64 + shif= t) + * + * Returns nanoseconds since epoch. + */ +static int64_t vmclock_read_ns(const volatile struct vmclock_abi *clk, + uint64_t counter_now) +{ + uint64_t delta =3D counter_now - clk->counter_value; + uint64_t period =3D clk->counter_period_frac_sec; + uint8_t shift =3D clk->counter_period_shift; + __uint128_t ns128; + + /* delta * period gives seconds in 0.(64+shift) fixed point */ + ns128 =3D (__uint128_t)delta * period; + ns128 >>=3D shift; + /* Now ns128 is seconds in 0.64 fixed point. Add time_frac_sec */ + ns128 +=3D clk->time_frac_sec; + /* Top 64 bits are whole seconds of fractional part =E2=80=94 but we + * need to add time_sec for the full result */ + uint64_t frac_sec =3D (uint64_t)(ns128 >> 64); + uint64_t sub_sec_ns =3D (uint64_t)(((ns128 & 0xFFFFFFFFFFFFFFFFULL) * + 1000000000ULL) >> 64); + + return (int64_t)(clk->time_sec + frac_sec) * 1000000000LL + sub_sec_ns; +} + +static int64_t clock_tai_ns(void) +{ + struct timespec ts; + clock_gettime(CLOCK_TAI, &ts); + return (int64_t)ts.tv_sec * 1000000000LL + ts.tv_nsec; +} + +int main(void) +{ + int fd, ret =3D 0; + volatile struct vmclock_abi *clk; + int i, failures =3D 0; + + fd =3D open("/dev/vmclock_host", O_RDONLY); + if (fd < 0) { + if (errno =3D=3D ENOENT) { + printf("SKIP: /dev/vmclock_host not available\n"); + return 4; + } + perror("open /dev/vmclock_host"); + return 1; + } + + clk =3D mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd, 0); + if (clk =3D=3D MAP_FAILED) { + perror("mmap"); + close(fd); + return 1; + } + + if (clk->magic !=3D VMCLOCK_MAGIC) { + fprintf(stderr, "Bad magic: 0x%x\n", clk->magic); + ret =3D 1; + goto out; + } + + if (clk->counter_id =3D=3D VMCLOCK_COUNTER_INVALID) { + printf("SKIP: counter_id is INVALID (clocksource not TSC?)\n"); + ret =3D 4; + goto out; + } + + printf("vmclock_host: version=3D%u counter_id=3D%u time_type=3D%u status= =3D%u\n", + clk->version, clk->counter_id, clk->time_type, clk->clock_status); + printf(" tai_offset=3D%d\n", (int16_t)clk->tai_offset_sec); + printf(" counter_period_frac_sec=3D0x%" PRIx64 " shift=3D%u\n", + (uint64_t)clk->counter_period_frac_sec, clk->counter_period_shift); + + /* ABA comparison: read CLOCK_TAI, vmclock, CLOCK_TAI */ + printf("\nABA comparison (vmclock vs CLOCK_TAI):\n"); + for (i =3D 0; i < 10; i++) { + uint32_t seq; + int64_t tai_before, tai_after, vmclock_ns; + int64_t delta, window; + + /* Read with seqcount retry */ + do { + seq =3D clk->seq_count; + if (seq & 1) { + __asm__ volatile("pause" ::: "memory"); + continue; + } + __asm__ volatile("" ::: "memory"); + + tai_before =3D clock_tai_ns(); + uint64_t ctr =3D read_counter(); + tai_after =3D clock_tai_ns(); + + __asm__ volatile("" ::: "memory"); + if (clk->seq_count !=3D seq) + continue; + + vmclock_ns =3D vmclock_read_ns(clk, ctr); + break; + } while (1); + + window =3D tai_after - tai_before; + /* vmclock should be between tai_before and tai_after */ + delta =3D vmclock_ns - tai_before; + + printf(" [%d] vmclock-tai_before=3D%+" PRId64 "ns window=3D%" + PRId64 "ns", i, delta, window); + + if (delta < -2000 || delta > window + 2000) { + printf(" FAIL (out of range)\n"); + failures++; + } else { + printf(" OK\n"); + } + + usleep(100000); /* 100ms between samples */ + } + + if (failures) { + printf("\nFAIL: %d/%d samples out of range\n", failures, 10); + ret =3D 1; + } else { + printf("\nPASS: all samples within ABA window\n"); + } + +out: + munmap((void *)clk, 4096); + close(fd); + return ret; +} --=20 2.51.0 From nobody Fri Jun 12 16:00:22 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7EB3F392C5B for ; Wed, 13 May 2026 21:02:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778706141; cv=none; b=PO1cfKedI5saRVxRzWsNgKMxRUN6P9+iidw/oIyX1xUVOWnnzxVVgEuhMYu9n/zHLoS2277cGqy6qoy1O0WdBKeX0rvPNbSexjoLadgjHRc8aZ8JCtWQHHJj8+22AgLOk3JJaLluhZqL/doVldKkV9p1ce5QgVG96CxNa8VunGE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778706141; c=relaxed/simple; bh=Yiid+t/oHjG2Aq3WPABYHqPyspFqTpsXcg3gNbD0sUY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CDbr3EYtQj25MAjkUcNXqaH7psDw/EEbX5t0uc5xHTWH1aT0r+wcvQhL/ZZZOOK8VxaCyvh/4ZA0JEpXmN4UTrZgoagp5zVsKabm4aumzvYY0BqE/YMW/jdEDtcYhlMPaFP+7Y1j2d9/soRu9mRFNkGSKleLT0Uj7nSUDqF5Aus= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=XgwvZJp1; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="XgwvZJp1" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=Y8J0RsECgOAfQZmpRMA0UCyolIS0ZwZAef2zvMD12dQ=; b=XgwvZJp1wiGBuTDS8P+dlsraUv 4kmYXr1PrmLYiR9XTMwRJ3o4dH7e9/8mmSn9TK47IeHorWRRbUZnVRPFa4OABC5l9iV5b3D0yuLmT uKEQGGX74n3Um8iENmW6bZFu7L75FgjZvFg/Kybd+GjQMoCmeB+lYHJFgdzBhBz4cLM8YlmqhouAS x5lgd79mDKNB5wrFyumWr2nAzCYGsvhmIVm3VX5v48flUAojV1isyoVdAePYY8Iv/GJw6HOCiVl5g /Rjq5doefjSti9Ikaeln87JYTCyROcQYj5300wVWAhG0o3gI+RAc+o4M3s+++DgLzGXUgUKi2WBh8 insVi4CA==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wNGiN-0000000C8WC-1O14; Wed, 13 May 2026 21:01:59 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1wNGiN-0000000EJXI-0HQb; Wed, 13 May 2026 22:01:59 +0100 From: David Woodhouse To: Richard Cochran , Wen Gu , David Woodhouse , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , John Stultz , Thomas Gleixner , Stephen Boyd , Anna-Maria Behnsen , Frederic Weisbecker , Shuah Khan , Peter Zijlstra , =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= , Arnd Bergmann , linux-kernel@vger.kernel.org Cc: David Woodhouse Subject: [RFC PATCH 3/4] timekeeping: Add absolute reference for feed-forward clock discipline Date: Wed, 13 May 2026 21:44:38 +0100 Message-ID: <20260513210157.3410814-4-dwmw2@infradead.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260513210157.3410814-1-dwmw2@infradead.org> References: <20260513210157.3410814-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse Add timekeeping_set_reference() which allows an external clock source (e.g. vmclock from a hypervisor) to provide an absolute time reference. The reference defines a linear TSC-to-time mapping that the tick mechanism clamps to, replacing the relative ntp_error accumulator for the dithering decision. When a reference is active: - ntp_tick is set to match the reference rate via the normal NTP path - The dithering decision (mult vs mult+1) uses an absolute comparison against the reference line instead of the ntp_error accumulator - timekeeping_apply_adjustment() still runs for vDSO monotonicity - adjtimex reads back the correct frequency The reference is automatically cleared when: - adjtimex ADJ_FREQUENCY is called (NTP takes over) - The clocksource changes This eliminates the ~26 PPB systematic drift caused by the interaction between timekeeping_apply_adjustment()'s monotonicity correction and the ntp_error accumulator, which depends on interrupt latency. Signed-off-by: David Woodhouse --- include/linux/timekeeping_reference.h | 35 ++++++++++++ kernel/time/ntp.c | 33 ++++++++++++ kernel/time/ntp_internal.h | 2 + kernel/time/timekeeping.c | 76 ++++++++++++++++++++++++++- 4 files changed, 145 insertions(+), 1 deletion(-) create mode 100644 include/linux/timekeeping_reference.h diff --git a/include/linux/timekeeping_reference.h b/include/linux/timekeep= ing_reference.h new file mode 100644 index 000000000000..0cf248ace241 --- /dev/null +++ b/include/linux/timekeeping_reference.h @@ -0,0 +1,35 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_TIMEKEEPING_REFERENCE_H +#define _LINUX_TIMEKEEPING_REFERENCE_H + +#include +#include + +struct timekeeper; + +/** + * struct tk_reference - Absolute time reference for feed-forward timekeep= ing + * @cs_id: Clocksource counter this reference applies to + * @counter_value: Counter reading at the reference point + * @cycle_interval: Counter cycles per tick (for ntp_tick computation) + * @time_sec: Seconds (UTC) at the reference point + * @time_frac_sec: Fractional seconds (units of 1/2^64 second) + * @period_frac_sec: Counter period (units of 1/2^(64+shift) seconds) + * @period_shift: Additional shift for period fixed-point + */ +struct tk_reference { + enum clocksource_ids cs_id; + u64 counter_value; + u64 cycle_interval; + u64 time_sec; + u64 time_frac_sec; + u64 period_frac_sec; + u8 period_shift; +}; + +int timekeeping_set_reference(const struct tk_reference *ref); +bool timekeeping_has_reference(void); +void timekeeping_clear_reference(void); +bool timekeeping_ref_ahead(struct timekeeper *tk); + +#endif /* _LINUX_TIMEKEEPING_REFERENCE_H */ diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 3c318c96f35d..cdd63589160d 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -21,6 +21,7 @@ #include =20 #include "ntp_internal.h" +#include #include "timekeeping_internal.h" =20 /** @@ -364,6 +365,37 @@ u64 ntp_tick_length(unsigned int tkid) return tk_ntp_data[tkid].tick_length; } =20 +void ntp_set_tick_length(unsigned int tkid, u64 tick_length) +{ + struct ntp_data *ntpdata =3D &tk_ntp_data[tkid]; + u64 base; + + /* + * Reverse ntp_update_frequency() to find the time_freq that + * produces this tick_length, keeping everything consistent. + * + * tick_length =3D ((tick_usec * 1000 * USER_HZ) << 32 + + * ntp_tick_adj + time_freq) / NTP_INTERVAL_FREQ + * + * time_freq =3D tick_length * NTP_INTERVAL_FREQ - + * (tick_usec * 1000 * USER_HZ) << 32 - ntp_tick_adj + */ + base =3D (u64)(ntpdata->tick_usec * NSEC_PER_USEC * USER_HZ) << NTP_SCALE= _SHIFT; + base +=3D ntpdata->ntp_tick_adj; + + ntpdata->time_freq =3D (s64)(tick_length * NTP_INTERVAL_FREQ - base); + ntp_update_frequency(ntpdata); +} + +void ntp_set_time_offset(unsigned int tkid, s64 offset_ns) +{ + struct ntp_data *ntpdata =3D &tk_ntp_data[tkid]; + + ntpdata->time_offset =3D div_s64((s64)offset_ns << NTP_SCALE_SHIFT, + NTP_INTERVAL_FREQ); + ntpdata->time_adjust =3D 0; +} + /** * ntp_get_next_leap - Returns the next leapsecond in CLOCK_REALTIME ktime= _t * @tkid: Timekeeper ID @@ -736,6 +768,7 @@ static inline void process_adjtimex_modes(struct ntp_da= ta *ntpdata, const struct ntpdata->time_freq =3D max(ntpdata->time_freq, -MAXFREQ_SCALED); /* Update pps_freq */ pps_set_freq(ntpdata); + timekeeping_clear_reference(); } =20 if (txc->modes & ADJ_MAXERROR) diff --git a/kernel/time/ntp_internal.h b/kernel/time/ntp_internal.h index b36a8090fc9c..4531c162d229 100644 --- a/kernel/time/ntp_internal.h +++ b/kernel/time/ntp_internal.h @@ -7,6 +7,8 @@ extern bool ntp_synced(void); extern void ntp_clear(unsigned int tkid); /* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */ extern u64 ntp_tick_length(unsigned int tkid); +extern void ntp_set_tick_length(unsigned int tkid, u64 length); +extern void ntp_set_time_offset(unsigned int tkid, s64 offset_ns); extern ktime_t ntp_get_next_leap(unsigned int tkid); extern int second_overflow(unsigned int tkid, time64_t secs); extern int ntp_adjtimex(unsigned int tkid, struct __kernel_timex *txc, con= st struct timespec64 *ts, diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 1935881041d0..1225efdf5dc0 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -27,6 +27,8 @@ #include "tick-internal.h" #include "timekeeping_internal.h" #include "ntp_internal.h" +#include +#include #include =20 void (*vmclock_host_update_fn)(struct timekeeper *tk); @@ -396,6 +398,7 @@ static void tk_setup_internals(struct timekeeper *tk, s= truct clocksource *clock) tk->skip_second_overflow =3D 0; =20 tk->cs_id =3D clock->id; + timekeeping_clear_reference(); =20 /* Coupled clockevent data */ if (IS_ENABLED(CONFIG_GENERIC_CLOCKEVENTS_COUPLED) && @@ -2323,9 +2326,77 @@ static __always_inline void timekeeping_apply_adjust= ment(struct timekeeper *tk, tk->tkr_mono.xtime_nsec -=3D offset; } =20 +static struct tk_reference tk_ref; +static bool tk_ref_valid; + +int timekeeping_set_reference(const struct tk_reference *ref) +{ + struct timekeeper *tk =3D &tk_core.timekeeper; + __uint128_t product; + u64 delta, ref_frac, ref_ns; + s64 offset_ns; + + tk_ref =3D *ref; + if (!tk_ref.cycle_interval) + tk_ref.cycle_interval =3D tk->cycle_interval; + + /* Reject if the clocksource doesn't match */ + if (tk->cs_id !=3D ref->cs_id) + return -ENODEV; + + tk_ref_valid =3D true; + ntp_set_tick_length(TIMEKEEPER_CORE, + mul_u64_u64_shr(ref->period_frac_sec, + (u64)tk_ref.cycle_interval * NSEC_PER_SEC, + 32 + ref->period_shift)); + + /* Compute phase offset: (reference_time - xtime) in ns */ + delta =3D tk->tkr_mono.cycle_last - tk_ref.counter_value; + product =3D (__uint128_t)delta * tk_ref.period_frac_sec; + product >>=3D tk_ref.period_shift; + product +=3D tk_ref.time_frac_sec; + ref_frac =3D (u64)product; + ref_ns =3D mul_u64_u64_shr(ref_frac, NSEC_PER_SEC, 64); + + if (tk_ref.time_sec + (u64)(product >> 64) =3D=3D tk->xtime_sec) { + offset_ns =3D (s64)ref_ns - + (s64)(tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift); + ntp_set_time_offset(TIMEKEEPER_CORE, offset_ns); + } + + return 0; +} +EXPORT_SYMBOL_GPL(timekeeping_set_reference); + +bool timekeeping_has_reference(void) { return tk_ref_valid; } +void timekeeping_clear_reference(void) { tk_ref_valid =3D false; } + +bool timekeeping_ref_ahead(struct timekeeper *tk) +{ + u64 delta, ref_frac, ref_sec, ref_shifted_ns; + __uint128_t product; + + if (tk->cs_id !=3D tk_ref.cs_id) + return false; + delta =3D tk->tkr_mono.cycle_last - tk_ref.counter_value; + product =3D (__uint128_t)delta * tk_ref.period_frac_sec; + product >>=3D tk_ref.period_shift; + product +=3D tk_ref.time_frac_sec; + ref_sec =3D tk_ref.time_sec + (u64)(product >> 64); + ref_frac =3D (u64)product; + ref_shifted_ns =3D mul_u64_u64_shr(ref_frac, + (u64)NSEC_PER_SEC << tk->tkr_mono.shift, 64); + if (tk->xtime_sec > ref_sec) + return true; + if (tk->xtime_sec =3D=3D ref_sec && + tk->tkr_mono.xtime_nsec > ref_shifted_ns) + return true; + return false; +} /* * Adjust the timekeeper's multiplier to the correct frequency * and also to reduce the accumulated error value. + */ static void timekeeping_adjust(struct timekeeper *tk, s64 offset) { @@ -2352,7 +2423,10 @@ static void timekeeping_adjust(struct timekeeper *tk= , s64 offset) * tick division, the clock will slow down. Otherwise it will stay * ahead until the tick length changes to a non-divisible value. */ - tk->ntp_err_mult =3D tk->ntp_error > 0 ? 1 : 0; + if (timekeeping_has_reference()) + tk->ntp_err_mult =3D timekeeping_ref_ahead(tk) ? 0 : 1; + else + tk->ntp_err_mult =3D tk->ntp_error > 0 ? 1 : 0; mult +=3D tk->ntp_err_mult; =20 timekeeping_apply_adjustment(tk, offset, mult - tk->tkr_mono.mult); --=20 2.51.0 From nobody Fri Jun 12 16:00:22 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98B70384CF4 for ; Wed, 13 May 2026 21:02:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778706138; cv=none; b=Ay84GexvA1eoqrKdykCTzmhtBxrw+YMNBUAHIFaYD7y3hvH6RrMssoyRDQooCROTJXkCPvd0LYppXHC78HE5QHNKIMRVP95usMAwtgeFgdwZPjGcGoQ0zXfk8lpiZQUIwTYe54iS60LIo7Xn4xgNR/pNvE/Rw1G+n4PaURjzlXg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778706138; c=relaxed/simple; bh=3mdWdgTjro2HaL23rbMCdgNP6JZPXg6Q72biY1Yyxms=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tfreO51Z5rWhmwEC6g7u8Sz6CNgL/F761b6MIm3HLIZH7L6TTkTnDk4ELeNfm1A5w3zHIhqwJy7PKPD1rdtEtsacLF0bQdh9VlsmbUNF0A5UjXZ5OqQ//mYj1XaAS1f1Tojw8QynYG3qivwrQV4W5rcHynE4Z9s5xOHeS9ayP4U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=desiato.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=UbJPs3tQ; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=desiato.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="UbJPs3tQ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=4WtjG4i5xYDvio67qJlK3P7dKc3jwP3kJnKVDMGT7wk=; b=UbJPs3tQKatMnmzqVWpvcQxY65 MnTdD/9+blB3ra2a1Ygsk+uheTNjC0Br3ixDPPxsod8ZCJFcOp30mw2POsUCqQjaMl3PrY5VeOpGH H4SbZICCBxGN+P2+pFtyuGu6Hyl3DlCe0Yhdg8aRKmsk199yUMNEOu+vonJI1uy5wIabWs6SaFaDA 62iJ//dlZr3/ynOWum6KYvdrqlQTpcAgGFdcI4a0TH7uoTqsX+VGRauzfYRkdM21V8tDJ10/V6t8l d0rrtlIQAOnutFqWcgGIbt2WIMEAndJBoLyBo+kmaU2/55T5/tVkztIPm2sHuUh8rXrJ1581NQDkE FMp2HhcQ==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wNGiN-00000001Thh-3aXf; Wed, 13 May 2026 21:02:00 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1wNGiN-0000000EJXM-0aXV; Wed, 13 May 2026 22:01:59 +0100 From: David Woodhouse To: Richard Cochran , Wen Gu , David Woodhouse , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , John Stultz , Thomas Gleixner , Stephen Boyd , Anna-Maria Behnsen , Frederic Weisbecker , Shuah Khan , Peter Zijlstra , =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= , Arnd Bergmann , linux-kernel@vger.kernel.org Cc: David Woodhouse Subject: [RFC PATCH 4/4] ptp_vmclock: Feed reference to timekeeping for feed-forward discipline Date: Wed, 13 May 2026 21:44:39 +0100 Message-ID: <20260513210157.3410814-5-dwmw2@infradead.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260513210157.3410814-1-dwmw2@infradead.org> References: <20260513210157.3410814-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse When a vmclock device provides valid time, call timekeeping_set_reference() to enable feed-forward clock discipline. This eliminates drift between the system clock and the vmclock reference. The reference is set at probe time (after PTP registration) and updated on each notification from the hypervisor (ACPI or DT interrupt). If cycle_interval is not provided (set to 0), timekeeping_set_reference() fills it from the current timekeeper. Signed-off-by: David Woodhouse --- drivers/ptp/ptp_vmclock.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/drivers/ptp/ptp_vmclock.c b/drivers/ptp/ptp_vmclock.c index 8b630eb916b5..225cdc4366dc 100644 --- a/drivers/ptp/ptp_vmclock.c +++ b/drivers/ptp/ptp_vmclock.c @@ -27,6 +27,7 @@ #include =20 #include +#include =20 #ifdef CONFIG_X86 #include @@ -334,6 +335,27 @@ static const struct ptp_clock_info ptp_vmclock_info = =3D { .getcrosststamp =3D ptp_vmclock_getcrosststamp, }; =20 +static void vmclock_set_tk_reference(struct vmclock_state *st) +{ + struct vmclock_abi *clk =3D st->clk; + struct tk_reference ref =3D { + .cs_id =3D st->cs_id, + .counter_value =3D le64_to_cpu(clk->counter_value), + .time_sec =3D le64_to_cpu(clk->time_sec), + .time_frac_sec =3D le64_to_cpu(clk->time_frac_sec), + .period_frac_sec =3D le64_to_cpu(clk->counter_period_frac_sec), + .period_shift =3D clk->counter_period_shift, + }; + + /* Convert TAI to UTC for comparison with xtime_sec */ + if (clk->time_type =3D=3D VMCLOCK_TIME_TAI && + (le64_to_cpu(clk->flags) & VMCLOCK_FLAG_TAI_OFFSET_VALID)) + ref.time_sec +=3D (int16_t)le16_to_cpu(clk->tai_offset_sec); + + if (clk->clock_status !=3D VMCLOCK_STATUS_UNRELIABLE) + timekeeping_set_reference(&ref); +} + static struct ptp_clock *vmclock_ptp_register(struct device *dev, struct vmclock_state *st) { @@ -525,6 +547,7 @@ vmclock_acpi_notification_handler(acpi_handle __always_= unused handle, struct device *device =3D dev; struct vmclock_state *st =3D device->driver_data; =20 + vmclock_set_tk_reference(st); wake_up_interruptible(&st->disrupt_wait); } =20 @@ -580,6 +603,7 @@ static irqreturn_t vmclock_of_irq_handler(int __always_= unused irq, void *_st) { struct vmclock_state *st =3D _st; =20 + vmclock_set_tk_reference(st); wake_up_interruptible(&st->disrupt_wait); return IRQ_HANDLED; } @@ -751,6 +775,8 @@ static int vmclock_probe(struct platform_device *pdev) st->ptp_clock =3D NULL; return ret; } + if (st->ptp_clock) + vmclock_set_tk_reference(st); } =20 if (!st->miscdev.minor && !st->ptp_clock) { --=20 2.51.0