kernel/time/timekeeping.c | 2 ++ 1 file changed, 2 insertions(+)
Most drivers only populate the fields cycles and cs_id in their
get_time_fn() callback for get_device_system_crosststamp() unless
they explicitly provide nanosecond values.
When this new use_nsecs field was added and used most drivers did not
care.
Clock sources other than CSID_GENERIC could then get converted in
convert_base_to_cs() based on an uninitialized use_nsecs which usually
results in -EINVAL during the following range check.
Fixes: 6b2e29977518 ("timekeeping: Provide infrastructure for converting to/from a base clock")
Cc: stable@vger.kernel.org
Signed-off-by: Markus Blöchl <markus.bloechl@ipetronik.com>
---
Notes:
We observed this in the e1000e driver but at least stmmac and
ptp_kvm also seem affected by this.
ice was recently fixed by a5a441ae283d ("ice/ptp: fix crosstimestamp reporting").
kernel/time/timekeeping.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index a009c91f7b05..be0da807329f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1269,6 +1269,8 @@ int get_device_system_crosststamp(int (*get_time_fn)
do {
seq = read_seqcount_begin(&tk_core.seq);
+ system_counterval.use_nsecs = false;
+
/*
* Try to synchronously capture device time and a system
* counter value calling back into the device driver
base-commit: d7b8f8e20813f0179d8ef519541a3527e7661d3a
--
2.50.0
Impressum/Imprint: https://www.ipetronik.com/impressum
On Tue, Jul 8, 2025 at 9:46 AM Markus Blöchl <markus.bloechl@ipetronik.com> wrote: > > Most drivers only populate the fields cycles and cs_id in their > get_time_fn() callback for get_device_system_crosststamp() unless > they explicitly provide nanosecond values. > When this new use_nsecs field was added and used most drivers did not > care. > Clock sources other than CSID_GENERIC could then get converted in > convert_base_to_cs() based on an uninitialized use_nsecs which usually > results in -EINVAL during the following range check. > > Fixes: 6b2e29977518 ("timekeeping: Provide infrastructure for converting to/from a base clock") > Cc: stable@vger.kernel.org > Signed-off-by: Markus Blöchl <markus.bloechl@ipetronik.com> > --- > > Notes: > We observed this in the e1000e driver but at least stmmac and > ptp_kvm also seem affected by this. > ice was recently fixed by a5a441ae283d ("ice/ptp: fix crosstimestamp reporting"). > > > kernel/time/timekeeping.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > index a009c91f7b05..be0da807329f 100644 > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -1269,6 +1269,8 @@ int get_device_system_crosststamp(int (*get_time_fn) > > do { > seq = read_seqcount_begin(&tk_core.seq); > + system_counterval.use_nsecs = false; > + So if the argument is the local system_counterval structure isn't being fully initialized by the get_time_fn() functions it is passed to, it seems like it would be better to do so at the top of get_device_system_crosststamp(), and not inside the seqloop. But having the responsibility to initialize/fill in the structure being split across the core and the implementation logic (leaving some of the fields as optional) feels prone to mistakes, so it makes me wonder if those drivers implementing the get_time_fn() really ought to fully fill out the structure, and thus the fix would be better done in those drivers. thanks -john
On Tue, Jul 08, 2025 at 12:09:40PM -0700, John Stultz wrote: > On Tue, Jul 8, 2025 at 9:46 AM Markus Blöchl > <markus.bloechl@ipetronik.com> wrote: > > > > Most drivers only populate the fields cycles and cs_id in their > > get_time_fn() callback for get_device_system_crosststamp() unless > > they explicitly provide nanosecond values. > > When this new use_nsecs field was added and used most drivers did not > > care. > > Clock sources other than CSID_GENERIC could then get converted in > > convert_base_to_cs() based on an uninitialized use_nsecs which usually > > results in -EINVAL during the following range check. > > > > Fixes: 6b2e29977518 ("timekeeping: Provide infrastructure for converting to/from a base clock") > > Cc: stable@vger.kernel.org > > Signed-off-by: Markus Blöchl <markus.bloechl@ipetronik.com> > > --- > > > > Notes: > > We observed this in the e1000e driver but at least stmmac and > > ptp_kvm also seem affected by this. > > ice was recently fixed by a5a441ae283d ("ice/ptp: fix crosstimestamp reporting"). > > > > > > kernel/time/timekeeping.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > > index a009c91f7b05..be0da807329f 100644 > > --- a/kernel/time/timekeeping.c > > +++ b/kernel/time/timekeeping.c > > @@ -1269,6 +1269,8 @@ int get_device_system_crosststamp(int (*get_time_fn) > > > > do { > > seq = read_seqcount_begin(&tk_core.seq); > > + system_counterval.use_nsecs = false; > > + > > So if the argument is the local system_counterval structure isn't > being fully initialized by the get_time_fn() functions it is passed > to, it seems like it would be better to do so at the top of > get_device_system_crosststamp(), and not inside the seqloop. Probably, I was just afraid of the case where get_time_fn() would take like *very* different paths during different iterations. But that seems really unlikely, indeed. > > But having the responsibility to initialize/fill in the structure > being split across the core and the implementation logic (leaving some > of the fields as optional) feels prone to mistakes, so it makes me > wonder if those drivers implementing the get_time_fn() really ought to > fully fill out the structure, and thus the fix would be better done in > those drivers. Yes, they should. I also intend to patch the current drivers to do so but the initial addition of the use_nsecs field could never have been safe without some default initialization. I know that we shouldn't worry too much about out-of-tree drivers, but the fact that get_device_system_crosststamp() is exported, the compiler is still perfectly happy *but* it breaks silently and occasionally bugs me. So this fix should cover all known and unknown drivers and is easier to backport into stable. Meanwhile I am preparing some follow-up patches for net to make the known drivers fully fill out the structure. Btw: Do you happen to have any patchwork queue you want those timekeeping patches to land in? Thanks for your input Markus > > thanks > -john Impressum/Imprint: https://www.ipetronik.com/impressum
On Wed, Jul 09 2025 at 10:32, Markus Blöchl wrote: > On Tue, Jul 08, 2025 at 12:09:40PM -0700, John Stultz wrote: >> On Tue, Jul 8, 2025 at 9:46 AM Markus Blöchl >> <markus.bloechl@ipetronik.com> wrote: >> > >> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c >> > index a009c91f7b05..be0da807329f 100644 >> > --- a/kernel/time/timekeeping.c >> > +++ b/kernel/time/timekeeping.c >> > @@ -1269,6 +1269,8 @@ int get_device_system_crosststamp(int (*get_time_fn) >> > >> > do { >> > seq = read_seqcount_begin(&tk_core.seq); >> > + system_counterval.use_nsecs = false; >> > + >> >> So if the argument is the local system_counterval structure isn't >> being fully initialized by the get_time_fn() functions it is passed >> to, it seems like it would be better to do so at the top of >> get_device_system_crosststamp(), and not inside the seqloop. > > Probably, I was just afraid of the case where get_time_fn() would take > like *very* different paths during different iterations. > But that seems really unlikely, indeed. It's impossible. xtstamp->device and the related get_time_fn() are immutable during the call. >> But having the responsibility to initialize/fill in the structure >> being split across the core and the implementation logic (leaving some >> of the fields as optional) feels prone to mistakes, so it makes me >> wonder if those drivers implementing the get_time_fn() really ought to >> fully fill out the structure, and thus the fix would be better done in >> those drivers. > > Yes, they should. No, they should not. The data structure is instantiated in get_device_system_crosststamp() and then handed in un-initialized to get_time_fn(), which is wrong to begin with. Why? That means if the structure is ever expanded, then you'd have to fix up all of the get_time_fn() implementations. Seriously? The obviously correct and future proof thing to do is: - struct system_counterval_t system_counterval; + struct system_counterval_t system_counterval = { }; Which fixes the problem you discovered once and forever, no? Thanks, tglx
Hi Thomas, On Fri, Jul 18, 2025 at 10:25:12PM +0200, Thomas Gleixner wrote: > On Wed, Jul 09 2025 at 10:32, Markus Blöchl wrote: > > On Tue, Jul 08, 2025 at 12:09:40PM -0700, John Stultz wrote: > >> On Tue, Jul 8, 2025 at 9:46 AM Markus Blöchl > >> <markus.bloechl@ipetronik.com> wrote: > >> > > >> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > >> > index a009c91f7b05..be0da807329f 100644 > >> > --- a/kernel/time/timekeeping.c > >> > +++ b/kernel/time/timekeeping.c > >> > @@ -1269,6 +1269,8 @@ int get_device_system_crosststamp(int (*get_time_fn) > >> > > >> > do { > >> > seq = read_seqcount_begin(&tk_core.seq); > >> > + system_counterval.use_nsecs = false; > >> > + > >> > >> So if the argument is the local system_counterval structure isn't > >> being fully initialized by the get_time_fn() functions it is passed > >> to, it seems like it would be better to do so at the top of > >> get_device_system_crosststamp(), and not inside the seqloop. > > > > Probably, I was just afraid of the case where get_time_fn() would take > > like *very* different paths during different iterations. > > But that seems really unlikely, indeed. > > It's impossible. xtstamp->device and the related get_time_fn() are > immutable during the call. I don't see it being entirely impossible, just nonsensical. get_time_fn() tends to read volatile external data and is thus far from being a pure function. Some implementations (e.g. ptp_vmclock_get_time_fn()) can take some weird turns, which I did not want to follow all the way. I have no clue what could happen e.g. during a VM migration. That's why resetting the struct before every invocation seemed to be the safer option to me. If you are certain that something like that can never happen, then I'm totally happy. > > >> But having the responsibility to initialize/fill in the structure > >> being split across the core and the implementation logic (leaving some > >> of the fields as optional) feels prone to mistakes, so it makes me > >> wonder if those drivers implementing the get_time_fn() really ought to > >> fully fill out the structure, and thus the fix would be better done in > >> those drivers. > > > > Yes, they should. > > No, they should not. > > The data structure is instantiated in get_device_system_crosststamp() > and then handed in un-initialized to get_time_fn(), which is wrong to > begin with. Why? > > That means if the structure is ever expanded, then you'd have to fix up > all of the get_time_fn() implementations. As long as all get_time_fn() implementations adhere to the convention of always assigning an entire, fully-initialized struct and are always compiled against the same headers as the kernel, I don't see that necessity. But yes, those are obviously a lot of "if"s... > > Seriously? > > The obviously correct and future proof thing to do is: > > - struct system_counterval_t system_counterval; > + struct system_counterval_t system_counterval = { }; > > Which fixes the problem you discovered once and forever, no? Works for me, too. Reroll follows. > > Thanks, > > tglx Thanks a lot, Markus --
© 2016 - 2025 Red Hat, Inc.