While the problem report was for extreme errors, even smaller ones would
better be avoided: The calculated period to run calibration loops over
can (and usually will) be shorter than the actual time elapsed between
first and last platform timer and TSC reads. Adjust values returned from
the init functions accordingly.
On a Skylake system I've tested this on accuracy (using HPET) went from
detecting in some cases more than 220kHz too high a value to about
±2kHz. On other systems (or on this system, but with PMTMR) the original
error range was much smaller, with less (in some cases only very little)
improvement.
Reported-by: James Dingwall <james-xen@dingwall.me.uk>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
There's still a time window for the issue to occur between the final
HPET/PMTMR read and the following TSC read. Improving this will be the
subject of yet another patch.
TBD: Accuracy could be slightly further improved by using a (to be
introduced) rounding variant of muldiv64().
TBD: I'm not entirely sure how useful the conditional is - there
shouldn't be any inaccuracy from the division when actual equals
target (upon entry to the conditional), as then the divisor is
what the original value was just multiplied by. And as per the
logic in the callers actual can't be smaller than target.
TBD: I'm also no longer sure that the helper function is warranted
anymore. It started out with more contents, but is now
effectively only the [conditional] muldiv64() invocation.
I'm afraid I don't see a way to deal with the same issue in init_pit().
In particular the (multiple) specs I have to hand don't make clear
whether the counter would continue counting after having reached zero.
Obviously it wouldn't help to check this on a few systems, as their
behavior could still be implementation specific.
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -287,6 +287,23 @@ static char *freq_string(u64 freq)
return s;
}
+static uint64_t adjust_elapsed(uint64_t elapsed, uint32_t actual,
+ uint32_t target)
+{
+ if ( likely(actual > target) )
+ {
+ /*
+ * A (perhaps significant) delay before the last timer read (e.g. due
+ * to a SMI or NMI) can lead to (perhaps severe) inaccuracy if not
+ * accounting for the time elapsed beyond the originally calculated
+ * duration of the calibration interval.
+ */
+ elapsed = muldiv64(elapsed, target, actual);
+ }
+
+ return elapsed * CALIBRATE_FRAC;
+}
+
/************************************************************
* PLATFORM TIMER 1: PROGRAMMABLE INTERVAL TIMER (LEGACY PIT)
*/
@@ -455,7 +472,7 @@ static int64_t __init init_hpet(struct p
while ( (elapsed = hpet_read32(HPET_COUNTER) - count) < target )
continue;
- return (rdtsc_ordered() - start) * CALIBRATE_FRAC;
+ return adjust_elapsed(rdtsc_ordered() - start, elapsed, target);
}
static void resume_hpet(struct platform_timesource *pts)
@@ -505,7 +522,7 @@ static s64 __init init_pmtimer(struct pl
while ( (elapsed = (inl(pmtmr_ioport) & mask) - count) < target )
continue;
- return (rdtsc_ordered() - start) * CALIBRATE_FRAC;
+ return adjust_elapsed(rdtsc_ordered() - start, elapsed, target);
}
static struct platform_timesource __initdata plt_pmtimer =
On Wed, Jan 12, 2022 at 09:56:12AM +0100, Jan Beulich wrote: > While the problem report was for extreme errors, even smaller ones would > better be avoided: The calculated period to run calibration loops over > can (and usually will) be shorter than the actual time elapsed between > first and last platform timer and TSC reads. Adjust values returned from > the init functions accordingly. > > On a Skylake system I've tested this on accuracy (using HPET) went from > detecting in some cases more than 220kHz too high a value to about > ±2kHz. On other systems (or on this system, but with PMTMR) the original > error range was much smaller, with less (in some cases only very little) > improvement. > > Reported-by: James Dingwall <james-xen@dingwall.me.uk> > Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> > --- > There's still a time window for the issue to occur between the final > HPET/PMTMR read and the following TSC read. Improving this will be the > subject of yet another patch. > > TBD: Accuracy could be slightly further improved by using a (to be > introduced) rounding variant of muldiv64(). I'm unsure we care that much about such fine grained accuracy here. > TBD: I'm not entirely sure how useful the conditional is - there > shouldn't be any inaccuracy from the division when actual equals > target (upon entry to the conditional), as then the divisor is > what the original value was just multiplied by. And as per the > logic in the callers actual can't be smaller than target. Right, it's just overhead to do the muldiv64 if target == actual. > TBD: I'm also no longer sure that the helper function is warranted > anymore. It started out with more contents, but is now > effectively only the [conditional] muldiv64() invocation. Don't have a strong opinion, I'm fine with the helper, or else I would likely request that the call to muldiv64 is not placed together with the return in order to avoid overly long lines. > > I'm afraid I don't see a way to deal with the same issue in init_pit(). > In particular the (multiple) specs I have to hand don't make clear > whether the counter would continue counting after having reached zero. > Obviously it wouldn't help to check this on a few systems, as their > behavior could still be implementation specific. We could likely set the counter to the maximum value it can hold and then perform reads in a loop (like it's done for HPET or the PM timers) and stop when start - target is reached. Not a great solution either. Thanks, Roger.
On 12.01.2022 11:53, Roger Pau Monné wrote: > On Wed, Jan 12, 2022 at 09:56:12AM +0100, Jan Beulich wrote: >> While the problem report was for extreme errors, even smaller ones would >> better be avoided: The calculated period to run calibration loops over >> can (and usually will) be shorter than the actual time elapsed between >> first and last platform timer and TSC reads. Adjust values returned from >> the init functions accordingly. >> >> On a Skylake system I've tested this on accuracy (using HPET) went from >> detecting in some cases more than 220kHz too high a value to about >> ±2kHz. On other systems (or on this system, but with PMTMR) the original >> error range was much smaller, with less (in some cases only very little) >> improvement. >> >> Reported-by: James Dingwall <james-xen@dingwall.me.uk> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Thanks. >> I'm afraid I don't see a way to deal with the same issue in init_pit(). >> In particular the (multiple) specs I have to hand don't make clear >> whether the counter would continue counting after having reached zero. >> Obviously it wouldn't help to check this on a few systems, as their >> behavior could still be implementation specific. > > We could likely set the counter to the maximum value it can hold > and then perform reads in a loop (like it's done for HPET or the PM > timers) and stop when start - target is reached. Not a great solution > either. Not the least because reading back the counter from the PIT requires multiple port operations, i.e. is overall quite a bit slower. Jan
On 12.01.2022 12:32, Jan Beulich wrote: > On 12.01.2022 11:53, Roger Pau Monné wrote: >> On Wed, Jan 12, 2022 at 09:56:12AM +0100, Jan Beulich wrote: >>> I'm afraid I don't see a way to deal with the same issue in init_pit(). >>> In particular the (multiple) specs I have to hand don't make clear >>> whether the counter would continue counting after having reached zero. >>> Obviously it wouldn't help to check this on a few systems, as their >>> behavior could still be implementation specific. >> >> We could likely set the counter to the maximum value it can hold >> and then perform reads in a loop (like it's done for HPET or the PM >> timers) and stop when start - target is reached. Not a great solution >> either. > > Not the least because reading back the counter from the PIT requires > multiple port operations, i.e. is overall quite a bit slower. What's worse - even is programmed to the maximum value (65536) this timer rolls over every 55ms; as said elsewhere SMIs have been observed to take significantly longer. I conclude that PIT simply cannot safely be used on platforms with such long lasting operations. As a further consequence I wonder whether we wouldn't better calibrate the APIC timer against the chosen platform timer rather than hardcoding this to use the PIT. Jan
© 2016 - 2026 Red Hat, Inc.