x86/time: TSC calibration improvements

[PATCH 2/2] x86/time: improve TSC / CPU freq calibration accuracy

Posted by Jan Beulich 4 years ago

While the problem report was for extreme errors, even smaller ones would
better be avoided: The calculated period to run calibration loops over
can (and usually will) be shorter than the actual time elapsed between
first and last platform timer and TSC reads. Adjust values returned from
the init functions accordingly.

On a Skylake system I've tested this on accuracy (using HPET) went from
detecting in some cases more than 220kHz too high a value to about
±2kHz. On other systems (or on this system, but with PMTMR) the original
error range was much smaller, with less (in some cases only very little)
improvement.

Reported-by: James Dingwall <james-xen@dingwall.me.uk>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
There's still a time window for the issue to occur between the final
HPET/PMTMR read and the following TSC read. Improving this will be the
subject of yet another patch.

TBD: Accuracy could be slightly further improved by using a (to be
     introduced) rounding variant of muldiv64().
TBD: I'm not entirely sure how useful the conditional is - there
     shouldn't be any inaccuracy from the division when actual equals
     target (upon entry to the conditional), as then the divisor is
     what the original value was just multiplied by. And as per the
     logic in the callers actual can't be smaller than target.
TBD: I'm also no longer sure that the helper function is warranted
     anymore. It started out with more contents, but is now
     effectively only the [conditional] muldiv64() invocation.

I'm afraid I don't see a way to deal with the same issue in init_pit().
In particular the (multiple) specs I have to hand don't make clear
whether the counter would continue counting after having reached zero.
Obviously it wouldn't help to check this on a few systems, as their
behavior could still be implementation specific.

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -287,6 +287,23 @@ static char *freq_string(u64 freq)
     return s;
 }
 
+static uint64_t adjust_elapsed(uint64_t elapsed, uint32_t actual,
+                               uint32_t target)
+{
+    if ( likely(actual > target) )
+    {
+        /*
+         * A (perhaps significant) delay before the last timer read (e.g. due
+         * to a SMI or NMI) can lead to (perhaps severe) inaccuracy if not
+         * accounting for the time elapsed beyond the originally calculated
+         * duration of the calibration interval.
+         */
+        elapsed = muldiv64(elapsed, target, actual);
+    }
+
+    return elapsed * CALIBRATE_FRAC;
+}
+
 /************************************************************
  * PLATFORM TIMER 1: PROGRAMMABLE INTERVAL TIMER (LEGACY PIT)
  */
@@ -455,7 +472,7 @@ static int64_t __init init_hpet(struct p
     while ( (elapsed = hpet_read32(HPET_COUNTER) - count) < target )
         continue;
 
-    return (rdtsc_ordered() - start) * CALIBRATE_FRAC;
+    return adjust_elapsed(rdtsc_ordered() - start, elapsed, target);
 }
 
 static void resume_hpet(struct platform_timesource *pts)
@@ -505,7 +522,7 @@ static s64 __init init_pmtimer(struct pl
     while ( (elapsed = (inl(pmtmr_ioport) & mask) - count) < target )
         continue;
 
-    return (rdtsc_ordered() - start) * CALIBRATE_FRAC;
+    return adjust_elapsed(rdtsc_ordered() - start, elapsed, target);
 }
 
 static struct platform_timesource __initdata plt_pmtimer =

Re: [PATCH 2/2] x86/time: improve TSC / CPU freq calibration accuracy

Posted by Roger Pau Monné 4 years ago

On Wed, Jan 12, 2022 at 09:56:12AM +0100, Jan Beulich wrote:
> While the problem report was for extreme errors, even smaller ones would
> better be avoided: The calculated period to run calibration loops over
> can (and usually will) be shorter than the actual time elapsed between
> first and last platform timer and TSC reads. Adjust values returned from
> the init functions accordingly.
> 
> On a Skylake system I've tested this on accuracy (using HPET) went from
> detecting in some cases more than 220kHz too high a value to about
> ±2kHz. On other systems (or on this system, but with PMTMR) the original
> error range was much smaller, with less (in some cases only very little)
> improvement.
> 
> Reported-by: James Dingwall <james-xen@dingwall.me.uk>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> There's still a time window for the issue to occur between the final
> HPET/PMTMR read and the following TSC read. Improving this will be the
> subject of yet another patch.
> 
> TBD: Accuracy could be slightly further improved by using a (to be
>      introduced) rounding variant of muldiv64().

I'm unsure we care that much about such fine grained accuracy here.

> TBD: I'm not entirely sure how useful the conditional is - there
>      shouldn't be any inaccuracy from the division when actual equals
>      target (upon entry to the conditional), as then the divisor is
>      what the original value was just multiplied by. And as per the
>      logic in the callers actual can't be smaller than target.

Right, it's just overhead to do the muldiv64 if target == actual.

> TBD: I'm also no longer sure that the helper function is warranted
>      anymore. It started out with more contents, but is now
>      effectively only the [conditional] muldiv64() invocation.

Don't have a strong opinion, I'm fine with the helper, or else I would
likely request that the call to muldiv64 is not placed together with
the return in order to avoid overly long lines.

> 
> I'm afraid I don't see a way to deal with the same issue in init_pit().
> In particular the (multiple) specs I have to hand don't make clear
> whether the counter would continue counting after having reached zero.
> Obviously it wouldn't help to check this on a few systems, as their
> behavior could still be implementation specific.

We could likely set the counter to the maximum value it can hold
and then perform reads in a loop (like it's done for HPET or the PM
timers) and stop when start - target is reached. Not a great solution
either.

Thanks, Roger.

Re: [PATCH 2/2] x86/time: improve TSC / CPU freq calibration accuracy

Posted by Jan Beulich 4 years ago

On 12.01.2022 11:53, Roger Pau Monné wrote:
> On Wed, Jan 12, 2022 at 09:56:12AM +0100, Jan Beulich wrote:
>> While the problem report was for extreme errors, even smaller ones would
>> better be avoided: The calculated period to run calibration loops over
>> can (and usually will) be shorter than the actual time elapsed between
>> first and last platform timer and TSC reads. Adjust values returned from
>> the init functions accordingly.
>>
>> On a Skylake system I've tested this on accuracy (using HPET) went from
>> detecting in some cases more than 220kHz too high a value to about
>> ±2kHz. On other systems (or on this system, but with PMTMR) the original
>> error range was much smaller, with less (in some cases only very little)
>> improvement.
>>
>> Reported-by: James Dingwall <james-xen@dingwall.me.uk>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

>> I'm afraid I don't see a way to deal with the same issue in init_pit().
>> In particular the (multiple) specs I have to hand don't make clear
>> whether the counter would continue counting after having reached zero.
>> Obviously it wouldn't help to check this on a few systems, as their
>> behavior could still be implementation specific.
> 
> We could likely set the counter to the maximum value it can hold
> and then perform reads in a loop (like it's done for HPET or the PM
> timers) and stop when start - target is reached. Not a great solution
> either.

Not the least because reading back the counter from the PIT requires
multiple port operations, i.e. is overall quite a bit slower.

Jan

Re: [PATCH 2/2] x86/time: improve TSC / CPU freq calibration accuracy

Posted by Jan Beulich 4 years ago

On 12.01.2022 12:32, Jan Beulich wrote:
> On 12.01.2022 11:53, Roger Pau Monné wrote:
>> On Wed, Jan 12, 2022 at 09:56:12AM +0100, Jan Beulich wrote:
>>> I'm afraid I don't see a way to deal with the same issue in init_pit().
>>> In particular the (multiple) specs I have to hand don't make clear
>>> whether the counter would continue counting after having reached zero.
>>> Obviously it wouldn't help to check this on a few systems, as their
>>> behavior could still be implementation specific.
>>
>> We could likely set the counter to the maximum value it can hold
>> and then perform reads in a loop (like it's done for HPET or the PM
>> timers) and stop when start - target is reached. Not a great solution
>> either.
> 
> Not the least because reading back the counter from the PIT requires
> multiple port operations, i.e. is overall quite a bit slower.

What's worse - even is programmed to the maximum value (65536) this
timer rolls over every 55ms; as said elsewhere SMIs have been observed
to take significantly longer. I conclude that PIT simply cannot safely
be used on platforms with such long lasting operations. As a further
consequence I wonder whether we wouldn't better calibrate the APIC
timer against the chosen platform timer rather than hardcoding this to
use the PIT.

Jan