x86/time: further improve TSC / CPU freq calibration accuracy

[PATCH] x86/time: further improve TSC / CPU freq calibration accuracy

Posted by Jan Beulich 4 years ago

Calibration logic assumes that the platform timer (HPET or ACPI PM
timer) and the TSC are read at about the same time. This assumption may
not hold when a long latency event (e.g. SMI or NMI) occurs between the
two reads. Reduce the risk of reading uncorrelated values by doing at
least four pairs of reads, using the tuple where the delta between the
enclosing TSC reads was smallest. From the fourth iteration onwards bail
if the new TSC delta isn't better (smaller) than the best earlier one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Obviously (I think) instead of having both read_{hpet,pmtmr}_and_tsc()
the calibration logic could be folded between HPET and PMTMR, at the
expense of a couple more indirect calls (which may not be that much of a
problem as this is all boot-time only). Really such folding would have
been possible already before, just that the amount of duplicate code
hasn't been this large so far. IOW if so desired, then perhaps the
folding should be a separate prereq patch.

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -392,9 +392,36 @@ static u64 read_hpet_count(void)
     return hpet_read32(HPET_COUNTER);
 }
 
+static uint32_t __init read_hpet_and_tsc(uint64_t *tsc)
+{
+    uint64_t tsc_prev = *tsc = rdtsc_ordered(), tsc_min = ~0;
+    uint32_t best = best;
+    unsigned int i;
+
+    for ( i = 0; ; ++i )
+    {
+        uint32_t hpet = hpet_read32(HPET_COUNTER);
+        uint64_t tsc_cur = rdtsc_ordered();
+        uint64_t tsc_delta = tsc_cur - tsc_prev;
+
+        if ( tsc_delta < tsc_min )
+        {
+            tsc_min = tsc_delta;
+            *tsc = tsc_cur;
+            best = hpet;
+        }
+        else if ( i > 2 )
+            break;
+
+        tsc_prev = tsc_cur;
+    }
+
+    return best;
+}
+
 static int64_t __init init_hpet(struct platform_timesource *pts)
 {
-    uint64_t hpet_rate, start;
+    uint64_t hpet_rate, start, end;
     uint32_t count, target, elapsed;
     /*
      * Allow HPET to be setup, but report a frequency of 0 so it's not selected
@@ -466,13 +493,13 @@ static int64_t __init init_hpet(struct p
 
     pts->frequency = hpet_rate;
 
-    count = hpet_read32(HPET_COUNTER);
-    start = rdtsc_ordered();
+    count = read_hpet_and_tsc(&start);
     target = CALIBRATE_VALUE(hpet_rate);
     while ( (elapsed = hpet_read32(HPET_COUNTER) - count) < target )
         continue;
+    elapsed = read_hpet_and_tsc(&end) - count;
 
-    return adjust_elapsed(rdtsc_ordered() - start, elapsed, target);
+    return adjust_elapsed(end - start, elapsed, target);
 }
 
 static void resume_hpet(struct platform_timesource *pts)
@@ -505,9 +532,36 @@ static u64 read_pmtimer_count(void)
     return inl(pmtmr_ioport);
 }
 
+static uint32_t __init read_pmtmr_and_tsc(uint64_t *tsc)
+{
+    uint64_t tsc_prev = *tsc = rdtsc_ordered(), tsc_min = ~0;
+    uint32_t best = best;
+    unsigned int i;
+
+    for ( i = 0; ; ++i )
+    {
+        uint32_t pmtmr = inl(pmtmr_ioport);
+        uint64_t tsc_cur = rdtsc_ordered();
+        uint64_t tsc_delta = tsc_cur - tsc_prev;
+
+        if ( tsc_delta < tsc_min )
+        {
+            tsc_min = tsc_delta;
+            *tsc = tsc_cur;
+            best = pmtmr;
+        }
+        else if ( i > 2 )
+            break;
+
+        tsc_prev = tsc_cur;
+    }
+
+    return best;
+}
+
 static s64 __init init_pmtimer(struct platform_timesource *pts)
 {
-    uint64_t start;
+    uint64_t start, end;
     uint32_t count, target, mask, elapsed;
 
     if ( !pmtmr_ioport || (pmtmr_width != 24 && pmtmr_width != 32) )
@@ -516,13 +570,13 @@ static s64 __init init_pmtimer(struct pl
     pts->counter_bits = pmtmr_width;
     mask = 0xffffffff >> (32 - pmtmr_width);
 
-    count = inl(pmtmr_ioport);
-    start = rdtsc_ordered();
+    count = read_pmtmr_and_tsc(&start);
     target = CALIBRATE_VALUE(ACPI_PM_FREQUENCY);
-    while ( (elapsed = (inl(pmtmr_ioport) - count) & mask) < target )
+    while ( ((inl(pmtmr_ioport) - count) & mask) < target )
         continue;
+    elapsed = read_pmtmr_and_tsc(&end) - count;
 
-    return adjust_elapsed(rdtsc_ordered() - start, elapsed, target);
+    return adjust_elapsed(end - start, elapsed, target);
 }
 
 static struct platform_timesource __initdata plt_pmtimer =

Re: [PATCH] x86/time: further improve TSC / CPU freq calibration accuracy

Posted by Jan Beulich 4 years ago

On 13.01.2022 14:41, Jan Beulich wrote:
> Calibration logic assumes that the platform timer (HPET or ACPI PM
> timer) and the TSC are read at about the same time. This assumption may
> not hold when a long latency event (e.g. SMI or NMI) occurs between the
> two reads. Reduce the risk of reading uncorrelated values by doing at
> least four pairs of reads, using the tuple where the delta between the
> enclosing TSC reads was smallest. From the fourth iteration onwards bail
> if the new TSC delta isn't better (smaller) than the best earlier one.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

When running virtualized, scheduling in the host would also constitute
long latency events. I wonder whether, to compensate for that, we'd want
more than 3 "base" iterations, as I would expect scheduling events to
occur more frequently than e.g. SMI (and with a higher probability of
multiple ones occurring in close succession).

Jan

Re: [PATCH] x86/time: further improve TSC / CPU freq calibration accuracy

Posted by David Vrabel 4 years ago


On 18/01/2022 08:50, Jan Beulich wrote:
> On 13.01.2022 14:41, Jan Beulich wrote:
>> Calibration logic assumes that the platform timer (HPET or ACPI PM
>> timer) and the TSC are read at about the same time. This assumption may
>> not hold when a long latency event (e.g. SMI or NMI) occurs between the
>> two reads. Reduce the risk of reading uncorrelated values by doing at
>> least four pairs of reads, using the tuple where the delta between the
>> enclosing TSC reads was smallest. From the fourth iteration onwards bail
>> if the new TSC delta isn't better (smaller) than the best earlier one.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> When running virtualized, scheduling in the host would also constitute
> long latency events. I wonder whether, to compensate for that, we'd want
> more than 3 "base" iterations, as I would expect scheduling events to
> occur more frequently than e.g. SMI (and with a higher probability of
> multiple ones occurring in close succession).

Should Xen be continually or periodically recalibrating? Rather than 
trying to get it perfect at the start of day?

You may also be able to find inspiration from the design or 
implementation of the Precision Time Protocol which has to similarly 
filter out outliers due to transmission delays.

David

Re: [PATCH] x86/time: further improve TSC / CPU freq calibration accuracy

Posted by Jan Beulich 4 years ago

On 18.01.2022 12:37, David Vrabel wrote:
> 
> 
> On 18/01/2022 08:50, Jan Beulich wrote:
>> On 13.01.2022 14:41, Jan Beulich wrote:
>>> Calibration logic assumes that the platform timer (HPET or ACPI PM
>>> timer) and the TSC are read at about the same time. This assumption may
>>> not hold when a long latency event (e.g. SMI or NMI) occurs between the
>>> two reads. Reduce the risk of reading uncorrelated values by doing at
>>> least four pairs of reads, using the tuple where the delta between the
>>> enclosing TSC reads was smallest. From the fourth iteration onwards bail
>>> if the new TSC delta isn't better (smaller) than the best earlier one.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> When running virtualized, scheduling in the host would also constitute
>> long latency events. I wonder whether, to compensate for that, we'd want
>> more than 3 "base" iterations, as I would expect scheduling events to
>> occur more frequently than e.g. SMI (and with a higher probability of
>> multiple ones occurring in close succession).
> 
> Should Xen be continually or periodically recalibrating? Rather than 
> trying to get it perfect at the start of day?

I wouldn't call dealing with bad samples "getting it perfect". IOW I
think recalibrating later may be an option, but independent of what
I'm doing here.

> You may also be able to find inspiration from the design or 
> implementation of the Precision Time Protocol which has to similarly 
> filter out outliers due to transmission delays.

Thanks for the pointer.

Jan