[PATCH RFC] x86/time: avoid early uses of NOW() to return zero

Jan Beulich posted 1 patch 3 weeks, 3 days ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/746ce9af-156b-4c16-8cc0-6e8d929107a0@suse.com
There is a newer version of this series
[PATCH RFC] x86/time: avoid early uses of NOW() to return zero
Posted by Jan Beulich 3 weeks, 3 days ago
Waiting loops like the one in flush_command_buffer() will degenerate to
infinite ones when used early enough for NOW() to still return constant
zero. Make sure the returned value at least monotonically increases.

Do this only in get_s_time(), as producing a sane value in
get_s_time_fixed() for non-zero inputs won't be reasonably possible.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC: This breaks at least the TSM_BOOT case printk_start_of_line(), which
     checks for NOW() returning 0 (falling back to TSM_RAW in this case).
     For now I have no idea how to avoid this, except that when CPUID leaf
     0x15 is available we could leverage that to put in place at least an
     approximate scale value. Doing so could, however, lead to a
     discontinuity (returned value moving backwards) once the final scale
     value was put in place. (Note, however, that such a discontinuity can
     also result from init_percpu_time() using the BSP's scale value as
     initial estimate for APs. Then again local_time_calibration() at
     least makes an attempt at avoiding such.)

RFC: While generally the mentioned waiting loops will take longer to time
     out, on a very fast CPU tight loops may time out too early.

RFC: In get_s_time_fixed(), should we perhaps assert that the scale was
     set?

I don't think Fixes: tags should be put here. If we did, we'd have to
enumerate all introductions of early uses of NOW() (or get_s_time()), with
the exception of those dealing with getting back 0 (which I expect is only
printk_start_of_line()).

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1668,6 +1668,20 @@ s_time_t get_s_time_fixed(u64 at_tsc)
 
 s_time_t get_s_time(void)
 {
+    /*
+     * Before the TSC scale is set, avoid returning constant 0 (or whatever
+     * this_cpu(cpu_time).stamp.local_stime is set to).  While the returned
+     * value is in no way representing time, it at least increases
+     * monotonically, thus avoiding e.g. waiting loops to degenerate to
+     * entirely infinite ones.
+     */
+    if ( unlikely(!this_cpu(cpu_time).tsc_scale.mul_frac) )
+    {
+        static s_time_t counter;
+
+        return arch_fetch_and_add(&counter, 1);
+    }
+
     return get_s_time_fixed(0);
 }
 

Re: [PATCH RFC] x86/time: avoid early uses of NOW() to return zero
Posted by Roger Pau Monné 3 weeks, 3 days ago
On Wed, May 06, 2026 at 11:37:41AM +0200, Jan Beulich wrote:
> Waiting loops like the one in flush_command_buffer() will degenerate to
> infinite ones when used early enough for NOW() to still return constant
> zero. Make sure the returned value at least monotonically increases.
> 
> Do this only in get_s_time(), as producing a sane value in
> get_s_time_fixed() for non-zero inputs won't be reasonably possible.
> 
> Reported-by: Roger Pau Monné <roger.pau@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC: This breaks at least the TSM_BOOT case printk_start_of_line(), which
>      checks for NOW() returning 0 (falling back to TSM_RAW in this case).
>      For now I have no idea how to avoid this, except that when CPUID leaf
>      0x15 is available we could leverage that to put in place at least an
>      approximate scale value. Doing so could, however, lead to a
>      discontinuity (returned value moving backwards) once the final scale
>      value was put in place. (Note, however, that such a discontinuity can
>      also result from init_percpu_time() using the BSP's scale value as
>      initial estimate for APs. Then again local_time_calibration() at
>      least makes an attempt at avoiding such.)

For the purposes of printk_start_of_line() we could unconditionally
use get_cycles() when system_state < SYS_STATE_smp_boot IMO.  Using
the frequency value from CPUID seems like a good approach also on
boxes that expose this information.

I wonder, we seem to unconditionally perform the TSC calibration
against a known frequency time source, wouldn't it be more reliable to
use the information from leaf 0x15 when available?

> 
> RFC: While generally the mentioned waiting loops will take longer to time
>      out, on a very fast CPU tight loops may time out too early.

I was wondering about that, increasing just a nano-second for each
call seems like it's going to make progress fairly slow?  Obviously
depends on how tights the calls to NOW() are in the outside loop.

Maybe when lacking frequency information from CPUID we could assume
something like 8GHz and scale the TSC based on that?  AFAICT it's
advisable to use a frequency greater than any CPU, as then we don't
risk NOW() running too fast.

> RFC: In get_s_time_fixed(), should we perhaps assert that the scale was
>      set?

Might be good, but I would like to see what explodes when doing
that...

> I don't think Fixes: tags should be put here. If we did, we'd have to
> enumerate all introductions of early uses of NOW() (or get_s_time()), with
> the exception of those dealing with getting back 0 (which I expect is only
> printk_start_of_line()).

I'm fine with no fixes tag, but we need to remember to backport this
one.

Thanks, Roger.

Re: [PATCH RFC] x86/time: avoid early uses of NOW() to return zero
Posted by Jan Beulich 3 weeks, 2 days ago
On 06.05.2026 12:11, Roger Pau Monné wrote:
> On Wed, May 06, 2026 at 11:37:41AM +0200, Jan Beulich wrote:
>> RFC: This breaks at least the TSM_BOOT case printk_start_of_line(), which
>>      checks for NOW() returning 0 (falling back to TSM_RAW in this case).
>>      For now I have no idea how to avoid this, except that when CPUID leaf
>>      0x15 is available we could leverage that to put in place at least an
>>      approximate scale value. Doing so could, however, lead to a
>>      discontinuity (returned value moving backwards) once the final scale
>>      value was put in place. (Note, however, that such a discontinuity can
>>      also result from init_percpu_time() using the BSP's scale value as
>>      initial estimate for APs. Then again local_time_calibration() at
>>      least makes an attempt at avoiding such.)
> 
> For the purposes of printk_start_of_line() we could unconditionally
> use get_cycles() when system_state < SYS_STATE_smp_boot IMO.

Hmm, "raw" console timestamps are quite a bit uglier to deal with as a
human. Also, while init_xen_time() is pretty close to us setting
SYS_STATE_smp_boot, early_time_init() occurs earlier (and with
init_percpu_time() also called from there that's enough for "good"
timestamps).

>  Using
> the frequency value from CPUID seems like a good approach also on
> boxes that expose this information.

As per what you suggest below, we may then need to increase that value
by some margin, to have NOW() rather move a little to slow than too
fast. Plus of course it won't help for AMD at all.

> I wonder, we seem to unconditionally perform the TSC calibration
> against a known frequency time source, wouldn't it be more reliable to
> use the information from leaf 0x15 when available?

Andrew has been suggesting this, but I can only keep saying that what
CPUID reports are nominal values aiui, not actual ones. From what I
know, there's always some (small) variation as to the frequency of
actual crystals. And it's unclear whether our calibration is more
precise than what CPUID tells us. (If we knew at least average errors,
we could maybe calculate the value to use from both the calculated and
the nominal value.)

>> RFC: While generally the mentioned waiting loops will take longer to time
>>      out, on a very fast CPU tight loops may time out too early.
> 
> I was wondering about that, increasing just a nano-second for each
> call seems like it's going to make progress fairly slow?  Obviously
> depends on how tights the calls to NOW() are in the outside loop.
> 
> Maybe when lacking frequency information from CPUID we could assume
> something like 8GHz and scale the TSC based on that?  AFAICT it's
> advisable to use a frequency greater than any CPU, as then we don't
> risk NOW() running too fast.

Whatever value we pick, something faster may later appear. And too high
a value isn't good either.

>> RFC: In get_s_time_fixed(), should we perhaps assert that the scale was
>>      set?
> 
> Might be good, but I would like to see what explodes when doing
> that...

Of course that would need checking first. I've audited the callers, and
all looked safe to me. Will do for v2.

>> I don't think Fixes: tags should be put here. If we did, we'd have to
>> enumerate all introductions of early uses of NOW() (or get_s_time()), with
>> the exception of those dealing with getting back 0 (which I expect is only
>> printk_start_of_line()).
> 
> I'm fine with no fixes tag, but we need to remember to backport this
> one.

Definitely.

Jan