[v1] clock: Get rid of clock_get_ns()

[PATCH 1/4] clock: Introduce clock_ticks_to_ns()

Posted by Peter Maydell 5 years, 2 months ago

The clock_get_ns() API claims to return the period of a clock in
nanoseconds. Unfortunately since it returns an integer and a
clock's period is represented in units of 2^-32 nanoseconds,
the result is often an approximation, and calculating a clock
expiry deadline by multiplying clock_get_ns() by a number-of-ticks
is unacceptably inaccurate.

Introduce a new API clock_ticks_to_ns() which returns the number
of nanoseconds it takes the clock to make a given number of ticks.
This function can do the complete calculation internally and
will thus give a more accurate result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
The 64x64->128 multiply is a bit painful for 32-bit and I
guess in theory since we know we only want bits [95:32]
of the result we could special-case it, but TBH I don't
think 32-bit hosts merit much optimization effort these days.
---
 docs/devel/clocks.rst | 15 +++++++++++++++
 include/hw/clock.h    | 29 +++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/docs/devel/clocks.rst b/docs/devel/clocks.rst
index e5da28e2111..aebeedbb95e 100644
--- a/docs/devel/clocks.rst
+++ b/docs/devel/clocks.rst
@@ -258,6 +258,21 @@ Here is an example:
                         clock_get_ns(dev->my_clk_input));
     }
 
+Calculating expiry deadlines
+----------------------------
+
+A commonly required operation for a clock is to calculate how long
+it will take for the clock to tick N times; this can then be used
+to set a timer expiry deadline. Use the function ``clock_ticks_to_ns()``,
+which takes an unsigned 64-bit count of ticks and returns the length
+of time in nanoseconds required for the clock to tick that many times.
+
+It is important not to try to calculate expiry deadlines using a
+shortcut like multiplying a "period of clock in nanoseconds" value
+by the tick count, because clocks can have periods which are not a
+whole number of nanoseconds, and the accumulated error in the
+multiplication can be significant.
+
 Changing a clock period
 -----------------------
 
diff --git a/include/hw/clock.h b/include/hw/clock.h
index 81bcf3e505a..a9425d9fb14 100644
--- a/include/hw/clock.h
+++ b/include/hw/clock.h
@@ -16,6 +16,7 @@
 
 #include "qom/object.h"
 #include "qemu/queue.h"
+#include "qemu/host-utils.h"
 
 #define TYPE_CLOCK "clock"
 OBJECT_DECLARE_SIMPLE_TYPE(Clock, CLOCK)
@@ -218,6 +219,34 @@ static inline unsigned clock_get_ns(Clock *clk)
     return CLOCK_PERIOD_TO_NS(clock_get(clk));
 }
 
+/**
+ * clock_ticks_to_ns:
+ * @clk: the clock to query
+ * @ticks: number of ticks
+ *
+ * Returns the length of time in nanoseconds for this clock
+ * to tick @ticks times. Because a clock can have a period
+ * which is not a whole number of nanoseconds, it is important
+ * to use this function when calculating things like timer
+ * expiry deadlines, rather than attempting to obtain a "period
+ * in nanoseconds" value and then multiplying that by a number
+ * of ticks.
+ */
+static inline uint64_t clock_ticks_to_ns(const Clock *clk, uint64_t ticks)
+{
+    uint64_t ns_low, ns_high;
+
+    /*
+     * clk->period is the period in units of 2^-32 ns, so
+     * (clk->period * ticks) is the required length of time in those
+     * units, and we can convert to nanoseconds by multiplying by
+     * 2^32, which is the same as shifting the 128-bit multiplication
+     * result right by 32.
+     */
+    mulu64(&ns_low, &ns_high, clk->period, ticks);
+    return ns_low >> 32 | ns_high << 32;
+}
+
 /**
  * clock_is_enabled:
  * @clk: a clock
-- 
2.20.1

Re: [PATCH 1/4] clock: Introduce clock_ticks_to_ns()

Posted by Richard Henderson 5 years, 2 months ago

On 12/8/20 12:15 PM, Peter Maydell wrote:
> The clock_get_ns() API claims to return the period of a clock in
> nanoseconds. Unfortunately since it returns an integer and a
> clock's period is represented in units of 2^-32 nanoseconds,
> the result is often an approximation, and calculating a clock
> expiry deadline by multiplying clock_get_ns() by a number-of-ticks
> is unacceptably inaccurate.
> 
> Introduce a new API clock_ticks_to_ns() which returns the number
> of nanoseconds it takes the clock to make a given number of ticks.
> This function can do the complete calculation internally and
> will thus give a more accurate result.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> The 64x64->128 multiply is a bit painful for 32-bit and I
> guess in theory since we know we only want bits [95:32]
> of the result we could special-case it, but TBH I don't
> think 32-bit hosts merit much optimization effort these days.
> ---
>  docs/devel/clocks.rst | 15 +++++++++++++++
>  include/hw/clock.h    | 29 +++++++++++++++++++++++++++++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/docs/devel/clocks.rst b/docs/devel/clocks.rst
> index e5da28e2111..aebeedbb95e 100644
> --- a/docs/devel/clocks.rst
> +++ b/docs/devel/clocks.rst
> @@ -258,6 +258,21 @@ Here is an example:
>                          clock_get_ns(dev->my_clk_input));
>      }
>  
> +Calculating expiry deadlines
> +----------------------------
> +
> +A commonly required operation for a clock is to calculate how long
> +it will take for the clock to tick N times; this can then be used
> +to set a timer expiry deadline. Use the function ``clock_ticks_to_ns()``,
> +which takes an unsigned 64-bit count of ticks and returns the length
> +of time in nanoseconds required for the clock to tick that many times.
> +
> +It is important not to try to calculate expiry deadlines using a
> +shortcut like multiplying a "period of clock in nanoseconds" value
> +by the tick count, because clocks can have periods which are not a
> +whole number of nanoseconds, and the accumulated error in the
> +multiplication can be significant.
> +
>  Changing a clock period
>  -----------------------
>  
> diff --git a/include/hw/clock.h b/include/hw/clock.h
> index 81bcf3e505a..a9425d9fb14 100644
> --- a/include/hw/clock.h
> +++ b/include/hw/clock.h
> @@ -16,6 +16,7 @@
>  
>  #include "qom/object.h"
>  #include "qemu/queue.h"
> +#include "qemu/host-utils.h"
>  
>  #define TYPE_CLOCK "clock"
>  OBJECT_DECLARE_SIMPLE_TYPE(Clock, CLOCK)
> @@ -218,6 +219,34 @@ static inline unsigned clock_get_ns(Clock *clk)
>      return CLOCK_PERIOD_TO_NS(clock_get(clk));
>  }
>  
> +/**
> + * clock_ticks_to_ns:
> + * @clk: the clock to query
> + * @ticks: number of ticks
> + *
> + * Returns the length of time in nanoseconds for this clock
> + * to tick @ticks times. Because a clock can have a period
> + * which is not a whole number of nanoseconds, it is important
> + * to use this function when calculating things like timer
> + * expiry deadlines, rather than attempting to obtain a "period
> + * in nanoseconds" value and then multiplying that by a number
> + * of ticks.
> + */
> +static inline uint64_t clock_ticks_to_ns(const Clock *clk, uint64_t ticks)
> +{
> +    uint64_t ns_low, ns_high;
> +
> +    /*
> +     * clk->period is the period in units of 2^-32 ns, so
> +     * (clk->period * ticks) is the required length of time in those
> +     * units, and we can convert to nanoseconds by multiplying by
> +     * 2^32, which is the same as shifting the 128-bit multiplication
> +     * result right by 32.
> +     */
> +    mulu64(&ns_low, &ns_high, clk->period, ticks);
> +    return ns_low >> 32 | ns_high << 32;

With the shift, you're discarding the high 32 bits of the result.  You'll lose
those same bits if you shift one of the inputs left by 32, and use only the
high part of the result, e.g.

    mulu(&discard, &ret, clk->period, ticks << 32);
    return ret;

Which on some hosts, e.g. aarch64, only requires umulh and not two multiply
instructions.

Either way, I wonder if you want to either use uint32_t ticks, or assert that
ticks <= UINT32_MAX?  Or if you don't shift ticks, assert that ns_high <=
UINT32_MAX, so that you don't lose output bits?


r~

Re: [PATCH 1/4] clock: Introduce clock_ticks_to_ns()

Posted by Luc Michel 5 years, 2 months ago

On 12/9/20 12:39 AM, Richard Henderson wrote:
> On 12/8/20 12:15 PM, Peter Maydell wrote:
>> The clock_get_ns() API claims to return the period of a clock in
>> nanoseconds. Unfortunately since it returns an integer and a
>> clock's period is represented in units of 2^-32 nanoseconds,
>> the result is often an approximation, and calculating a clock
>> expiry deadline by multiplying clock_get_ns() by a number-of-ticks
>> is unacceptably inaccurate.
>>
>> Introduce a new API clock_ticks_to_ns() which returns the number
>> of nanoseconds it takes the clock to make a given number of ticks.
>> This function can do the complete calculation internally and
>> will thus give a more accurate result.
>>
>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>> ---
>> The 64x64->128 multiply is a bit painful for 32-bit and I
>> guess in theory since we know we only want bits [95:32]
>> of the result we could special-case it, but TBH I don't
>> think 32-bit hosts merit much optimization effort these days.
>> ---
>>   docs/devel/clocks.rst | 15 +++++++++++++++
>>   include/hw/clock.h    | 29 +++++++++++++++++++++++++++++
>>   2 files changed, 44 insertions(+)
>>
>> diff --git a/docs/devel/clocks.rst b/docs/devel/clocks.rst
>> index e5da28e2111..aebeedbb95e 100644
>> --- a/docs/devel/clocks.rst
>> +++ b/docs/devel/clocks.rst
>> @@ -258,6 +258,21 @@ Here is an example:
>>                           clock_get_ns(dev->my_clk_input));
>>       }
>>   
>> +Calculating expiry deadlines
>> +----------------------------
>> +
>> +A commonly required operation for a clock is to calculate how long
>> +it will take for the clock to tick N times; this can then be used
>> +to set a timer expiry deadline. Use the function ``clock_ticks_to_ns()``,
>> +which takes an unsigned 64-bit count of ticks and returns the length
>> +of time in nanoseconds required for the clock to tick that many times.
>> +
>> +It is important not to try to calculate expiry deadlines using a
>> +shortcut like multiplying a "period of clock in nanoseconds" value
>> +by the tick count, because clocks can have periods which are not a
>> +whole number of nanoseconds, and the accumulated error in the
>> +multiplication can be significant.
>> +
>>   Changing a clock period
>>   -----------------------
>>   
>> diff --git a/include/hw/clock.h b/include/hw/clock.h
>> index 81bcf3e505a..a9425d9fb14 100644
>> --- a/include/hw/clock.h
>> +++ b/include/hw/clock.h
>> @@ -16,6 +16,7 @@
>>   
>>   #include "qom/object.h"
>>   #include "qemu/queue.h"
>> +#include "qemu/host-utils.h"
>>   
>>   #define TYPE_CLOCK "clock"
>>   OBJECT_DECLARE_SIMPLE_TYPE(Clock, CLOCK)
>> @@ -218,6 +219,34 @@ static inline unsigned clock_get_ns(Clock *clk)
>>       return CLOCK_PERIOD_TO_NS(clock_get(clk));
>>   }
>>   
>> +/**
>> + * clock_ticks_to_ns:
>> + * @clk: the clock to query
>> + * @ticks: number of ticks
>> + *
>> + * Returns the length of time in nanoseconds for this clock
>> + * to tick @ticks times. Because a clock can have a period
>> + * which is not a whole number of nanoseconds, it is important
>> + * to use this function when calculating things like timer
>> + * expiry deadlines, rather than attempting to obtain a "period
>> + * in nanoseconds" value and then multiplying that by a number
>> + * of ticks.
>> + */
>> +static inline uint64_t clock_ticks_to_ns(const Clock *clk, uint64_t ticks)
>> +{
>> +    uint64_t ns_low, ns_high;
>> +
>> +    /*
>> +     * clk->period is the period in units of 2^-32 ns, so
>> +     * (clk->period * ticks) is the required length of time in those
>> +     * units, and we can convert to nanoseconds by multiplying by
>> +     * 2^32, which is the same as shifting the 128-bit multiplication
>> +     * result right by 32.
>> +     */
>> +    mulu64(&ns_low, &ns_high, clk->period, ticks);
>> +    return ns_low >> 32 | ns_high << 32;
> 
> With the shift, you're discarding the high 32 bits of the result.  You'll lose
> those same bits if you shift one of the inputs left by 32, and use only the
> high part of the result, e.g.
> 
>      mulu(&discard, &ret, clk->period, ticks << 32);
>      return ret;
> 
> Which on some hosts, e.g. aarch64, only requires umulh and not two multiply
> instructions.
> 
> Either way, I wonder if you want to either use uint32_t ticks, or assert that
> ticks <= UINT32_MAX?  Or if you don't shift ticks, assert that ns_high <=
> UINT32_MAX, so that you don't lose output bits?

If I'm not mistaken, loosing bits in the 32 bits upper part would mean 
that the number of ticks correspond to a period greater or equal to:
   2^96 ns ~= 251230855258 years.
So I guess this case is not that relevant anyways. Maybe asserting here 
would help the developer using this function to catch a bug in her/his code.

> 
> 
> r~
>

Re: [PATCH 1/4] clock: Introduce clock_ticks_to_ns()

Posted by Richard Henderson 5 years, 2 months ago

On 12/9/20 2:49 AM, Luc Michel wrote:
> On 12/9/20 12:39 AM, Richard Henderson wrote:
>> On 12/8/20 12:15 PM, Peter Maydell wrote:
>>> The clock_get_ns() API claims to return the period of a clock in
>>> nanoseconds. Unfortunately since it returns an integer and a
>>> clock's period is represented in units of 2^-32 nanoseconds,
>>> the result is often an approximation, and calculating a clock
>>> expiry deadline by multiplying clock_get_ns() by a number-of-ticks
>>> is unacceptably inaccurate.
>>>
>>> Introduce a new API clock_ticks_to_ns() which returns the number
>>> of nanoseconds it takes the clock to make a given number of ticks.
>>> This function can do the complete calculation internally and
>>> will thus give a more accurate result.
>>>
>>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>>> ---
>>> The 64x64->128 multiply is a bit painful for 32-bit and I
>>> guess in theory since we know we only want bits [95:32]
>>> of the result we could special-case it, but TBH I don't
>>> think 32-bit hosts merit much optimization effort these days.
>>> ---
>>>   docs/devel/clocks.rst | 15 +++++++++++++++
>>>   include/hw/clock.h    | 29 +++++++++++++++++++++++++++++
>>>   2 files changed, 44 insertions(+)
>>>
>>> diff --git a/docs/devel/clocks.rst b/docs/devel/clocks.rst
>>> index e5da28e2111..aebeedbb95e 100644
>>> --- a/docs/devel/clocks.rst
>>> +++ b/docs/devel/clocks.rst
>>> @@ -258,6 +258,21 @@ Here is an example:
>>>                           clock_get_ns(dev->my_clk_input));
>>>       }
>>>   +Calculating expiry deadlines
>>> +----------------------------
>>> +
>>> +A commonly required operation for a clock is to calculate how long
>>> +it will take for the clock to tick N times; this can then be used
>>> +to set a timer expiry deadline. Use the function ``clock_ticks_to_ns()``,
>>> +which takes an unsigned 64-bit count of ticks and returns the length
>>> +of time in nanoseconds required for the clock to tick that many times.
>>> +
>>> +It is important not to try to calculate expiry deadlines using a
>>> +shortcut like multiplying a "period of clock in nanoseconds" value
>>> +by the tick count, because clocks can have periods which are not a
>>> +whole number of nanoseconds, and the accumulated error in the
>>> +multiplication can be significant.
>>> +
>>>   Changing a clock period
>>>   -----------------------
>>>   diff --git a/include/hw/clock.h b/include/hw/clock.h
>>> index 81bcf3e505a..a9425d9fb14 100644
>>> --- a/include/hw/clock.h
>>> +++ b/include/hw/clock.h
>>> @@ -16,6 +16,7 @@
>>>     #include "qom/object.h"
>>>   #include "qemu/queue.h"
>>> +#include "qemu/host-utils.h"
>>>     #define TYPE_CLOCK "clock"
>>>   OBJECT_DECLARE_SIMPLE_TYPE(Clock, CLOCK)
>>> @@ -218,6 +219,34 @@ static inline unsigned clock_get_ns(Clock *clk)
>>>       return CLOCK_PERIOD_TO_NS(clock_get(clk));
>>>   }
>>>   +/**
>>> + * clock_ticks_to_ns:
>>> + * @clk: the clock to query
>>> + * @ticks: number of ticks
>>> + *
>>> + * Returns the length of time in nanoseconds for this clock
>>> + * to tick @ticks times. Because a clock can have a period
>>> + * which is not a whole number of nanoseconds, it is important
>>> + * to use this function when calculating things like timer
>>> + * expiry deadlines, rather than attempting to obtain a "period
>>> + * in nanoseconds" value and then multiplying that by a number
>>> + * of ticks.
>>> + */
>>> +static inline uint64_t clock_ticks_to_ns(const Clock *clk, uint64_t ticks)
>>> +{
>>> +    uint64_t ns_low, ns_high;
>>> +
>>> +    /*
>>> +     * clk->period is the period in units of 2^-32 ns, so
>>> +     * (clk->period * ticks) is the required length of time in those
>>> +     * units, and we can convert to nanoseconds by multiplying by
>>> +     * 2^32, which is the same as shifting the 128-bit multiplication
>>> +     * result right by 32.
>>> +     */
>>> +    mulu64(&ns_low, &ns_high, clk->period, ticks);
>>> +    return ns_low >> 32 | ns_high << 32;
>>
>> With the shift, you're discarding the high 32 bits of the result.  You'll lose
>> those same bits if you shift one of the inputs left by 32, and use only the
>> high part of the result, e.g.
>>
>>      mulu(&discard, &ret, clk->period, ticks << 32);
>>      return ret;
>>
>> Which on some hosts, e.g. aarch64, only requires umulh and not two multiply
>> instructions.
>>
>> Either way, I wonder if you want to either use uint32_t ticks, or assert that
>> ticks <= UINT32_MAX?  Or if you don't shift ticks, assert that ns_high <=
>> UINT32_MAX, so that you don't lose output bits?
> 
> If I'm not mistaken, loosing bits in the 32 bits upper part would mean that the
> number of ticks correspond to a period greater or equal to:
>   2^96 ns ~= 251230855258 years.

No, would be the bit above above ns_high (this integer 64x64->128
multiplication is computing fixed point 64.0 x 32.32 -> 96.32).

This function is truncating back to 64.0, dropping the 32 high bits and 32 low
bits.  We lose bits at 2^64 ns ~= 584 years.  Which is still unreasonably long,
but could still be had from a timer setting ~= never.

An alternate to an assert could be saturation.  Input "infinity", return
"infinity".  More or less.


r~

Re: [PATCH 1/4] clock: Introduce clock_ticks_to_ns()

Posted by Peter Maydell 5 years, 2 months ago

On Wed, 9 Dec 2020 at 14:11, Richard Henderson
<richard.henderson@linaro.org> wrote:
> This function is truncating back to 64.0, dropping the 32 high bits and 32 low
> bits.  We lose bits at 2^64 ns ~= 584 years.  Which is still unreasonably long,
> but could still be had from a timer setting ~= never.
>
> An alternate to an assert could be saturation.  Input "infinity", return
> "infinity".  More or less.

Might be an idea. We have never really properly nailed down what
QEMU's simulation of time does when it hits INT64_MAX nanoseconds
(which is the furthest forward absolute time you can set a QEMUTimer).

In particular if you use the icount sleep=on option it is actually
possible for the simulation to get there (set a far-future timer,
no other interrupts or simulation input, do a sleep-til-next-interrupt)
and I have no idea what QEMU should do at that point (print "Welcome
to the end of the universe" and exit?).

FWIW, the reason I made this API take a 64-bit tick count and return
a 64-bit nanosecond count is because we do have timer devices where
the tick count is 64 bits (eg the Arm Generic Timers in the CPU)
and the QEMUTimer APIs all want "expiry date in nanoseconds as a
signed 64-bit value".

thanks
-- PMM

Re: [PATCH 1/4] clock: Introduce clock_ticks_to_ns()

Posted by Richard Henderson 5 years, 2 months ago

On 12/9/20 8:26 AM, Peter Maydell wrote:
> (print "Welcome to the end of the universe" and exit?)

Sounds appropriately Hitchhiker's Guide.  ;-)


r~

Re: [PATCH 1/4] clock: Introduce clock_ticks_to_ns()

Posted by Peter Maydell 5 years, 2 months ago

On Tue, 8 Dec 2020 at 23:39, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 12/8/20 12:15 PM, Peter Maydell wrote:
> > +static inline uint64_t clock_ticks_to_ns(const Clock *clk, uint64_t ticks)
> > +{
> > +    uint64_t ns_low, ns_high;
> > +
> > +    /*
> > +     * clk->period is the period in units of 2^-32 ns, so
> > +     * (clk->period * ticks) is the required length of time in those
> > +     * units, and we can convert to nanoseconds by multiplying by
> > +     * 2^32, which is the same as shifting the 128-bit multiplication
> > +     * result right by 32.
> > +     */
> > +    mulu64(&ns_low, &ns_high, clk->period, ticks);
> > +    return ns_low >> 32 | ns_high << 32;
>
> With the shift, you're discarding the high 32 bits of the result.  You'll lose
> those same bits if you shift one of the inputs left by 32, and use only the
> high part of the result, e.g.
>
>     mulu(&discard, &ret, clk->period, ticks << 32);
>     return ret;
>
> Which on some hosts, e.g. aarch64, only requires umulh and not two multiply
> instructions.

We can't do this if we want to allow a full 64-bit 'ticks' input, right?

> Either way, I wonder if you want to either use uint32_t ticks, or assert that
> ticks <= UINT32_MAX?  Or if you don't shift ticks, assert that ns_high <=
> UINT32_MAX, so that you don't lose output bits?

So I think my plan for v2 of this series is just to add in the
saturation-to-INT64_MAX logic.

thanks
-- PMM

Re: [PATCH 1/4] clock: Introduce clock_ticks_to_ns()

Posted by Richard Henderson 5 years, 2 months ago

On 12/10/20 2:47 PM, Peter Maydell wrote:
>> With the shift, you're discarding the high 32 bits of the result.  You'll lose
>> those same bits if you shift one of the inputs left by 32, and use only the
>> high part of the result, e.g.
>>
>>     mulu(&discard, &ret, clk->period, ticks << 32);
>>     return ret;
>>
>> Which on some hosts, e.g. aarch64, only requires umulh and not two multiply
>> instructions.
> 
> We can't do this if we want to allow a full 64-bit 'ticks' input, right?

Correct.

> So I think my plan for v2 of this series is just to add in the
> saturation-to-INT64_MAX logic.

Sounds good.

r~