hw/ptimer: Assert next_event is newer than last_event

[PATCH] hw/ptimer: Assert next_event is newer than last_event

Posted by Philippe Mathieu-Daudé 6 years, 4 months ago

If the period is too big, the 'delta * period' product result
might overflow, resulting in a negative number, then the
next_event ends before the last_event. This is buggy, as there
is no forward progress. Assert this can not happen.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 hw/core/ptimer.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c
index d58e2dfdb0..88085d4c81 100644
--- a/hw/core/ptimer.c
+++ b/hw/core/ptimer.c
@@ -125,6 +125,9 @@ static void ptimer_reload(ptimer_state *s, int delta_adjust)
 
     s->last_event = s->next_event;
     s->next_event = s->last_event + delta * period;
+    /* Verify forward progress */
+    g_assert(s->next_event > s->last_event);
+
     if (period_frac) {
         s->next_event += ((int64_t)period_frac * delta) >> 32;
     }
-- 
2.20.1

Re: [PATCH] hw/ptimer: Assert next_event is newer than last_event

Posted by Peter Maydell 6 years, 4 months ago

On Sat, 21 Sep 2019 at 11:17, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>
> If the period is too big, the 'delta * period' product result
> might overflow, resulting in a negative number, then the
> next_event ends before the last_event. This is buggy, as there
> is no forward progress. Assert this can not happen.
>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  hw/core/ptimer.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c
> index d58e2dfdb0..88085d4c81 100644
> --- a/hw/core/ptimer.c
> +++ b/hw/core/ptimer.c
> @@ -125,6 +125,9 @@ static void ptimer_reload(ptimer_state *s, int delta_adjust)
>
>      s->last_event = s->next_event;
>      s->next_event = s->last_event + delta * period;
> +    /* Verify forward progress */
> +    g_assert(s->next_event > s->last_event);
> +
>      if (period_frac) {
>          s->next_event += ((int64_t)period_frac * delta) >> 32;
>      }
> --

Can this only happen if a QEMU timer model using the ptimer
code has a bug, or is it guest-triggerable for some of our
timer models?

thanks
-- PMM

Re: [PATCH] hw/ptimer: Assert next_event is newer than last_event

Posted by Philippe Mathieu-Daudé 6 years, 4 months ago

On 9/23/19 4:40 PM, Peter Maydell wrote:
> On Sat, 21 Sep 2019 at 11:17, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>>
>> If the period is too big, the 'delta * period' product result
>> might overflow, resulting in a negative number, then the
>> next_event ends before the last_event. This is buggy, as there
>> is no forward progress. Assert this can not happen.
>>
>> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>> ---
>>  hw/core/ptimer.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c
>> index d58e2dfdb0..88085d4c81 100644
>> --- a/hw/core/ptimer.c
>> +++ b/hw/core/ptimer.c
>> @@ -125,6 +125,9 @@ static void ptimer_reload(ptimer_state *s, int delta_adjust)
>>
>>      s->last_event = s->next_event;
>>      s->next_event = s->last_event + delta * period;
>> +    /* Verify forward progress */
>> +    g_assert(s->next_event > s->last_event);
>> +
>>      if (period_frac) {
>>          s->next_event += ((int64_t)period_frac * delta) >> 32;
>>      }
>> --
> 
> Can this only happen if a QEMU timer model using the ptimer
> code has a bug, or is it guest-triggerable for some of our
> timer models?

I hit this running a raspi4 guest, I had incorrectly initialized a clock
using the core cpu frequency, while I had to use the APB one (in my
case, core_cpu_freq / 2). The guest use a high value to configure a slow
timer, which in my buggy case made QEMU hang in hard way to debug.

So yes, it seems guest-triggerable if the implementation is broken.
Using assert() is OK for broken implementation, right?
Or should we audit all ptimer calls?

Regards,

Phil.

Re: [PATCH] hw/ptimer: Assert next_event is newer than last_event

Posted by Peter Maydell 6 years, 4 months ago

On Mon, 23 Sep 2019 at 15:54, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>
> On 9/23/19 4:40 PM, Peter Maydell wrote:
> > On Sat, 21 Sep 2019 at 11:17, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> >>
> >> If the period is too big, the 'delta * period' product result
> >> might overflow, resulting in a negative number, then the
> >> next_event ends before the last_event. This is buggy, as there
> >> is no forward progress. Assert this can not happen.

> > Can this only happen if a QEMU timer model using the ptimer
> > code has a bug, or is it guest-triggerable for some of our
> > timer models?
>
> I hit this running a raspi4 guest, I had incorrectly initialized a clock
> using the core cpu frequency, while I had to use the APB one (in my
> case, core_cpu_freq / 2). The guest use a high value to configure a slow
> timer, which in my buggy case made QEMU hang in hard way to debug.
>
> So yes, it seems guest-triggerable if the implementation is broken.
> Using assert() is OK for broken implementation, right?

Yeah, if this can only happen if QEMU code is broken then
an assert is OK. I was just trying to find out what the
cause was, since "this is buggy" isn't specific about where
the bug is.

> Or should we audit all ptimer calls?

I don't think we specifically need an audit. We could perhaps
expand the comment by the assert to specifically say that if
the calculation of the next event overflowed then this indicates
a bug in the QEMU device model using the ptimer API, so if
somebody else runs into the assert they have a hint about
what to look at. (An overflowed next_event indicates a time
incredibly far in the future, given that it's a nanosecond
time in an int64_t.)

The other approach I thought of would be to make the ptimer
code handle this sort of after-the-end-of-QEMU-universe time
by saturating next_event to INT64_MAX rather than letting it
overflow and wrap. Unfortunately while this would be fine for
the 'timer event' part of the code, it would break
ptimer_get_count() which calculates the current counter
value by looking at the difference between the current
time and the time of the next event (fixable but only with
a bunch of messing about to treat a next_event of INT64_MAX
as equivalent to the counter being disabled and tracking
the counter value in s->delta). So an assert is the
best thing I think.

thanks
-- PMM

Re: [PATCH] hw/ptimer: Assert next_event is newer than last_event

Posted by Philippe Mathieu-Daudé 6 years, 4 months ago

On 9/23/19 5:08 PM, Peter Maydell wrote:
> On Mon, 23 Sep 2019 at 15:54, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>>
>> On 9/23/19 4:40 PM, Peter Maydell wrote:
>>> On Sat, 21 Sep 2019 at 11:17, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>>>>
>>>> If the period is too big, the 'delta * period' product result
>>>> might overflow, resulting in a negative number, then the
>>>> next_event ends before the last_event. This is buggy, as there
>>>> is no forward progress. Assert this can not happen.
> 
>>> Can this only happen if a QEMU timer model using the ptimer
>>> code has a bug, or is it guest-triggerable for some of our
>>> timer models?
>>
>> I hit this running a raspi4 guest, I had incorrectly initialized a clock
>> using the core cpu frequency, while I had to use the APB one (in my
>> case, core_cpu_freq / 2). The guest use a high value to configure a slow
>> timer, which in my buggy case made QEMU hang in hard way to debug.
>>
>> So yes, it seems guest-triggerable if the implementation is broken.
>> Using assert() is OK for broken implementation, right?
> 
> Yeah, if this can only happen if QEMU code is broken then
> an assert is OK. I was just trying to find out what the
> cause was, since "this is buggy" isn't specific about where
> the bug is.
> 
>> Or should we audit all ptimer calls?
> 
> I don't think we specifically need an audit. We could perhaps
> expand the comment by the assert to specifically say that if
> the calculation of the next event overflowed then this indicates
> a bug in the QEMU device model using the ptimer API, so if
> somebody else runs into the assert they have a hint about
> what to look at. (An overflowed next_event indicates a time
> incredibly far in the future, given that it's a nanosecond
> time in an int64_t.)

OK I'll improve the comment.

> The other approach I thought of would be to make the ptimer
> code handle this sort of after-the-end-of-QEMU-universe time
> by saturating next_event to INT64_MAX rather than letting it
> overflow and wrap. Unfortunately while this would be fine for
> the 'timer event' part of the code, it would break
> ptimer_get_count() which calculates the current counter
> value by looking at the difference between the current
> time and the time of the next event (fixable but only with
> a bunch of messing about to treat a next_event of INT64_MAX
> as equivalent to the counter being disabled and tracking
> the counter value in s->delta). So an assert is the
> best thing I think.

Yes :/

Thanks!

Phil.