If the period is too big, the 'delta * period' product result
might overflow, resulting in a negative number, then the
next_event ends before the last_event. This is buggy, as there
is no forward progress. Assert this can not happen.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
hw/core/ptimer.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c
index d58e2dfdb0..88085d4c81 100644
--- a/hw/core/ptimer.c
+++ b/hw/core/ptimer.c
@@ -125,6 +125,9 @@ static void ptimer_reload(ptimer_state *s, int delta_adjust)
s->last_event = s->next_event;
s->next_event = s->last_event + delta * period;
+ /* Verify forward progress */
+ g_assert(s->next_event > s->last_event);
+
if (period_frac) {
s->next_event += ((int64_t)period_frac * delta) >> 32;
}
--
2.20.1
On Sat, 21 Sep 2019 at 11:17, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: > > If the period is too big, the 'delta * period' product result > might overflow, resulting in a negative number, then the > next_event ends before the last_event. This is buggy, as there > is no forward progress. Assert this can not happen. > > Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> > --- > hw/core/ptimer.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c > index d58e2dfdb0..88085d4c81 100644 > --- a/hw/core/ptimer.c > +++ b/hw/core/ptimer.c > @@ -125,6 +125,9 @@ static void ptimer_reload(ptimer_state *s, int delta_adjust) > > s->last_event = s->next_event; > s->next_event = s->last_event + delta * period; > + /* Verify forward progress */ > + g_assert(s->next_event > s->last_event); > + > if (period_frac) { > s->next_event += ((int64_t)period_frac * delta) >> 32; > } > -- Can this only happen if a QEMU timer model using the ptimer code has a bug, or is it guest-triggerable for some of our timer models? thanks -- PMM
On 9/23/19 4:40 PM, Peter Maydell wrote: > On Sat, 21 Sep 2019 at 11:17, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: >> >> If the period is too big, the 'delta * period' product result >> might overflow, resulting in a negative number, then the >> next_event ends before the last_event. This is buggy, as there >> is no forward progress. Assert this can not happen. >> >> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> >> --- >> hw/core/ptimer.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c >> index d58e2dfdb0..88085d4c81 100644 >> --- a/hw/core/ptimer.c >> +++ b/hw/core/ptimer.c >> @@ -125,6 +125,9 @@ static void ptimer_reload(ptimer_state *s, int delta_adjust) >> >> s->last_event = s->next_event; >> s->next_event = s->last_event + delta * period; >> + /* Verify forward progress */ >> + g_assert(s->next_event > s->last_event); >> + >> if (period_frac) { >> s->next_event += ((int64_t)period_frac * delta) >> 32; >> } >> -- > > Can this only happen if a QEMU timer model using the ptimer > code has a bug, or is it guest-triggerable for some of our > timer models? I hit this running a raspi4 guest, I had incorrectly initialized a clock using the core cpu frequency, while I had to use the APB one (in my case, core_cpu_freq / 2). The guest use a high value to configure a slow timer, which in my buggy case made QEMU hang in hard way to debug. So yes, it seems guest-triggerable if the implementation is broken. Using assert() is OK for broken implementation, right? Or should we audit all ptimer calls? Regards, Phil.
On Mon, 23 Sep 2019 at 15:54, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: > > On 9/23/19 4:40 PM, Peter Maydell wrote: > > On Sat, 21 Sep 2019 at 11:17, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: > >> > >> If the period is too big, the 'delta * period' product result > >> might overflow, resulting in a negative number, then the > >> next_event ends before the last_event. This is buggy, as there > >> is no forward progress. Assert this can not happen. > > Can this only happen if a QEMU timer model using the ptimer > > code has a bug, or is it guest-triggerable for some of our > > timer models? > > I hit this running a raspi4 guest, I had incorrectly initialized a clock > using the core cpu frequency, while I had to use the APB one (in my > case, core_cpu_freq / 2). The guest use a high value to configure a slow > timer, which in my buggy case made QEMU hang in hard way to debug. > > So yes, it seems guest-triggerable if the implementation is broken. > Using assert() is OK for broken implementation, right? Yeah, if this can only happen if QEMU code is broken then an assert is OK. I was just trying to find out what the cause was, since "this is buggy" isn't specific about where the bug is. > Or should we audit all ptimer calls? I don't think we specifically need an audit. We could perhaps expand the comment by the assert to specifically say that if the calculation of the next event overflowed then this indicates a bug in the QEMU device model using the ptimer API, so if somebody else runs into the assert they have a hint about what to look at. (An overflowed next_event indicates a time incredibly far in the future, given that it's a nanosecond time in an int64_t.) The other approach I thought of would be to make the ptimer code handle this sort of after-the-end-of-QEMU-universe time by saturating next_event to INT64_MAX rather than letting it overflow and wrap. Unfortunately while this would be fine for the 'timer event' part of the code, it would break ptimer_get_count() which calculates the current counter value by looking at the difference between the current time and the time of the next event (fixable but only with a bunch of messing about to treat a next_event of INT64_MAX as equivalent to the counter being disabled and tracking the counter value in s->delta). So an assert is the best thing I think. thanks -- PMM
On 9/23/19 5:08 PM, Peter Maydell wrote: > On Mon, 23 Sep 2019 at 15:54, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: >> >> On 9/23/19 4:40 PM, Peter Maydell wrote: >>> On Sat, 21 Sep 2019 at 11:17, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: >>>> >>>> If the period is too big, the 'delta * period' product result >>>> might overflow, resulting in a negative number, then the >>>> next_event ends before the last_event. This is buggy, as there >>>> is no forward progress. Assert this can not happen. > >>> Can this only happen if a QEMU timer model using the ptimer >>> code has a bug, or is it guest-triggerable for some of our >>> timer models? >> >> I hit this running a raspi4 guest, I had incorrectly initialized a clock >> using the core cpu frequency, while I had to use the APB one (in my >> case, core_cpu_freq / 2). The guest use a high value to configure a slow >> timer, which in my buggy case made QEMU hang in hard way to debug. >> >> So yes, it seems guest-triggerable if the implementation is broken. >> Using assert() is OK for broken implementation, right? > > Yeah, if this can only happen if QEMU code is broken then > an assert is OK. I was just trying to find out what the > cause was, since "this is buggy" isn't specific about where > the bug is. > >> Or should we audit all ptimer calls? > > I don't think we specifically need an audit. We could perhaps > expand the comment by the assert to specifically say that if > the calculation of the next event overflowed then this indicates > a bug in the QEMU device model using the ptimer API, so if > somebody else runs into the assert they have a hint about > what to look at. (An overflowed next_event indicates a time > incredibly far in the future, given that it's a nanosecond > time in an int64_t.) OK I'll improve the comment. > The other approach I thought of would be to make the ptimer > code handle this sort of after-the-end-of-QEMU-universe time > by saturating next_event to INT64_MAX rather than letting it > overflow and wrap. Unfortunately while this would be fine for > the 'timer event' part of the code, it would break > ptimer_get_count() which calculates the current counter > value by looking at the difference between the current > time and the time of the next event (fixable but only with > a bunch of messing about to treat a next_event of INT64_MAX > as equivalent to the counter being disabled and tracking > the counter value in s->delta). So an assert is the > best thing I think. Yes :/ Thanks! Phil.
© 2016 - 2024 Red Hat, Inc.