hw/openrisc: Fixed undercounting of TTCR in continuous mode

[PATCH] hw/openrisc: Fixed undercounting of TTCR in continuous mode

Posted by Joel Holdsworth via 5 months, 2 weeks ago

In the existing design, TTCR is prone to undercounting when running in
continuous mode. This manifests as a timer interrupt appearing to
trigger a few cycles prior to the deadline set in SPR_TTMR_TP.

When the timer triggers, the virtual time delta in nanoseconds between
the time when the timer was set, and when it triggers is calculated.
This nanoseconds value is then divided by TIMER_PERIOD (50) to compute
an increment of cycles to apply to TTCR.

However, this calculation rounds down the number of cycles causing the
undercounting.

A simplistic solution would be to instead round up the number of cycles,
however this will result in the accumulation of timing error over time.

This patch corrects the issue by calculating the time delta in
nanoseconds between when the timer was last reset and the timer event.
This approach allows the TTCR value to be rounded up, but without
accumulating error over time.

Signed-off-by: Joel Holdsworth <jholdsworth@nvidia.com>
---
 hw/openrisc/cputimer.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/hw/openrisc/cputimer.c b/hw/openrisc/cputimer.c
index 835986c4db..ddc129aa48 100644
--- a/hw/openrisc/cputimer.c
+++ b/hw/openrisc/cputimer.c
@@ -29,7 +29,8 @@
 /* Tick Timer global state to allow all cores to be in sync */
 typedef struct OR1KTimerState {
     uint32_t ttcr;
-    uint64_t last_clk;
+    uint32_t ttcr_offset;
+    uint64_t clk_offset;
 } OR1KTimerState;
 
 static OR1KTimerState *or1k_timer;
@@ -37,6 +38,8 @@ static OR1KTimerState *or1k_timer;
 void cpu_openrisc_count_set(OpenRISCCPU *cpu, uint32_t val)
 {
     or1k_timer->ttcr = val;
+    or1k_timer->ttcr_offset = val;
+    or1k_timer->clk_offset = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 }
 
 uint32_t cpu_openrisc_count_get(OpenRISCCPU *cpu)
@@ -53,9 +56,8 @@ void cpu_openrisc_count_update(OpenRISCCPU *cpu)
         return;
     }
     now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-    or1k_timer->ttcr += (uint32_t)((now - or1k_timer->last_clk)
-                                    / TIMER_PERIOD);
-    or1k_timer->last_clk = now;
+    or1k_timer->ttcr = (now - or1k_timer->clk_offset + TIMER_PERIOD - 1) / TIMER_PERIOD +
+        or1k_timer->ttcr_offset;
 }
 
 /* Update the next timeout time as difference between ttmr and ttcr */
@@ -69,7 +71,7 @@ void cpu_openrisc_timer_update(OpenRISCCPU *cpu)
     }
 
     cpu_openrisc_count_update(cpu);
-    now = or1k_timer->last_clk;
+    now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 
     if ((cpu->env.ttmr & TTMR_TP) <= (or1k_timer->ttcr & TTMR_TP)) {
         wait = TTMR_TP - (or1k_timer->ttcr & TTMR_TP) + 1;
@@ -110,7 +112,8 @@ static void openrisc_timer_cb(void *opaque)
     case TIMER_NONE:
         break;
     case TIMER_INTR:
-        or1k_timer->ttcr = 0;
+        /* Zero the count by applying a negative offset to the counter */
+        or1k_timer->ttcr_offset += UINT32_MAX - (cpu->env.ttmr & TTMR_TP);
         break;
     case TIMER_SHOT:
         cpu_openrisc_count_stop(cpu);
@@ -137,8 +140,8 @@ static void openrisc_count_reset(void *opaque)
 /* Reset the global timer state. */
 static void openrisc_timer_reset(void *opaque)
 {
-    or1k_timer->ttcr = 0x00000000;
-    or1k_timer->last_clk = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+    OpenRISCCPU *cpu = opaque;
+    cpu_openrisc_count_set(cpu, 0);
 }
 
 static const VMStateDescription vmstate_or1k_timer = {
@@ -147,7 +150,8 @@ static const VMStateDescription vmstate_or1k_timer = {
     .minimum_version_id = 1,
     .fields = (const VMStateField[]) {
         VMSTATE_UINT32(ttcr, OR1KTimerState),
-        VMSTATE_UINT64(last_clk, OR1KTimerState),
+        VMSTATE_UINT32(ttcr_offset, OR1KTimerState),
+        VMSTATE_UINT64(clk_offset, OR1KTimerState),
         VMSTATE_END_OF_LIST()
     }
 };
-- 
2.44.1

Re: [PATCH] hw/openrisc: Fixed undercounting of TTCR in continuous mode

Posted by Stafford Horne 1 week, 4 days ago

On Fri, Jun 07, 2024 at 03:29:33PM -0700, Joel Holdsworth via wrote:
> In the existing design, TTCR is prone to undercounting when running in
> continuous mode. This manifests as a timer interrupt appearing to
> trigger a few cycles prior to the deadline set in SPR_TTMR_TP.
> 
> When the timer triggers, the virtual time delta in nanoseconds between
> the time when the timer was set, and when it triggers is calculated.
> This nanoseconds value is then divided by TIMER_PERIOD (50) to compute
> an increment of cycles to apply to TTCR.
> 
> However, this calculation rounds down the number of cycles causing the
> undercounting.
> 
> A simplistic solution would be to instead round up the number of cycles,
> however this will result in the accumulation of timing error over time.
> 
> This patch corrects the issue by calculating the time delta in
> nanoseconds between when the timer was last reset and the timer event.
> This approach allows the TTCR value to be rounded up, but without
> accumulating error over time.
> 
> Signed-off-by: Joel Holdsworth <jholdsworth@nvidia.com>
> ---
>  hw/openrisc/cputimer.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/openrisc/cputimer.c b/hw/openrisc/cputimer.c
> index 835986c4db..ddc129aa48 100644
> --- a/hw/openrisc/cputimer.c
> +++ b/hw/openrisc/cputimer.c
> @@ -29,7 +29,8 @@
>  /* Tick Timer global state to allow all cores to be in sync */
>  typedef struct OR1KTimerState {
>      uint32_t ttcr;
> -    uint64_t last_clk;
> +    uint32_t ttcr_offset;
> +    uint64_t clk_offset;
>  } OR1KTimerState;
>  
>  static OR1KTimerState *or1k_timer;
> @@ -37,6 +38,8 @@ static OR1KTimerState *or1k_timer;
>  void cpu_openrisc_count_set(OpenRISCCPU *cpu, uint32_t val)
>  {
>      or1k_timer->ttcr = val;
> +    or1k_timer->ttcr_offset = val;
> +    or1k_timer->clk_offset = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
>  }
>  
>  uint32_t cpu_openrisc_count_get(OpenRISCCPU *cpu)
> @@ -53,9 +56,8 @@ void cpu_openrisc_count_update(OpenRISCCPU *cpu)
>          return;
>      }
>      now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> -    or1k_timer->ttcr += (uint32_t)((now - or1k_timer->last_clk)
> -                                    / TIMER_PERIOD);
> -    or1k_timer->last_clk = now;
> +    or1k_timer->ttcr = (now - or1k_timer->clk_offset + TIMER_PERIOD - 1) / TIMER_PERIOD +
> +        or1k_timer->ttcr_offset;
>  }
>  
>  /* Update the next timeout time as difference between ttmr and ttcr */
> @@ -69,7 +71,7 @@ void cpu_openrisc_timer_update(OpenRISCCPU *cpu)
>      }
>  
>      cpu_openrisc_count_update(cpu);
> -    now = or1k_timer->last_clk;
> +    now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
>  
>      if ((cpu->env.ttmr & TTMR_TP) <= (or1k_timer->ttcr & TTMR_TP)) {
>          wait = TTMR_TP - (or1k_timer->ttcr & TTMR_TP) + 1;
> @@ -110,7 +112,8 @@ static void openrisc_timer_cb(void *opaque)
>      case TIMER_NONE:
>          break;
>      case TIMER_INTR:
> -        or1k_timer->ttcr = 0;
> +        /* Zero the count by applying a negative offset to the counter */
> +        or1k_timer->ttcr_offset += UINT32_MAX - (cpu->env.ttmr & TTMR_TP);

Hi Joel,

I am trying to get this merged as I am finally getting some time for this again
after a long project at work.

Why here do you do += UINT32_MAX - (cpu->env.ttmr & TTMR_TP)?
Is there an edge case I am not thinking of that is making you use UINT32_MAX?

Wouldn't this be the same as
    r1k_timer->ttcr_offset -= 1 - (cpu->env.ttmr & TTMR_TP);

-Stafford

Re: [PATCH] hw/openrisc: Fixed undercounting of TTCR in continuous mode

Posted by Stafford Horne 5 months, 2 weeks ago

Hi Joel,

I am away and wont be able to have too much time to look at this.

But have a few comments below and questions.

 - You sent this 2 times, is the only change in v2 the sender address?

On Fri, Jun 07, 2024 at 03:29:33PM -0700, Joel Holdsworth via wrote:
> In the existing design, TTCR is prone to undercounting when running in
> continuous mode. This manifests as a timer interrupt appearing to
> trigger a few cycles prior to the deadline set in SPR_TTMR_TP.

This is a good find, I have noticed the timer is off when running on OpenRISC
but never tracked it down to this undercounting issue.  I also notice
unexplained RCU stalls when running in Linux when tere is no load, this timer
issue might be related.

Did you notice this via other system symptoms when running OpenRISC or just via
code auditing of QEMU?

> When the timer triggers, the virtual time delta in nanoseconds between
> the time when the timer was set, and when it triggers is calculated.
> This nanoseconds value is then divided by TIMER_PERIOD (50) to compute
> an increment of cycles to apply to TTCR.
> 
> However, this calculation rounds down the number of cycles causing the
> undercounting.
> 
> A simplistic solution would be to instead round up the number of cycles,
> however this will result in the accumulation of timing error over time.
> 
> This patch corrects the issue by calculating the time delta in
> nanoseconds between when the timer was last reset and the timer event.
> This approach allows the TTCR value to be rounded up, but without
> accumulating error over time.

In QEMU there is a function clock_ns_to_ticks(). Could this maybe be used
instead to give us more standard fix?

> Signed-off-by: Joel Holdsworth <jholdsworth@nvidia.com>
> ---
>  hw/openrisc/cputimer.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/openrisc/cputimer.c b/hw/openrisc/cputimer.c
> index 835986c4db..ddc129aa48 100644
> --- a/hw/openrisc/cputimer.c
> +++ b/hw/openrisc/cputimer.c
> @@ -29,7 +29,8 @@
>  /* Tick Timer global state to allow all cores to be in sync */
>  typedef struct OR1KTimerState {
>      uint32_t ttcr;
> -    uint64_t last_clk;
> +    uint32_t ttcr_offset;
> +    uint64_t clk_offset;
>  } OR1KTimerState;
>  
>  static OR1KTimerState *or1k_timer;
> @@ -37,6 +38,8 @@ static OR1KTimerState *or1k_timer;
>  void cpu_openrisc_count_set(OpenRISCCPU *cpu, uint32_t val)
>  {
>      or1k_timer->ttcr = val;
> +    or1k_timer->ttcr_offset = val;
> +    or1k_timer->clk_offset = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
>  }
>  
>  uint32_t cpu_openrisc_count_get(OpenRISCCPU *cpu)
> @@ -53,9 +56,8 @@ void cpu_openrisc_count_update(OpenRISCCPU *cpu)
>          return;
>      }
>      now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> -    or1k_timer->ttcr += (uint32_t)((now - or1k_timer->last_clk)
> -                                    / TIMER_PERIOD);
> -    or1k_timer->last_clk = now;
> +    or1k_timer->ttcr = (now - or1k_timer->clk_offset + TIMER_PERIOD - 1) / TIMER_PERIOD +
> +        or1k_timer->ttcr_offset;
>  }
>  
>  /* Update the next timeout time as difference between ttmr and ttcr */
> @@ -69,7 +71,7 @@ void cpu_openrisc_timer_update(OpenRISCCPU *cpu)
>      }
>  
>      cpu_openrisc_count_update(cpu);
> -    now = or1k_timer->last_clk;
> +    now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
>  
>      if ((cpu->env.ttmr & TTMR_TP) <= (or1k_timer->ttcr & TTMR_TP)) {
>          wait = TTMR_TP - (or1k_timer->ttcr & TTMR_TP) + 1;
> @@ -110,7 +112,8 @@ static void openrisc_timer_cb(void *opaque)
>      case TIMER_NONE:
>          break;
>      case TIMER_INTR:
> -        or1k_timer->ttcr = 0;
> +        /* Zero the count by applying a negative offset to the counter */
> +        or1k_timer->ttcr_offset += UINT32_MAX - (cpu->env.ttmr & TTMR_TP);
>          break;
>      case TIMER_SHOT:
>          cpu_openrisc_count_stop(cpu);
> @@ -137,8 +140,8 @@ static void openrisc_count_reset(void *opaque)
>  /* Reset the global timer state. */
>  static void openrisc_timer_reset(void *opaque)
>  {
> -    or1k_timer->ttcr = 0x00000000;
> -    or1k_timer->last_clk = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> +    OpenRISCCPU *cpu = opaque;
> +    cpu_openrisc_count_set(cpu, 0);
>  }
>  
>  static const VMStateDescription vmstate_or1k_timer = {
> @@ -147,7 +150,8 @@ static const VMStateDescription vmstate_or1k_timer = {
>      .minimum_version_id = 1,
>      .fields = (const VMStateField[]) {
>          VMSTATE_UINT32(ttcr, OR1KTimerState),
> -        VMSTATE_UINT64(last_clk, OR1KTimerState),
> +        VMSTATE_UINT32(ttcr_offset, OR1KTimerState),
> +        VMSTATE_UINT64(clk_offset, OR1KTimerState),
>          VMSTATE_END_OF_LIST()
>      }
>  };

This is a change to the VM state, we would need to update the version.

-Stafford

Re: [PATCH] hw/openrisc: Fixed undercounting of TTCR in continuous mode

Posted by Joel Holdsworth 5 months, 2 weeks ago

Hi Stafford, thanks for your response.

> - You sent this 2 times, is the only change in v2 the sender address?

Yes, I was just having some difficulty with Git and SMTP. Should be fixed now.


>> In the existing design, TTCR is prone to undercounting when running in
>> continuous mode. This manifests as a timer interrupt appearing to
>> trigger a few cycles prior to the deadline set in SPR_TTMR_TP.

> This is a good find, I have noticed the timer is off when running on OpenRISC
> but never tracked it down to this undercounting issue.  I also notice
> unexplained RCU stalls when running in Linux when tere is no load, this timer
issue might be related.

> Did you notice this via other system symptoms when running OpenRISC or just via
> code auditing of QEMU?

I'm working on an OpenRISC port of Zephyr. The under-counting issue causes consistent deadlocks in my experiments with the test suite. I wouldn't be surprised if it causes problems for other OS's.


> In QEMU there is a function clock_ns_to_ticks(). Could this maybe be used
> instead to give us more standard fix?

Seems like a good idea, and I now have some nearly-complete patch that brings hw/openrisc/cputimer.c into closer alignment with target/mips/sysemu/cp0_timer.c . However, don't we run into problems with undercounting with clock_ns_to_ticks, because if I understand correctly it will round ticks down, not up?, which is the problem I was trying to avoid in the first place.

Joel

Re: [PATCH] hw/openrisc: Fixed undercounting of TTCR in continuous mode

Posted by Stafford Horne 3 months, 3 weeks ago

On Mon, Jun 10, 2024 at 07:29:15PM +0000, Joel Holdsworth wrote:
> Hi Stafford, thanks for your response.
> 
> > - You sent this 2 times, is the only change in v2 the sender address?
> 
> Yes, I was just having some difficulty with Git and SMTP. Should be fixed now.

OK.

> >> In the existing design, TTCR is prone to undercounting when running in
> >> continuous mode. This manifests as a timer interrupt appearing to
> >> trigger a few cycles prior to the deadline set in SPR_TTMR_TP.
> 
> > This is a good find, I have noticed the timer is off when running on OpenRISC
> > but never tracked it down to this undercounting issue.  I also notice
> > unexplained RCU stalls when running in Linux when tere is no load, this timer
> issue might be related.
> 
> > Did you notice this via other system symptoms when running OpenRISC or just via
> > code auditing of QEMU?
> 
> I'm working on an OpenRISC port of Zephyr. The under-counting issue causes
> consistent deadlocks in my experiments with the test suite. I wouldn't be
> surprised if it causes problems for other OS's.

Thats cool.  I got around to testing the patch with Linux, unfortunately I didnt
see an improvement in the lockups I have been seeing during boot time.  But I am
sure this is a step in the right direction.

> > In QEMU there is a function clock_ns_to_ticks(). Could this maybe be used
> > instead to give us more standard fix?
> 
> Seems like a good idea, and I now have some nearly-complete patch that brings
> hw/openrisc/cputimer.c into closer alignment with
> target/mips/sysemu/cp0_timer.c.

Hi, I was waiting for this second version patch, v2.  Have you ever completed it?

> However, don't we run into problems with undercounting with clock_ns_to_ticks,
> because if I understand correctly it will round ticks down, not up?, which is
> the problem I was trying to avoid in the first place.

You might be right, but if that is the case maybe it's a but to raise to the
maintainers directly.  I was planning to look into this more cosely after you
sent the followup patch.

-Stafford