[PATCH] x86/time: do not kill calibration timer on suspend

Roger Pau Monne posted 1 patch 2 days, 12 hours ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20260410085504.32925-1-roger.pau@citrix.com
xen/arch/x86/time.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] x86/time: do not kill calibration timer on suspend
Posted by Roger Pau Monne 2 days, 12 hours ago
A killed timer will ignore further set_timer() calls, and hence won't be
re-armed unless it's initialized again.  Use stop_timer() instead of
kill_timer() in time_suspend(), so that the set_timer() call in
time_resume() successfully re-arms the timer.  Otherwise time calibration
is no longer scheduled (and executed) after resuming from S3 suspend.

Fixes: 6d90db1a2ca1 ("x86: rendezvous-based local time calibration")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/time.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index fed30a919d2c..4233ea507d40 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2728,7 +2728,7 @@ int time_suspend(void)
     {
         cmos_utc_offset = -get_wallclock_time();
         cmos_utc_offset += get_sec();
-        kill_timer(&calibration_timer);
+        stop_timer(&calibration_timer);
 
         /* Sync platform timer stamps. */
         platform_time_calibration();
-- 
2.53.0


Re: [PATCH] x86/time: do not kill calibration timer on suspend
Posted by Marek Marczykowski-Górecki 2 days, 10 hours ago
On Fri, Apr 10, 2026 at 10:55:04AM +0200, Roger Pau Monne wrote:
> A killed timer will ignore further set_timer() calls, and hence won't be
> re-armed unless it's initialized again.  Use stop_timer() instead of
> kill_timer() in time_suspend(), so that the set_timer() call in
> time_resume() successfully re-arms the timer.  Otherwise time calibration
> is no longer scheduled (and executed) after resuming from S3 suspend.
> 
> Fixes: 6d90db1a2ca1 ("x86: rendezvous-based local time calibration")
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

I confirm this fixes the issue, thanks!

Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Re: [PATCH] x86/time: do not kill calibration timer on suspend
Posted by Andrew Cooper 2 days, 11 hours ago
On 10/04/2026 9:55 am, Roger Pau Monne wrote:
> A killed timer will ignore further set_timer() calls, and hence won't be
> re-armed unless it's initialized again.  Use stop_timer() instead of
> kill_timer() in time_suspend(), so that the set_timer() call in
> time_resume() successfully re-arms the timer.  Otherwise time calibration
> is no longer scheduled (and executed) after resuming from S3 suspend.
>
> Fixes: 6d90db1a2ca1 ("x86: rendezvous-based local time calibration")
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  xen/arch/x86/time.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index fed30a919d2c..4233ea507d40 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -2728,7 +2728,7 @@ int time_suspend(void)
>      {
>          cmos_utc_offset = -get_wallclock_time();
>          cmos_utc_offset += get_sec();
> -        kill_timer(&calibration_timer);
> +        stop_timer(&calibration_timer);
>  
>          /* Sync platform timer stamps. */
>          platform_time_calibration();

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

This should definitely be stop timer and not kill timer.

However, the fact it "stops" drift after S3 really does concern me. 
There's clearly a different issue here which this is covering over.

The systems which we've been testing on all have ITSC so the TSC doesn't
drift.  Whether there's a step or not is a different question (I'd
expect firmware to arrange to avoid a step being seen), but a step would
not explain our symptoms.

Given it's only once a second, can we dump the scale/offset which the
rendezvous produced each time?

This feels suspiciously like we've gauged the frequency too fast, and
are relying on micro-fixes every second to keep time looking normal. 
(This is pure speculation; I don't any evidence).

~Andrew