x86/tsc: Always save/restore TSC sched_clock on suspend/resume

[PATCH] x86/tsc: Always save/restore TSC sched_clock on suspend/resume

Posted by Guilherme G. Piccoli 11 months, 4 weeks ago

TSC could be reset in deep ACPI sleep states, even with invariant TSC.
That's the reason we have sched_clock() save/restore functions, to deal
with this situation. But happens that such functions are guarded with a
check for the stability of sched_clock - if not considered stable, the
save/restore routines aren't executed.

On top of that, we have a clear comment on native_sched_clock() saying
that *even* with TSC unstable, we continue using TSC for sched_clock due
to its speed. In other words, if we have a situation of TSC getting
detected as unstable, it marks the sched_clock as unstable as well,
so subsequent S3 sleep cycles could bring bogus sched_clock values due
to the lack of the save/restore mechanism, causing warnings like this:

[22.954918] ------------[ cut here ]------------
[22.954923] Delta way too big! 18446743750843854390 ts=18446744072977390405 before=322133536015 after=322133536015 write stamp=18446744072977390405
[22.954923] If you just came from a suspend/resume,
[22.954923] please switch to the trace global clock:
[22.954923]   echo global > /sys/kernel/tracing/trace_clock
[22.954923] or add trace_clock=global to the kernel command line
[22.954937] WARNING: CPU: 2 PID: 5728 at kernel/trace/ring_buffer.c:2890 rb_add_timestamp+0x193/0x1c0

Notice that the above was reproduced even with "trace_clock=global".

The fix for that is to _always_ save/restore the sched_clock on suspend
cycle _if TSC is used_ as sched_clock - only if we fallback to jiffies
the sched_clock_stable() check becomes relevant to save/restore the
sched_clock.

Cc: stable@vger.kernel.org
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 arch/x86/kernel/tsc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 34dec0b72ea8..88e5a4ed9db3 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -959,7 +959,7 @@ static unsigned long long cyc2ns_suspend;
 
 void tsc_save_sched_clock_state(void)
 {
-	if (!sched_clock_stable())
+	if (!static_branch_likely(&__use_tsc) && !sched_clock_stable())
 		return;
 
 	cyc2ns_suspend = sched_clock();
@@ -979,7 +979,7 @@ void tsc_restore_sched_clock_state(void)
 	unsigned long flags;
 	int cpu;
 
-	if (!sched_clock_stable())
+	if (!static_branch_likely(&__use_tsc) && !sched_clock_stable())
 		return;
 
 	local_irq_save(flags);
-- 
2.47.1

Re: [PATCH] x86/tsc: Always save/restore TSC sched_clock on suspend/resume

Posted by Guilherme G. Piccoli 11 months, 3 weeks ago

On 15/02/2025 17:58, Guilherme G. Piccoli wrote:
> TSC could be reset in deep ACPI sleep states, even with invariant TSC.
> That's the reason we have sched_clock() save/restore functions, to deal
> with this situation. But happens that such functions are guarded with a
> check for the stability of sched_clock - if not considered stable, the
> save/restore routines aren't executed.
> 
> On top of that, we have a clear comment on native_sched_clock() saying
> that *even* with TSC unstable, we continue using TSC for sched_clock due
> to its speed. In other words, if we have a situation of TSC getting
> detected as unstable, it marks the sched_clock as unstable as well,
> so subsequent S3 sleep cycles could bring bogus sched_clock values due
> to the lack of the save/restore mechanism, causing warnings like this:
> 
> [22.954918] ------------[ cut here ]------------
> [22.954923] Delta way too big! 18446743750843854390 ts=18446744072977390405 before=322133536015 after=322133536015 write stamp=18446744072977390405
> [22.954923] If you just came from a suspend/resume,
> [22.954923] please switch to the trace global clock:
> [22.954923]   echo global > /sys/kernel/tracing/trace_clock
> [22.954923] or add trace_clock=global to the kernel command line
> [22.954937] WARNING: CPU: 2 PID: 5728 at kernel/trace/ring_buffer.c:2890 rb_add_timestamp+0x193/0x1c0
> 
> Notice that the above was reproduced even with "trace_clock=global".
> 
> The fix for that is to _always_ save/restore the sched_clock on suspend
> cycle _if TSC is used_ as sched_clock - only if we fallback to jiffies
> the sched_clock_stable() check becomes relevant to save/restore the
> sched_clock.
> 

Hi folks, I would like to ask if possible to add the following tag:

Debugged-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>

Cascardo helped me a lot on debugging this issue but I forgot to add it
earlier, so nothing more fair than add it now!

Thanks,


Guilherme

> Cc: stable@vger.kernel.org
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> [...]

[tip: sched/core] x86/tsc: Always save/restore TSC sched_clock() on suspend/resume

Posted by tip-bot2 for Guilherme G. Piccoli 11 months, 3 weeks ago

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     d90c9de9de2f1712df56de6e4f7d6982d358cabe
Gitweb:        https://git.kernel.org/tip/d90c9de9de2f1712df56de6e4f7d6982d358cabe
Author:        Guilherme G. Piccoli <gpiccoli@igalia.com>
AuthorDate:    Sat, 15 Feb 2025 17:58:16 -03:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 21 Feb 2025 15:27:38 +01:00

x86/tsc: Always save/restore TSC sched_clock() on suspend/resume

TSC could be reset in deep ACPI sleep states, even with invariant TSC.

That's the reason we have sched_clock() save/restore functions, to deal
with this situation. But what happens is that such functions are guarded
with a check for the stability of sched_clock - if not considered stable,
the save/restore routines aren't executed.

On top of that, we have a clear comment in native_sched_clock() saying
that *even* with TSC unstable, we continue using TSC for sched_clock due
to its speed.

In other words, if we have a situation of TSC getting detected as unstable,
it marks the sched_clock as unstable as well, so subsequent S3 sleep cycles
could bring bogus sched_clock values due to the lack of the save/restore
mechanism, causing warnings like this:

  [22.954918] ------------[ cut here ]------------
  [22.954923] Delta way too big! 18446743750843854390 ts=18446744072977390405 before=322133536015 after=322133536015 write stamp=18446744072977390405
  [22.954923] If you just came from a suspend/resume,
  [22.954923] please switch to the trace global clock:
  [22.954923]   echo global > /sys/kernel/tracing/trace_clock
  [22.954923] or add trace_clock=global to the kernel command line
  [22.954937] WARNING: CPU: 2 PID: 5728 at kernel/trace/ring_buffer.c:2890 rb_add_timestamp+0x193/0x1c0

Notice that the above was reproduced even with "trace_clock=global".

The fix for that is to _always_ save/restore the sched_clock on suspend
cycle _if TSC is used_ as sched_clock - only if we fallback to jiffies
the sched_clock_stable() check becomes relevant to save/restore the
sched_clock.

Debugged-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250215210314.351480-1-gpiccoli@igalia.com
---
 arch/x86/kernel/tsc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 34dec0b..88e5a4e 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -959,7 +959,7 @@ static unsigned long long cyc2ns_suspend;
 
 void tsc_save_sched_clock_state(void)
 {
-	if (!sched_clock_stable())
+	if (!static_branch_likely(&__use_tsc) && !sched_clock_stable())
 		return;
 
 	cyc2ns_suspend = sched_clock();
@@ -979,7 +979,7 @@ void tsc_restore_sched_clock_state(void)
 	unsigned long flags;
 	int cpu;
 
-	if (!sched_clock_stable())
+	if (!static_branch_likely(&__use_tsc) && !sched_clock_stable())
 		return;
 
 	local_irq_save(flags);

Re: [tip: sched/core] x86/tsc: Always save/restore TSC sched_clock() on suspend/resume

Posted by Peter Zijlstra 11 months, 3 weeks ago

On Fri, Feb 21, 2025 at 02:37:42PM -0000, tip-bot2 for Guilherme G. Piccoli wrote:
> The following commit has been merged into the sched/core branch of tip:
> 
> Commit-ID:     d90c9de9de2f1712df56de6e4f7d6982d358cabe
> Gitweb:        https://git.kernel.org/tip/d90c9de9de2f1712df56de6e4f7d6982d358cabe
> Author:        Guilherme G. Piccoli <gpiccoli@igalia.com>
> AuthorDate:    Sat, 15 Feb 2025 17:58:16 -03:00
> Committer:     Ingo Molnar <mingo@kernel.org>
> CommitterDate: Fri, 21 Feb 2025 15:27:38 +01:00
> 
> x86/tsc: Always save/restore TSC sched_clock() on suspend/resume

Should this not go into x86/core or somesuch?