[PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot

Guilherme G. Piccoli posted 1 patch 9 months, 3 weeks ago
arch/x86/kernel/tsc.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
[PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot
Posted by Guilherme G. Piccoli 9 months, 3 weeks ago
Right now, we can force the TSC to be marked as unstable through
boot parameter. There are debug / test cases though in which would
be preferable to simulate the clocksource watchdog behavior, i.e.,
marking TSC as unstable during the system run. Some paths might
change, for example: the tracing clock is auto switched to global
if TSC is marked as unstable on boot, but it could remain local if
TSC gets marked as unstable after tracing initialization.

Hence, the proposal here is to have a simple debugfs file that
gets TSC marked as unstable when written.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 arch/x86/kernel/tsc.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 88e5a4ed9db3..6bce5f3a2848 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -9,6 +9,7 @@
 #include <linux/timer.h>
 #include <linux/acpi_pmtmr.h>
 #include <linux/cpufreq.h>
+#include <linux/debugfs.h>
 #include <linux/delay.h>
 #include <linux/clocksource.h>
 #include <linux/percpu.h>
@@ -1602,3 +1603,24 @@ unsigned long calibrate_delay_is_known(void)
 	return 0;
 }
 #endif
+
+#ifdef CONFIG_DEBUG_FS
+static ssize_t tsc_debugfs_write(struct file *f, const char __user *ubuf,
+				 size_t count, loff_t *ppos)
+{
+	mark_tsc_unstable("debugfs write");
+	return count;
+}
+
+static const struct file_operations tsc_debugfs_fops = {
+	.write	= tsc_debugfs_write,
+};
+
+static __init int tsc_debugfs_init(void)
+{
+	debugfs_create_file("mark_tsc_unstable", 0200, /* write only */
+			    arch_debugfs_dir, NULL, &tsc_debugfs_fops);
+	return 0;
+}
+late_initcall(tsc_debugfs_init);
+#endif /* CONFIG_DEBUG_FS */
-- 
2.48.1
Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot
Posted by Borislav Petkov 9 months ago
On Wed, Feb 26, 2025 at 10:27:13AM -0300, Guilherme G. Piccoli wrote:
> Right now, we can force the TSC to be marked as unstable through

Who's "we"?

> boot parameter. There are debug / test cases though in which would

Which are those test cases?

> be preferable to simulate the clocksource watchdog behavior, i.e.,
> marking TSC as unstable during the system run. Some paths might
> change, for example: the tracing clock is auto switched to global
> if TSC is marked as unstable on boot, but it could remain local if
> TSC gets marked as unstable after tracing initialization.
> 
> Hence, the proposal here is to have a simple debugfs file that
> gets TSC marked as unstable when written.

What happens if someone marks the TSC as unstable and comes reporting to us
that her/his machine is kaputt? And we go on a wild goose chase ...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot
Posted by Guilherme G. Piccoli 9 months ago
Hi Boris! Thanks for the attention, responses below.

On 17/03/2025 11:40, Borislav Petkov wrote:
> On Wed, Feb 26, 2025 at 10:27:13AM -0300, Guilherme G. Piccoli wrote:
>> Right now, we can force the TSC to be marked as unstable through
> 
> Who's "we"?

We as in we, the Linux users. I can change to something like "Right now,
TSC can be marked as unstable" - let me know your preference =)


> 
>> boot parameter. There are debug / test cases though in which would
> 
> Which are those test cases?
>

For example, my team and I debugged recently a problem with
TSC+sched_clock: after TSC being marked as unstable by the watchdog,
sched_clock continues to use it BUT the suspend/resume TSC routines stop
being executed; for more details, please check [1]. But the thing is:
during this debug we tried forcing TSC unstable, and did that by
changing the command-line.

Problem: with that, tracing code sets its clock to global on boot time.
We were suspicious that the issue was related to local trace clock, so
we couldn't mark TSC unstable and let the trace code run with local
clock as it would, if TSC was marked as unstable by the watchdog late on
runtime.

That was one case (easily "workarounded" with trace_clock=), but in the
end, I thought that would be way better to just have this switch on
debugfs to be able to reproduce real-life TSC cases of instability,
while system runs. Hope that explains better my reasoning for adding
this debugs entry.


>> be preferable to simulate the clocksource watchdog behavior, i.e.,
>> marking TSC as unstable during the system run. Some paths might
>> change, for example: the tracing clock is auto switched to global
>> if TSC is marked as unstable on boot, but it could remain local if
>> TSC gets marked as unstable after tracing initialization.
>>
>> Hence, the proposal here is to have a simple debugfs file that
>> gets TSC marked as unstable when written.
> 
> What happens if someone marks the TSC as unstable and comes reporting to us
> that her/his machine is kaputt? And we go on a wild goose chase ...
> 

The same that happens if today someone marks it as unstable via
command-line, right? You will see that on logs, and could simply reply
that the user marked as unstable themselves, so..no bug at all!!

But let's think the other way around: what if some user marks TSC
unstable via debugfs, later on runtime, and with that, unveils a real
bug as [1] and then, we can then fix it? That would be a win heheh
Cheers,


Guilherme


[1]
https://web.git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=d90c9de9de2f1712df56de6e4f7d6982d358cabe
Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot
Posted by Borislav Petkov 9 months ago
On Mon, Mar 17, 2025 at 12:03:02PM -0300, Guilherme G. Piccoli wrote:
> We as in we, the Linux users. I can change to something like "Right now,
> TSC can be marked as unstable" - let me know your preference =)

Yes please.

Personal pronouns are ambiguous in text, especially with so many
parties/companies/etc developing the kernel so let's avoid them please.

> For example, my team and I debugged recently a problem with
> TSC+sched_clock: after TSC being marked as unstable by the watchdog,
> sched_clock continues to use it BUT the suspend/resume TSC routines stop
> being executed; for more details, please check [1]. But the thing is:
> during this debug we tried forcing TSC unstable, and did that by
> changing the command-line.
> 
> Problem: with that, tracing code sets its clock to global on boot time.
> We were suspicious that the issue was related to local trace clock, so
> we couldn't mark TSC unstable and let the trace code run with local
> clock as it would, if TSC was marked as unstable by the watchdog late on
> runtime.
> 
> That was one case (easily "workarounded" with trace_clock=), but in the
> end, I thought that would be way better to just have this switch on
> debugfs to be able to reproduce real-life TSC cases of instability,
> while system runs. Hope that explains better my reasoning for adding
> this debugs entry.

That sounds like a debugging session and we all change the kernel to do things
it doesn't normally do just for the purposes of debugging. It doesn't mean
that that should be exposed upstream.

> The same that happens if today someone marks it as unstable via
> command-line, right? You will see that on logs, and could simply reply
> that the user marked as unstable themselves, so..no bug at all!!
> 
> But let's think the other way around: what if some user marks TSC
> unstable via debugfs, later on runtime, and with that, unveils a real

I don't understand what you mean here.

> bug as [1] and then, we can then fix it? That would be a win heheh

No, marking the most important clocksource on x86 deliberately as unstable
better scream bloody hell in dmesg. But regardless, I'm not convinced this is
nearly as needed to have upstream. Just use it locally or cmdline.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot
Posted by Guilherme G. Piccoli 9 months ago
On 17/03/2025 12:14, Borislav Petkov wrote:
> [...]
>> The same that happens if today someone marks it as unstable via
>> command-line, right? You will see that on logs, and could simply reply
>> that the user marked as unstable themselves, so..no bug at all!!
>>
>> But let's think the other way around: what if some user marks TSC
>> unstable via debugfs, later on runtime, and with that, unveils a real
> 
> I don't understand what you mean here.
> 

Oh, I meant that: since the behavior of other components (like tracing
clock) is different if TSC is marked as unstable (a) on boot time,
through command-line, (b) some time after boot, like on watchdog skews
(or using this debugfs approach), we may as well have some user marking
it unstable through debugfs and due to that, finding a real bug
somewhere, that we can fix it!


>> bug as [1] and then, we can then fix it? That would be a win heheh
> 
> No, marking the most important clocksource on x86 deliberately as unstable
> better scream bloody hell in dmesg. But regardless, I'm not convinced this is
> nearly as needed to have upstream. Just use it locally or cmdline.
> 
> Thx.
> 

OK Boris, thanks for the feedback - if you don't see much value in this
one, let's forget about that - can always change locally as you said.
Cheers,


Guilherme
Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot
Posted by Guilherme G. Piccoli 9 months ago
On 26/02/2025 10:27, Guilherme G. Piccoli wrote:
> Right now, we can force the TSC to be marked as unstable through
> boot parameter. There are debug / test cases though in which would
> be preferable to simulate the clocksource watchdog behavior, i.e.,
> marking TSC as unstable during the system run. Some paths might
> change, for example: the tracing clock is auto switched to global
> if TSC is marked as unstable on boot, but it could remain local if
> TSC gets marked as unstable after tracing initialization.
> 
> Hence, the proposal here is to have a simple debugfs file that
> gets TSC marked as unstable when written.
> 
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
>  arch/x86/kernel/tsc.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)

Hi folks, gentle ping about this one - any suggestions?
Cheers,


Guilherme
Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot
Posted by H. Peter Anvin 9 months ago
On March 17, 2025 7:35:45 AM PDT, "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:
>On 26/02/2025 10:27, Guilherme G. Piccoli wrote:
>> Right now, we can force the TSC to be marked as unstable through
>> boot parameter. There are debug / test cases though in which would
>> be preferable to simulate the clocksource watchdog behavior, i.e.,
>> marking TSC as unstable during the system run. Some paths might
>> change, for example: the tracing clock is auto switched to global
>> if TSC is marked as unstable on boot, but it could remain local if
>> TSC gets marked as unstable after tracing initialization.
>> 
>> Hence, the proposal here is to have a simple debugfs file that
>> gets TSC marked as unstable when written.
>> 
>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
>> ---
>>  arch/x86/kernel/tsc.c | 22 ++++++++++++++++++++++
>>  1 file changed, 22 insertions(+)
>
>Hi folks, gentle ping about this one - any suggestions?
>Cheers,
>
>
>Guilherme

To be honest I don't think this belongs in debugfs; rather it belongs in sysfs.

Debugfs should not be necessarily in serious production systems – it is way too large of an attack surface, which is a very good reason why it is its own filesystem – but if this is a real issue on hardware then it may be needed.

   -hpa
Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot
Posted by Guilherme G. Piccoli 9 months ago
On 17/03/2025 15:42, H. Peter Anvin wrote:
> On March 17, 2025 7:35:45 AM PDT, "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:
>> On 26/02/2025 10:27, Guilherme G. Piccoli wrote:
>>> Right now, we can force the TSC to be marked as unstable through
>>> boot parameter. There are debug / test cases though in which would
>>> be preferable to simulate the clocksource watchdog behavior, i.e.,
>>> marking TSC as unstable during the system run. Some paths might
>>> change, for example: the tracing clock is auto switched to global
>>> if TSC is marked as unstable on boot, but it could remain local if
>>> TSC gets marked as unstable after tracing initialization.
>>>
>>> Hence, the proposal here is to have a simple debugfs file that
>>> gets TSC marked as unstable when written.
>>>
>>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
>> Guilherme
> 
> To be honest I don't think this belongs in debugfs; rather it belongs in sysfs.
> 
> Debugfs should not be necessarily in serious production systems – it is way too large of an attack surface, which is a very good reason why it is its own filesystem – but if this is a real issue on hardware then it may be needed.
> 
>    -hpa

Thanks! I agree, but as per Boris comment, he suggested to drop this
one, right?


In other words, we have 2 options in my understanding:

(a) Drop it;

(b) Re-implement using sysfs entry instead of debugfs;


What do you all think?
Thanks again,


Guilherme
Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot
Posted by Thomas Gleixner 9 months ago
On Fri, Mar 21 2025 at 16:26, Guilherme G. Piccoli wrote:
> On 17/03/2025 15:42, H. Peter Anvin wrote:
>> To be honest I don't think this belongs in debugfs; rather it belongs in sysfs.

No.

>> Debugfs should not be necessarily in serious production systems – it
>> is way too large of an attack surface, which is a very good reason
>> why it is its own filesystem – but if this is a real issue on
>> hardware then it may be needed.

There is ZERO reason to do that on a production system.

If the in kernel detection does not work, then switching it over after
someone detected the problem five hours after the fact does not help at
all.

The admin can force the TSC to be removed from timekeeping, which is the
really crucial part, already today by changing the clocksource via sysfs.

> In other words, we have 2 options in my understanding:
>
> (a) Drop it;
>
> (b) Re-implement using sysfs entry instead of debugfs;

Neither (a) nor (b) nor the proposed implementation.

There is actually a good reason why a debug/validation mechanism of some
sort makes sense, i.e. testing:

  1) The hardware, which exposed these issues frequently is starting to
     get into museum or junkyard state, which reduces the test base
     significantly.

  2) Modern hardware, which exposes this issue in large fleets
     occasionally due to aging and misdirected neutrons, is not really a
     good testbed either.

So we have no real test coverage for something, which can be crucial in
the actual failure case.

Sure, it could be argued that this can be implemented in qemu, but that's
fundamentally the wrong approach.

This is something which must be easily available to developers and CI
and not require to have a special setup with debug nonsense enabled in
some external tool.

The proposed implementation is just an ad hoc band aid as well. Why?

  1) It has zero relation to the actual failure detection code paths.

  2) It covers only a small part of the problem space. On all modern
     systems, which have TSC_ADJUST the clocksource watchdog is disabled
     and just asynchronously invoking TSC unstable is a hack which only
     tests the unstable logic.

So I rather want to see a more complete solution, which

  1) lets the clocksource watchdog logic fail the test

  2) lets the TSC sync (including TSC_ADJUST) logic on CPU hotplug fail

  3) tweaks the TSC_ADJUST register and validates that the detection and
     mitigation logic on systems w/o clocksource watchdog works
     correctly.

Ideally that's a kunit test for CI integration plus a debugfs interface
for developers, which comes with a related selftest.

Thanks,

        tglx






  

   
Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot
Posted by Guilherme G. Piccoli 9 months ago
Thanks Thomas for your comprehensive response, quite enriching.
Some comments inline:


On 21/03/2025 18:19, Thomas Gleixner wrote:
> [...]
> The proposed implementation is just an ad hoc band aid as well. Why?
> 
>   1) It has zero relation to the actual failure detection code paths.
> 
>   2) It covers only a small part of the problem space. On all modern
>      systems, which have TSC_ADJUST the clocksource watchdog is disabled
>      and just asynchronously invoking TSC unstable is a hack which only
>      tests the unstable logic.

But what about AMD systems? Even the modern ones apparently lack
TSC_ADJUST - or is it changing recently?

Checking TSC code, it is full of checks "if Intel" as well, like in
native calibration. Our issue is present on AMD and my impression is
that, in this respect, these systems are way more unstable (from TSC
perspective) than the ones having TSC_ADJUST.


> 
> So I rather want to see a more complete solution, which
> 
>   1) lets the clocksource watchdog logic fail the test
> 
>   2) lets the TSC sync (including TSC_ADJUST) logic on CPU hotplug fail
> 
>   3) tweaks the TSC_ADJUST register and validates that the detection and
>      mitigation logic on systems w/o clocksource watchdog works
>      correctly.
> 
> Ideally that's a kunit test for CI integration plus a debugfs interface
> for developers, which comes with a related selftest.
> 

This is a great suggestion. I'll try to come up with something in next
weeks (as time allows), I agree this area indeed seems to lack good/easy
testing.
Cheers,


Guilherme
Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot
Posted by Borislav Petkov 9 months ago
On Sun, Mar 23, 2025 at 02:53:05PM -0300, Guilherme G. Piccoli wrote:
> But what about AMD systems? Even the modern ones apparently lack
> TSC_ADJUST - or is it changing recently?

Yes, it is.

> Checking TSC code, it is full of checks "if Intel" as well, like in
> native calibration. Our issue is present on AMD and my impression is
> that, in this respect, these systems are way more unstable (from TSC
> perspective) than the ones having TSC_ADJUST.

The only one I know of is a Zen2 laptop where BIOS botches a perfectly fine
TSC because those BIOS programmers are soo smart.

If you know of other cases, where are those bug reports?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot
Posted by Guilherme G. Piccoli 8 months, 4 weeks ago
On 23/03/2025 15:14, Borislav Petkov wrote:
> On Sun, Mar 23, 2025 at 02:53:05PM -0300, Guilherme G. Piccoli wrote:
>> But what about AMD systems? Even the modern ones apparently lack
>> TSC_ADJUST - or is it changing recently?
> 
> Yes, it is.

This is great to hear - is there any starting point model that you know
AMD introduced/is introducing TSC_ADJUST?


> 
> The only one I know of is a Zen2 laptop where BIOS botches a perfectly fine
> TSC because those BIOS programmers are soo smart.
> 
> If you know of other cases, where are those bug reports?
>

We don't know yet what's going on, some TSC skews reported eventually in
some machines, in a fleet - but "old" processors, Zen 1st gen.
Once things are more clear, definitely could provide a more mature
report of the issue.

Cheers,


Guilherme
Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot
Posted by Borislav Petkov 8 months, 4 weeks ago
On Sun, Mar 23, 2025 at 04:21:44PM -0300, Guilherme G. Piccoli wrote:
> This is great to hear - is there any starting point model that you know
> AMD introduced/is introducing TSC_ADJUST?

Zen5.

> We don't know yet what's going on, some TSC skews reported eventually in
> some machines, in a fleet - but "old" processors, Zen 1st gen.  Once things
> are more clear, definitely could provide a more mature report of the issue.

Sure.

Zen1 should not have TSC problems either. If it does, whack BIOS people.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot
Posted by Guilherme G. Piccoli 8 months, 4 weeks ago
On 23/03/2025 16:51, Borislav Petkov wrote:
> On Sun, Mar 23, 2025 at 04:21:44PM -0300, Guilherme G. Piccoli wrote:
>> This is great to hear - is there any starting point model that you know
>> AMD introduced/is introducing TSC_ADJUST?
> 
> Zen5.
>

Thanks! Nice improvement from AMD =)


> [...]
> Zen1 should not have TSC problems either. If it does, whack BIOS people.
> 

Heheh
OK, thanks for confirming!