arch/x86/kernel/tsc.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
Right now, we can force the TSC to be marked as unstable through
boot parameter. There are debug / test cases though in which would
be preferable to simulate the clocksource watchdog behavior, i.e.,
marking TSC as unstable during the system run. Some paths might
change, for example: the tracing clock is auto switched to global
if TSC is marked as unstable on boot, but it could remain local if
TSC gets marked as unstable after tracing initialization.
Hence, the proposal here is to have a simple debugfs file that
gets TSC marked as unstable when written.
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
arch/x86/kernel/tsc.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 88e5a4ed9db3..6bce5f3a2848 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -9,6 +9,7 @@
#include <linux/timer.h>
#include <linux/acpi_pmtmr.h>
#include <linux/cpufreq.h>
+#include <linux/debugfs.h>
#include <linux/delay.h>
#include <linux/clocksource.h>
#include <linux/percpu.h>
@@ -1602,3 +1603,24 @@ unsigned long calibrate_delay_is_known(void)
return 0;
}
#endif
+
+#ifdef CONFIG_DEBUG_FS
+static ssize_t tsc_debugfs_write(struct file *f, const char __user *ubuf,
+ size_t count, loff_t *ppos)
+{
+ mark_tsc_unstable("debugfs write");
+ return count;
+}
+
+static const struct file_operations tsc_debugfs_fops = {
+ .write = tsc_debugfs_write,
+};
+
+static __init int tsc_debugfs_init(void)
+{
+ debugfs_create_file("mark_tsc_unstable", 0200, /* write only */
+ arch_debugfs_dir, NULL, &tsc_debugfs_fops);
+ return 0;
+}
+late_initcall(tsc_debugfs_init);
+#endif /* CONFIG_DEBUG_FS */
--
2.48.1
On Wed, Feb 26, 2025 at 10:27:13AM -0300, Guilherme G. Piccoli wrote:
> Right now, we can force the TSC to be marked as unstable through
Who's "we"?
> boot parameter. There are debug / test cases though in which would
Which are those test cases?
> be preferable to simulate the clocksource watchdog behavior, i.e.,
> marking TSC as unstable during the system run. Some paths might
> change, for example: the tracing clock is auto switched to global
> if TSC is marked as unstable on boot, but it could remain local if
> TSC gets marked as unstable after tracing initialization.
>
> Hence, the proposal here is to have a simple debugfs file that
> gets TSC marked as unstable when written.
What happens if someone marks the TSC as unstable and comes reporting to us
that her/his machine is kaputt? And we go on a wild goose chase ...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Hi Boris! Thanks for the attention, responses below. On 17/03/2025 11:40, Borislav Petkov wrote: > On Wed, Feb 26, 2025 at 10:27:13AM -0300, Guilherme G. Piccoli wrote: >> Right now, we can force the TSC to be marked as unstable through > > Who's "we"? We as in we, the Linux users. I can change to something like "Right now, TSC can be marked as unstable" - let me know your preference =) > >> boot parameter. There are debug / test cases though in which would > > Which are those test cases? > For example, my team and I debugged recently a problem with TSC+sched_clock: after TSC being marked as unstable by the watchdog, sched_clock continues to use it BUT the suspend/resume TSC routines stop being executed; for more details, please check [1]. But the thing is: during this debug we tried forcing TSC unstable, and did that by changing the command-line. Problem: with that, tracing code sets its clock to global on boot time. We were suspicious that the issue was related to local trace clock, so we couldn't mark TSC unstable and let the trace code run with local clock as it would, if TSC was marked as unstable by the watchdog late on runtime. That was one case (easily "workarounded" with trace_clock=), but in the end, I thought that would be way better to just have this switch on debugfs to be able to reproduce real-life TSC cases of instability, while system runs. Hope that explains better my reasoning for adding this debugs entry. >> be preferable to simulate the clocksource watchdog behavior, i.e., >> marking TSC as unstable during the system run. Some paths might >> change, for example: the tracing clock is auto switched to global >> if TSC is marked as unstable on boot, but it could remain local if >> TSC gets marked as unstable after tracing initialization. >> >> Hence, the proposal here is to have a simple debugfs file that >> gets TSC marked as unstable when written. > > What happens if someone marks the TSC as unstable and comes reporting to us > that her/his machine is kaputt? And we go on a wild goose chase ... > The same that happens if today someone marks it as unstable via command-line, right? You will see that on logs, and could simply reply that the user marked as unstable themselves, so..no bug at all!! But let's think the other way around: what if some user marks TSC unstable via debugfs, later on runtime, and with that, unveils a real bug as [1] and then, we can then fix it? That would be a win heheh Cheers, Guilherme [1] https://web.git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=d90c9de9de2f1712df56de6e4f7d6982d358cabe
On Mon, Mar 17, 2025 at 12:03:02PM -0300, Guilherme G. Piccoli wrote:
> We as in we, the Linux users. I can change to something like "Right now,
> TSC can be marked as unstable" - let me know your preference =)
Yes please.
Personal pronouns are ambiguous in text, especially with so many
parties/companies/etc developing the kernel so let's avoid them please.
> For example, my team and I debugged recently a problem with
> TSC+sched_clock: after TSC being marked as unstable by the watchdog,
> sched_clock continues to use it BUT the suspend/resume TSC routines stop
> being executed; for more details, please check [1]. But the thing is:
> during this debug we tried forcing TSC unstable, and did that by
> changing the command-line.
>
> Problem: with that, tracing code sets its clock to global on boot time.
> We were suspicious that the issue was related to local trace clock, so
> we couldn't mark TSC unstable and let the trace code run with local
> clock as it would, if TSC was marked as unstable by the watchdog late on
> runtime.
>
> That was one case (easily "workarounded" with trace_clock=), but in the
> end, I thought that would be way better to just have this switch on
> debugfs to be able to reproduce real-life TSC cases of instability,
> while system runs. Hope that explains better my reasoning for adding
> this debugs entry.
That sounds like a debugging session and we all change the kernel to do things
it doesn't normally do just for the purposes of debugging. It doesn't mean
that that should be exposed upstream.
> The same that happens if today someone marks it as unstable via
> command-line, right? You will see that on logs, and could simply reply
> that the user marked as unstable themselves, so..no bug at all!!
>
> But let's think the other way around: what if some user marks TSC
> unstable via debugfs, later on runtime, and with that, unveils a real
I don't understand what you mean here.
> bug as [1] and then, we can then fix it? That would be a win heheh
No, marking the most important clocksource on x86 deliberately as unstable
better scream bloody hell in dmesg. But regardless, I'm not convinced this is
nearly as needed to have upstream. Just use it locally or cmdline.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 17/03/2025 12:14, Borislav Petkov wrote: > [...] >> The same that happens if today someone marks it as unstable via >> command-line, right? You will see that on logs, and could simply reply >> that the user marked as unstable themselves, so..no bug at all!! >> >> But let's think the other way around: what if some user marks TSC >> unstable via debugfs, later on runtime, and with that, unveils a real > > I don't understand what you mean here. > Oh, I meant that: since the behavior of other components (like tracing clock) is different if TSC is marked as unstable (a) on boot time, through command-line, (b) some time after boot, like on watchdog skews (or using this debugfs approach), we may as well have some user marking it unstable through debugfs and due to that, finding a real bug somewhere, that we can fix it! >> bug as [1] and then, we can then fix it? That would be a win heheh > > No, marking the most important clocksource on x86 deliberately as unstable > better scream bloody hell in dmesg. But regardless, I'm not convinced this is > nearly as needed to have upstream. Just use it locally or cmdline. > > Thx. > OK Boris, thanks for the feedback - if you don't see much value in this one, let's forget about that - can always change locally as you said. Cheers, Guilherme
On 26/02/2025 10:27, Guilherme G. Piccoli wrote: > Right now, we can force the TSC to be marked as unstable through > boot parameter. There are debug / test cases though in which would > be preferable to simulate the clocksource watchdog behavior, i.e., > marking TSC as unstable during the system run. Some paths might > change, for example: the tracing clock is auto switched to global > if TSC is marked as unstable on boot, but it could remain local if > TSC gets marked as unstable after tracing initialization. > > Hence, the proposal here is to have a simple debugfs file that > gets TSC marked as unstable when written. > > Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> > --- > arch/x86/kernel/tsc.c | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) Hi folks, gentle ping about this one - any suggestions? Cheers, Guilherme
On March 17, 2025 7:35:45 AM PDT, "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote: >On 26/02/2025 10:27, Guilherme G. Piccoli wrote: >> Right now, we can force the TSC to be marked as unstable through >> boot parameter. There are debug / test cases though in which would >> be preferable to simulate the clocksource watchdog behavior, i.e., >> marking TSC as unstable during the system run. Some paths might >> change, for example: the tracing clock is auto switched to global >> if TSC is marked as unstable on boot, but it could remain local if >> TSC gets marked as unstable after tracing initialization. >> >> Hence, the proposal here is to have a simple debugfs file that >> gets TSC marked as unstable when written. >> >> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> >> --- >> arch/x86/kernel/tsc.c | 22 ++++++++++++++++++++++ >> 1 file changed, 22 insertions(+) > >Hi folks, gentle ping about this one - any suggestions? >Cheers, > > >Guilherme To be honest I don't think this belongs in debugfs; rather it belongs in sysfs. Debugfs should not be necessarily in serious production systems – it is way too large of an attack surface, which is a very good reason why it is its own filesystem – but if this is a real issue on hardware then it may be needed. -hpa
On 17/03/2025 15:42, H. Peter Anvin wrote: > On March 17, 2025 7:35:45 AM PDT, "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote: >> On 26/02/2025 10:27, Guilherme G. Piccoli wrote: >>> Right now, we can force the TSC to be marked as unstable through >>> boot parameter. There are debug / test cases though in which would >>> be preferable to simulate the clocksource watchdog behavior, i.e., >>> marking TSC as unstable during the system run. Some paths might >>> change, for example: the tracing clock is auto switched to global >>> if TSC is marked as unstable on boot, but it could remain local if >>> TSC gets marked as unstable after tracing initialization. >>> >>> Hence, the proposal here is to have a simple debugfs file that >>> gets TSC marked as unstable when written. >>> >>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> >> Guilherme > > To be honest I don't think this belongs in debugfs; rather it belongs in sysfs. > > Debugfs should not be necessarily in serious production systems – it is way too large of an attack surface, which is a very good reason why it is its own filesystem – but if this is a real issue on hardware then it may be needed. > > -hpa Thanks! I agree, but as per Boris comment, he suggested to drop this one, right? In other words, we have 2 options in my understanding: (a) Drop it; (b) Re-implement using sysfs entry instead of debugfs; What do you all think? Thanks again, Guilherme
On Fri, Mar 21 2025 at 16:26, Guilherme G. Piccoli wrote:
> On 17/03/2025 15:42, H. Peter Anvin wrote:
>> To be honest I don't think this belongs in debugfs; rather it belongs in sysfs.
No.
>> Debugfs should not be necessarily in serious production systems – it
>> is way too large of an attack surface, which is a very good reason
>> why it is its own filesystem – but if this is a real issue on
>> hardware then it may be needed.
There is ZERO reason to do that on a production system.
If the in kernel detection does not work, then switching it over after
someone detected the problem five hours after the fact does not help at
all.
The admin can force the TSC to be removed from timekeeping, which is the
really crucial part, already today by changing the clocksource via sysfs.
> In other words, we have 2 options in my understanding:
>
> (a) Drop it;
>
> (b) Re-implement using sysfs entry instead of debugfs;
Neither (a) nor (b) nor the proposed implementation.
There is actually a good reason why a debug/validation mechanism of some
sort makes sense, i.e. testing:
1) The hardware, which exposed these issues frequently is starting to
get into museum or junkyard state, which reduces the test base
significantly.
2) Modern hardware, which exposes this issue in large fleets
occasionally due to aging and misdirected neutrons, is not really a
good testbed either.
So we have no real test coverage for something, which can be crucial in
the actual failure case.
Sure, it could be argued that this can be implemented in qemu, but that's
fundamentally the wrong approach.
This is something which must be easily available to developers and CI
and not require to have a special setup with debug nonsense enabled in
some external tool.
The proposed implementation is just an ad hoc band aid as well. Why?
1) It has zero relation to the actual failure detection code paths.
2) It covers only a small part of the problem space. On all modern
systems, which have TSC_ADJUST the clocksource watchdog is disabled
and just asynchronously invoking TSC unstable is a hack which only
tests the unstable logic.
So I rather want to see a more complete solution, which
1) lets the clocksource watchdog logic fail the test
2) lets the TSC sync (including TSC_ADJUST) logic on CPU hotplug fail
3) tweaks the TSC_ADJUST register and validates that the detection and
mitigation logic on systems w/o clocksource watchdog works
correctly.
Ideally that's a kunit test for CI integration plus a debugfs interface
for developers, which comes with a related selftest.
Thanks,
tglx
Thanks Thomas for your comprehensive response, quite enriching. Some comments inline: On 21/03/2025 18:19, Thomas Gleixner wrote: > [...] > The proposed implementation is just an ad hoc band aid as well. Why? > > 1) It has zero relation to the actual failure detection code paths. > > 2) It covers only a small part of the problem space. On all modern > systems, which have TSC_ADJUST the clocksource watchdog is disabled > and just asynchronously invoking TSC unstable is a hack which only > tests the unstable logic. But what about AMD systems? Even the modern ones apparently lack TSC_ADJUST - or is it changing recently? Checking TSC code, it is full of checks "if Intel" as well, like in native calibration. Our issue is present on AMD and my impression is that, in this respect, these systems are way more unstable (from TSC perspective) than the ones having TSC_ADJUST. > > So I rather want to see a more complete solution, which > > 1) lets the clocksource watchdog logic fail the test > > 2) lets the TSC sync (including TSC_ADJUST) logic on CPU hotplug fail > > 3) tweaks the TSC_ADJUST register and validates that the detection and > mitigation logic on systems w/o clocksource watchdog works > correctly. > > Ideally that's a kunit test for CI integration plus a debugfs interface > for developers, which comes with a related selftest. > This is a great suggestion. I'll try to come up with something in next weeks (as time allows), I agree this area indeed seems to lack good/easy testing. Cheers, Guilherme
On Sun, Mar 23, 2025 at 02:53:05PM -0300, Guilherme G. Piccoli wrote:
> But what about AMD systems? Even the modern ones apparently lack
> TSC_ADJUST - or is it changing recently?
Yes, it is.
> Checking TSC code, it is full of checks "if Intel" as well, like in
> native calibration. Our issue is present on AMD and my impression is
> that, in this respect, these systems are way more unstable (from TSC
> perspective) than the ones having TSC_ADJUST.
The only one I know of is a Zen2 laptop where BIOS botches a perfectly fine
TSC because those BIOS programmers are soo smart.
If you know of other cases, where are those bug reports?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 23/03/2025 15:14, Borislav Petkov wrote: > On Sun, Mar 23, 2025 at 02:53:05PM -0300, Guilherme G. Piccoli wrote: >> But what about AMD systems? Even the modern ones apparently lack >> TSC_ADJUST - or is it changing recently? > > Yes, it is. This is great to hear - is there any starting point model that you know AMD introduced/is introducing TSC_ADJUST? > > The only one I know of is a Zen2 laptop where BIOS botches a perfectly fine > TSC because those BIOS programmers are soo smart. > > If you know of other cases, where are those bug reports? > We don't know yet what's going on, some TSC skews reported eventually in some machines, in a fleet - but "old" processors, Zen 1st gen. Once things are more clear, definitely could provide a more mature report of the issue. Cheers, Guilherme
On Sun, Mar 23, 2025 at 04:21:44PM -0300, Guilherme G. Piccoli wrote:
> This is great to hear - is there any starting point model that you know
> AMD introduced/is introducing TSC_ADJUST?
Zen5.
> We don't know yet what's going on, some TSC skews reported eventually in
> some machines, in a fleet - but "old" processors, Zen 1st gen. Once things
> are more clear, definitely could provide a more mature report of the issue.
Sure.
Zen1 should not have TSC problems either. If it does, whack BIOS people.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 23/03/2025 16:51, Borislav Petkov wrote: > On Sun, Mar 23, 2025 at 04:21:44PM -0300, Guilherme G. Piccoli wrote: >> This is great to hear - is there any starting point model that you know >> AMD introduced/is introducing TSC_ADJUST? > > Zen5. > Thanks! Nice improvement from AMD =) > [...] > Zen1 should not have TSC problems either. If it does, whack BIOS people. > Heheh OK, thanks for confirming!
© 2016 - 2025 Red Hat, Inc.