.../admin-guide/kernel-parameters.txt | 5 +- arch/x86/Kconfig | 19 ++ arch/x86/include/asm/io_apic.h | 2 + arch/x86/kernel/apic/io_apic.c | 32 ++++ arch/x86/kernel/hpet.c | 172 ++++++++++++++++++ arch/x86/kernel/i8253.c | 9 + drivers/char/hpet.c | 3 + include/linux/hpet.h | 14 ++ 8 files changed, 255 insertions(+), 1 deletion(-)
The current NMI watchdog relies on performance counters and consistently occupies one on each CPU. When running virtual machines, we want to pass performance counters to virtual machines so they can make use of them. In addition the host system wants to use performance counters to check the system to identify when anything looks abnormal, such as split locks. That makes PMCs a precious resource. So any PMC we can free up is a PMC we can use for something useful. That made me look at the NMI watchdog. The PMC based NMI watchdog implementation does not actually need any performance counting. It just needs a per-CPU NMI timer source. X86 systems can make anything that emits an interrupt descriptor (IOAPIC, MSI(-X), etc) become an NMI source. So any time goes. Including the HPET. And while they can't really operate per-CPU, in almost all cases you only really want the NMI on *all* CPUs, rather than per-CPU. So I took a stab at building an HPET based NMI watchdog. In my (QEMU based) testing, it's fully functional and can successfully detect when CPUs get stuck. It even survives suspend/resume cycles. For now, its enablement is a config time option because the hardlockup framework does not support dynamic switching of multiple detectors. That's ok for our use case. But maybe something for the interested reader to tackle eventually :). You can enable the HPET watchdog by default by setting CONFIG_HARDLOCKUP_DETECTOR_HPET_DEFAULT=y or passing "hpet=watchdog" to the kernel command line. When active, it will emit a kernel log message to indicate it works: [ 0.179176] hpet: HPET watchdog initialized on timer 0, GSI 2 The HPET can only be in either watchdog or generic mode. I am a bit worried about IO-APIC pin allocation logic, so I opted to reuse the generic timer pin. And that means I'm effectively breaking the normal interrupt delivery path. so the easy way out was to say when watchdog is active, PIT and HPET are not available as timer sources. Which is ok on modern systems. There are way too many (unreliable) timer sources on x86 already. Trimming a few surely won't hurt. I'm open to inputs on how to make the HPET multi-purpose though, in case anyone feels strongly about it. Alex Alexander Graf (2): x86/ioapic: Add NMI delivery configuration helper hpet: Add HPET-based NMI watchdog support .../admin-guide/kernel-parameters.txt | 5 +- arch/x86/Kconfig | 19 ++ arch/x86/include/asm/io_apic.h | 2 + arch/x86/kernel/apic/io_apic.c | 32 ++++ arch/x86/kernel/hpet.c | 172 ++++++++++++++++++ arch/x86/kernel/i8253.c | 9 + drivers/char/hpet.c | 3 + include/linux/hpet.h | 14 ++ 8 files changed, 255 insertions(+), 1 deletion(-) -- 2.47.1 Amazon Web Services Development Center Germany GmbH Tamara-Danz-Str. 13 10243 Berlin Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597
On 02.02.26 18:43, Alexander Graf wrote: > The current NMI watchdog relies on performance counters and consistently > occupies one on each CPU. When running virtual machines, we want to pass > performance counters to virtual machines so they can make use of them. > In addition the host system wants to use performance counters to check > the system to identify when anything looks abnormal, such as split > locks. > > That makes PMCs a precious resource. So any PMC we can free up is a PMC > we can use for something useful. That made me look at the NMI watchdog. > > The PMC based NMI watchdog implementation does not actually need any > performance counting. It just needs a per-CPU NMI timer source. X86 > systems can make anything that emits an interrupt descriptor (IOAPIC, > MSI(-X), etc) become an NMI source. So any time goes. Including the > HPET. And while they can't really operate per-CPU, in almost all cases > you only really want the NMI on *all* CPUs, rather than per-CPU. > > So I took a stab at building an HPET based NMI watchdog. In my (QEMU > based) testing, it's fully functional and can successfully detect when > CPUs get stuck. It even survives suspend/resume cycles. > > For now, its enablement is a config time option because the hardlockup > framework does not support dynamic switching of multiple detectors. > That's ok for our use case. But maybe something for the interested > reader to tackle eventually :). > > You can enable the HPET watchdog by default by setting > > CONFIG_HARDLOCKUP_DETECTOR_HPET_DEFAULT=y > > or passing "hpet=watchdog" to the kernel command line. When active, it > will emit a kernel log message to indicate it works: > > [ 0.179176] hpet: HPET watchdog initialized on timer 0, GSI 2 > > The HPET can only be in either watchdog or generic mode. I am a bit > worried about IO-APIC pin allocation logic, so I opted to reuse the > generic timer pin. And that means I'm effectively breaking the normal > interrupt delivery path. so the easy way out was to say when watchdog is > active, PIT and HPET are not available as timer sources. Which is ok on > modern systems. There are way too many (unreliable) timer sources on x86 > already. Trimming a few surely won't hurt. > > I'm open to inputs on how to make the HPET multi-purpose though, in case > anyone feels strongly about it. Sorry for the resend. I caught an issue while sending out the series, hit ctrl-c before thinking and suddenly had a half sent series. Discard this one. Happy review on the real, full one :) Alex Amazon Web Services Development Center Germany GmbH Tamara-Danz-Str. 13 10243 Berlin Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597
© 2016 - 2026 Red Hat, Inc.