[v1] Add HPET NMI Watchdog support

[PATCH 0/2] Add HPET NMI Watchdog support

Posted by Alexander Graf 4 days, 9 hours ago

The current NMI watchdog relies on performance counters and consistently
occupies one on each CPU. When running virtual machines, we want to pass
performance counters to virtual machines so they can make use of them.
In addition the host system wants to use performance counters to check
the system to identify when anything looks abnormal, such as split
locks.

That makes PMCs a precious resource. So any PMC we can free up is a PMC
we can use for something useful. That made me look at the NMI watchdog.

The PMC based NMI watchdog implementation does not actually need any
performance counting. It just needs a per-CPU NMI timer source. X86
systems can make anything that emits an interrupt descriptor (IOAPIC,
MSI(-X), etc) become an NMI source. So any time goes. Including the
HPET. And while they can't really operate per-CPU, in almost all cases
you only really want the NMI on *all* CPUs, rather than per-CPU.

So I took a stab at building an HPET based NMI watchdog. In my (QEMU
based) testing, it's fully functional and can successfully detect when
CPUs get stuck. It even survives suspend/resume cycles.

For now, its enablement is a config time option because the hardlockup
framework does not support dynamic switching of multiple detectors.
That's ok for our use case. But maybe something for the interested
reader to tackle eventually :).

You can enable the HPET watchdog by default by setting

  CONFIG_HARDLOCKUP_DETECTOR_HPET_DEFAULT=y

or passing "hpet=watchdog" to the kernel command line. When active, it
will emit a kernel log message to indicate it works:

  [    0.179176] hpet: HPET watchdog initialized on timer 0, GSI 2

The HPET can only be in either watchdog or generic mode. I am a bit
worried about IO-APIC pin allocation logic, so I opted to reuse the
generic timer pin. And that means I'm effectively breaking the normal
interrupt delivery path. so the easy way out was to say when watchdog is
active, PIT and HPET are not available as timer sources. Which is ok on
modern systems. There are way too many (unreliable) timer sources on x86
already. Trimming a few surely won't hurt.

I'm open to inputs on how to make the HPET multi-purpose though, in case
anyone feels strongly about it.

Alex

Alexander Graf (2):
  x86/ioapic: Add NMI delivery configuration helper
  hpet: Add HPET-based NMI watchdog support

 .../admin-guide/kernel-parameters.txt         |   5 +-
 arch/x86/Kconfig                              |  19 ++
 arch/x86/include/asm/io_apic.h                |   2 +
 arch/x86/kernel/apic/io_apic.c                |  32 ++++
 arch/x86/kernel/hpet.c                        | 172 ++++++++++++++++++
 arch/x86/kernel/i8253.c                       |   9 +
 drivers/char/hpet.c                           |   3 +
 include/linux/hpet.h                          |  14 ++
 8 files changed, 255 insertions(+), 1 deletion(-)

-- 
2.47.1




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

Re: [PATCH 0/2] Add HPET NMI Watchdog support

Posted by Alexander Graf 4 days, 9 hours ago

On 02.02.26 18:43, Alexander Graf wrote:
> The current NMI watchdog relies on performance counters and consistently
> occupies one on each CPU. When running virtual machines, we want to pass
> performance counters to virtual machines so they can make use of them.
> In addition the host system wants to use performance counters to check
> the system to identify when anything looks abnormal, such as split
> locks.
>
> That makes PMCs a precious resource. So any PMC we can free up is a PMC
> we can use for something useful. That made me look at the NMI watchdog.
>
> The PMC based NMI watchdog implementation does not actually need any
> performance counting. It just needs a per-CPU NMI timer source. X86
> systems can make anything that emits an interrupt descriptor (IOAPIC,
> MSI(-X), etc) become an NMI source. So any time goes. Including the
> HPET. And while they can't really operate per-CPU, in almost all cases
> you only really want the NMI on *all* CPUs, rather than per-CPU.
>
> So I took a stab at building an HPET based NMI watchdog. In my (QEMU
> based) testing, it's fully functional and can successfully detect when
> CPUs get stuck. It even survives suspend/resume cycles.
>
> For now, its enablement is a config time option because the hardlockup
> framework does not support dynamic switching of multiple detectors.
> That's ok for our use case. But maybe something for the interested
> reader to tackle eventually :).
>
> You can enable the HPET watchdog by default by setting
>
>    CONFIG_HARDLOCKUP_DETECTOR_HPET_DEFAULT=y
>
> or passing "hpet=watchdog" to the kernel command line. When active, it
> will emit a kernel log message to indicate it works:
>
>    [    0.179176] hpet: HPET watchdog initialized on timer 0, GSI 2
>
> The HPET can only be in either watchdog or generic mode. I am a bit
> worried about IO-APIC pin allocation logic, so I opted to reuse the
> generic timer pin. And that means I'm effectively breaking the normal
> interrupt delivery path. so the easy way out was to say when watchdog is
> active, PIT and HPET are not available as timer sources. Which is ok on
> modern systems. There are way too many (unreliable) timer sources on x86
> already. Trimming a few surely won't hurt.
>
> I'm open to inputs on how to make the HPET multi-purpose though, in case
> anyone feels strongly about it.


Sorry for the resend. I caught an issue while sending out the series, 
hit ctrl-c before thinking and suddenly had a half sent series. Discard 
this one. Happy review on the real, full one :)


Alex




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

[PATCH 1/2] x86/ioapic: Add NMI delivery configuration helper