kernel/watchdog.c | 244 +++++++++++++++++++++++++++++++++++++++++++++- lib/Kconfig.debug | 13 +++ 2 files changed, 253 insertions(+), 4 deletions(-)
Hi, guys. I have implemented a low-overhead method for detecting interrupt storm in softlockup. Please review it, all comments are welcome. Changes from v5 to v6: - Use "./scripts/checkpatch.pl --strict" to get a few extra style nits and fix them. - Squash patch #3 into patch #1, and wrapp the help text to 80 columns. - Sort existing headers alphabetically in watchdog.c - Drop "softlockup_hardirq_cpus", just read "hardirq_counts" and see if it's non-NULL. - Store "nr_irqs" in a local variable. - Simplify the calculation of "cpu_diff". Changes from v4 to v5: - Rearranging variable placement to make code look neater. Changes from v3 to v4: - Renaming some variable and function names to make the code logic more readable. - Change the code location to avoid predeclaring. - Just swap rather than a double loop in tabulate_irq_count. - Since nr_irqs has the potential to grow at runtime, bounds-check logic has been implemented. - Add SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob. Changes from v2 to v3: - From Liu Song, using enum instead of macro for cpu_stats, shortening the name 'idx_to_stat' to 'stats', adding 'get_16bit_precesion' instead of using right shift operations, and using 'struct irq_counts'. - From kernel robot test, using '__this_cpu_read' and '__this_cpu_write' instead of accessing to an per-cpu array directly, in order to avoid this warning. 'sparse: incorrect type in initializer (different modifiers)' Changes from v1 to v2: - From Douglas, optimize the memory of cpustats. With the maximum number of CPUs, that's now this. 2 * 8192 * 4 + 1 * 8192 * 5 * 4 + 1 * 8192 = 237,568 bytes. - From Liu Song, refactor the code format and add necessary comments. - From Douglas, use interrupt counts instead of interrupt time to determine the cause of softlockup. - Remove the cmdline parameter added in PATCHv1. Bitao Hu (2): watchdog/softlockup: low-overhead detection of interrupt watchdog/softlockup: report the most frequent interrupts kernel/watchdog.c | 244 +++++++++++++++++++++++++++++++++++++++++++++- lib/Kconfig.debug | 13 +++ 2 files changed, 253 insertions(+), 4 deletions(-) -- 2.37.1 (Apple Git-137.1)
On Thu 2024-02-08 20:54:24, Bitao Hu wrote: > Hi, guys. > I have implemented a low-overhead method for detecting interrupt > storm in softlockup. Please review it, all comments are welcome. I like this work. I wonder if you might be interested also in reporting problems when soft IRQs are offloaded to the "ksoftirqd/X" kthreads for too long. The kthreads are processes with normal priority. As a result, offloading soft IRQs to kthreads might cause huge difference on loaded systems. I have seen several problems when a flood of softIRQs triggered offloading them. And it caused several second delays on networking interfaces. Best Regards, Petr
On 2024/2/9 22:48, Petr Mladek wrote: > On Thu 2024-02-08 20:54:24, Bitao Hu wrote: >> Hi, guys. >> I have implemented a low-overhead method for detecting interrupt >> storm in softlockup. Please review it, all comments are welcome. > > I like this work. > > I wonder if you might be interested also in reporting problems > when soft IRQs are offloaded to the "ksoftirqd/X" kthreads > for too long. > > The kthreads are processes with normal priority. As a result, > offloading soft IRQs to kthreads might cause huge difference > on loaded systems. > > I have seen several problems when a flood of softIRQs triggered > offloading them. And it caused several second delays on networking > interfaces. > This is an interesting issue! I had considered the matter of softirq while working on this, but since there were no actual issues at hand, I didn't conduct an analysis. Your mention of this problem has opened my eyes. Best Regards, Bitao
© 2016 - 2026 Red Hat, Inc.