From nobody Wed Dec 24 21:32:36 2025 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 164F85DF30 for ; Tue, 23 Jan 2024 12:12:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706011964; cv=none; b=TqJ/euDQWThaoihwDZ8c8Vbp6tgyK6+1LyHzNUyj70p6W2TT1J12SS71DGqRDJZheEcuMUYpUyVRzzxYg1Pb2fN0eNNxa/ozTh5OG/aA/zx1TiE8+8fokuxaHR+XaT0CP/EfNhbNFQX7fqdETxvPz/ijVZg/SCBSAKoic6211sI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706011964; c=relaxed/simple; bh=bf8N058MM2i5Zf5QpJiVM/3LphTudf9h6j37YkLGJZg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=RQU5+zOppUetXGF+PmUmYjXT1hzk2Wzt4MQeV1SOecucqZr6lkF5KiyTYUSfcY0eonx2X0bBSXDaMbsHX87rYk7x3belaw3jMN8b7wGq+ulKPkGOF0G84fi5759sW0RG7kI1XfldknnWLYdfrSmoKRlW3V0DXVlk5wa0WWCAmIQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; arc=none smtp.client-ip=115.124.30.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046051;MF=yaoma@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0W.CnxF7_1706011949; Received: from localhost.localdomain(mailfrom:yaoma@linux.alibaba.com fp:SMTPD_---0W.CnxF7_1706011949) by smtp.aliyun-inc.com; Tue, 23 Jan 2024 20:12:32 +0800 From: Bitao Hu To: dianders@chromium.org, akpm@linux-foundation.org, pmladek@suse.com, tglx@linutronix.de, maz@kernel.org, liusong@linux.alibaba.com Cc: linux-kernel@vger.kernel.org, Bitao Hu Subject: [PATCH 1/3] watchdog/softlockup: low-overhead detection of interrupt storm Date: Tue, 23 Jan 2024 20:12:21 +0800 Message-Id: <20240123121223.22318-2-yaoma@linux.alibaba.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20240123121223.22318-1-yaoma@linux.alibaba.com> References: <20240123121223.22318-1-yaoma@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following softlockup is caused by interrupt storm, but it cannot be identified from the call tree. Because the call tree is just a snapshot and doesn't fully capture the behavior of the CPU during the soft lockup. watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921] ... Call trace: __do_softirq+0xa0/0x37c __irq_exit_rcu+0x108/0x140 irq_exit+0x14/0x20 __handle_domain_irq+0x84/0xe0 gic_handle_irq+0x80/0x108 el0_irq_naked+0x50/0x58 Therefore=EF=BC=8CI think it is necessary to report CPU utilization during = the softlockup_thresh period (report once every sample_period, for a total of 5 reportings), like this: watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921] CPU#28 Utilization every 4s during lockup: #1: 0.0% system, 0.0% softirq, 100.0% hardirq, 0.0% idle #2: 0.0% system, 0.0% softirq, 100.0% hardirq, 0.0% idle #3: 0.0% system, 0.0% softirq, 100.0% hardirq, 0.0% idle #4: 0.0% system, 0.0% softirq, 100.0% hardirq, 0.0% idle #5: 0.0% system, 0.0% softirq, 100.0% hardirq, 0.0% idle ... This would be helpful in determining whether an interrupt storm has occurred or in identifying the cause of the softlockup. The criteria for determination are as follows: a. If the hardirq utilization is high, then interrupt storm should be considered and the root cause cannot be determined from the call tree. b. If the softirq utilization is high, then we could analyze the call tree but it may cannot reflect the root cause. c. If the system utilization is high, then we could analyze the root cause from the call tree. Signed-off-by: Bitao Hu --- kernel/watchdog.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 81a8862295d6..9fad10e0a147 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -23,6 +23,8 @@ #include #include #include +#include +#include =20 #include #include @@ -441,6 +443,58 @@ static int is_softlockup(unsigned long touch_ts, return 0; } =20 +#ifdef CONFIG_IRQ_TIME_ACCOUNTING +static DEFINE_PER_CPU(u64, cpustat_old[NR_STATS]); +static DEFINE_PER_CPU(u64, cpustat_diff[5][NR_STATS]); +static DEFINE_PER_CPU(int, cpustat_tail); + +static void update_cpustat(void) +{ + u64 *old =3D this_cpu_ptr(cpustat_old); + u64 (*diff)[NR_STATS] =3D this_cpu_ptr(cpustat_diff); + int tail =3D this_cpu_read(cpustat_tail), i; + struct kernel_cpustat kcpustat; + u64 *cpustat =3D kcpustat.cpustat; + + kcpustat_cpu_fetch(&kcpustat, smp_processor_id()); + for (i =3D 0; i < NR_STATS; i++) { + diff[tail][i] =3D cpustat[i] - old[i]; + old[i] =3D cpustat[i]; + } + this_cpu_write(cpustat_tail, (tail + 1) % 5); +} + +static void print_cpustat(void) +{ + int i, j, k; + u64 a[5][NR_STATS], b[5][NR_STATS]; + u64 (*diff)[NR_STATS] =3D this_cpu_ptr(cpustat_diff); + int tail =3D this_cpu_read(cpustat_tail); + u32 period_us =3D sample_period / 1000; + + for (i =3D 0; i < 5; i++) { + for (j =3D 0; j < NR_STATS; j++) { + a[i][j] =3D 100 * (diff[i][j] / 1000); + b[i][j] =3D 10 * do_div(a[i][j], period_us); + do_div(b[i][j], period_us); + } + } + printk(KERN_CRIT "CPU#%d Utilization every %us during lockup:\n", + smp_processor_id(), period_us/1000000); + for (k =3D 0, i =3D tail; k < 5; k++, i =3D (i + 1) % 5) { + printk(KERN_CRIT "\t#%d: %llu.%llu%% system,\t%llu.%llu%% softirq,\t" + "%llu.%llu%% hardirq,\t%llu.%llu%% idle\n", k+1, + a[i][CPUTIME_SYSTEM], b[i][CPUTIME_SYSTEM], + a[i][CPUTIME_SOFTIRQ], b[i][CPUTIME_SOFTIRQ], + a[i][CPUTIME_IRQ], b[i][CPUTIME_IRQ], + a[i][CPUTIME_IDLE], b[i][CPUTIME_IDLE]); + } +} +#else +static inline void update_cpustat(void) { } +static inline void print_cpustat(void) { } +#endif + /* watchdog detector functions */ static DEFINE_PER_CPU(struct completion, softlockup_completion); static DEFINE_PER_CPU(struct cpu_stop_work, softlockup_stop_work); @@ -504,6 +558,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hr= timer *hrtimer) */ period_ts =3D READ_ONCE(*this_cpu_ptr(&watchdog_report_ts)); =20 + /* update cpu usage stat */ + update_cpustat(); + /* Reset the interval when touched by known problematic code. */ if (period_ts =3D=3D SOFTLOCKUP_DELAY_REPORT) { if (unlikely(__this_cpu_read(softlockup_touch_sync))) { @@ -539,6 +596,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hr= timer *hrtimer) pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", smp_processor_id(), duration, current->comm, task_pid_nr(current)); + print_cpustat(); print_modules(); print_irqtrace_events(current); if (regs) --=20 2.37.1 (Apple Git-137.1) From nobody Wed Dec 24 21:32:36 2025 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1CCFA5EE81 for ; Tue, 23 Jan 2024 12:12:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706011967; cv=none; b=lvo36cBkjxxpPEeBornJhiVdFMv57Z3WpPUuQVmrcRHHeCr4c+Rcn8QY0/mtW0EZ+RSHqTiT11fQUH43f3VoJ92yvrGA8d49ADBaNd4EdbpiI6cgJLmM+e8Tg/dBne9vIpT07L3GZixA0hH/lRcBMF7Tw5WXIr2zsjTnuOPG2CI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706011967; c=relaxed/simple; bh=Po7CzWHs2z5CKg+fb71m1ieZpQr7TiP52PT4BRY/dnc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Zb0W17a6prOC37KAophnepasWISRa+4fryizf/kUT4Qh2gsk/z0g4uAn8gB2H2FsUP4FFfQDrwFq+CZ7QM6j9TSiqDMd5qaf4mdMhhMeIEhfuph3lSj7ns8ISYx8QYxvZS2sM5KUWY1TFo0sA5DKb0uRP+9yq4vygYgLwFw+kDk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; arc=none smtp.client-ip=115.124.30.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R971e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045192;MF=yaoma@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0W.CnxGb_1706011953; Received: from localhost.localdomain(mailfrom:yaoma@linux.alibaba.com fp:SMTPD_---0W.CnxGb_1706011953) by smtp.aliyun-inc.com; Tue, 23 Jan 2024 20:12:35 +0800 From: Bitao Hu To: dianders@chromium.org, akpm@linux-foundation.org, pmladek@suse.com, tglx@linutronix.de, maz@kernel.org, liusong@linux.alibaba.com Cc: linux-kernel@vger.kernel.org, Bitao Hu Subject: [PATCH 2/3] watchdog/softlockup: report the most time-consuming hardirq Date: Tue, 23 Jan 2024 20:12:22 +0800 Message-Id: <20240123121223.22318-3-yaoma@linux.alibaba.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20240123121223.22318-1-yaoma@linux.alibaba.com> References: <20240123121223.22318-1-yaoma@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When the watchdog determines that the current soft lockup is due to an interrupt storm based on CPU utilization, reporting the top three most time-consuming hardirq can be quite useful for further troubleshooting. Below is an example of interrupt storm. The call tree does not provide useful information, but we can analyze which interrupt caused the soft lockup by using the time-consuming information of hardirq. [ 67.714044] watchdog: BUG: soft lockup - CPU#9 stuck for 28s! [swapper/9= :0] [ 67.714548] CPU#9 Utilization every 4s during lockup: [ 67.714549] #1: 0.0% system, 0.0% softirq, 95.0% hardirq, 0.0= % idle [ 67.714551] #2: 0.0% system, 0.0% softirq, 90.0% hardirq, 0.0= % idle [ 67.714553] #3: 0.0% system, 0.0% softirq, 90.0% hardirq, 0.0= % idle [ 67.714555] #4: 0.0% system, 0.0% softirq, 95.0% hardirq, 0.0= % idle [ 67.714556] #5: 0.0% system, 0.0% softirq, 90.0% hardirq, 0.0= % idle [ 67.714570] CPU#9 Detect HardIRQ Time exceeds 50% since 45s. Most time-c= onsuming HardIRQs: [ 67.714571] #1: 99.9% irq#7(IPI) [ 67.714575] #2: 0.0% irq#10(arch_timer) [ 67.714578] #3: 0.0% irq#2(IPI) ... [ 67.714654] Call trace: [ 67.714656] __do_softirq+0xa8/0x364 Signed-off-by: Bitao Hu --- include/linux/irq.h | 9 ++ include/linux/irqdesc.h | 2 + kernel/irq/irqdesc.c | 9 +- kernel/watchdog.c | 186 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 205 insertions(+), 1 deletion(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index 90081afa10ce..3a3e8b1e227e 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -251,6 +251,15 @@ enum { IRQD_RESEND_WHEN_IN_PROGRESS =3D BIT(30), }; =20 + +/* + * struct per_irqtime - per irq stats for interrupt storm detection + */ +struct per_irqtime { + u64 start_time; + u64 delta; +}; + #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors) =20 static inline bool irqd_is_setaffinity_pending(struct irq_data *d) diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h index d9451d456a73..c5407cdcf2d6 100644 --- a/include/linux/irqdesc.h +++ b/include/linux/irqdesc.h @@ -29,6 +29,7 @@ struct pt_regs; * @wake_depth: enable depth, for multiple irq_set_irq_wake() callers * @tot_count: stats field for non-percpu irqs * @irq_count: stats field to detect stalled irqs + * @irq_time: stats field to detect interrupt storm * @last_unhandled: aging timer for unhandled count * @irqs_unhandled: stats field for spurious unhandled interrupts * @threads_handled: stats field for deferred spurious detection of thread= ed handlers @@ -64,6 +65,7 @@ struct irq_desc { unsigned int wake_depth; /* nested wake enables */ unsigned int tot_count; unsigned int irq_count; /* For detecting broken IRQs */ + struct per_irqtime __percpu *irq_time; unsigned long last_unhandled; /* Aging timer for unhandled count */ unsigned int irqs_unhandled; atomic_t threads_handled; diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 27ca1c866f29..a4d7c324cba2 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -422,9 +422,12 @@ static struct irq_desc *alloc_desc(int irq, int node, = unsigned int flags, desc->kstat_irqs =3D alloc_percpu(unsigned int); if (!desc->kstat_irqs) goto err_desc; + desc->irq_time =3D alloc_percpu(struct per_irqtime); + if (!desc->irq_time) + goto err_kstat; =20 if (alloc_masks(desc, node)) - goto err_kstat; + goto err_irqtime; =20 raw_spin_lock_init(&desc->lock); lockdep_set_class(&desc->lock, &irq_desc_lock_class); @@ -439,6 +442,8 @@ static struct irq_desc *alloc_desc(int irq, int node, u= nsigned int flags, =20 return desc; =20 +err_irqtime: + free_percpu(desc->irq_time); err_kstat: free_percpu(desc->kstat_irqs); err_desc: @@ -451,6 +456,7 @@ static void irq_kobj_release(struct kobject *kobj) struct irq_desc *desc =3D container_of(kobj, struct irq_desc, kobj); =20 free_masks(desc); + free_percpu(desc->irq_time); free_percpu(desc->kstat_irqs); kfree(desc); } @@ -594,6 +600,7 @@ int __init early_irq_init(void) =20 for (i =3D 0; i < count; i++) { desc[i].kstat_irqs =3D alloc_percpu(unsigned int); + desc[i].irq_time =3D alloc_percpu(struct per_irqtime); alloc_masks(&desc[i], node); raw_spin_lock_init(&desc[i].lock); lockdep_set_class(&desc[i].lock, &irq_desc_lock_class); diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 9fad10e0a147..f347c5d8c5c1 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -25,6 +25,12 @@ #include #include #include +#include +#include +#include +#include +#include +#include =20 #include #include @@ -431,11 +437,15 @@ void touch_softlockup_watchdog_sync(void) __this_cpu_write(watchdog_report_ts, SOFTLOCKUP_DELAY_REPORT); } =20 +static void set_potential_softlockup(unsigned long now, unsigned long touc= h_ts); + static int is_softlockup(unsigned long touch_ts, unsigned long period_ts, unsigned long now) { if ((watchdog_enabled & WATCHDOG_SOFTOCKUP_ENABLED) && watchdog_thresh) { + /* Softlockup may occur in the current period */ + set_potential_softlockup(now, period_ts); /* Warn about unreasonable delays. */ if (time_after(now, period_ts + get_softlockup_thresh())) return now - touch_ts; @@ -448,6 +458,8 @@ static DEFINE_PER_CPU(u64, cpustat_old[NR_STATS]); static DEFINE_PER_CPU(u64, cpustat_diff[5][NR_STATS]); static DEFINE_PER_CPU(int, cpustat_tail); =20 +static void print_hardirq_time(void); + static void update_cpustat(void) { u64 *old =3D this_cpu_ptr(cpustat_old); @@ -489,10 +501,179 @@ static void print_cpustat(void) a[i][CPUTIME_IRQ], b[i][CPUTIME_IRQ], a[i][CPUTIME_IDLE], b[i][CPUTIME_IDLE]); } + print_hardirq_time(); +} + +static DEFINE_PER_CPU(u64, hardirq_start_time); + +#define SOFTLOCKUP_HARDIRQ 0 +static DEFINE_PER_CPU(long, softlockup_flags); + +static void find_irqtime_top3(u64 *irqtime, int *irq, u64 perirq_time, int= perirq_id) +{ + if (perirq_time > irqtime[0] || (perirq_time =3D=3D irqtime[0] && perirq_= id < irq[0])) { + irqtime[2] =3D irqtime[1]; + irq[2] =3D irq[1]; + irqtime[1] =3D irqtime[0]; + irq[1] =3D irq[0]; + irqtime[0] =3D perirq_time; + irq[0] =3D perirq_id; + } else if (perirq_time > irqtime[1] || (perirq_time =3D=3D irqtime[1] && = perirq_id < irq[1])) { + irqtime[2] =3D irqtime[1]; + irq[2] =3D irq[1]; + irqtime[1] =3D perirq_time; + irq[1] =3D perirq_id; + } else if (perirq_time > irqtime[1] || (perirq_time =3D=3D irqtime[2] && = perirq_id < irq[2])) { + irqtime[2] =3D perirq_time; + irq[2] =3D perirq_id; + } +} + +/* + * If the proportion of time spent handling irq exceeds 50% during a sampl= ing period, + * then it is necessary to tally the handling time of each irq. + */ +static inline bool need_trace_irqtime(int type) +{ + int tail =3D this_cpu_read(cpustat_tail); + + if (--tail =3D=3D -1) + tail =3D 4; + return this_cpu_read(cpustat_diff[tail][type]) > sample_period/2; +} + +static void irq_handler_entry_callback(void *data, int irq, struct irqacti= on *action) +{ + this_cpu_ptr(irq_to_desc(irq)->irq_time)->start_time =3D local_clock(); +} + +static void irq_handler_exit_callback(void *data, int irq, struct irqactio= n *action, int ret) +{ + u64 delta; + struct per_irqtime *irq_time; + + if (test_bit(SOFTLOCKUP_HARDIRQ, this_cpu_ptr(&softlockup_flags))) { + irq_time =3D this_cpu_ptr(irq_to_desc(irq)->irq_time); + delta =3D local_clock() - irq_time->start_time; + irq_time->delta +=3D delta; + } +} +/* + * Mark softlockup as potentially caused by hardirq + */ +static void set_potential_softlockup_hardirq(void) +{ + if (!need_trace_irqtime(CPUTIME_IRQ)) + return; + + if (test_and_set_bit(SOFTLOCKUP_HARDIRQ, this_cpu_ptr(&softlockup_flags))) + return; + + __this_cpu_write(hardirq_start_time, local_clock()); +} + +static void clear_potential_softlockup_hardirq(void) +{ + unsigned int i; + unsigned long flags; + struct irq_desc *desc; + + local_irq_save(flags); + clear_bit(SOFTLOCKUP_HARDIRQ, this_cpu_ptr(&softlockup_flags)); + for_each_irq_desc(i, desc) { + if (!desc) + continue; + memset(this_cpu_ptr(desc->irq_time), 0, sizeof(struct per_irqtime)); + } + local_irq_restore(flags); +} + +static void hook_hardirq_events(void) +{ + int ret; + + ret =3D register_trace_irq_handler_entry(irq_handler_entry_callback, NULL= ); + if (ret) + goto out_err; + ret =3D register_trace_irq_handler_exit(irq_handler_exit_callback, NULL); + if (ret) + goto out_unreg_entry; + return; + +out_unreg_entry: + unregister_trace_irq_handler_entry(irq_handler_entry_callback, NULL); +out_err: + return; +} + +static void unhook_hardirq_events(void) +{ + unregister_trace_irq_handler_entry(irq_handler_entry_callback, NULL); + unregister_trace_irq_handler_exit(irq_handler_exit_callback, NULL); } + +/* + * Mark that softlockup may occur + */ +static void set_potential_softlockup(unsigned long now, unsigned long peri= od_ts) +{ + if (time_after_eq(now, period_ts + get_softlockup_thresh()/5)) + set_potential_softlockup_hardirq(); +} + +static void clear_potential_softlockup(void) +{ + clear_potential_softlockup_hardirq(); +} + +static void print_hardirq_time(void) +{ + struct per_irqtime *irq_time; + struct irq_desc *desc; + u64 hardirqtime_top3[3] =3D {0, 0, 0}; + int hardirq_top3[3] =3D {-1, -1, -1}; + u64 start_time, now, a; + u32 period_us, i, b; + + if (test_bit(SOFTLOCKUP_HARDIRQ, this_cpu_ptr(&softlockup_flags))) { + start_time =3D __this_cpu_read(hardirq_start_time); + now =3D local_clock(); + period_us =3D (now - start_time)/1000; + /* Find the top three most time-consuming interrupts */ + for_each_irq_desc(i, desc) { + if (!desc) + continue; + irq_time =3D this_cpu_ptr(desc->irq_time); + find_irqtime_top3(hardirqtime_top3, hardirq_top3, irq_time->delta, i); + } + printk(KERN_CRIT "CPU#%d Detect HardIRQ Time exceeds 50%% since %llds. M= ost time-consuming HardIRQs:\n", + smp_processor_id(), ns_to_timespec64(start_time - sample_period).tv_sec= ); + for (i =3D 0; i < 3; i++) { + if (hardirq_top3[i] =3D=3D -1) + break; + a =3D 100 * (hardirqtime_top3[i] / 1000); + b =3D 10 * do_div(a, period_us); + do_div(b, period_us); + desc =3D irq_to_desc(hardirq_top3[i]); + if (desc && desc->action) + printk(KERN_CRIT "\t#%u: %llu.%u%% irq#%d(%s)\n", + i+1, a, b, hardirq_top3[i], desc->action->name); + else + printk(KERN_CRIT "\t#%u: %llu.%u%% irq#%d\n", + i+1, a, b, hardirq_top3[i]); + } + if (!need_trace_irqtime(CPUTIME_IRQ)) + clear_potential_softlockup_hardirq(); + } +} + #else static inline void update_cpustat(void) { } static inline void print_cpustat(void) { } +static inline void set_potential_softlockup(unsigned long now, unsigned lo= ng period_ts) { } +static inline void clear_potential_softlockup(void) { } +static inline void hook_hardirq_events(void) { } +static inline void unhook_hardirq_events(void) { } #endif =20 /* watchdog detector functions */ @@ -510,6 +691,7 @@ static DEFINE_PER_CPU(struct cpu_stop_work, softlockup_= stop_work); static int softlockup_fn(void *data) { update_touch_ts(); + clear_potential_softlockup(); complete(this_cpu_ptr(&softlockup_completion)); =20 return 0; @@ -674,6 +856,8 @@ static void softlockup_stop_all(void) if (!softlockup_initialized) return; =20 + unhook_hardirq_events(); + for_each_cpu(cpu, &watchdog_allowed_mask) smp_call_on_cpu(cpu, softlockup_stop_fn, NULL, false); =20 @@ -690,6 +874,8 @@ static void softlockup_start_all(void) { int cpu; =20 + hook_hardirq_events(); + cpumask_copy(&watchdog_allowed_mask, &watchdog_cpumask); for_each_cpu(cpu, &watchdog_allowed_mask) smp_call_on_cpu(cpu, softlockup_start_fn, NULL, false); --=20 2.37.1 (Apple Git-137.1) From nobody Wed Dec 24 21:32:36 2025 Received: from out30-119.freemail.mail.aliyun.com (out30-119.freemail.mail.aliyun.com [115.124.30.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE5F55EE86 for ; Tue, 23 Jan 2024 12:12:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.119 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706011967; cv=none; b=dqEACPZ9TV9NgPE2rc9pSdZG87AmrDBZv8YN4yYKYl/iqdm1OT6j5k/TTw5gAnIQrOevRGLorO8UpcDG1C5GAZUDF2qe3zfWSHpnhJM5XPIJl+IwPu9RYdqm7J/6qHvSD8slUJ5rdSuW+0hOZWxXCieq+hqdk1UdfKU0YsziQTU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706011967; c=relaxed/simple; bh=APkNASydfGXYMRnXuI9Rfle0j1POhkAXYjwSdofzsBc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RVEZ4/83yxiglC9DPS/rEJV54Dk9d5+MzDIi6Yos0jmv1u1WyaRzsQVqfYlzk6vGiENnCtCLHhh2JOcr6JycepIjr/9z3Zb0x7yWjElnnXR2XH5qWKZ1b7eSJlDFL8lyQYVtY35w1JOCe5TGN2Zl52N/AEwO9E1sFjM7i4lEvcg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; arc=none smtp.client-ip=115.124.30.119 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R381e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046051;MF=yaoma@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0W.CnxHG_1706011955; Received: from localhost.localdomain(mailfrom:yaoma@linux.alibaba.com fp:SMTPD_---0W.CnxHG_1706011955) by smtp.aliyun-inc.com; Tue, 23 Jan 2024 20:12:36 +0800 From: Bitao Hu To: dianders@chromium.org, akpm@linux-foundation.org, pmladek@suse.com, tglx@linutronix.de, maz@kernel.org, liusong@linux.alibaba.com Cc: linux-kernel@vger.kernel.org, Bitao Hu Subject: [PATCH 3/3] watchdog/softlockup: add parameter to control the reporting of time-consuming hardirq Date: Tue, 23 Jan 2024 20:12:23 +0800 Message-Id: <20240123121223.22318-4-yaoma@linux.alibaba.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20240123121223.22318-1-yaoma@linux.alibaba.com> References: <20240123121223.22318-1-yaoma@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To obtain a more accurate cause of softlockup, we use tracepoints to measure the time of each hardirq, which may have some impact on performance. A parameter could be added to allow users to enable this feature on demand. Signed-off-by: Bitao Hu --- kernel/watchdog.c | 51 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 48 insertions(+), 3 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index f347c5d8c5c1..314dfd301d8c 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -303,6 +303,9 @@ unsigned int __read_mostly softlockup_panic =3D static bool softlockup_initialized __read_mostly; static u64 __read_mostly sample_period; =20 +static int __read_mostly softlockup_irqtrace; +static bool softlockup_irqtrace_initialized __read_mostly; + /* Timestamp taken after the last successful reschedule. */ static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts); /* Timestamp of the last softlockup report. */ @@ -318,6 +321,13 @@ static int __init softlockup_panic_setup(char *str) } __setup("softlockup_panic=3D", softlockup_panic_setup); =20 +static int __init softlockup_irqtrace_setup(char *str) +{ + get_option(&str, &softlockup_irqtrace); + return 1; +} +__setup("softlockup_irqtrace=3D", softlockup_irqtrace_setup); + static int __init nowatchdog_setup(char *str) { watchdog_user_enabled =3D 0; @@ -635,7 +645,7 @@ static void print_hardirq_time(void) u64 start_time, now, a; u32 period_us, i, b; =20 - if (test_bit(SOFTLOCKUP_HARDIRQ, this_cpu_ptr(&softlockup_flags))) { + if (softlockup_irqtrace && test_bit(SOFTLOCKUP_HARDIRQ, this_cpu_ptr(&sof= tlockup_flags))) { start_time =3D __this_cpu_read(hardirq_start_time); now =3D local_clock(); period_us =3D (now - start_time)/1000; @@ -856,7 +866,10 @@ static void softlockup_stop_all(void) if (!softlockup_initialized) return; =20 - unhook_hardirq_events(); + if (softlockup_irqtrace_initialized) { + unhook_hardirq_events(); + softlockup_irqtrace_initialized =3D false; + } =20 for_each_cpu(cpu, &watchdog_allowed_mask) smp_call_on_cpu(cpu, softlockup_stop_fn, NULL, false); @@ -874,7 +887,10 @@ static void softlockup_start_all(void) { int cpu; =20 - hook_hardirq_events(); + if (softlockup_irqtrace && !softlockup_irqtrace_initialized) { + hook_hardirq_events(); + softlockup_irqtrace_initialized =3D true; + } =20 cpumask_copy(&watchdog_allowed_mask, &watchdog_cpumask); for_each_cpu(cpu, &watchdog_allowed_mask) @@ -1090,6 +1106,26 @@ int proc_watchdog_thresh(struct ctl_table *table, in= t write, return err; } =20 +/* + * /proc/sys/kernel/softlockup_irqtrace + */ +int proc_softlockup_irqtrace(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + int err, old; + + mutex_lock(&watchdog_mutex); + + old =3D READ_ONCE(softlockup_irqtrace); + err =3D proc_dointvec_minmax(table, write, buffer, lenp, ppos); + + if (!err && write && old !=3D READ_ONCE(softlockup_irqtrace)) + proc_watchdog_update(); + + mutex_unlock(&watchdog_mutex); + return err; +} + /* * The cpumask is the mask of possible cpus that the watchdog can run * on, not the mask of cpus it is actually running on. This allows the @@ -1158,6 +1194,15 @@ static struct ctl_table watchdog_sysctls[] =3D { .extra1 =3D SYSCTL_ZERO, .extra2 =3D SYSCTL_ONE, }, + { + .procname =3D "softlockup_irqtrace", + .data =3D &softlockup_irqtrace, + .maxlen =3D sizeof(int), + .mode =3D 0644, + .proc_handler =3D proc_softlockup_irqtrace, + .extra1 =3D SYSCTL_ZERO, + .extra2 =3D SYSCTL_ONE, + }, #ifdef CONFIG_SMP { .procname =3D "softlockup_all_cpu_backtrace", --=20 2.37.1 (Apple Git-137.1)