From nobody Wed Apr 15 04:16:50 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04794C19F28 for ; Wed, 27 Jul 2022 06:10:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230377AbiG0GKA (ORCPT ); Wed, 27 Jul 2022 02:10:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230010AbiG0GJw (ORCPT ); Wed, 27 Jul 2022 02:09:52 -0400 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4DA0402C5; Tue, 26 Jul 2022 23:09:49 -0700 (PDT) Received: from dggpemm500024.china.huawei.com (unknown [172.30.72.53]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4Lt3Fn5Sjqz1HBsJ; Wed, 27 Jul 2022 14:06:53 +0800 (CST) Received: from dggpemm500006.china.huawei.com (7.185.36.236) by dggpemm500024.china.huawei.com (7.185.36.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Wed, 27 Jul 2022 14:09:46 +0800 Received: from thunder-town.china.huawei.com (10.174.178.55) by dggpemm500006.china.huawei.com (7.185.36.236) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Wed, 27 Jul 2022 14:09:45 +0800 From: Zhen Lei To: "Paul E . McKenney" , Frederic Weisbecker , Neeraj Upadhyay , "Josh Triplett" , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Joel Fernandes , , CC: Zhen Lei Subject: [PATCH] rcu: Display registers of self-detected stall as far as possible Date: Wed, 27 Jul 2022 14:09:29 +0800 Message-ID: <20220727060929.1149-1-thunder.leizhen@huawei.com> X-Mailer: git-send-email 2.26.0.windows.1 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.174.178.55] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemm500006.china.huawei.com (7.185.36.236) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" For architectures that do not support NMI interrupts, registers is not printed when rcu stall is self-detected. However, this information is useful for analyzing the root cause of the fault. Fortunately, the rcu stall is always detected in the tick interrupt handler. So we can take it through get_irq_regs() and display it through show_regs(). Further, show_regs() unwind the call trace based on 'regs', the worthless call trace associated with tick handling will be omitted, this helps us to focus more on the problem. This is an example on arm64: [ 27.501721] rcu: INFO: rcu_preempt self-detected stall on CPU [ 27.502238] rcu: 0-....: (1250 ticks this GP) idle=3D4f7/1/0x4000000= 000000000 softirq=3D2594/2594 fqs=3D619 [ 27.502632] (t=3D1251 jiffies g=3D2989 q=3D29 ncpus=3D4) [ 27.503845] CPU: 0 PID: 306 Comm: test0 Not tainted 5.19.0-rc7-00009-g1c= 1a6c29ff99-dirty #46 [ 27.504732] Hardware name: linux,dummy-virt (DT) [ 27.504947] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE= =3D--) [ 27.504998] pc : arch_counter_read+0x18/0x24 [ 27.505301] lr : arch_counter_read+0x18/0x24 [ 27.505328] sp : ffff80000b29bdf0 [ 27.505345] x29: ffff80000b29bdf0 x28: 0000000000000000 x27: 00000000000= 00000 [ 27.505475] x26: 0000000000000000 x25: 0000000000000000 x24: 00000000000= 00000 [ 27.505553] x23: 0000000000001f40 x22: ffff800009849c48 x21: 000000065f8= 71ae0 [ 27.505627] x20: 00000000000025ec x19: ffff80000a6eb300 x18: fffffffffff= fffff [ 27.505654] x17: 0000000000000001 x16: 0000000000000000 x15: ffff80000a6= d0296 [ 27.505681] x14: ffffffffffffffff x13: ffff80000a29bc18 x12: 00000000000= 00426 [ 27.505709] x11: 0000000000000162 x10: ffff80000a2f3c18 x9 : ffff80000a2= 9bc18 [ 27.505736] x8 : 00000000ffffefff x7 : ffff80000a2f3c18 x6 : 00000000759= bd013 [ 27.505761] x5 : 01ffffffffffffff x4 : 0002dc6c00000000 x3 : 00000000000= 00017 [ 27.505787] x2 : 00000000000025ec x1 : ffff80000b29bdf0 x0 : 0000000075a= 30653 [ 27.505937] Call trace: [ 27.506002] arch_counter_read+0x18/0x24 [ 27.506171] ktime_get+0x48/0xa0 [ 27.506207] test_task+0x70/0xf0 [ 27.506227] kthread+0x10c/0x110 [ 27.506243] ret_from_fork+0x10/0x20 The old output is as follows: [ 27.944550] rcu: INFO: rcu_preempt self-detected stall on CPU [ 27.944980] rcu: 0-....: (1249 ticks this GP) idle=3Dcbb/1/0x4000000= 000000000 softirq=3D2610/2610 fqs=3D614 [ 27.945407] (t=3D1251 jiffies g=3D2681 q=3D28 ncpus=3D4) [ 27.945731] Task dump for CPU 0: [ 27.945844] task:test0 state:R running task stack: 0 p= id: 306 ppid: 2 flags:0x0000000a [ 27.946073] Call trace: [ 27.946151] dump_backtrace.part.0+0xc8/0xd4 [ 27.946378] show_stack+0x18/0x70 [ 27.946405] sched_show_task+0x150/0x180 [ 27.946427] dump_cpu_task+0x44/0x54 [ 27.947193] rcu_dump_cpu_stacks+0xec/0x130 [ 27.947212] rcu_sched_clock_irq+0xb18/0xef0 [ 27.947231] update_process_times+0x68/0xac [ 27.947248] tick_sched_handle+0x34/0x60 [ 27.947266] tick_sched_timer+0x4c/0xa4 [ 27.947281] __hrtimer_run_queues+0x178/0x360 [ 27.947295] hrtimer_interrupt+0xe8/0x244 [ 27.947309] arch_timer_handler_virt+0x38/0x4c [ 27.947326] handle_percpu_devid_irq+0x88/0x230 [ 27.947342] generic_handle_domain_irq+0x2c/0x44 [ 27.947357] gic_handle_irq+0x44/0xc4 [ 27.947376] call_on_irq_stack+0x2c/0x54 [ 27.947415] do_interrupt_handler+0x80/0x94 [ 27.947431] el1_interrupt+0x34/0x70 [ 27.947447] el1h_64_irq_handler+0x18/0x24 [ 27.947462] el1h_64_irq+0x64/0x68 <--- the above = backtrace is worthless [ 27.947474] arch_counter_read+0x18/0x24 [ 27.947487] ktime_get+0x48/0xa0 [ 27.947501] test_task+0x70/0xf0 [ 27.947520] kthread+0x10c/0x110 [ 27.947538] ret_from_fork+0x10/0x20 Signed-off-by: Zhen Lei Reported-by: kernel test robot --- kernel/rcu/tree_stall.h | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index a001e1e7a99269c..fdc8a222b41881b 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -350,6 +350,21 @@ static int rcu_print_task_stall(struct rcu_node *rnp, = unsigned long flags) } #endif /* #else #ifdef CONFIG_PREEMPT_RCU */ =20 +static void rcu_dump_cpu_task(int cpu) +{ + if (cpu =3D=3D smp_processor_id() && in_irq()) { + struct pt_regs *regs; + + regs =3D get_irq_regs(); + if (regs) { + show_regs(regs); + return; + } + } + + dump_cpu_task(cpu); +} + /* * Dump stacks of all tasks running on stalled CPUs. First try using * NMIs, but fall back to manual remote stack tracing on architectures @@ -369,7 +384,7 @@ static void rcu_dump_cpu_stacks(void) if (cpu_is_offline(cpu)) pr_err("Offline CPU %d blocking current GP.\n", cpu); else if (!trigger_single_cpu_backtrace(cpu)) - dump_cpu_task(cpu); + rcu_dump_cpu_task(cpu); } raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } --=20 2.25.1