kernel/panic.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
SYS_INFO_ALL_CPU_BT sends NMI backtrace request to
all CPUs, which dumps an extra backtrace on panic CPU.
Exclude panic CPU from SYS_INFO_ALL_CPU_BT cpu-mask.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
kernel/panic.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/panic.c b/kernel/panic.c
index 27747cecb1af..c08f2695cf42 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -335,9 +335,11 @@ void check_panic_on_warn(const char *origin)
static void panic_other_cpus_shutdown(bool crash_kexec)
{
if (panic_print & SYS_INFO_ALL_CPU_BT) {
+ unsigned int this_cpu = raw_smp_processor_id();
+
/* Temporary allow non-panic CPUs to write their backtraces. */
panic_triggering_all_cpu_backtrace = true;
- trigger_all_cpu_backtrace();
+ trigger_allbutcpu_cpu_backtrace(this_cpu);
panic_triggering_all_cpu_backtrace = false;
}
--
2.50.1.565.gc32cd1483b-goog
On 2025-07-31, Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > SYS_INFO_ALL_CPU_BT sends NMI backtrace request to > all CPUs, which dumps an extra backtrace on panic CPU. Isn't this only true if CONFIG_DEBUG_BUGVERBOSE=y? Also, the information is not the same. trigger_all_cpu_backtrace() will also dump the registers. For CONFIG_DEBUG_BUGVERBOSE=y on the panic CPU, only the stack is dumped. John Ogness
On (25/07/31 09:15), John Ogness wrote: > On 2025-07-31, Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > > SYS_INFO_ALL_CPU_BT sends NMI backtrace request to > > all CPUs, which dumps an extra backtrace on panic CPU. > > Isn't this only true if CONFIG_DEBUG_BUGVERBOSE=y? Are you referring to vpanic()->dump_stack()? Another way to get backtrace on panic CPU is via BUG(), which routes through die()->__die_body(), which prints registers, stack trace, and so on, before it calls into panic(). This might be x86 specific, though. > Also, the information is not the same. trigger_all_cpu_backtrace() will > also dump the registers. For CONFIG_DEBUG_BUGVERBOSE=y on the panic CPU, > only the stack is dumped. Hmm, it's getting complicated, probably isn't worth it then.
On 2025-07-31, Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > On (25/07/31 09:15), John Ogness wrote: >> On 2025-07-31, Sergey Senozhatsky <senozhatsky@chromium.org> wrote: >> > SYS_INFO_ALL_CPU_BT sends NMI backtrace request to >> > all CPUs, which dumps an extra backtrace on panic CPU. >> >> Isn't this only true if CONFIG_DEBUG_BUGVERBOSE=y? > > Are you referring to vpanic()->dump_stack()? Yes. > Another way to get backtrace on panic CPU is via BUG(), which routes > through die()->__die_body(), which prints registers, stack trace, > and so on, before it calls into panic(). This might be x86 specific, > though. So in that case you see 2 stack traces if CONFIG_DEBUG_BUGVERBOSE=y? >> Also, the information is not the same. trigger_all_cpu_backtrace() will >> also dump the registers. For CONFIG_DEBUG_BUGVERBOSE=y on the panic CPU, >> only the stack is dumped. > > Hmm, it's getting complicated, probably isn't worth it then. I think it is worth cleaning up, but it probably won't be such a simple fix. All call paths of redundant stack trace printing should be identified and then we can decide on a clean solution. John
On Thu 2025-07-31 09:51:07, John Ogness wrote: > On 2025-07-31, Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > > On (25/07/31 09:15), John Ogness wrote: > >> On 2025-07-31, Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > >> > SYS_INFO_ALL_CPU_BT sends NMI backtrace request to > >> > all CPUs, which dumps an extra backtrace on panic CPU. > >> > >> Isn't this only true if CONFIG_DEBUG_BUGVERBOSE=y? > > > > Are you referring to vpanic()->dump_stack()? > > Yes. > > > Another way to get backtrace on panic CPU is via BUG(), which routes > > through die()->__die_body(), which prints registers, stack trace, > > and so on, before it calls into panic(). This might be x86 specific, > > though. > > So in that case you see 2 stack traces if CONFIG_DEBUG_BUGVERBOSE=y? > > >> Also, the information is not the same. trigger_all_cpu_backtrace() will > >> also dump the registers. For CONFIG_DEBUG_BUGVERBOSE=y on the panic CPU, > >> only the stack is dumped. IMHO, this is actually not true, see the following code: void nmi_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu, void (*raise)(cpumask_t *mask)) { [...] /* * Don't try to send an NMI to this cpu; it may work on some * architectures, but on others it may not, and we'll get * information at least as useful just by doing a dump_stack() here. * Note that nmi_cpu_backtrace(NULL) will clear the cpu bit. */ if (cpumask_test_cpu(this_cpu, to_cpumask(backtrace_mask))) nmi_cpu_backtrace(NULL); [...] } , where bool nmi_cpu_backtrace(struct pt_regs *regs) { [...] if (regs) show_regs(regs); else dump_stack(); [...] } So, I think that the following patch should not make it worse: diff --git a/kernel/panic.c b/kernel/panic.c index 72fcbb5a071b..dfbfe1ce7bfc 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -67,6 +67,7 @@ static unsigned int warn_limit __read_mostly; static bool panic_console_replay; bool panic_triggering_all_cpu_backtrace; +bool panic_this_cpu_backtrace_printed; int panic_timeout = CONFIG_PANIC_TIMEOUT; EXPORT_SYMBOL_GPL(panic_timeout); @@ -328,6 +329,22 @@ void check_panic_on_warn(const char *origin) origin, limit); } +static void panic_trigger_all_cpu_backtraces(void) +{ + /* Temporary allow non-panic CPUs to write their backtraces. */ + panic_triggering_all_cpu_backtrace = true; + + if (panic_this_cpu_backtrace_printed) { + int this_cpu = raw_smp_processor_id(); + + trigger_allbutcpu_cpu_backtrace(this_cpu); + } else { + trigger_all_cpu_backtrace(); + } + + panic_triggering_all_cpu_backtrace = false; +} + /* * Helper that triggers the NMI backtrace (if set in panic_print) * and then performs the secondary CPUs shutdown - we cannot have @@ -335,12 +352,8 @@ void check_panic_on_warn(const char *origin) */ static void panic_other_cpus_shutdown(bool crash_kexec) { - if (panic_print & SYS_INFO_ALL_CPU_BT) { - /* Temporary allow non-panic CPUs to write their backtraces. */ - panic_triggering_all_cpu_backtrace = true; - trigger_all_cpu_backtrace(); - panic_triggering_all_cpu_backtrace = false; - } + if (panic_print & SYS_INFO_ALL_CPU_BT) + panic_trigger_all_cpu_backtraces(); /* * Note that smp_send_stop() is the usual SMP shutdown function, @@ -422,13 +435,15 @@ void vpanic(const char *fmt, va_list args) buf[len - 1] = '\0'; pr_emerg("Kernel panic - not syncing: %s\n", buf); -#ifdef CONFIG_DEBUG_BUGVERBOSE /* * Avoid nested stack-dumping if a panic occurs during oops processing */ - if (!test_taint(TAINT_DIE) && oops_in_progress <= 1) + if (test_taint(TAINT_DIE) || oops_in_progress > 1) { + panic_this_cpu_backtrace_printed = true; + } else if (IS_ENABLED(CONFIG_DEBUG_BUGVERBOSE)) { dump_stack(); -#endif + panic_this_cpu_backtrace_printed = true; + } /* * If kgdb is enabled, give it a chance to run before we stop all > > Hmm, it's getting complicated, probably isn't worth it then. > > I think it is worth cleaning up, but it probably won't be such a simple > fix. All call paths of redundant stack trace printing should be > identified and then we can decide on a clean solution. I feel that the check + if (test_taint(TAINT_DIE) || oops_in_progress > 1) { is kind of a hack. It would be nice to make it cleaner. But I am not sure how complicated it would be. Anyway, I think that storing the information, whether the backtrace was printed or not, into a global variable, is a step in the right direction. Best Regards, Petr
On (25/08/12 15:03), Petr Mladek wrote: > So, I think that the following patch should not make it worse: From what I can tell it does the trick. Ran some tests, everything seems to be fine.
On (25/07/31 09:51), John Ogness wrote: > > Another way to get backtrace on panic CPU is via BUG(), which routes > > through die()->__die_body(), which prints registers, stack trace, > > and so on, before it calls into panic(). This might be x86 specific, > > though. > > So in that case you see 2 stack traces if CONFIG_DEBUG_BUGVERBOSE=y? Yes. Triggering BUG() with panic_print=0x40 generates two panic CPU backtraces - one from die() trap handler and one from NMI: [..] <2>[ 44.003032] kernel BUG at fs/drop_caches.c:68! <4>[ 44.003138] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI <4>[ 44.003302] CPU: 2 UID: 0 PID: 3560 Comm: bash Not tainted 6.12.24-kasan-00964-gcf04fce2879f-dirty #1 77a011f1de55cafdc697f1d21852e4a93167feea <4>[ 44.003624] RIP: 0010:drop_caches_sysctl_handler+0xe5/0xf0 <4>[ 44.003732] Code: ... <4>[ 44.003954] RSP: 0018:ffff888053cd7cd8 EFLAGS: 00010202 <4>[ 44.004058] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000002 <4>[ 44.004215] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000000 <4>[ 44.004369] RBP: ffff888053cd7cf0 R08: ffff888053cd7be8 R09: 0000000000000001 <4>[ 44.004461] R10: ffff888053cd7b00 R11: ffffffff888ce910 R12: ffff888053cd7db8 <4>[ 44.004617] R13: dffffc0000000000 R14: 0000000000000001 R15: 1ffffffff1ad5821 <4>[ 44.004773] FS: 000078fd06d9f740(0000) GS:ffff88810b100000(0000) knlGS:0000000000000000 <4>[ 44.004868] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 44.005022] CR2: 00005a31efb86c60 CR3: 00000001021f8000 CR4: 00000000003506f0 <4>[ 44.005178] Call Trace: <4>[ 44.005268] <TASK> <4>[ 44.005360] proc_sys_call_handler+0x34a/0x550 <4>[ 44.005467] vfs_write+0x76a/0xa80 <4>[ 44.005628] ? proc_sys_read+0x20/0x20 <4>[ 44.005734] ksys_write+0xb4/0x160 <4>[ 44.005835] do_syscall_64+0x6a/0xe0 <4>[ 44.006715] entry_SYSCALL_64_after_hwframe+0x55/0x5d <4>[ 44.006876] RIP: 0033:0x78fd06ec1594 <4>[ 44.006973] Code: ... <4>[ 44.007195] RSP: 002b:00007ffe87fcd618 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 <4>[ 44.007297] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 000078fd06ec1594 <4>[ 44.007451] RDX: 0000000000000002 RSI: 0000573b1ca40130 RDI: 0000000000000001 <4>[ 44.007626] RBP: 00007ffe87fcd640 R08: 0000000000000001 R09: 0000000000000001 <4>[ 44.007731] R10: 000078fd06f55820 R11: 0000000000000202 R12: 0000000000000002 <4>[ 44.007906] R13: 0000573b1ca40130 R14: 000078fd06f945c0 R15: 000078fd06f91f20 <4>[ 44.008093] </TASK> <4>[ 44.008192] Modules linked in: [...] <0>[ 44.017630] gsmi: Log Shutdown Reason 0x03 <4>[ 44.017886] ---[ end trace 0000000000000000 ]--- <4>[ 44.037806] RIP: 0010:drop_caches_sysctl_handler+0xe5/0xf0 <4>[ 44.037926] Code: ... <4>[ 44.038078] RSP: 0018:ffff888053cd7cd8 EFLAGS: 00010202 <4>[ 44.038143] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000002 <4>[ 44.038249] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000000 <4>[ 44.038310] RBP: ffff888053cd7cf0 R08: ffff888053cd7be8 R09: 0000000000000001 <4>[ 44.038416] R10: ffff888053cd7b00 R11: ffffffff888ce910 R12: ffff888053cd7db8 <4>[ 44.038521] R13: dffffc0000000000 R14: 0000000000000001 R15: 1ffffffff1ad5821 <4>[ 44.038581] FS: 000078fd06d9f740(0000) GS:ffff88810b100000(0000) knlGS:0000000000000000 <4>[ 44.038688] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 44.038803] CR2: 00005a31efb86c60 CR3: 00000001021f8000 CR4: 00000000003506f0 <0>[ 44.038865] Kernel panic - not syncing: Fatal exception <4>[ 44.038971] NMI backtrace for cpu 2 <4>[ 44.038978] CPU: 2 UID: 0 PID: 3560 Comm: bash Tainted: G D 6.12.24-kasan-00964-gcf04fce2879f-dirty #1 77a011f1de55cafdc697f1d21852e4a93167feea <4>[ 44.038988] Tainted: [D]=DIE <4>[ 44.038996] Call Trace: <4>[ 44.039001] <TASK> <4>[ 44.039005] nmi_cpu_backtrace+0x14c/0x1a0 <4>[ 44.039016] ? arch_trigger_cpumask_backtrace+0x20/0x20 <4>[ 44.039026] nmi_trigger_cpumask_backtrace+0xd8/0x1b0 <4>[ 44.039035] panic_other_cpus_shutdown+0x2d/0x80 <4>[ 44.039045] panic+0x199/0x450 <4>[ 44.039055] oops_end+0xb9/0xc0 <4>[ 44.039062] do_trap+0x10c/0x330 <4>[ 44.039089] handle_invalid_op+0x95/0xd0 <4>[ 44.039095] ? drop_caches_sysctl_handler+0xe5/0xf0 <4>[ 44.039102] exc_invalid_op+0x3c/0x50 <4>[ 44.039110] asm_exc_invalid_op+0x16/0x20 <4>[ 44.039118] RIP: 0010:drop_caches_sysctl_handler+0xe5/0xf0 <4>[ 44.039124] Code: ... <4>[ 44.039129] RSP: 0018:ffff888053cd7cd8 EFLAGS: 00010202 <4>[ 44.039135] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000002 <4>[ 44.039139] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000000 <4>[ 44.039143] RBP: ffff888053cd7cf0 R08: ffff888053cd7be8 R09: 0000000000000001 <4>[ 44.039148] R10: ffff888053cd7b00 R11: ffffffff888ce910 R12: ffff888053cd7db8 <4>[ 44.039152] R13: dffffc0000000000 R14: 0000000000000001 R15: 1ffffffff1ad5821 <4>[ 44.039158] ? proc_dointvec_minmax+0xe0/0xe0 <4>[ 44.039168] ? drop_caches_sysctl_handler+0x15/0xf0 <4>[ 44.039174] proc_sys_call_handler+0x34a/0x550 <4>[ 44.039184] vfs_write+0x76a/0xa80 <4>[ 44.039191] ? proc_sys_read+0x20/0x20 <4>[ 44.039201] ksys_write+0xb4/0x160 <4>[ 44.039208] do_syscall_64+0x6a/0xe0 <4>[ 44.039259] entry_SYSCALL_64_after_hwframe+0x55/0x5d <4>[ 44.039265] RIP: 0033:0x78fd06ec1594 <4>[ 44.039273] Code: ... <4>[ 44.039278] RSP: 002b:00007ffe87fcd618 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 <4>[ 44.039285] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 000078fd06ec1594 <4>[ 44.039289] RDX: 0000000000000002 RSI: 0000573b1ca40130 RDI: 0000000000000001 <4>[ 44.039294] RBP: 00007ffe87fcd640 R08: 0000000000000001 R09: 0000000000000001 <4>[ 44.039298] R10: 000078fd06f55820 R11: 0000000000000202 R12: 0000000000000002 <4>[ 44.039302] R13: 0000573b1ca40130 R14: 000078fd06f945c0 R15: 000078fd06f91f20 <4>[ 44.039310] </TASK> <6>[ 44.039314] Sending NMI from CPU 2 to CPUs 0-1,3: [..] panic() with DEBUG_BUGVERBOSE and panic_print=0x40, two backtraces on the panic CPU: [..] <0>[ 45.149482] Kernel panic - not syncing: BOOM <4>[ 45.149792] CPU: 1 UID: 0 PID: 3512 Comm: bash Not tainted 6.12.24-kasan-00964-gcf04fce2879f-dirty #1 221d6609d9c374a289b848042333fd4fa6f5bddd <4>[ 45.150176] Call Trace: <4>[ 45.150285] <TASK> <4>[ 45.150394] panic+0x190/0x450 <4>[ 45.150529] drop_caches_sysctl_handler+0xb4/0xe0 <4>[ 45.150727] proc_sys_call_handler+0x34a/0x550 <4>[ 45.150858] vfs_write+0x76a/0xa80 <4>[ 45.150978] ? proc_sys_read+0x20/0x20 <4>[ 45.151182] ksys_write+0xb4/0x160 <4>[ 45.151303] do_syscall_64+0x6a/0xe0 <4>[ 45.154298] entry_SYSCALL_64_after_hwframe+0x55/0x5d <4>[ 45.154490] RIP: 0033:0x797c7ab90594 <4>[ 45.154608] Code: ... <4>[ 45.154866] RSP: 002b:00007fff4f3ae9f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 <4>[ 45.154998] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000797c7ab90594 <4>[ 45.155183] RDX: 0000000000000002 RSI: 0000577d00a03130 RDI: 0000000000000001 <4>[ 45.155365] RBP: 00007fff4f3aea20 R08: 0000000000000001 R09: 0000000000000001 <4>[ 45.155478] R10: 0000797c7ac24820 R11: 0000000000000202 R12: 0000000000000002 <4>[ 45.155689] R13: 0000577d00a03130 R14: 0000797c7ac635c0 R15: 0000797c7ac60f20 <4>[ 45.155884] </TASK> <4>[ 45.155992] NMI backtrace for cpu 1 <4>[ 45.156005] CPU: 1 UID: 0 PID: 3512 Comm: bash Not tainted 6.12.24-kasan-00964-gcf04fce2879f-dirty #1 221d6609d9c374a289b848042333fd4fa6f5bddd <4>[ 45.156042] Call Trace: <4>[ 45.156054] <TASK> <4>[ 45.156065] nmi_cpu_backtrace+0x14c/0x1a0 <4>[ 45.156094] ? arch_trigger_cpumask_backtrace+0x20/0x20 <4>[ 45.156122] nmi_trigger_cpumask_backtrace+0xd8/0x1b0 <4>[ 45.156149] panic_other_cpus_shutdown+0x2d/0x80 <4>[ 45.156176] panic+0x199/0x450 <4>[ 45.156206] drop_caches_sysctl_handler+0xb4/0xe0 <4>[ 45.156229] proc_sys_call_handler+0x34a/0x550 <4>[ 45.156259] vfs_write+0x76a/0xa80 <4>[ 45.156279] ? proc_sys_read+0x20/0x20 <4>[ 45.156313] ksys_write+0xb4/0x160 <4>[ 45.156338] do_syscall_64+0x6a/0xe0 <4>[ 45.156788] entry_SYSCALL_64_after_hwframe+0x55/0x5d <4>[ 45.156810] RIP: 0033:0x797c7ab90594 <4>[ 45.156829] Code: ... <4>[ 45.156846] RSP: 002b:00007fff4f3ae9f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 <4>[ 45.156869] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000797c7ab90594 <4>[ 45.156884] RDX: 0000000000000002 RSI: 0000577d00a03130 RDI: 0000000000000001 <4>[ 45.156899] RBP: 00007fff4f3aea20 R08: 0000000000000001 R09: 0000000000000001 <4>[ 45.156913] R10: 0000797c7ac24820 R11: 0000000000000202 R12: 0000000000000002 <4>[ 45.156927] R13: 0000577d00a03130 R14: 0000797c7ac635c0 R15: 0000797c7ac60f20 <4>[ 45.156954] </TASK> <6>[ 45.156965] Sending NMI from CPU 1 to CPUs 0,2-3: [..]
© 2016 - 2025 Red Hat, Inc.