arch/arm64/kernel/machine_kexec.c | 4 ++++ kernel/panic.c | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-)
When a system running a real-time (PREEMPT_RT) kernel panics and triggers kdump,
the critical log messages (e.g., panic reason, stack traces) may fail to appear
on the serial console.
When kdump cannot be used properly, serial console logs are crucial,
whether for diagnosing kdump issues or troubleshooting the underlying problem.
This issue arises due to synchronization or deferred flushing of the printk buffer
in real-time contexts, where preemptible console locks or delayed workqueues prevent
timely log output before kexec transitions to the crash kernel.
The test results are as follows:
[ T197] Kernel panic - not syncing: sysrq triggered crash
[ T197] Call trace:
[ T197] dump_backtrace+0x9c/0x120
[ T197] show_stack+0x1c/0x30
[ T197] dump_stack_lvl+0x34/0x88
[ T197] dump_stack+0x14/0x20
[ T197] panic+0x3c4/0x3f8
[ T197] sysrq_handle_crash+0x20/0x28
[ T197] __handle_sysrq+0xd4/0x1e0
[ T197] write_sysrq_trigger+0x88/0x108
[ T197] proc_reg_write+0x9c/0xf8
[ T197] vfs_write+0xf4/0x450
[ T197] ksys_write+0x70/0x100
[ T197] __arm64_sys_write+0x20/0x30
[ T197] invoke_syscall+0x48/0x110
[ T197] el0_svc_common.constprop.0+0x44/0xe8
[ T197] do_el0_svc+0x20/0x30
[ T197] el0_svc+0x24/0x88
[ T197] el0t_64_sync_handler+0xb8/0xc0
[ T197] el0t_64_sync+0x14c/0x150
[ T197] SMP: stopping secondary CPUs
[ T197] Starting crashdump kernel...
[ T197] Bye!
Signed-off-by: cuiguoqi <cuiguoqi@kylinos.cn>
---
arch/arm64/kernel/machine_kexec.c | 4 ++++
kernel/panic.c | 4 ++--
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index 6f121a0..66c7d90 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -24,6 +24,7 @@
#include <asm/page.h>
#include <asm/sections.h>
#include <asm/trans_pgd.h>
+#include <linux/console.h>
/**
* kexec_image_info - For debugging output.
@@ -176,6 +177,9 @@ void machine_kexec(struct kimage *kimage)
pr_info("Bye!\n");
+ if (IS_ENABLED(CONFIG_PREEMPT_RT) && in_kexec_crash)
+ console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+
local_daif_mask();
/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 72fcbb5..e0ad0df 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -437,6 +437,8 @@ void vpanic(const char *fmt, va_list args)
*/
kgdb_panic(buf);
+ printk_legacy_allow_panic_sync();
+
/*
* If we have crashed and we have a crash kernel loaded let it handle
* everything else.
@@ -450,8 +452,6 @@ void vpanic(const char *fmt, va_list args)
panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
- printk_legacy_allow_panic_sync();
-
/*
* Run any panic handlers, including those that might need to
* add information to the kmsg dump output.
--
2.7.4
On Thu 2025-08-07 19:22:47, cuiguoqi wrote: > When a system running a real-time (PREEMPT_RT) kernel panics and triggers kdump, > the critical log messages (e.g., panic reason, stack traces) may fail to appear > on the serial console. How did you find this problem, please? Were you investigating why a log was missing? Or was is just be reading the code? By other words, is this problem theoretial or did you found it when debugging a real life problem? I ask because there is no ideal solution. This change might help in one situation and make it worse in other situations. See below. > When kdump cannot be used properly, serial console logs are crucial, > whether for diagnosing kdump issues or troubleshooting the underlying problem. > > This issue arises due to synchronization or deferred flushing of the printk buffer > in real-time contexts, where preemptible console locks or delayed workqueues prevent > timely log output before kexec transitions to the crash kernel. > > The test results are as follows: > [ T197] Kernel panic - not syncing: sysrq triggered crash > [ T197] Call trace: > [ T197] dump_backtrace+0x9c/0x120 > [ T197] show_stack+0x1c/0x30 > [ T197] dump_stack_lvl+0x34/0x88 > [ T197] dump_stack+0x14/0x20 > [ T197] panic+0x3c4/0x3f8 > [ T197] sysrq_handle_crash+0x20/0x28 > [ T197] __handle_sysrq+0xd4/0x1e0 > [ T197] write_sysrq_trigger+0x88/0x108 > [ T197] proc_reg_write+0x9c/0xf8 > [ T197] vfs_write+0xf4/0x450 > [ T197] ksys_write+0x70/0x100 > [ T197] __arm64_sys_write+0x20/0x30 > [ T197] invoke_syscall+0x48/0x110 > [ T197] el0_svc_common.constprop.0+0x44/0xe8 > [ T197] do_el0_svc+0x20/0x30 > [ T197] el0_svc+0x24/0x88 > [ T197] el0t_64_sync_handler+0xb8/0xc0 > [ T197] el0t_64_sync+0x14c/0x150 > [ T197] SMP: stopping secondary CPUs > [ T197] Starting crashdump kernel... > [ T197] Bye! > > Signed-off-by: cuiguoqi <cuiguoqi@kylinos.cn> > --- > arch/arm64/kernel/machine_kexec.c | 4 ++++ > kernel/panic.c | 4 ++-- > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c > index 6f121a0..66c7d90 100644 > --- a/arch/arm64/kernel/machine_kexec.c > +++ b/arch/arm64/kernel/machine_kexec.c > @@ -24,6 +24,7 @@ > #include <asm/page.h> > #include <asm/sections.h> > #include <asm/trans_pgd.h> > +#include <linux/console.h> > > /** > * kexec_image_info - For debugging output. > @@ -176,6 +177,9 @@ void machine_kexec(struct kimage *kimage) > > pr_info("Bye!\n"); > > + if (IS_ENABLED(CONFIG_PREEMPT_RT) && in_kexec_crash) > + console_flush_on_panic(CONSOLE_FLUSH_PENDING); IMHO, this is a bad idea. console_flush_on_panic() is supposed to be called as the last attempt to flush the kernel messages when there is no other chance to see them. console_flush_on_panic() ignores console_lock() because it might create a deadlock. This why vpanic() calls debug_locks_off() first. But ignoring a synchronization might obviously bring another problems, and break the system another way. console_lock() should _not_ be ignored when we try to create crash_dump(). It would increase the risk that the crash_dump would fail. And crash_dump() is the preferred way to preserve the kernel messages in this code path. > + > local_daif_mask(); > > /* > diff --git a/kernel/panic.c b/kernel/panic.c > index 72fcbb5..e0ad0df 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -437,6 +437,8 @@ void vpanic(const char *fmt, va_list args) > */ > kgdb_panic(buf); > > + printk_legacy_allow_panic_sync(); I do not like this as well. The commit message for the commit e35a8884270bae1 ("printk: Coordinate direct printing in panic") says that the primary purpose is to disable flushing legacy consoles when printing the backtrace by dump_stack(). This change looks OK from this POV. But we wanted to delay this after __crash_kexec() and panic_other_cpus_shutdown() because the legacy consoles are not safe in panic(). They ignore the internal spin locks after calling bust_spinlocks(1). This change would increase the risk that __crash_kexec() would fail. Also the legacy consoles are more safe after stopping other CPUs. IMPORTANT: The legacy consoles are blocked only when some "nbcon" console is registered. And nbcon consoles are never blocked. It guarantees that the messages are flushed on some consoles even before this call. > + > /* > * If we have crashed and we have a crash kernel loaded let it handle > * everything else. > @@ -450,8 +452,6 @@ void vpanic(const char *fmt, va_list args) > > panic_other_cpus_shutdown(_crash_kexec_post_notifiers); > > - printk_legacy_allow_panic_sync(); > - > /* > * Run any panic handlers, including those that might need to > * add information to the kmsg dump output. Best Regards, Petr
From: Petr Mladek <pmladek@suse.com> Hi Petr: > How did you find this problem, please? > Were you investigating why a log was missing? > Or was is just be reading the code? When I was developing the Linux real-time kernel, the system abnormally crashed, and kdump triggered the inability to normally enter the second kernel for demsg&vmcore saving. When an abnormal panic is triggered simultaneously, the abnormal scene and some of the jump logs of kexec are not output, which to some extent affects the efficiency of debugging and testing Motivation for the fix: 1. For RT kernels with Kdump deployed, ensure that all relevant information such as call stacks is fully output to the serial port during the entire process from panic occurrence to transition to the second kernel, which can better enhance debugging efficiency. 2. When Kdump is not deployed, the call stack can be directly output in RT kernels as shown below: ```c vpanic{ + printk_legacy_allow_panic_sync(); +debug_locks_off(); +console_flush_on_panic(CONSOLE_FLUSH_PENDING); +console_flush_on_panic(CONSOLE_FLUSH_PENDING); +nbcon_atomic_flush_unsafe(); } ``` 3. Therefore, currently I am wondering whether the issue lies with `debug_locks_off();` or if there is an issue with the logical placement of `printk_legacy_allow_panic_sync();`. My understanding is that by the time we reach machine_kexec(kexec_crash_image); other cores should have already been notified and shut down. Additionally, since this is clearly an emergency situation, flushing the log buffer to the terminal should not introduce further adverse effects. I would greatly appreciate your insights and guidance on this matter.
Hi cuiguoqi! I'm not an expert on the subject, but hope this helps. On Thu, 7 Aug 2025 19:22:47 +0800, cuiguoqi wrote: >When a system running a real-time (PREEMPT_RT) kernel panics and triggers kdump, >the critical log messages (e.g., panic reason, stack traces) may fail to appear >on the serial console. > >When kdump cannot be used properly, serial console logs are crucial, >whether for diagnosing kdump issues or troubleshooting the underlying problem. The console not being flushed in the case of kexec should be expected as described [0]. Its about prioritizing kexec over serial output. >This issue arises due to synchronization or deferred flushing of the printk buffer >in real-time contexts, where preemptible console locks or delayed workqueues prevent >timely log output before kexec transitions to the crash kernel. > > /** > * kexec_image_info - For debugging output. >@@ -176,6 +177,9 @@ void machine_kexec(struct kimage *kimage) > > pr_info("Bye!\n"); > >+ if (IS_ENABLED(CONFIG_PREEMPT_RT) && in_kexec_crash) >+ console_flush_on_panic(CONSOLE_FLUSH_PENDING); >+ Calling console_flush_on_panic() while trying to kexec will reduce its chance of success. >diff --git a/kernel/panic.c b/kernel/panic.c >index 72fcbb5..e0ad0df 100644 >--- a/kernel/panic.c >+++ b/kernel/panic.c >@@ -437,6 +437,8 @@ void vpanic(const char *fmt, va_list args) > */ > kgdb_panic(buf); > >+ printk_legacy_allow_panic_sync(); >+ > /* > * If we have crashed and we have a crash kernel loaded let it handle > * everything else. >@@ -450,8 +452,6 @@ void vpanic(const char *fmt, va_list args) > > panic_other_cpus_shutdown(_crash_kexec_post_notifiers); > >- printk_legacy_allow_panic_sync(); >- The ordering here should be kept where we don't want CPUs other than the one panicked flushing legacy consoles. Sincerely, Ryo Takakura [0] https://lore.kernel.org/lkml/847cagmjsx.fsf@jogness.linutronix.de/
© 2016 - 2025 Red Hat, Inc.