From: Ryo Takakura <takakura@valinux.co.jp>
After panic, non-panicked CPU's has been unable to flush ringbuffer
while they can still write into it. This can affect CPU backtrace
triggered in panic only able to write into ringbuffer incapable of
flushing them.
Fix the issue by letting the panicked CPU handle the flushing of
ringbuffer right after non-panicked CPUs finished writing their
backtraces.
Signed-off-by: Ryo Takakura <takakura@valinux.co.jp>
---
include/linux/console.h | 1 +
kernel/panic.c | 1 +
kernel/printk/printk.c | 14 ++++++++++++++
3 files changed, 16 insertions(+)
diff --git a/include/linux/console.h b/include/linux/console.h
index 31a8f5b85..c7eb6f785 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -504,6 +504,7 @@ extern void console_unlock(void);
extern void console_conditional_schedule(void);
extern void console_unblank(void);
extern void console_flush_on_panic(enum con_flush_mode mode);
+extern void console_try_flush(void);
extern struct tty_driver *console_device(int *);
extern void console_stop(struct console *);
extern void console_start(struct console *);
diff --git a/kernel/panic.c b/kernel/panic.c
index 2a0449144..6519cc6bd 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -260,6 +260,7 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
panic_triggering_all_cpu_backtrace = true;
trigger_all_cpu_backtrace();
panic_triggering_all_cpu_backtrace = false;
+ console_try_flush();
}
/*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c22b07049..517b8b02e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3284,6 +3284,20 @@ void console_flush_on_panic(enum con_flush_mode mode)
console_flush_all(false, &next_seq, &handover);
}
+/**
+ * console_try_flush - try to flush consoles when safe
+ *
+ * Context: Any, except for NMI.
+ */
+void console_try_flush(void)
+{
+ if (is_printk_legacy_deferred())
+ return;
+
+ if (console_trylock())
+ console_unlock();
+}
+
/*
* Return the console tty driver structure and its associated index
*/
--
2.34.1
On Mon 2024-08-12 16:29:31, takakura@valinux.co.jp wrote:
> From: Ryo Takakura <takakura@valinux.co.jp>
>
> After panic, non-panicked CPU's has been unable to flush ringbuffer
> while they can still write into it. This can affect CPU backtrace
> triggered in panic only able to write into ringbuffer incapable of
> flushing them.
I still think about this. The motivation is basically the same
as in the commit 5d5e4522a7f404d1a96fd ("printk: restore flushing
of NMI buffers on remote CPUs after NMI backtraces").
It would make sense to replace printk_trigger_flush() with
console_try_flush(). And the function should queue the irq
work when it can't be flushed directly, see below.
> Fix the issue by letting the panicked CPU handle the flushing of
> ringbuffer right after non-panicked CPUs finished writing their
> backtraces.
>
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -260,6 +260,7 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
> panic_triggering_all_cpu_backtrace = true;
> trigger_all_cpu_backtrace();
> panic_triggering_all_cpu_backtrace = false;
> + console_try_flush();
> }
>
> /*
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3284,6 +3284,20 @@ void console_flush_on_panic(enum con_flush_mode mode)
> console_flush_all(false, &next_seq, &handover);
> }
>
> +/**
> + * console_try_flush - try to flush consoles when safe
> + *
> + * Context: Any, except for NMI.
It is safe even in NMI. is_printk_legacy_deferred() would return true
in this case.
> + */
> +void console_try_flush(void)
> +{
> + if (is_printk_legacy_deferred())
> + return;
> +
> + if (console_trylock())
> + console_unlock();
> +}
I would do something like:
/**
* console_try_or_trigger_flush - try to flush consoles directly when
* safe or the trigger deferred flush.
*
* Context: Any
*/
void console_try_or_trigger_flush(void)
{
if (!is_printk_legacy_deferred() && console_trylock())
console_unlock();
else
defer_console_output();
}
and use it instead of printk_trigger_flush() in
nmi_trigger_cpumask_backtrace().
Well, I would postpone this patch after we finalize the patchset
adding con->write_atomic() callback. This patch depends on it anyway
via is_printk_legacy_deferred(). The patchset might also add
other wrappers for flushing consoles and we have to choose some
reasonable names. Or John could integrate this patch into the patchset.
Best Regards,
Petr
On 2024-08-13, Petr Mladek <pmladek@suse.com> wrote:
> I would do something like:
>
> /**
> * console_try_or_trigger_flush - try to flush consoles directly when
> * safe or the trigger deferred flush.
> *
> * Context: Any
> */
> void console_try_or_trigger_flush(void)
> {
> if (!is_printk_legacy_deferred() && console_trylock())
> console_unlock();
> else
> defer_console_output();
> }
>
> and use it instead of printk_trigger_flush() in
> nmi_trigger_cpumask_backtrace().
Just to be clear, you are talking about removing printk_trigger_flush()
entirely and instead provide the new console_try_or_trigger_flush()?
Which then also involves updating the call sites:
lib/nmi_backtrace.c:nmi_trigger_cpumask_backtrace()
arch/powerpc/kernel/watchdog.c:watchdog_timer_interrupt()
> Well, I would postpone this patch after we finalize the patchset
> adding con->write_atomic() callback. This patch depends on it anyway
> via is_printk_legacy_deferred(). The patchset might also add
> other wrappers for flushing consoles and we have to choose some
> reasonable names.
I agree. Let's finish up the atomic series and then we can worry about
this.
John
Hi Petr and John,
On 2024-08-19, John Ogness wrote:
>On 2024-08-13, Petr Mladek <pmladek@suse.com> wrote:
>> I would do something like:
>>
>> /**
>> * console_try_or_trigger_flush - try to flush consoles directly when
>> * safe or the trigger deferred flush.
>> *
>> * Context: Any
>> */
>> void console_try_or_trigger_flush(void)
>> {
>> if (!is_printk_legacy_deferred() && console_trylock())
>> console_unlock();
>> else
>> defer_console_output();
>> }
>>
>> and use it instead of printk_trigger_flush() in
>> nmi_trigger_cpumask_backtrace().
>
>Just to be clear, you are talking about removing printk_trigger_flush()
>entirely and instead provide the new console_try_or_trigger_flush()?
>Which then also involves updating the call sites:
>
>lib/nmi_backtrace.c:nmi_trigger_cpumask_backtrace()
>arch/powerpc/kernel/watchdog.c:watchdog_timer_interrupt()
>
Taking a look at [0], in addition to the mentioned call sites,
nbcon_device_release() will also be calling printk_trigger_flush()?
For nbcon_device_release(), I thought its better not to be replaced as
it calles for @legacy_off, in which case printk_trigger_flush() seems
more suitable as it always defers printing.
Also taking a look at the [1], for nmi_trigger_cpumask_backtrace(),
I thought that it will not comply with the syncing of
legacy_allow_panic_sync. I believe it will allow flushing of legacy consoles
before printk_legacy_allow_panic_sync() which is out of sync.
>> Well, I would postpone this patch after we finalize the patchset
>> adding con->write_atomic() callback. This patch depends on it anyway
>> via is_printk_legacy_deferred(). The patchset might also add
>> other wrappers for flushing consoles and we have to choose some
>> reasonable names.
>
>I agree. Let's finish up the atomic series and then we can worry about
>this.
>
Ok! I see that it can be better discussed after the atomic series.
>John
Sincerely,
Ryo Takakura
[0] https://lore.kernel.org/all/20240820063001.36405-31-john.ogness@linutronix.de/
[1] https://lore.kernel.org/all/20240820063001.36405-30-john.ogness@linutronix.de/
On 2024-08-21, takakura@valinux.co.jp wrote:
>>> /**
>>> * console_try_or_trigger_flush - try to flush consoles directly when
>>> * safe or the trigger deferred flush.
>>> *
>>> * Context: Any
>>> */
>>> void console_try_or_trigger_flush(void)
>>> {
>>> if (!is_printk_legacy_deferred() && console_trylock())
>>> console_unlock();
>>> else
>>> defer_console_output();
>>> }
>>>
>>> and use it instead of printk_trigger_flush() in
>>> nmi_trigger_cpumask_backtrace().
>>
>> Just to be clear, you are talking about removing
>> printk_trigger_flush() entirely and instead provide the new
>> console_try_or_trigger_flush()? Which then also involves updating
>> the call sites:
>>
>> lib/nmi_backtrace.c:nmi_trigger_cpumask_backtrace()
>> arch/powerpc/kernel/watchdog.c:watchdog_timer_interrupt()
>>
>
> Taking a look at [0], in addition to the mentioned call sites,
> nbcon_device_release() will also be calling printk_trigger_flush()?
> For nbcon_device_release(), I thought its better not to be replaced as
> it calles for @legacy_off, in which case printk_trigger_flush() seems
> more suitable as it always defers printing.
The @legacy_off logic would be within console_try_or_trigger_flush(),
so the result would be the same.
> Also taking a look at the [1], for nmi_trigger_cpumask_backtrace(),
> I thought that it will not comply with the syncing of
> legacy_allow_panic_sync. I believe it will allow flushing of legacy consoles
> before printk_legacy_allow_panic_sync() which is out of sync.
But isn't your patch also causing that violation?
printk_legacy_allow_panic_sync() performs a trylock/unlock. Isn't that
good enough?
John
Hi John,
Thanks for checking into what I pointed out!
On 2024-08-26, John Ogness wrote:
>On 2024-08-21, takakura@valinux.co.jp wrote:
>>>> /**
>>>> * console_try_or_trigger_flush - try to flush consoles directly when
>>>> * safe or the trigger deferred flush.
>>>> *
>>>> * Context: Any
>>>> */
>>>> void console_try_or_trigger_flush(void)
>>>> {
>>>> if (!is_printk_legacy_deferred() && console_trylock())
>>>> console_unlock();
>>>> else
>>>> defer_console_output();
>>>> }
>>>>
>>>> and use it instead of printk_trigger_flush() in
>>>> nmi_trigger_cpumask_backtrace().
>>>
>>> Just to be clear, you are talking about removing
>>> printk_trigger_flush() entirely and instead provide the new
>>> console_try_or_trigger_flush()? Which then also involves updating
>>> the call sites:
>>>
>>> lib/nmi_backtrace.c:nmi_trigger_cpumask_backtrace()
>>> arch/powerpc/kernel/watchdog.c:watchdog_timer_interrupt()
>>>
>>
>> Taking a look at [0], in addition to the mentioned call sites,
>> nbcon_device_release() will also be calling printk_trigger_flush()?
>> For nbcon_device_release(), I thought its better not to be replaced as
>> it calles for @legacy_off, in which case printk_trigger_flush() seems
>> more suitable as it always defers printing.
>
>The @legacy_off logic would be within console_try_or_trigger_flush(),
>so the result would be the same.
I see.
>> Also taking a look at the [1], for nmi_trigger_cpumask_backtrace(),
>> I thought that it will not comply with the syncing of
>> legacy_allow_panic_sync. I believe it will allow flushing of legacy consoles
>> before printk_legacy_allow_panic_sync() which is out of sync.
>
>But isn't your patch also causing that violation?
Yes, the patch I sent would be causing the violation...
Sorry I should have asked this earlier before sending patch. The question came up
after sending the patch.
>printk_legacy_allow_panic_sync() performs a trylock/unlock. Isn't that
>good enough?
Also sorry for being unclear here. I also think that
printk_legacy_allow_panic_sync()'s trylock/unlock is good enough.
My question was that if we were to call console_try_or_trigger_flush() in
panic, wouldn't we need to check @legacy_direct to avoid flushing
of legacy consoles.
Something like:
/**
* console_try_or_trigger_flush - try to flush consoles directly when
* safe or the trigger deferred flush.
*
* Context: Any
*/
void console_try_or_trigger_flush(void)
{
struct console_flush_type ft;
printk_get_console_flush_type(&ft);
if (ft.legacy_direct) {
if (console_trylock())
console_unlock();
} else {
defer_console_output();
}
}
>John
Sincerely,
Ryo Takakura
Hi Petr and John,
On Tue 2024-08-13 13:45, Petr Mladek wrote:
>On Mon 2024-08-12 16:29:31, takakura@valinux.co.jp wrote:
>> From: Ryo Takakura <takakura@valinux.co.jp>
>>
>> After panic, non-panicked CPU's has been unable to flush ringbuffer
>> while they can still write into it. This can affect CPU backtrace
>> triggered in panic only able to write into ringbuffer incapable of
>> flushing them.
>
>I still think about this. The motivation is basically the same
>as in the commit 5d5e4522a7f404d1a96fd ("printk: restore flushing
>of NMI buffers on remote CPUs after NMI backtraces").
>
>It would make sense to replace printk_trigger_flush() with
>console_try_flush(). And the function should queue the irq
>work when it can't be flushed directly, see below.
Yes, I agree.
>> Fix the issue by letting the panicked CPU handle the flushing of
>> ringbuffer right after non-panicked CPUs finished writing their
>> backtraces.
>>
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -260,6 +260,7 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>> panic_triggering_all_cpu_backtrace = true;
>> trigger_all_cpu_backtrace();
>> panic_triggering_all_cpu_backtrace = false;
>> + console_try_flush();
>> }
>>
>> /*
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -3284,6 +3284,20 @@ void console_flush_on_panic(enum con_flush_mode mode)
>> console_flush_all(false, &next_seq, &handover);
>> }
>>
>> +/**
>> + * console_try_flush - try to flush consoles when safe
>> + *
>> + * Context: Any, except for NMI.
>
>It is safe even in NMI. is_printk_legacy_deferred() would return true
>in this case.
I'm sorry... I didn't notice this. Thanks.
>> + */
>> +void console_try_flush(void)
>> +{
>> + if (is_printk_legacy_deferred())
>> + return;
>> +
>> + if (console_trylock())
>> + console_unlock();
>> +}
>
>I would do something like:
>
>/**
> * console_try_or_trigger_flush - try to flush consoles directly when
> * safe or the trigger deferred flush.
> *
> * Context: Any
> */
>void console_try_or_trigger_flush(void)
>{
> if (!is_printk_legacy_deferred() && console_trylock())
> console_unlock();
> else
> defer_console_output();
>}
>
>and use it instead of printk_trigger_flush() in
>nmi_trigger_cpumask_backtrace().
Yes, that sounds better to me as well!
>Well, I would postpone this patch after we finalize the patchset
>adding con->write_atomic() callback. This patch depends on it anyway
>via is_printk_legacy_deferred(). The patchset might also add
>other wrappers for flushing consoles and we have to choose some
>reasonable names. Or John could integrate this patch into the patchset.
Sure, I can resend one once the patchset is finalized!
Or would it be better if I just leave it to John so that it can be part of
the patchset?
>Best Regards,
>Petr
Sincerely,
Ryo Takakura
© 2016 - 2025 Red Hat, Inc.