[PATCH v2 1/2] Handle flushing of CPU backtraces during panic

takakura@valinux.co.jp posted 2 patches 1 year, 4 months ago
There is a newer version of this series
[PATCH v2 1/2] Handle flushing of CPU backtraces during panic
Posted by takakura@valinux.co.jp 1 year, 4 months ago
From: Ryo Takakura <takakura@valinux.co.jp>

After panic, non-panicked CPU's has been unable to flush ringbuffer 
while they can still write into it. This can affect CPU backtrace 
triggered in panic only able to write into ringbuffer incapable of 
flushing them.

Fix the issue by letting the panicked CPU handle the flushing of 
ringbuffer right after non-panicked CPUs finished writing their
backtraces.

Signed-off-by: Ryo Takakura <takakura@valinux.co.jp>
---
 kernel/panic.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 7e2070925..f94923a63 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -252,8 +252,10 @@ void check_panic_on_warn(const char *origin)
  */
 static void panic_other_cpus_shutdown(bool crash_kexec)
 {
-	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
+	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
 		trigger_all_cpu_backtrace();
+		console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+	}
 
 	/*
 	 * Note that smp_send_stop() is the usual SMP shutdown function,
-- 
2.34.1
Re: [PATCH v2 1/2] Handle flushing of CPU backtraces during panic
Posted by John Ogness 1 year, 4 months ago
On 2024-08-03, takakura@valinux.co.jp wrote:
> From: Ryo Takakura <takakura@valinux.co.jp>
>
> After panic, non-panicked CPU's has been unable to flush ringbuffer 
> while they can still write into it. This can affect CPU backtrace 
> triggered in panic only able to write into ringbuffer incapable of 
> flushing them.

Right now, they cannot write to it. If you apply your second patch
before this one, then the above statement is true.

Perhaps the ordering of the two patches should be reversed?

Either way, for the series:

Reviewed-by: John Ogness <john.ogness@linutronix.de>
Re: [PATCH v2 1/2] Handle flushing of CPU backtraces during panic
Posted by takakura@valinux.co.jp 1 year, 4 months ago
Hi John and Petr,

On 2024-08-05, John Ogness wrote:
>On 2024-08-03, takakura@valinux.co.jp wrote:
>> From: Ryo Takakura <takakura@valinux.co.jp>
>>
>> After panic, non-panicked CPU's has been unable to flush ringbuffer 
>> while they can still write into it. This can affect CPU backtrace 
>> triggered in panic only able to write into ringbuffer incapable of 
>> flushing them.
>
>Right now, they cannot write to it. If you apply your second patch
>before this one, then the above statement is true.
>
>Perhaps the ordering of the two patches should be reversed?

Yes, that is true. Thanks!
I'll send the patches in the reverse order for next version. 

>Either way, for the series:
>
>Reviewed-by: John Ogness <john.ogness@linutronix.de>

Sincerely,
Ryo Takakura
Re: [PATCH v2 1/2] Handle flushing of CPU backtraces during panic
Posted by Petr Mladek 1 year, 4 months ago
On Sat 2024-08-03 17:12:30, takakura@valinux.co.jp wrote:
> From: Ryo Takakura <takakura@valinux.co.jp>
> 
> After panic, non-panicked CPU's has been unable to flush ringbuffer 
> while they can still write into it. This can affect CPU backtrace 
> triggered in panic only able to write into ringbuffer incapable of 
> flushing them.
> 
> Fix the issue by letting the panicked CPU handle the flushing of 
> ringbuffer right after non-panicked CPUs finished writing their
> backtraces.
> 
> Signed-off-by: Ryo Takakura <takakura@valinux.co.jp>
> ---
>  kernel/panic.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 7e2070925..f94923a63 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -252,8 +252,10 @@ void check_panic_on_warn(const char *origin)
>   */
>  static void panic_other_cpus_shutdown(bool crash_kexec)
>  {
> -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
> +	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
>  		trigger_all_cpu_backtrace();
> +		console_flush_on_panic(CONSOLE_FLUSH_PENDING);

Hmm, this is too dangerous.

console_flush_on_panic() is supposed to be called at the end on
panic() as the final desperate attempt to flush consoles.
It does not take console_lock(). It must not be called before
stopping non-panic() CPUs.

We would need to implement something like:

/**
 * console_try_flush - try to flush consoles when safe
 *
 * Context: Any, except for NMI.
 */
void console_try_flush(void)
{
	if (is_printk_legacy_deferred())
		return;

	if (console_trylock())
		console_unlock();
}

, where is_printk_legacy_deferred() is not yet in the mainline. It is a new
API proposed by the last version of a patchset adding adding write_atomic() callback,
see https://lore.kernel.org/all/20240804005138.3722656-24-john.ogness@linutronix.de/

Best Regards,
Petr
Re: [PATCH v2 1/2] Handle flushing of CPU backtraces during panic
Posted by takakura@valinux.co.jp 1 year, 4 months ago
Hi Petr and John,

On 2024-08-05, Petr Mladek wrote:
>On Sat 2024-08-03 17:12:30, takakura@valinux.co.jp wrote:
>> From: Ryo Takakura <takakura@valinux.co.jp>
>> 
>> After panic, non-panicked CPU's has been unable to flush ringbuffer 
>> while they can still write into it. This can affect CPU backtrace 
>> triggered in panic only able to write into ringbuffer incapable of 
>> flushing them.
>> 
>> Fix the issue by letting the panicked CPU handle the flushing of 
>> ringbuffer right after non-panicked CPUs finished writing their
>> backtraces.
>> 
>> Signed-off-by: Ryo Takakura <takakura@valinux.co.jp>
>> ---
>>  kernel/panic.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>> 
>> diff --git a/kernel/panic.c b/kernel/panic.c
>> index 7e2070925..f94923a63 100644
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -252,8 +252,10 @@ void check_panic_on_warn(const char *origin)
>>   */
>>  static void panic_other_cpus_shutdown(bool crash_kexec)
>>  {
>> -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
>> +	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
>>  		trigger_all_cpu_backtrace();
>> +		console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>
>Hmm, this is too dangerous.
>
>console_flush_on_panic() is supposed to be called at the end on
>panic() as the final desperate attempt to flush consoles.
>It does not take console_lock(). It must not be called before
>stopping non-panic() CPUs.

I see, yes, it is dangerous...  Thanks for pointing this out!

Just out of curiosity, if we only had nbcon consoles which disables 
acquiring console lock after panic as pointed out by John on [0], 
I guess in that case we will be able to call
console_flush_on_panic(CONSOLE_FLUSH_PENDING) in this context.

>We would need to implement something like:
>
>/**
> * console_try_flush - try to flush consoles when safe
> *
> * Context: Any, except for NMI.
> */
>void console_try_flush(void)
>{
>	if (is_printk_legacy_deferred())
>		return;
>
>	if (console_trylock())
>		console_unlock();
>}
>
>, where is_printk_legacy_deferred() is not yet in the mainline. It is a new
>API proposed by the last version of a patchset adding adding write_atomic() callback,
>see https://lore.kernel.org/all/20240804005138.3722656-24-john.ogness@linutronix.de/

Also thanks for potinting the direction.
I'll send the next version with console_try_flush() for flushing backtraces 
as suggested!

>Best Regards,
>Petr

Sincerely,
Ryo Takakura

[0] https://lore.kernel.org/lkml/875xsofl7i.fsf@jogness.linutronix.de/
Re: [PATCH v2 1/2] Handle flushing of CPU backtraces during panic
Posted by John Ogness 1 year, 4 months ago
On 2024-08-05, Petr Mladek <pmladek@suse.com> wrote:
>> diff --git a/kernel/panic.c b/kernel/panic.c
>> index 7e2070925..f94923a63 100644
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -252,8 +252,10 @@ void check_panic_on_warn(const char *origin)
>>   */
>>  static void panic_other_cpus_shutdown(bool crash_kexec)
>>  {
>> -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
>> +	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
>>  		trigger_all_cpu_backtrace();
>> +		console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>
> Hmm, this is too dangerous.
>
> console_flush_on_panic() is supposed to be called at the end on
> panic() as the final desperate attempt to flush consoles.

Thanks for catching this. I keep forgetting that
console_flush_on_panic() is the legacy variant of
nbcon_atomic_flush_unsafe(), but is called much earlier.

John