[PATCH printk v2 08/18] printk: nbcon: Stop threads on shutdown/reboot

John Ogness posted 18 patches 1 year, 8 months ago
There is a newer version of this series
[PATCH printk v2 08/18] printk: nbcon: Stop threads on shutdown/reboot
Posted by John Ogness 1 year, 8 months ago
Register a syscore_ops shutdown function to stop all threaded
printers on shutdown/reboot. This allows printk to cleanly
transition back to atomic printing in order to provide a robust
mechanism for outputting the final messages.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/nbcon.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 4ab47cc014a3..f60d47b5db8a 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -18,6 +18,7 @@
 #include <linux/smp.h>
 #include <linux/stddef.h>
 #include <linux/string.h>
+#include <linux/syscore_ops.h>
 #include <linux/types.h>
 #include "internal.h"
 #include "printk_ringbuffer.h"
@@ -1703,3 +1704,33 @@ void nbcon_device_release(struct console *con)
 	console_srcu_read_unlock(cookie);
 }
 EXPORT_SYMBOL_GPL(nbcon_device_release);
+
+/**
+ * printk_kthread_shutdown - shutdown all threaded printers
+ *
+ * On system shutdown all threaded printers are stopped. This allows printk
+ * to transition back to atomic printing, thus providing a robust mechanism
+ * for the final shutdown/reboot messages to be output.
+ */
+static void printk_kthread_shutdown(void)
+{
+	struct console *con;
+
+	console_list_lock();
+	for_each_console(con) {
+		if (con->flags & CON_NBCON)
+			nbcon_kthread_stop(con);
+	}
+	console_list_unlock();
+}
+
+static struct syscore_ops printk_syscore_ops = {
+	.shutdown = printk_kthread_shutdown,
+};
+
+static int __init printk_init_ops(void)
+{
+	register_syscore_ops(&printk_syscore_ops);
+	return 0;
+}
+device_initcall(printk_init_ops);
-- 
2.39.2
Re: [PATCH printk v2 08/18] printk: nbcon: Stop threads on shutdown/reboot
Posted by Petr Mladek 1 year, 8 months ago
On Tue 2024-06-04 01:30:43, John Ogness wrote:
> Register a syscore_ops shutdown function to stop all threaded
> printers on shutdown/reboot. This allows printk to cleanly
> transition back to atomic printing in order to provide a robust
> mechanism for outputting the final messages.

It looks like a simple straightforward patch. But there are some
hidden questions.

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1703,3 +1704,33 @@ void nbcon_device_release(struct console *con)
>  	console_srcu_read_unlock(cookie);
>  }
>  EXPORT_SYMBOL_GPL(nbcon_device_release);
> +
> +/**
> + * printk_kthread_shutdown - shutdown all threaded printers
> + *
> + * On system shutdown all threaded printers are stopped. This allows printk
> + * to transition back to atomic printing, thus providing a robust mechanism
> + * for the final shutdown/reboot messages to be output.
> + */
> +static void printk_kthread_shutdown(void)

I would rename this to "printk_shutdown_threads() because it
is symmetric to printk_setup_threads().


> +{
> +	struct console *con;
> +
> +	console_list_lock();

First ideas (how the waterfall of questions started):

    + I though that we should do here:

	    printk_threads_enabled = false;

      It would tell vprintk_emit() to call nbcon_atomic_flush_pending().
      And it would be symetric with printk_setup_threads().


    + Also I though whether to call it before or after stopping the kthreads.

      If we set the variable before stopping kthreads then the kthreads
      might try to flush the messages in parallel with the flush from
      vprintk_emit(). It should be safe because it would be serialized
      via the nbcon console context. But it would be the first scenario
      where this happen => not much tested.

      Well, we should start the direct flush before stopping kthreads.
      So that we could see the messages when the stopping fails.


But then I realized:

Ha, vprintk_emit() would call nbcon_atomic_flush_pending() anyway
because these notifiers are called with:

	system_state > SYSTEM_RUNNING

This makes me wonder whether we need to stop the kthreads at all.
How exactly does this make printk more robust during shutdown?

Well, there is one differece. The kthreads could not interferre with
the direct flush when they are stopped.

But we have the same problem also with suspend/resume. And we do not
stop the kthreads in this case.

Maybe, we should just add a check of

       system_state > SYSTEM_RUNNING

also into the nbcon_kthread_func(). Maybe, the kthread should not try
to flush the messages when they are flushed directly by
vprintk_emit().

But then there is the problem with "the current owner is responsible
for flushing everything". vprintk_emit() could not flush the messages
when nbcon_atomic_flush_pending() races with the kthread. And if
the kthread stops flushing then nobody would flush the rest
of the pending messages.

Maybe, the notifier could just call pr_flush() like in suspend_console().
Maybe it is not important to stop the kthreads.

But maybe, the messages get flushed before the kthread is stopped.
Also the kthread could not interfere with direct flush once stopped.
Mabye, this is the idea behind "more robust during suspend".

But why is suspend solved another way then?
Is the suspend less important than shutdown?

My opinion:

I am not against stopping the kthreads. But the mechanism of switching
between kthreads and direct flush should be the same for both suspend
and shutdown. And I am not sure if we need to stop the kthreads then.

> +	for_each_console(con) {
> +		if (con->flags & CON_NBCON)
> +			nbcon_kthread_stop(con);
> +	}
> +	console_list_unlock();
> +}
> +
> +static struct syscore_ops printk_syscore_ops = {
> +	.shutdown = printk_kthread_shutdown,
> +};
> +
> +static int __init printk_init_ops(void)
> +{
> +	register_syscore_ops(&printk_syscore_ops);
> +	return 0;
> +}
> +device_initcall(printk_init_ops);

I guess that "device_initcall" is cut&pasted ;-) IMHO, it does not
fit much. That said, I do not have strong opinion where
it belongs.

IMHO, the best solution would be to register the notifier in
printk_setup_threads() before we actually allow creating them.
(Later update: if we really need to stop them).

Best Regards,
Petr
Re: [PATCH printk v2 08/18] printk: nbcon: Stop threads on shutdown/reboot
Posted by John Ogness 1 year, 7 months ago
On 2024-06-17, Petr Mladek <pmladek@suse.com> wrote:
> First ideas (how the waterfall of questions started):
>
>     + I though that we should do here:
>
> 	    printk_threads_enabled = false;
>
>       It would tell vprintk_emit() to call
>       nbcon_atomic_flush_pending().  And it would be symetric with
>       printk_setup_threads().

This makes sense. I created @printk_threads_enabled to have a clean
startup of the threads. I did not consider recycling it for shutdown
purposes.

>     + Also I though whether to call it before or after stopping the
>       kthreads.
>
>       If we set the variable before stopping kthreads then the
>       kthreads might try to flush the messages in parallel with the
>       flush from vprintk_emit(). It should be safe because it would be
>       serialized via the nbcon console context. But it would be the
>       first scenario where this happen => not much tested.

It is safe. The scenario is not that unusual. Non-printing activities
can contend with thread via the port_lock wrapper quite often.

>       Well, we should start the direct flush before stopping kthreads.
>       So that we could see the messages when the stopping fails.
>
>
> But then I realized:
>
> Ha, vprintk_emit() would call nbcon_atomic_flush_pending() anyway
> because these notifiers are called with:
>
> 	system_state > SYSTEM_RUNNING
>
> This makes me wonder whether we need to stop the kthreads at all.

Well, that overlap was not intentional. There probably should be a flag
to signify the transition rather than just looking at @system_state.

> How exactly does this make printk more robust during shutdown?

Because by stopping them, the printing threads are guaranteed to go
silent before they suddenly freeze. Without the clean shutdown, I could
create shutdown/reboot scenarios where the printing thread would stop
mid-line. Then the atomic printing would never be able to finish because
it is a non-panic situation and the thread was frozen while holding the
context. The result: the user never sees the final printk messages.

> Well, there is one differece. The kthreads could not interferre with
> the direct flush when they are stopped.
>
> But we have the same problem also with suspend/resume. And we do not
> stop the kthreads in this case.

Suspend/resume is something different. Suspend needs to stop _all_
printing. I do not think it makes sense to shutdown threads for
this. The consoles becoming temporarily !usable() is enough.

For shutdown, the consoles are usable() the entire time. I just want the
threads to get out of the way before they freeze so that the user can
see all the messages on shutdown/reboot.

> Maybe, we should just add a check of
>
>        system_state > SYSTEM_RUNNING
>
> also into the nbcon_kthread_func(). Maybe, the kthread should not try
> to flush the messages when they are flushed directly by
> vprintk_emit().

It gets racy when we start relying on the contexts noticing some
variables. By racy, I mean there are scenarios where there are unprinted
records and no context is printing. I think it is easiest when the
following steps occur within a notifier:

1. notifier sets a flag to allow atomic printing from vprintk_emit()

2. notifier stops all threads (with the kthread_should_stop() moved
   inside the printing loop of nbcon_kthread_func())

3. notifier performs nbcon_atomic_flush_pending()

This guarantees that no messages are lost and that all get flushed.

>> +static int __init printk_init_ops(void)
>> +{
>> +	register_syscore_ops(&printk_syscore_ops);
>> +	return 0;
>> +}
>> +device_initcall(printk_init_ops);
>
> I guess that "device_initcall" is cut&pasted ;-) IMHO, it does not
> fit much. That said, I do not have strong opinion where
> it belongs.
>
> IMHO, the best solution would be to register the notifier in
> printk_setup_threads() before we actually allow creating them.

OK.

John Ogness
Re: [PATCH printk v2 08/18] printk: nbcon: Stop threads on shutdown/reboot
Posted by Petr Mladek 1 year, 7 months ago
On Tue 2024-06-25 22:02:30, John Ogness wrote:
> On 2024-06-17, Petr Mladek <pmladek@suse.com> wrote:
> > This makes me wonder whether we need to stop the kthreads at all.
> 
> > How exactly does this make printk more robust during shutdown?
> 
> Because by stopping them, the printing threads are guaranteed to go
> silent before they suddenly freeze. Without the clean shutdown, I could
> create shutdown/reboot scenarios where the printing thread would stop
> mid-line. Then the atomic printing would never be able to finish because
> it is a non-panic situation and the thread was frozen while holding the
> context. The result: the user never sees the final printk messages.

I see.

> > Well, there is one differece. The kthreads could not interferre with
> > the direct flush when they are stopped.
> >
> > But we have the same problem also with suspend/resume. And we do not
> > stop the kthreads in this case.
> 
> Suspend/resume is something different. Suspend needs to stop _all_
> printing. I do not think it makes sense to shutdown threads for
> this. The consoles becoming temporarily !usable() is enough.

Just to be sure that we are on the same page. I made a closer
look at the suspend code. I see 4 important milestones:

  1. suspend_console();

     The consoles are intentionally stopped after this point.
     It allows to suspend the related hardware, for example,
     the serial port.


  2. pm_sleep_disable_secondary_cpus();

     Other CPUs are stopped at this point. The kthread might still
     be theoretically scheduled.


  3. local_irq_disable();

     IRQs get disabled. Neither the hardware nor the kthread could
     work after this point.


  4. system_state = SYSTEM_SUSPEND;

     The global value is set. vprintk_emit() starts calling
     nbcon_atomic_flush_pending().


From some reasons, I thought that ordering was different and
the "system_state" was set much earlier. This is why I thought
that it would make sense to "stop" the kthreads.

But it seems that, in reality, the nbcon_atomic_flush_pending() in
vprintk_emit() would never do anything during suspend. The consoles
are already suspended (unusable) when the system_state gets set
to SYSTEM_SUSPEND.

It means that we do not need to synchronize the printk kthreads with
vprintk_emit() for suspend. The only important thing is
the pr_flush() in suspend_console().


> For shutdown, the consoles are usable() the entire time. I just want the
> threads to get out of the way before they freeze so that the user can
> see all the messages on shutdown/reboot.

I see what you mean now. The consoles are not suspended => they are
usable all the time.


> > Maybe, we should just add a check of
> >
> >        system_state > SYSTEM_RUNNING
> >
> > also into the nbcon_kthread_func(). Maybe, the kthread should not try
> > to flush the messages when they are flushed directly by
> > vprintk_emit().
> 
> It gets racy when we start relying on the contexts noticing some
> variables. By racy, I mean there are scenarios where there are unprinted
> records and no context is printing. I think it is easiest when the
> following steps occur within a notifier:
> 
> 1. notifier sets a flag to allow atomic printing from vprintk_emit()
> 
> 2. notifier stops all threads (with the kthread_should_stop() moved
>    inside the printing loop of nbcon_kthread_func())
> 
> 3. notifier performs nbcon_atomic_flush_pending()
> 
> This guarantees that no messages are lost and that all get flushed.

Makes sense.

Best Regards,
Petr