[PATCH] printk/console: Enable console kthreads only when there is no boot console left

Petr Mladek posted 1 patch 3 years, 10 months ago
There is a newer version of this series
kernel/printk/printk.c | 38 +++++++++++++++++++++++++-------------
1 file changed, 25 insertions(+), 13 deletions(-)
[PATCH] printk/console: Enable console kthreads only when there is no boot console left
Posted by Petr Mladek 3 years, 10 months ago
The console kthreads uncovered several races in console drivers.
All problems were in situation when a console was being properly
initialized and registered while an early console, using the same
port, was being used.

These problems are pretty hard to debug because they often result
into silent boot crashes. It would be nice to fix them but it
looks like a can of worms.

Prevent these problems by delaying the use of console kthreads
after all early consoles are gone. It might later be optimized.
But let's close this can of worms with a big hammer for now
so that they do not break first impression on the kthreads
that solve other real problems.

Link: https://lore.kernel.org/r/20220619204949.50d9154d@thinkpad
Link: https://lore.kernel.org/r/2a82eae7-a256-f70c-fd82-4e510750906e@samsung.com
Reported=by: Marek Behún <kabel@kernel.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 38 +++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b095fb5f5f61..c0c5e2b6b91d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3551,6 +3551,19 @@ void __init console_init(void)
 	}
 }
 
+static int __init printk_activate_kthreads(void)
+{
+	struct console *con;
+
+	console_lock();
+	printk_kthreads_available = true;
+	for_each_console(con)
+		printk_start_kthread(con);
+	console_unlock();
+
+	return 0;
+}
+
 /*
  * Some boot consoles access data that is in the init section and which will
  * be discarded after the initcalls have been run. To make sure that no code
@@ -3567,6 +3580,7 @@ void __init console_init(void)
  */
 static int __init printk_late_init(void)
 {
+	bool no_bootcon = true;
 	struct console *con;
 	int ret;
 
@@ -3588,7 +3602,10 @@ static int __init printk_late_init(void)
 			pr_warn("bootconsole [%s%d] uses init memory and must be disabled even before the real one is ready\n",
 				con->name, con->index);
 			unregister_console(con);
+			continue;
 		}
+
+		no_bootcon = false;
 	}
 	ret = cpuhp_setup_state_nocalls(CPUHP_PRINTK_DEAD, "printk:dead", NULL,
 					console_cpu_notify);
@@ -3597,23 +3614,18 @@ static int __init printk_late_init(void)
 					console_cpu_notify, NULL);
 	WARN_ON(ret < 0);
 	printk_sysctl_init();
-	return 0;
-}
-late_initcall(printk_late_init);
-
-static int __init printk_activate_kthreads(void)
-{
-	struct console *con;
 
-	console_lock();
-	printk_kthreads_available = true;
-	for_each_console(con)
-		printk_start_kthread(con);
-	console_unlock();
+	/*
+	 * Some console drivers are not ready to use the same port with
+	 * boot (early) and normal console in parallel. Stay on the safe
+	 * side and enable kthreads only when there is no boot console.
+	 */
+	if (no_bootcon)
+		printk_activate_kthreads();
 
 	return 0;
 }
-early_initcall(printk_activate_kthreads);
+late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
 /* If @con is specified, only wait for that console. Otherwise wait for all. */
-- 
2.35.3
Re: [PATCH] printk/console: Enable console kthreads only when there is no boot console left
Posted by John Ogness 3 years, 10 months ago
On 2022-06-20, Petr Mladek <pmladek@suse.com> wrote:
> The console kthreads uncovered several races in console drivers.
> All problems were in situation when a console was being properly
> initialized and registered while an early console, using the same
> port, was being used.
>
> These problems are pretty hard to debug because they often result
> into silent boot crashes. It would be nice to fix them but it
> looks like a can of worms.
>
> Prevent these problems by delaying the use of console kthreads
> after all early consoles are gone. It might later be optimized.
> But let's close this can of worms with a big hammer for now
> so that they do not break first impression on the kthreads
> that solve other real problems.

The problem is that there are multiple consoles accessing the same
hardware. The threaded console printing model did not consider this. The
commit message should clearly state the actual problem.

Since early printing does not use the kernel driver model, I do not see
a clear solution at this point. But since using early printing is not a
typical use, I am OK with this patch.

For the commit message I would suggest something like:

Threaded console printing does not take into consideration that boot
consoles may be accessing the same hardware as normal consoles and thus
must not be called in parallel.

Since it is currently not possible to identify which consoles are
accessing the same hardware, delay threaded console printing activation
until it is known that there are no boot consoles registered.

> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index b095fb5f5f61..c0c5e2b6b91d 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3597,23 +3614,18 @@ static int __init printk_late_init(void)
>  					console_cpu_notify, NULL);
>  	WARN_ON(ret < 0);
>  	printk_sysctl_init();
> -	return 0;
> -}
> -late_initcall(printk_late_init);
> -
> -static int __init printk_activate_kthreads(void)
> -{
> -	struct console *con;
>  
> -	console_lock();
> -	printk_kthreads_available = true;
> -	for_each_console(con)
> -		printk_start_kthread(con);
> -	console_unlock();
> +	/*
> +	 * Some console drivers are not ready to use the same port with
> +	 * boot (early) and normal console in parallel. Stay on the safe
> +	 * side and enable kthreads only when there is no boot console.
> +	 */

As with the commit message, this comment is not pointing out the real
issue. I suggest:

	/*
	 * Boot consoles may be accessing the same hardware as normal
	 * consoles and thus must not be called in parallel. Therefore
	 * only activate threaded console printing if it is known that
	 * there are no boot consoles registered.
	 */

> +	if (no_bootcon)
> +		printk_activate_kthreads();
>  
>  	return 0;
>  }
> -early_initcall(printk_activate_kthreads);
> +late_initcall(printk_late_init);
>  
>  #if defined CONFIG_PRINTK
>  /* If @con is specified, only wait for that console. Otherwise wait for all. */

John Ogness
Re: [PATCH] printk/console: Enable console kthreads only when there is no boot console left
Posted by Steven Rostedt 3 years, 9 months ago
On Tue, 21 Jun 2022 00:57:09 +0206
John Ogness <john.ogness@linutronix.de> wrote:

> Since early printing does not use the kernel driver model, I do not see
> a clear solution at this point. But since using early printing is not a
> typical use, I am OK with this patch.

Does this patch still work if you add the "keep" flag to the
earlyprintk kernel parameter?

-- Steve
Re: [PATCH] printk/console: Enable console kthreads only when there is no boot console left
Posted by Petr Mladek 3 years, 9 months ago
On Fri 2022-06-24 18:41:36, Steven Rostedt wrote:
> On Tue, 21 Jun 2022 00:57:09 +0206
> John Ogness <john.ogness@linutronix.de> wrote:
> 
> > Since early printing does not use the kernel driver model, I do not see
> > a clear solution at this point. But since using early printing is not a
> > typical use, I am OK with this patch.
> 
> Does this patch still work if you add the "keep" flag to the
> earlyprintk kernel parameter?

It works. If the boot console stays then the console kthreads will
never get enabled.

Well, I hope that we will be able to add the synchronization and remove
this constraint. I think that the port->lock should be usable in
most cases.

Best Regards,
Petr
Re: [PATCH] printk/console: Enable console kthreads only when there is no boot console left
Posted by Linus Torvalds 3 years, 10 months ago
On Mon, Jun 20, 2022 at 10:14 AM Petr Mladek <pmladek@suse.com> wrote:
>
> The console kthreads uncovered several races in console drivers.

I really want to make it clear that this was NOT some kind of "races
in drivers".

Console drivers may very well  have intentionally avoided taking locks
for console output, since the printk output was supposed to be
serialized by printk.

Don't try to make this some kind of "buggy drivers" thing. This is on
printk, not on anything else.

Assuming this solves all issues, I'm ok with this approach, but I
really want this to be clearly about printk being buggy, no "blame the
drivers" garbage.

And if there are other issues coming up, we revert the whole thing entirely.

Because printk is too important to play games with, and too important
to try to blame drivers.

               Linus
Re: [PATCH] printk/console: Enable console kthreads only when there is no boot console left
Posted by Petr Mladek 3 years, 10 months ago
On Mon 2022-06-20 14:10:20, Linus Torvalds wrote:
> On Mon, Jun 20, 2022 at 10:14 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > The console kthreads uncovered several races in console drivers.
> 
> I really want to make it clear that this was NOT some kind of "races
> in drivers".
>
> Console drivers may very well  have intentionally avoided taking locks
> for console output, since the printk output was supposed to be
> serialized by printk.
> 
> Don't try to make this some kind of "buggy drivers" thing. This is on
> printk, not on anything else.

OK, I see that uart_console_write() is used by
early_serial8250_write() without port->lock. It means that it is
racy against serial8250_console_write(). It might
cause problems reported by this thread. And you are
right that it has never been used in parallel before the kthreads.

But I believe that it might cause real problems. serial8250_console_write()
takes port->lock to get serialized against other operations on the
port. And there might be some when the same port is added as
a proper serial console.

Today I found that probe_baud() is called from
serial8250_console_setup() without port->lock. It does reads and
writes. I believe that it might break with the earlycon.

Also the commit 589f892ac8ef244e47c5a ("serial: meson:
acquire port->lock in startup()") fixes a race between
meson_serial_port_write() and meson_uart_startup(), where
meson_serial_port_write() is used by both early and proper
console driver. The problem was there even without kthreads.
They just made it more visible.

My colleagues familiar with ARM told me that they heard about
boot freezes with early consoles before threads. The kthreads
allow to reproduce and fix them. In the end, they make the early
consoles more reliable.


> Assuming this solves all issues, I'm ok with this approach, but I
> really want this to be clearly about printk being buggy, no "blame the
> drivers" garbage.
>
> And if there are other issues coming up, we revert the whole thing entirely.
> 
> Because printk is too important to play games with, and too important
> to try to blame drivers.

I take printk() really seriously. And I definitely do not want
to wave out problems as others problem.

I do not want to release 5.19 with broken printk(). But the kthreads
solve real bugs where printk() put the system into knees. I want
to invest much more time on improving them and fixing related
problems. Unfortunately, linux-next was not able to catch
the recently reported problems and we were not able to fix them
in advance.

All the recent fixes were generic and should make printk() with
kthreads much more reliable. I can't be sure if it will be enough.
I could only say that I am going to fix any new ones.

Of course, if people continue reporting problems, we would need
to revert it for 5.19. But I would really like to give it another
chance later.

Best Regards,
Petr
Re: [PATCH] printk/console: Enable console kthreads only when there is no boot console left
Posted by Marek Behún 3 years, 10 months ago
On Mon, 20 Jun 2022 17:14:27 +0200
Petr Mladek <pmladek@suse.com> wrote:

> The console kthreads uncovered several races in console drivers.
> All problems were in situation when a console was being properly
> initialized and registered while an early console, using the same
> port, was being used.
> 
> These problems are pretty hard to debug because they often result
> into silent boot crashes. It would be nice to fix them but it
> looks like a can of worms.
> 
> Prevent these problems by delaying the use of console kthreads
> after all early consoles are gone. It might later be optimized.
> But let's close this can of worms with a big hammer for now
> so that they do not break first impression on the kthreads
> that solve other real problems.
> 
> Link: https://lore.kernel.org/r/20220619204949.50d9154d@thinkpad
> Link: https://lore.kernel.org/r/2a82eae7-a256-f70c-fd82-4e510750906e@samsung.com
> Reported=by: Marek Behún <kabel@kernel.org>
> Signed-off-by: Petr Mladek <pmladek@suse.com>

This solves the issue. You have a typo in the reported-by tag
('=' instead of '-').

Tested-by: Marek Behún <kabel@kernel.org>
Re: [PATCH] printk/console: Enable console kthreads only when there is no boot console left
Posted by Petr Mladek 3 years, 10 months ago
On Mon 2022-06-20 17:14:30, Petr Mladek wrote:
> The console kthreads uncovered several races in console drivers.
> All problems were in situation when a console was being properly
> initialized and registered while an early console, using the same
> port, was being used.
> 
> These problems are pretty hard to debug because they often result
> into silent boot crashes. It would be nice to fix them but it
> looks like a can of worms.
> 
> Prevent these problems by delaying the use of console kthreads
> after all early consoles are gone. It might later be optimized.
> But let's close this can of worms with a big hammer for now
> so that they do not break first impression on the kthreads
> that solve other real problems.

Marek, could you please try if this patch solves the problem
for you?

Best Regards,
Petr