[PATCH printk v3 05/19] printk: Fail pr_flush() if before SYSTEM_SCHEDULING

John Ogness posted 19 patches 1 year, 4 months ago
There is a newer version of this series
[PATCH printk v3 05/19] printk: Fail pr_flush() if before SYSTEM_SCHEDULING
Posted by John Ogness 1 year, 4 months ago
A follow-up change adds pr_flush() to console unregistration.
However, with boot consoles unregistration can happen very
early if there are also regular consoles registering as well.
In this case the pr_flush() is not important because the
regular console manually flushes the boot consoles before
unregistering them.

Allow pr_flush() to fail if @system_state has not yet reached
SYSTEM_SCHEDULING. This avoids might_sleep() and msleep()
explosions that would otherwise occur.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 641c2a8b0a09..39db56a32c5e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3946,6 +3946,10 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 	u64 diff;
 	u64 seq;
 
+	/* Sorry, pr_flush() will not work this early. */
+	if (system_state < SYSTEM_SCHEDULING)
+		return false;
+
 	might_sleep();
 
 	seq = prb_next_reserve_seq(prb);
-- 
2.39.2
Re: [PATCH printk v3 05/19] printk: Fail pr_flush() if before SYSTEM_SCHEDULING
Posted by Petr Mladek 1 year, 4 months ago
On Mon 2024-07-22 19:25:25, John Ogness wrote:
> A follow-up change adds pr_flush() to console unregistration.
> However, with boot consoles unregistration can happen very
> early if there are also regular consoles registering as well.
> In this case the pr_flush() is not important because the
> regular console manually flushes the boot consoles before
> unregistering them.

It is not much clear what the "manual flush" means. Is it
the console_flush_all() in get_init_console_seq()?

I would write something like:

<proposal>
In this case the pr_flush() is not important because all consoles
are flushed when checking the initial console sequence number.
</proposal>

> Allow pr_flush() to fail if @system_state has not yet reached
> SYSTEM_SCHEDULING. This avoids might_sleep() and msleep()
> explosions that would otherwise occur.

What do you mean with the explosion, please?
Does it add some warnings into the log?
Does it cause extra delays?

> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> ---
>  kernel/printk/printk.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 641c2a8b0a09..39db56a32c5e 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3946,6 +3946,10 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
>  	u64 diff;
>  	u64 seq;
>  
> +	/* Sorry, pr_flush() will not work this early. */
> +	if (system_state < SYSTEM_SCHEDULING)
> +		return false;
> +
>  	might_sleep();
>  
>  	seq = prb_next_reserve_seq(prb);

The change seems acceptable. I just do not completely understand the
motivation.

Best Regards,
Petr
Re: [PATCH printk v3 05/19] printk: Fail pr_flush() if before SYSTEM_SCHEDULING
Posted by John Ogness 1 year, 4 months ago
On 2024-07-26, Petr Mladek <pmladek@suse.com> wrote:
> On Mon 2024-07-22 19:25:25, John Ogness wrote:
>> A follow-up change adds pr_flush() to console unregistration.
>> However, with boot consoles unregistration can happen very
>> early if there are also regular consoles registering as well.
>> In this case the pr_flush() is not important because the
>> regular console manually flushes the boot consoles before
>> unregistering them.
>
> It is not much clear what the "manual flush" means. Is it
> the console_flush_all() in get_init_console_seq()?
>
> I would write something like:
>
> <proposal>
> In this case the pr_flush() is not important because all consoles
> are flushed when checking the initial console sequence number.
> </proposal>

Yes, clearer. Thanks.

>> Allow pr_flush() to fail if @system_state has not yet reached
>> SYSTEM_SCHEDULING. This avoids might_sleep() and msleep()
>> explosions that would otherwise occur.
>
> What do you mean with the explosion, please?
> Does it add some warnings into the log?
> Does it cause extra delays?

I am certain that I could trigger a splat with might_sleep() using some
configured preemption mode. But now I cannot reproduce it. However, with
msleep() it is easy:

[    0.436739][    T0] printk: legacy console [ttyS0] enabled
[    0.439820][    T0] printk: legacy bootconsole [earlyser0] disabled
[    0.446822][    T0] BUG: scheduling while atomic: swapper/0/0/0x00000002
[    0.450491][    T0] 1 lock held by swapper/0/0:
[    0.457897][    T0]  #0: ffffffff82ae5f88 (console_mutex){+.+.}-{4:4}, at: console_list_lock+0x20/0x70
[    0.463141][    T0] Modules linked in:
[    0.465307][    T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.10.0-rc1+ #372
[    0.469394][    T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[    0.474402][    T0] Call Trace:
[    0.476246][    T0]  <TASK>
[    0.481473][    T0]  dump_stack_lvl+0x93/0xb0
[    0.483949][    T0]  dump_stack+0x10/0x20
[    0.486256][    T0]  __schedule_bug+0x68/0x90
[    0.488753][    T0]  __schedule+0xb9b/0xd80
[    0.491179][    T0]  ? lock_release+0xb5/0x270
[    0.493732][    T0]  schedule+0x43/0x170
[    0.495998][    T0]  schedule_timeout+0xc5/0x1e0
[    0.498634][    T0]  ? __pfx_process_timeout+0x10/0x10
[    0.501522][    T0]  ? msleep+0x13/0x50
[    0.503728][    T0]  msleep+0x3c/0x50
[    0.505847][    T0]  __pr_flush.constprop.0.isra.0+0x56/0x500
[    0.509050][    T0]  ? _printk+0x58/0x80
[    0.511332][    T0]  ? lock_is_held_type+0x9c/0x110
[    0.514106][    T0]  unregister_console_locked+0xe1/0x450
[    0.517144][    T0]  register_console+0x509/0x620
[    0.519827][    T0]  ? __pfx_univ8250_console_init+0x10/0x10
[    0.523042][    T0]  univ8250_console_init+0x24/0x40
[    0.525845][    T0]  console_init+0x43/0x210
[    0.528280][    T0]  start_kernel+0x493/0x980
[    0.530773][    T0]  x86_64_start_reservations+0x18/0x30
[    0.533755][    T0]  x86_64_start_kernel+0xae/0xc0
[    0.536473][    T0]  common_startup_64+0x12c/0x138
[    0.539210][    T0]  </TASK>

And then the kernel goes into an infinite loop complaining about:

1. releasing a pinned lock
2. unpinning an unpinned lock
3. bad: scheduling from the idle thread!
4. goto 1

John
Re: [PATCH printk v3 05/19] printk: Fail pr_flush() if before SYSTEM_SCHEDULING
Posted by Petr Mladek 1 year, 4 months ago
On Fri 2024-07-26 16:51:47, John Ogness wrote:
> On 2024-07-26, Petr Mladek <pmladek@suse.com> wrote:
> > On Mon 2024-07-22 19:25:25, John Ogness wrote:
> >> A follow-up change adds pr_flush() to console unregistration.
> >> However, with boot consoles unregistration can happen very
> >> early if there are also regular consoles registering as well.
> >> In this case the pr_flush() is not important because the
> >> regular console manually flushes the boot consoles before
> >> unregistering them.
> >
> > It is not much clear what the "manual flush" means. Is it
> > the console_flush_all() in get_init_console_seq()?
> >
> > I would write something like:
> >
> > <proposal>
> > In this case the pr_flush() is not important because all consoles
> > are flushed when checking the initial console sequence number.
> > </proposal>
> 
> Yes, clearer. Thanks.
> 
> >> Allow pr_flush() to fail if @system_state has not yet reached
> >> SYSTEM_SCHEDULING. This avoids might_sleep() and msleep()
> >> explosions that would otherwise occur.
> >
> > What do you mean with the explosion, please?
> > Does it add some warnings into the log?
> > Does it cause extra delays?
> 
> I am certain that I could trigger a splat with might_sleep() using some
> configured preemption mode. But now I cannot reproduce it. However, with
> msleep() it is easy:
> 
> [    0.436739][    T0] printk: legacy console [ttyS0] enabled
> [    0.439820][    T0] printk: legacy bootconsole [earlyser0] disabled
> [    0.446822][    T0] BUG: scheduling while atomic: swapper/0/0/0x00000002

It complains about sheduling while atomic.

> [    0.450491][    T0] 1 lock held by swapper/0/0:
> [    0.457897][    T0]  #0: ffffffff82ae5f88 (console_mutex){+.+.}-{4:4}, at: console_list_lock+0x20/0x70

But it is under a mutex so that scheduling should be possible.

It is weird.

> [    0.463141][    T0] Modules linked in:
> [    0.465307][    T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.10.0-rc1+ #372
> [    0.469394][    T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> [    0.474402][    T0] Call Trace:
> [    0.476246][    T0]  <TASK>
> [    0.481473][    T0]  dump_stack_lvl+0x93/0xb0
> [    0.483949][    T0]  dump_stack+0x10/0x20
> [    0.486256][    T0]  __schedule_bug+0x68/0x90
> [    0.488753][    T0]  __schedule+0xb9b/0xd80
> [    0.491179][    T0]  ? lock_release+0xb5/0x270
> [    0.493732][    T0]  schedule+0x43/0x170
> [    0.495998][    T0]  schedule_timeout+0xc5/0x1e0
> [    0.498634][    T0]  ? __pfx_process_timeout+0x10/0x10
> [    0.501522][    T0]  ? msleep+0x13/0x50
> [    0.503728][    T0]  msleep+0x3c/0x50
> [    0.505847][    T0]  __pr_flush.constprop.0.isra.0+0x56/0x500
> [    0.509050][    T0]  ? _printk+0x58/0x80
> [    0.511332][    T0]  ? lock_is_held_type+0x9c/0x110
> [    0.514106][    T0]  unregister_console_locked+0xe1/0x450
> [    0.517144][    T0]  register_console+0x509/0x620
> [    0.519827][    T0]  ? __pfx_univ8250_console_init+0x10/0x10
> [    0.523042][    T0]  univ8250_console_init+0x24/0x40
> [    0.525845][    T0]  console_init+0x43/0x210
> [    0.528280][    T0]  start_kernel+0x493/0x980
> [    0.530773][    T0]  x86_64_start_reservations+0x18/0x30
> [    0.533755][    T0]  x86_64_start_kernel+0xae/0xc0
> [    0.536473][    T0]  common_startup_64+0x12c/0x138
> [    0.539210][    T0]  </TASK>
> 
> And then the kernel goes into an infinite loop complaining about:
> 
> 1. releasing a pinned lock
> 2. unpinning an unpinned lock
> 3. bad: scheduling from the idle thread!
> 4. goto 1

Hmm, I have reproduced it as well. I do not understand it much.
But yeah, this is early during the boot and some things does
not work as expected.

I do not see any better solution. I am fine with the patch.

Well, it might we worth adding the above backtrace to the commit
message so that people know what we have seen.

Best Regards,
Petr