[v1] printk: Release console_lock between printing records in legacy thread

[PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread

Posted by Andrew Murray 2 weeks, 3 days ago

The legacy printer kthread uses console_lock and
__console_flush_and_unlock to flush records to the console. This
approach results in the console_lock being held for the entire
duration of a flush. This can result in large waiting times for
those waiting for console_lock especially where there is a large
volume of records or where the console is slow (e.g. serial). This
contention is observed during boot, as the call to filp_open in
console_on_rootfs will delay progression to userspace until any
in-flight flush is completed.

Let's instead use __console_flush_unlocked which releases and
reacquires console_lock between records.

Signed-off-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
---
 kernel/printk/printk.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 75a3c47e9c0e645a3198c5f56e47df2a8d1871e6..53daab5cdee537c2ff55702104e495005352db1b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3301,6 +3301,46 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
 	return any_usable;
 }
 
+/*
+ * Print out all remaining records to all consoles.
+ *
+ * @next_seq is set to the sequence number after the last available record.
+ * The value is valid only when this function returns true. It means that all
+ * usable consoles are completely flushed.
+ *
+ * @handover will be set to true if a printk waiter has taken over the
+ * console_lock, in which case the caller is no longer holding the
+ * console_lock. Otherwise it is set to false.
+ *
+ * Returns true when there was at least one usable console and all messages
+ * were flushed to all usable consoles. A returned false informs the caller
+ * that everything was not flushed (either there were no usable consoles or
+ * another context has taken over printing or it is a panic situation and this
+ * is not the panic CPU). Regardless the reason, the caller should assume it
+ * is not useful to immediately try again.
+ */
+static bool console_flush_all_unlocked(u64 *next_seq, bool *handover)
+{
+	bool any_usable;
+	bool any_progress;
+
+	*next_seq = 0;
+	*handover = false;
+
+	do {
+		console_lock();
+		any_progress = console_flush_one_record(true, next_seq, handover, &any_usable);
+
+		if (*handover)
+			return false;
+
+		__console_unlock();
+
+	} while (any_progress);
+
+	return any_usable;
+}
+
 static void __console_flush_and_unlock(void)
 {
 	bool do_cond_resched;
@@ -3346,6 +3386,17 @@ static void __console_flush_and_unlock(void)
 	} while (prb_read_valid(prb, next_seq, NULL) && console_trylock());
 }
 
+static void __console_flush_unlocked(void)
+{
+	bool handover;
+	bool flushed;
+	u64 next_seq;
+
+	do {
+		flushed = console_flush_all_unlocked(&next_seq, &handover);
+	} while (flushed && !handover && prb_read_valid(prb, next_seq, NULL));
+}
+
 /**
  * console_unlock - unblock the legacy console subsystem from printing
  *
@@ -3676,8 +3727,7 @@ static int legacy_kthread_func(void *unused)
 		if (kthread_should_stop())
 			break;
 
-		console_lock();
-		__console_flush_and_unlock();
+		__console_flush_unlocked();
 	}
 
 	return 0;

-- 
2.34.1

Re: [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread

Posted by Petr Mladek 1 week, 6 days ago

On Mon 2025-09-15 13:43:06, Andrew Murray wrote:
> The legacy printer kthread uses console_lock and
> __console_flush_and_unlock to flush records to the console. This
> approach results in the console_lock being held for the entire
> duration of a flush. This can result in large waiting times for
> those waiting for console_lock especially where there is a large
> volume of records or where the console is slow (e.g. serial). This
> contention is observed during boot, as the call to filp_open in
> console_on_rootfs will delay progression to userspace until any
> in-flight flush is completed.

It would be great to add here the boot logs from the cover letter
so that the real life numbers are stored in the git log.

> Let's instead use __console_flush_unlocked which releases and
> reacquires console_lock between records.

It seems that the patch does the right thing.
I would just suggest some refactoring, see below.

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3301,6 +3301,46 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
>  	return any_usable;
>  }
>  
> +/*
> + * Print out all remaining records to all consoles.
> + *
> + * @next_seq is set to the sequence number after the last available record.
> + * The value is valid only when this function returns true. It means that all
> + * usable consoles are completely flushed.
> + *
> + * @handover will be set to true if a printk waiter has taken over the
> + * console_lock, in which case the caller is no longer holding the
> + * console_lock. Otherwise it is set to false.
> + *
> + * Returns true when there was at least one usable console and all messages
> + * were flushed to all usable consoles. A returned false informs the caller
> + * that everything was not flushed (either there were no usable consoles or
> + * another context has taken over printing or it is a panic situation and this
> + * is not the panic CPU). Regardless the reason, the caller should assume it
> + * is not useful to immediately try again.
> + */
> +static bool console_flush_all_unlocked(u64 *next_seq, bool *handover)
> +{
> +	bool any_usable;
> +	bool any_progress;
> +
> +	*next_seq = 0;
> +	*handover = false;
> +
> +	do {
> +		console_lock();
> +		any_progress = console_flush_one_record(true, next_seq, handover, &any_usable);
> +
> +		if (*handover)
> +			return false;
> +
> +		__console_unlock();
> +
> +	} while (any_progress);
> +
> +	return any_usable;
> +}

This is yet another console_flush() API with a complicated semantic.
(return value, parameters). It is not bad. But it would be nice
if we did not need it ;-)

>  static void __console_flush_and_unlock(void)
>  {
>  	bool do_cond_resched;
> @@ -3346,6 +3386,17 @@ static void __console_flush_and_unlock(void)
>  	} while (prb_read_valid(prb, next_seq, NULL) && console_trylock());
>  }
>  
> +static void __console_flush_unlocked(void)
> +{
> +	bool handover;
> +	bool flushed;
> +	u64 next_seq;
> +
> +	do {
> +		flushed = console_flush_all_unlocked(&next_seq, &handover);
> +	} while (flushed && !handover && prb_read_valid(prb, next_seq, NULL));
> +}

The semantic of this function is not much clear from the name.
IMHO, it would be a puzzle for people who would try to understand
the code in the future.

> +
>  /**
>   * console_unlock - unblock the legacy console subsystem from printing
>   *
> @@ -3676,8 +3727,7 @@ static int legacy_kthread_func(void *unused)
>  		if (kthread_should_stop())
>  			break;
>  
> -		console_lock();
> -		__console_flush_and_unlock();
> +		__console_flush_unlocked();

IMHO, it would be pretty hard to understand what it does by trying to
understand all the layers of the code.

It might be better to open code it. And I would take inspiration
in the nbcon_kthread_func().

I played with the code and came up with:

static int legacy_kthread_func(void *unused)
{
	bool any_progress;

wait_for_event:
	wait_event_interruptible(legacy_wait, legacy_kthread_should_wakeup());

	do {
		bool any_usable;
		bool handover;
		u64 next_seq;

		if (kthread_should_stop())
			return 0;

		console_lock();
		any_progress = console_flush_one_record(true, &next_seq, &handover, &any_usable);
		if (!handover)
			__console_unlock();

		/*
		 * There is no need to check whether there is any usable
		 * console in this context or whether there are pending
		 * messages. It is checked by legacy_kthread_should_wakeup()
		 * anyway. And console_lock() will never succeed again if
		 * there was panic in progress.
		 */
	} while (any_progress);

	goto wait_for_event;
}

What do you think, please?

Best Regards,
Petr

Re: [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread

Posted by Andrew Murray 1 week, 6 days ago

On Thu, 18 Sept 2025 at 17:27, Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2025-09-15 13:43:06, Andrew Murray wrote:
> > The legacy printer kthread uses console_lock and
> > __console_flush_and_unlock to flush records to the console. This
> > approach results in the console_lock being held for the entire
> > duration of a flush. This can result in large waiting times for
> > those waiting for console_lock especially where there is a large
> > volume of records or where the console is slow (e.g. serial). This
> > contention is observed during boot, as the call to filp_open in
> > console_on_rootfs will delay progression to userspace until any
> > in-flight flush is completed.
>
> It would be great to add here the boot logs from the cover letter
> so that the real life numbers are stored in the git log.

No problem, I'll add them to the next revision.


>
> > Let's instead use __console_flush_unlocked which releases and
> > reacquires console_lock between records.
>
> It seems that the patch does the right thing.
> I would just suggest some refactoring, see below.
>
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -3301,6 +3301,46 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
> >       return any_usable;
> >  }
> >
> > +/*
> > + * Print out all remaining records to all consoles.
> > + *
> > + * @next_seq is set to the sequence number after the last available record.
> > + * The value is valid only when this function returns true. It means that all
> > + * usable consoles are completely flushed.
> > + *
> > + * @handover will be set to true if a printk waiter has taken over the
> > + * console_lock, in which case the caller is no longer holding the
> > + * console_lock. Otherwise it is set to false.
> > + *
> > + * Returns true when there was at least one usable console and all messages
> > + * were flushed to all usable consoles. A returned false informs the caller
> > + * that everything was not flushed (either there were no usable consoles or
> > + * another context has taken over printing or it is a panic situation and this
> > + * is not the panic CPU). Regardless the reason, the caller should assume it
> > + * is not useful to immediately try again.
> > + */
> > +static bool console_flush_all_unlocked(u64 *next_seq, bool *handover)
> > +{
> > +     bool any_usable;
> > +     bool any_progress;
> > +
> > +     *next_seq = 0;
> > +     *handover = false;
> > +
> > +     do {
> > +             console_lock();
> > +             any_progress = console_flush_one_record(true, next_seq, handover, &any_usable);
> > +
> > +             if (*handover)
> > +                     return false;
> > +
> > +             __console_unlock();
> > +
> > +     } while (any_progress);
> > +
> > +     return any_usable;
> > +}
>
> This is yet another console_flush() API with a complicated semantic.
> (return value, parameters). It is not bad. But it would be nice
> if we did not need it ;-)

Yes, my approach/rationale was to replace the chain of calls from the
__console_flush_and_unlock call in legacy_kthread_func to functions
that don't require console_lock. Thus __console_flush_and_unlock
became __console_flush_unlocked, and console_flush_all became
console_flush_all_unlocked. I hoped I could then change existing users
of the locked versions over to the unlocked versions and perhaps
eliminate the locked versions. However, given that hasn't worked out,
open-coding as you suggest seems like a better approach.


>
> >  static void __console_flush_and_unlock(void)
> >  {
> >       bool do_cond_resched;
> > @@ -3346,6 +3386,17 @@ static void __console_flush_and_unlock(void)
> >       } while (prb_read_valid(prb, next_seq, NULL) && console_trylock());
> >  }
> >
> > +static void __console_flush_unlocked(void)
> > +{
> > +     bool handover;
> > +     bool flushed;
> > +     u64 next_seq;
> > +
> > +     do {
> > +             flushed = console_flush_all_unlocked(&next_seq, &handover);
> > +     } while (flushed && !handover && prb_read_valid(prb, next_seq, NULL));
> > +}
>
> The semantic of this function is not much clear from the name.
> IMHO, it would be a puzzle for people who would try to understand
> the code in the future.
>
> > +
> >  /**
> >   * console_unlock - unblock the legacy console subsystem from printing
> >   *
> > @@ -3676,8 +3727,7 @@ static int legacy_kthread_func(void *unused)
> >               if (kthread_should_stop())
> >                       break;
> >
> > -             console_lock();
> > -             __console_flush_and_unlock();
> > +             __console_flush_unlocked();
>
> IMHO, it would be pretty hard to understand what it does by trying to
> understand all the layers of the code.
>
> It might be better to open code it. And I would take inspiration
> in the nbcon_kthread_func().

Yes, I agree.


>
> I played with the code and came up with:
>
> static int legacy_kthread_func(void *unused)
> {
>         bool any_progress;
>
> wait_for_event:
>         wait_event_interruptible(legacy_wait, legacy_kthread_should_wakeup());
>
>         do {
>                 bool any_usable;
>                 bool handover;
>                 u64 next_seq;
>
>                 if (kthread_should_stop())
>                         return 0;

This changes the behaviour from the existing legacy_kthread_func. Thus
allowing the thread to exit mid way through printing remaining
records, whereas previously the whole set of unprinted records would
first be printed. But that's probably a good thing.


>
>                 console_lock();
>                 any_progress = console_flush_one_record(true, &next_seq, &handover, &any_usable);
>                 if (!handover)
>                         __console_unlock();
>
>                 /*
>                  * There is no need to check whether there is any usable
>                  * console in this context or whether there are pending
>                  * messages. It is checked by legacy_kthread_should_wakeup()
>                  * anyway.

Ah, I hadn't realised this, thus the prb_read_valid call in my loop
isn't required. Thanks.


> And console_lock() will never succeed again if
>                  * there was panic in progress.
>                  */
>         } while (any_progress);
>
>         goto wait_for_event;
> }
>
> What do you think, please?
>
> Best Regards,
> Petr

I'll respin this series.

Thanks,

Andrew Murray

Re: [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread

Posted by John Ogness 1 week, 3 days ago

On 2025-09-19, Andrew Murray <amurray@thegoodpenguin.co.uk> wrote:
>> I played with the code and came up with:
>>
>> static int legacy_kthread_func(void *unused)
>> {
>>         bool any_progress;
>>
>> wait_for_event:
>>         wait_event_interruptible(legacy_wait, legacy_kthread_should_wakeup());
>>
>>         do {
>>                 bool any_usable;
>>                 bool handover;
>>                 u64 next_seq;
>>
>>                 if (kthread_should_stop())
>>                         return 0;
>
> This changes the behaviour from the existing legacy_kthread_func. Thus
> allowing the thread to exit mid way through printing remaining
> records, whereas previously the whole set of unprinted records would
> first be printed. But that's probably a good thing.

It does not matter. kthread_should_stop() will only return true from
printk_kthreads_check_locked() when @have_legacy_console and
@have_boot_console are both false. That means that whatever legacy or
boot consoles there were, they are now unregistered, and were already
flushed from within their unregister_console_locked().

John

Re: [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread

Posted by Petr Mladek 1 week, 2 days ago

On Mon 2025-09-22 15:33:12, John Ogness wrote:
> On 2025-09-19, Andrew Murray <amurray@thegoodpenguin.co.uk> wrote:
> >> I played with the code and came up with:
> >>
> >> static int legacy_kthread_func(void *unused)
> >> {
> >>         bool any_progress;
> >>
> >> wait_for_event:
> >>         wait_event_interruptible(legacy_wait, legacy_kthread_should_wakeup());
> >>
> >>         do {
> >>                 bool any_usable;
> >>                 bool handover;
> >>                 u64 next_seq;
> >>
> >>                 if (kthread_should_stop())
> >>                         return 0;
> >
> > This changes the behaviour from the existing legacy_kthread_func. Thus
> > allowing the thread to exit mid way through printing remaining
> > records, whereas previously the whole set of unprinted records would
> > first be printed. But that's probably a good thing.

Good catch! I admit that I just copied this from nbcon_kthread_func()
and it looked reasonable.

> It does not matter. kthread_should_stop() will only return true from
> printk_kthreads_check_locked() when @have_legacy_console and
> @have_boot_console are both false. That means that whatever legacy or
> boot consoles there were, they are now unregistered, and were already
> flushed from within their unregister_console_locked().

Yup. I would keep it in the do/while loop to keep it consistent with
the nbcon kthread.

Best Regards,
Petr

Re: [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread

Posted by Andrew Murray 5 days, 3 hours ago

On Tue, 23 Sept 2025 at 15:22, Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2025-09-22 15:33:12, John Ogness wrote:
> > On 2025-09-19, Andrew Murray <amurray@thegoodpenguin.co.uk> wrote:
> > >> I played with the code and came up with:
> > >>
> > >> static int legacy_kthread_func(void *unused)
> > >> {
> > >>         bool any_progress;
> > >>
> > >> wait_for_event:
> > >>         wait_event_interruptible(legacy_wait, legacy_kthread_should_wakeup());
> > >>
> > >>         do {
> > >>                 bool any_usable;
> > >>                 bool handover;
> > >>                 u64 next_seq;
> > >>
> > >>                 if (kthread_should_stop())
> > >>                         return 0;
> > >
> > > This changes the behaviour from the existing legacy_kthread_func. Thus
> > > allowing the thread to exit mid way through printing remaining
> > > records, whereas previously the whole set of unprinted records would
> > > first be printed. But that's probably a good thing.
>
> Good catch! I admit that I just copied this from nbcon_kthread_func()
> and it looked reasonable.
>
> > It does not matter. kthread_should_stop() will only return true from
> > printk_kthreads_check_locked() when @have_legacy_console and
> > @have_boot_console are both false. That means that whatever legacy or
> > boot consoles there were, they are now unregistered, and were already
> > flushed from within their unregister_console_locked().
>
> Yup. I would keep it in the do/while loop to keep it consistent with
> the nbcon kthread.

Thanks for the feedback, the only change I've made in my v2 from
Petr's code is to set handover to false in the loop. Otherwise if
there are no usable consoles, the then uninitialised variable may lead
to incorrectly calling or not calling __console_unlock.

Thanks,

Andrew Murray