[PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation

John Ogness posted 19 patches 1 year, 4 months ago
There is a newer version of this series
[PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
Posted by John Ogness 1 year, 4 months ago
Once the kthread is running and available
(i.e. @printk_kthreads_running is set), the kthread becomes
responsible for flushing any pending messages which are added
in NBCON_PRIO_NORMAL context. Namely the legacy
console_flush_all() and device_release() no longer flush the
console. And nbcon_atomic_flush_pending() used by
nbcon_cpu_emergency_exit() no longer flushes messages added
after the emergency messages.

The console context is safe when used by the kthread only when
one of the following conditions are true:

  1. Other caller acquires the console context with
     NBCON_PRIO_NORMAL with preemption disabled. It will
     release the context before rescheduling.

  2. Other caller acquires the console context with
     NBCON_PRIO_NORMAL under the device_lock.

  3. The kthread is the only context which acquires the console
     with NBCON_PRIO_NORMAL.

This is satisfied for all atomic printing call sites:

nbcon_legacy_emit_next_record() (#1)

nbcon_atomic_flush_pending_con() (#1)

nbcon_device_release() (#2)

It is even double guaranteed when @printk_kthreads_running
is set because then _only_ the kthread will print for
NBCON_PRIO_NORMAL. (#3)

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/internal.h |  6 +++++-
 kernel/printk/nbcon.c    | 13 ++++++------
 kernel/printk/printk.c   | 46 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index bb02788acc7c..66321836c3fe 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -190,11 +190,13 @@ extern bool legacy_allow_panic_sync;
 /**
  * struct console_flush_type - Define how to flush the consoles
  * @nbcon_atomic:	Flush directly using nbcon_atomic() callback
+ * @nbcon_offload:	Offload flush to printer thread
  * @legacy_direct:	Call the legacy loop in this context
  * @legacy_offload:	Offload the legacy loop into IRQ
  */
 struct console_flush_type {
 	bool	nbcon_atomic;
+	bool	nbcon_offload;
 	bool	legacy_direct;
 	bool	legacy_offload;
 };
@@ -220,7 +222,9 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft,
 				ft->legacy_direct = true;
 		}
 
-		if (have_nbcon_console && !have_boot_console)
+		if (printk_kthreads_running)
+			ft->nbcon_offload = true;
+		else if (have_nbcon_console && !have_boot_console)
 			ft->nbcon_atomic = true;
 		break;
 
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 233ab8f90fef..8cf9e9e8c6e4 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1511,10 +1511,10 @@ static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
 
 	/*
 	 * If flushing was successful but more records are available, this
-	 * context must flush those remaining records because there is no
-	 * other context that will do it.
+	 * context must flush those remaining records if the printer thread
+	 * is not available do it.
 	 */
-	printk_get_console_flush_type(&ft, false);
+	printk_get_console_flush_type(&ft, true);
 	if (ft.nbcon_atomic &&
 	    prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
 		stop_seq = prb_next_reserve_seq(prb);
@@ -1809,10 +1809,11 @@ void nbcon_device_release(struct console *con)
 
 	/*
 	 * This context must flush any new records added while the console
-	 * was locked. The console_srcu_read_lock must be taken to ensure
-	 * the console is usable throughout flushing.
+	 * was locked if the printer thread is not available to do it. The
+	 * console_srcu_read_lock must be taken to ensure the console is
+	 * usable throughout flushing.
 	 */
-	printk_get_console_flush_type(&ft, false);
+	printk_get_console_flush_type(&ft, true);
 	cookie = console_srcu_read_lock();
 	if (ft.nbcon_atomic &&
 	    console_is_usable(con, console_srcu_read_flags(con), true) &&
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 71e946a8c5fa..620c02b069bc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2382,6 +2382,9 @@ asmlinkage int vprintk_emit(int facility, int level,
 	if (ft.nbcon_atomic)
 		nbcon_atomic_flush_pending();
 
+	if (ft.nbcon_offload)
+		nbcon_wake_kthreads();
+
 	if (!defer_legacy && ft.legacy_direct) {
 		/*
 		 * The caller may be holding system-critical or
@@ -2680,6 +2683,7 @@ void suspend_console(void)
 
 void resume_console(void)
 {
+	struct console_flush_type ft;
 	struct console *con;
 
 	if (!console_suspend_enabled)
@@ -2697,6 +2701,10 @@ void resume_console(void)
 	 */
 	synchronize_srcu(&console_srcu);
 
+	printk_get_console_flush_type(&ft, true);
+	if (ft.nbcon_offload)
+		nbcon_wake_kthreads();
+
 	pr_flush(1000, true);
 }
 
@@ -3007,6 +3015,7 @@ static inline void printk_kthreads_check_locked(void) { }
  */
 static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handover)
 {
+	struct console_flush_type ft;
 	bool any_usable = false;
 	struct console *con;
 	bool any_progress;
@@ -3018,12 +3027,21 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
 	do {
 		any_progress = false;
 
+		printk_get_console_flush_type(&ft, true);
+
 		cookie = console_srcu_read_lock();
 		for_each_console_srcu(con) {
 			short flags = console_srcu_read_flags(con);
 			u64 printk_seq;
 			bool progress;
 
+			/*
+			 * console_flush_all() is only for legacy consoles when
+			 * the nbcon consoles have their printer threads.
+			 */
+			if ((flags & CON_NBCON) && ft.nbcon_offload)
+				continue;
+
 			if (!console_is_usable(con, flags, !do_cond_resched))
 				continue;
 			any_usable = true;
@@ -3334,9 +3352,28 @@ EXPORT_SYMBOL(console_stop);
 
 void console_start(struct console *console)
 {
+	struct console_flush_type ft;
+	short flags;
+	int cookie;
+
 	console_list_lock();
 	console_srcu_write_flags(console, console->flags | CON_ENABLED);
 	console_list_unlock();
+
+	/*
+	 * Ensure that all SRCU list walks have completed. The related
+	 * printing context must be able to see it is enabled so that
+	 * it is guaranteed to wake up and resume printing.
+	 */
+	synchronize_srcu(&console_srcu);
+
+	printk_get_console_flush_type(&ft, true);
+	cookie = console_srcu_read_lock();
+	flags = console_srcu_read_flags(console);
+	if ((flags & CON_NBCON) && ft.nbcon_offload)
+		nbcon_kthread_wake(console);
+	console_srcu_read_unlock(cookie);
+
 	__pr_flush(console, 1000, true);
 }
 EXPORT_SYMBOL(console_start);
@@ -4102,8 +4139,10 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 			 * that they make forward progress, so only increment
 			 * @diff for usable consoles.
 			 */
-			if (!console_is_usable(c, flags, true))
+			if (!console_is_usable(c, flags, true) &&
+			    !console_is_usable(c, flags, false)) {
 				continue;
+			}
 
 			if (flags & CON_NBCON) {
 				printk_seq = nbcon_seq_read(c);
@@ -4605,8 +4644,13 @@ EXPORT_SYMBOL_GPL(kmsg_dump_rewind);
  */
 void console_try_replay_all(void)
 {
+	struct console_flush_type ft;
+
+	printk_get_console_flush_type(&ft, true);
 	if (console_trylock()) {
 		__console_rewind_all();
+		if (ft.nbcon_offload)
+			nbcon_wake_kthreads();
 		/* Consoles are flushed as part of console_unlock(). */
 		console_unlock();
 	}
-- 
2.39.2
Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
Posted by Petr Mladek 1 year, 4 months ago
On Mon 2024-07-22 19:25:31, John Ogness wrote:
> Once the kthread is running and available
> (i.e. @printk_kthreads_running is set), the kthread becomes
> responsible for flushing any pending messages which are added
> in NBCON_PRIO_NORMAL context. Namely the legacy
> console_flush_all() and device_release() no longer flush the
> console. And nbcon_atomic_flush_pending() used by
> nbcon_cpu_emergency_exit() no longer flushes messages added
> after the emergency messages.
> 
> The console context is safe when used by the kthread only when
> one of the following conditions are true:
> 
>   1. Other caller acquires the console context with
>      NBCON_PRIO_NORMAL with preemption disabled. It will
>      release the context before rescheduling.
> 
>   2. Other caller acquires the console context with
>      NBCON_PRIO_NORMAL under the device_lock.
> 
>   3. The kthread is the only context which acquires the console
>      with NBCON_PRIO_NORMAL.
> 
> This is satisfied for all atomic printing call sites:
> 
> nbcon_legacy_emit_next_record() (#1)
> 
> nbcon_atomic_flush_pending_con() (#1)
> 
> nbcon_device_release() (#2)
> 
> It is even double guaranteed when @printk_kthreads_running
> is set because then _only_ the kthread will print for
> NBCON_PRIO_NORMAL. (#3)
> 
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -4102,8 +4139,10 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
>  			 * that they make forward progress, so only increment
>  			 * @diff for usable consoles.
>  			 */
> -			if (!console_is_usable(c, flags, true))
> +			if (!console_is_usable(c, flags, true) &&
> +			    !console_is_usable(c, flags, false)) {

This looks weird. nbcon console can't make progress when
"write_atomic" is not implemented and the kthreads are not
running.

I should be:

			if (!((console_is_usable(c, flags, true)) ||
			      (console_is_usable(c, flags, false) && printk_kthreads_running))) {

That said. Do we really want to support nbcon consoles without
@write_atomic() callback?

It would make things easier when both callbacks are mandatory.

>  				continue;
> +			}
>  
>  			if (flags & CON_NBCON) {
>  				printk_seq = nbcon_seq_read(c);

Best Regards,
Petr
Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
Posted by John Ogness 1 year, 4 months ago
On 2024-07-31, Petr Mladek <pmladek@suse.com> wrote:
>> @@ -4102,8 +4139,10 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
>>  			 * that they make forward progress, so only increment
>>  			 * @diff for usable consoles.
>>  			 */
>> -			if (!console_is_usable(c, flags, true))
>> +			if (!console_is_usable(c, flags, true) &&
>> +			    !console_is_usable(c, flags, false)) {
>
> This looks weird. nbcon console can't make progress when
> "write_atomic" is not implemented and the kthreads are not
> running.
>
> I should be:
>
> 			if (!((console_is_usable(c, flags, true)) ||
> 			      (console_is_usable(c, flags, false) && printk_kthreads_running))) {

I would prefer to have the printk_kthreads_running check within
console_is_usable() for the !use_atomic case.

> That said. Do we really want to support nbcon consoles without
> @write_atomic() callback?

We must. Graphic consoles will not be able to implement
write_atomic(). Network and USB consoles probably will not be able to
implement it either.

John
Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
Posted by Petr Mladek 1 year, 4 months ago
On Wed 2024-07-31 17:31:02, John Ogness wrote:
> On 2024-07-31, Petr Mladek <pmladek@suse.com> wrote:
> >> @@ -4102,8 +4139,10 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
> >>  			 * that they make forward progress, so only increment
> >>  			 * @diff for usable consoles.
> >>  			 */
> >> -			if (!console_is_usable(c, flags, true))
> >> +			if (!console_is_usable(c, flags, true) &&
> >> +			    !console_is_usable(c, flags, false)) {
> >
> > This looks weird. nbcon console can't make progress when
> > "write_atomic" is not implemented and the kthreads are not
> > running.
> >
> > I should be:
> >
> > 			if (!((console_is_usable(c, flags, true)) ||
> > 			      (console_is_usable(c, flags, false) && printk_kthreads_running))) {
> 
> I would prefer to have the printk_kthreads_running check within
> console_is_usable() for the !use_atomic case.

Makes sense.

> > That said. Do we really want to support nbcon consoles without
> > @write_atomic() callback?
> 
> We must. Graphic consoles will not be able to implement
> write_atomic(). Network and USB consoles probably will not be able to
> implement it either.

I see.

Best Regards,
Petr
Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
Posted by Petr Mladek 1 year, 4 months ago
On Thu 2024-08-01 11:36:53, Petr Mladek wrote:
> On Wed 2024-07-31 17:31:02, John Ogness wrote:
> > On 2024-07-31, Petr Mladek <pmladek@suse.com> wrote:
> > >> @@ -4102,8 +4139,10 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
> > >>  			 * that they make forward progress, so only increment
> > >>  			 * @diff for usable consoles.
> > >>  			 */
> > >> -			if (!console_is_usable(c, flags, true))
> > >> +			if (!console_is_usable(c, flags, true) &&
> > >> +			    !console_is_usable(c, flags, false)) {
> > >
> > > This looks weird. nbcon console can't make progress when
> > > "write_atomic" is not implemented and the kthreads are not
> > > running.
> > >
> > > I should be:
> > >
> > > 			if (!((console_is_usable(c, flags, true)) ||
> > > 			      (console_is_usable(c, flags, false) && printk_kthreads_running))) {
> > 
> > I would prefer to have the printk_kthreads_running check within
> > console_is_usable() for the !use_atomic case.
> 
> Makes sense.

I have realized right after sending the reply that it actually did
not make sense.

We try to use con->write_thread also in

 + console_flush_all()
   + nbcon_legacy_emit_next_record()
     + nbcon_emit_one()
       + nbcon_emit_next_record()

when (@do_cond_resched == true)   =>   (@use_atomic == false)

=> __pr_flush() actually should be able to flush the messages using
   con->write_thread() even when the kthreads are not running.


Result:

IMHO, we should not check @printk_kthreads_running at all after all.

Sigh, it might make sense to document the design from a top level
view somewhere. We should do it at the end before we forget
the details.

Best Regards,
Petr
Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
Posted by kernel test robot 1 year, 4 months ago
Hi John,

kernel test robot noticed the following build errors:

[auto build test ERROR on b18703ea7157f62f02eb0ceb11f6fa0138e90adc]

url:    https://github.com/intel-lab-lkp/linux/commits/John-Ogness/printk-nbcon-Clarify-nbcon_get_default_prio-context/20240723-015154
base:   b18703ea7157f62f02eb0ceb11f6fa0138e90adc
patch link:    https://lore.kernel.org/r/20240722171939.3349410-12-john.ogness%40linutronix.de
patch subject: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
config: x86_64-buildonly-randconfig-002-20240723 (https://download.01.org/0day-ci/archive/20240723/202407231117.SngBfEcI-lkp@intel.com/config)
compiler: gcc-10 (Ubuntu 10.5.0-1ubuntu1) 10.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240723/202407231117.SngBfEcI-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407231117.SngBfEcI-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from kernel/printk/printk.c:62:
   kernel/printk/internal.h: In function 'printk_get_console_flush_type':
   kernel/printk/internal.h:219:26: error: implicit declaration of function 'is_printk_deferred'; did you mean '_printk_deferred'? [-Werror=implicit-function-declaration]
     219 |    if (prefer_offload || is_printk_deferred())
         |                          ^~~~~~~~~~~~~~~~~~
         |                          _printk_deferred
   kernel/printk/printk.c: In function 'resume_console':
>> kernel/printk/printk.c:2706:3: error: implicit declaration of function 'nbcon_wake_kthreads'; did you mean 'irq_wake_thread'? [-Werror=implicit-function-declaration]
    2706 |   nbcon_wake_kthreads();
         |   ^~~~~~~~~~~~~~~~~~~
         |   irq_wake_thread
   cc1: some warnings being treated as errors


vim +2706 kernel/printk/printk.c

  2683	
  2684	void resume_console(void)
  2685	{
  2686		struct console_flush_type ft;
  2687		struct console *con;
  2688	
  2689		if (!console_suspend_enabled)
  2690			return;
  2691	
  2692		console_list_lock();
  2693		for_each_console(con)
  2694			console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
  2695		console_list_unlock();
  2696	
  2697		/*
  2698		 * Ensure that all SRCU list walks have completed. All printing
  2699		 * contexts must be able to see they are no longer suspended so
  2700		 * that they are guaranteed to wake up and resume printing.
  2701		 */
  2702		synchronize_srcu(&console_srcu);
  2703	
  2704		printk_get_console_flush_type(&ft, true);
  2705		if (ft.nbcon_offload)
> 2706			nbcon_wake_kthreads();
  2707	
  2708		pr_flush(1000, true);
  2709	}
  2710	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
Posted by John Ogness 1 year, 4 months ago
On 2024-07-23, kernel test robot <lkp@intel.com> wrote:
>    kernel/printk/printk.c: In function 'resume_console':
>>> kernel/printk/printk.c:2706:3: error: implicit declaration of function 'nbcon_wake_kthreads'; did you mean 'irq_wake_thread'? [-Werror=implicit-function-declaration]
>     2706 |   nbcon_wake_kthreads();
>          |   ^~~~~~~~~~~~~~~~~~~
>          |   irq_wake_thread
>    cc1: some warnings being treated as errors

Thanks lkp.

For v4 I will fix this and also rename nbcon_wake_kthreads() to
nbcon_kthreads_wake() so that it matches the naming scheme:

nbcon_kthread_wake()
legacy_kthread_wake()

John Ogness
preffer_ofload param: was: Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
Posted by Petr Mladek 1 year, 4 months ago
On Mon 2024-07-22 19:25:31, John Ogness wrote:
> Once the kthread is running and available
> (i.e. @printk_kthreads_running is set), the kthread becomes
> responsible for flushing any pending messages which are added
> in NBCON_PRIO_NORMAL context. Namely the legacy
> console_flush_all() and device_release() no longer flush the
> console. And nbcon_atomic_flush_pending() used by
> nbcon_cpu_emergency_exit() no longer flushes messages added
> after the emergency messages.
> 
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -190,11 +190,13 @@ extern bool legacy_allow_panic_sync;
>  /**
>   * struct console_flush_type - Define how to flush the consoles
>   * @nbcon_atomic:	Flush directly using nbcon_atomic() callback
> + * @nbcon_offload:	Offload flush to printer thread
>   * @legacy_direct:	Call the legacy loop in this context
>   * @legacy_offload:	Offload the legacy loop into IRQ
>   */
>  struct console_flush_type {
>  	bool	nbcon_atomic;
> +	bool	nbcon_offload;
>  	bool	legacy_direct;
>  	bool	legacy_offload;
>  };
> @@ -220,7 +222,9 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft,
>  				ft->legacy_direct = true;
>  		}
>  
> -		if (have_nbcon_console && !have_boot_console)
> +		if (printk_kthreads_running)
> +			ft->nbcon_offload = true;
> +		else if (have_nbcon_console && !have_boot_console)
>  			ft->nbcon_atomic = true;
>  		break;
>  
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 233ab8f90fef..8cf9e9e8c6e4 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1511,10 +1511,10 @@ static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
>  
>  	/*
>  	 * If flushing was successful but more records are available, this
> -	 * context must flush those remaining records because there is no
> -	 * other context that will do it.
> +	 * context must flush those remaining records if the printer thread
> +	 * is not available do it.
>  	 */
> -	printk_get_console_flush_type(&ft, false);
> +	printk_get_console_flush_type(&ft, true);

Hmm, it is a bit weird that we change the value even though it does
not affect the behavior. The parameter @prefer_offload affects only
the legacy loop.

It is even more confusing because ft.legacy_offload and
ft.nbcon_offload have difference motivation:

  + legacy_offload is used when the direct flush is not allowed/possible
  + nbcon_offload is used when allowed/possible

I am not 100% sure why I added the @prefer_offload parameter. I think
that it was for situations when we wanted to call the legacy loop
independently on whether direct or offload was chosen. I think
that it was for __wake_up_klogd() called from
defer_console_output().

Anyway, this is cscope output in emacs after applying this patchset:

<paste>
Finding symbol: printk_get_console_flush_type

Database directory: /prace/kernel/linux-printk/
-------------------------------------------------------------------------------
*** kernel/printk/internal.h:
printk_get_console_flush_type[225] static inline void printk_get_console_flush_type(struct console_flush_type *ft, bool prefer_offload)

*** kernel/printk/nbcon.c:
nbcon_atomic_flush_pending_con[1548] printk_get_console_flush_type(&ft, true);
nbcon_device_release[1848]     printk_get_console_flush_type(&ft, true);

*** kernel/printk/printk.c:
printk_legacy_allow_panic_sync[2343] printk_get_console_flush_type(&ft, false);
vprintk_emit[2381]             printk_get_console_flush_type(&ft, false);
resume_console[2706]           printk_get_console_flush_type(&ft, true);
console_cpu_notify[2729]       printk_get_console_flush_type(&ft, false);
console_flush_all[3102]        printk_get_console_flush_type(&ft, true);
console_unlock[3223]           printk_get_console_flush_type(&ft, false);
console_flush_on_panic[3375]   printk_get_console_flush_type(&ft, false);
console_start[3453]            printk_get_console_flush_type(&ft, true);
legacy_kthread_should_wakeup[3484] printk_get_console_flush_type(&ft, true);
__pr_flush[4290]               printk_get_console_flush_type(&ft, false);
__pr_flush[4302]               printk_get_console_flush_type(&ft, false);
__wake_up_klogd[4466]          printk_get_console_flush_type(&ft, true);
console_try_replay_all[4851]   printk_get_console_flush_type(&ft, true);
-------------------------------------------------------------------------------
</paste>

Now, the parameter @prefer_offload makes a difference only
when it is set to "true" and "ft.legacy_offload" value is
later used. It reduces the above list to:

*** kernel/printk/printk.c:
resume_console[2706]           printk_get_console_flush_type(&ft, true);
console_start[3453]            printk_get_console_flush_type(&ft, true);
__wake_up_klogd[4466]          printk_get_console_flush_type(&ft, true);
console_try_replay_all[4851]   printk_get_console_flush_type(&ft, true);

IMHO, __wake_up_klogd() is the only location where we really need
to know if there are any messages for the legacy loop, for example,
when called from printk_deferred().

It should not be needed in other situations because it there
is always __pr_flush() or console_unlock() which would flush
the legacy consoles directly anyway.

=> I suggest to

1. Remove @prefer_offload parameter from printk_get_console_flush_type

2. Update __wake_up_klogd() to check both ft.legacy_offload and
   ft.legacy_direct, like:

	printk_get_console_flush_type(&ft);
	if (!ft.legacy_offload && !ft.legacy_direct)
		val &= ~PRINTK_PENDING_OUTPUT;


NOTE: I actually suggested to use in vprintk_emit():

	printk_get_console_flush_type(&ft, deffer_legacy);

      But it can be done even without this parameter. Like it
      is done in this version of the patchset.

Best Regards,
Petr
Re: preffer_ofload param: was: Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
Posted by John Ogness 1 year, 4 months ago
On 2024-07-31, Petr Mladek <pmladek@suse.com> wrote:
>> @@ -1511,10 +1511,10 @@ static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
>>  
>>  	/*
>>  	 * If flushing was successful but more records are available, this
>> -	 * context must flush those remaining records because there is no
>> -	 * other context that will do it.
>> +	 * context must flush those remaining records if the printer thread
>> +	 * is not available do it.
>>  	 */
>> -	printk_get_console_flush_type(&ft, false);
>> +	printk_get_console_flush_type(&ft, true);
>
> Hmm, it is a bit weird that we change the value even though it does
> not affect the behavior. The parameter @prefer_offload affects only
> the legacy loop.

For nbcon consoles, prefer_offload is really only annotating the
intentions. In this case, nbcon_atomic_flush_pending_con() is interested
in knowing if kthread printer is available, thus using
prefer_offload=true.

If the query yields ft.nbcon_atomic set, it means that the kthread
printer is _not_ available (nbcon_atomic and nbcon_offload are
exclusive) and it can (and must) handle its flushing responsibilities
directly (immediately, using the atomic callbacks).

You might ask, why does it not check ft.nbcon_offload? Although that
would tell it that the kthread is not available, it does not imply that
atomic printing is allowed. In order to see if atomic printing is
allowed, it would need to check ft.nbcon_atomic. And since the two are
exclusive, really it is enough just to check ft.nbcon_atomic. If
ft.nbcon_atomic is not set, either the kthread is available or there is
nothing the nbcon console can do about it anyway (for example, it must
rely on the legacy loop to handle it).

I suppose it is wrong that nbcon_atomic_flush_pending_con(false) will
set ft.nbcon_offload if the kthreads are available. I would fix that.

> IMHO, __wake_up_klogd() is the only location where we really need
> to know if there are any messages for the legacy loop, for example,
> when called from printk_deferred().
>
> It should not be needed in other situations because it there
> is always __pr_flush() or console_unlock() which would flush
> the legacy consoles directly anyway.
>
> => I suggest to
>
> 1. Remove @prefer_offload parameter from printk_get_console_flush_type
>
> 2. Update __wake_up_klogd() to check both ft.legacy_offload and
>    ft.legacy_direct, like:
>
> 	printk_get_console_flush_type(&ft);
> 	if (!ft.legacy_offload && !ft.legacy_direct)
> 		val &= ~PRINTK_PENDING_OUTPUT;
>
>
> NOTE: I actually suggested to use in vprintk_emit():
>
> 	printk_get_console_flush_type(&ft, deffer_legacy);
>
>       But it can be done even without this parameter. Like it
>       is done in this version of the patchset.

I understand what you are saying. But I don't like it. :-)

It would mean that functions only interested in offloading must check
direct. And that direct and offload are no longer exclusive. IMHO this
is a hack. The whole point of printk_get_console_flush_type() is so that
the flusher does not need any special code to figure out what to do.

If the flusher is only interested in offloaded flushing capabilities, it
should be able to query that. We could wrap things into macros to make
it clearer, but it might get a bit ugly with the efficience (depending
on how well compilers can optimize the macro usage):

#define is_nbcon_offload_available() ({			\
	struct console_flush_type ft;			\
	printk_get_console_flush_type(&ft, true);	\
	ft.nbcon_offload;				\
})

#define is_nbcon_atomic_available() ({			\
	struct console_flush_type ft;			\
	printk_get_console_flush_type(&ft, false);	\
	ft.nbcon_atomic;				\
})

And then this code looks like:

if (prb_read_valid(prb, nbcon_seq_read(con), NULL)) &&
    !is_nbcon_offload_available() &&
    is_nbcon_atomic_available()) {
	/* flush directly using atomic callback */
}

I have backported the printk_get_console_flush_type() macro to the
atomic series for v7. I would like to keep @prefer_offload and I will
try to add comments to clarify why the different query types are used.

John
Re: preffer_ofload param: was: Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
Posted by Petr Mladek 1 year, 4 months ago
On Thu 2024-08-01 16:28:28, John Ogness wrote:
> On 2024-07-31, Petr Mladek <pmladek@suse.com> wrote:
> >> @@ -1511,10 +1511,10 @@ static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
> >>  
> >>  	/*
> >>  	 * If flushing was successful but more records are available, this
> >> -	 * context must flush those remaining records because there is no
> >> -	 * other context that will do it.
> >> +	 * context must flush those remaining records if the printer thread
> >> +	 * is not available do it.
> >>  	 */
> >> -	printk_get_console_flush_type(&ft, false);
> >> +	printk_get_console_flush_type(&ft, true);
> >
> > Hmm, it is a bit weird that we change the value even though it does
> > not affect the behavior. The parameter @prefer_offload affects only
> > the legacy loop.
> 
> For nbcon consoles, prefer_offload is really only annotating the
> intentions. In this case, nbcon_atomic_flush_pending_con() is interested
> in knowing if kthread printer is available, thus using
> prefer_offload=true.
> 
> If the query yields ft.nbcon_atomic set, it means that the kthread
> printer is _not_ available (nbcon_atomic and nbcon_offload are
> exclusive) and it can (and must) handle its flushing responsibilities
> directly (immediately, using the atomic callbacks).
> 
> You might ask, why does it not check ft.nbcon_offload? Although that
> would tell it that the kthread is not available, it does not imply that
> atomic printing is allowed. In order to see if atomic printing is
> allowed, it would need to check ft.nbcon_atomic. And since the two are
> exclusive, really it is enough just to check ft.nbcon_atomic. If
> ft.nbcon_atomic is not set, either the kthread is available or there is
> nothing the nbcon console can do about it anyway (for example, it must
> rely on the legacy loop to handle it).

Where exactly do you need prefer offload of the legacy consoles?

Do need to "prefer offload" or "force offload" in these situations?

Note: We are talking about "legacy consoles" and "legacy approach"
which is:

   Legacy approach: "Prefer direct flush when possible"

Also note that "force_offload" is usually detected automatically via
the context: is_printk_deferred() check.

IMHO, we need to explicitely "force_offload" only in printk_deferred()
where it is passed to vprintk_emit() via LOGLEVEL_SCHED.

IMHO, we could get rid of this hack and simply do something like:

  int vprintk_deferred(const char *fmt, va_list args)
{
	preemption_disable();
	printk_deferred_enter();

	return vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, fmt, args);

	printk_deferred_exit();
	preemption_enable();
}

> I suppose it is wrong that nbcon_atomic_flush_pending_con(false) will
> set ft.nbcon_offload if the kthreads are available. I would fix that.

This will not work in vprintk_emit(). We need to use there

   nbcon_atomic_flush_pending_con(false)

because legacy consoles should be handled directly when possible
(legacy approach).

But nbcon consoles should be offloaded to the kthread when possible
(new approach).

The new approach is acceptable "only" for nbcon consoles because
they are synchronized by the nbcon context. Namely, they allow
to take over the ownership and flush the messages directly
in emergency or panic context.

In terms of approach:

  nbcon approach: "Prefer offload when possible"

=> for nbcon consoles we would need "prefer_direct" or
   "force_direct" parameter.

> > IMHO, __wake_up_klogd() is the only location where we really need
> > to know if there are any messages for the legacy loop, for example,
> > when called from printk_deferred().
> >
> > It should not be needed in other situations because it there
> > is always __pr_flush() or console_unlock() which would flush
> > the legacy consoles directly anyway.
> >
> > => I suggest to
> >
> > 1. Remove @prefer_offload parameter from printk_get_console_flush_type
> >
> > 2. Update __wake_up_klogd() to check both ft.legacy_offload and
> >    ft.legacy_direct, like:
> >
> > 	printk_get_console_flush_type(&ft);
> > 	if (!ft.legacy_offload && !ft.legacy_direct)
> > 		val &= ~PRINTK_PENDING_OUTPUT;
> >
> >
> > NOTE: I actually suggested to use in vprintk_emit():
> >
> > 	printk_get_console_flush_type(&ft, deffer_legacy);
> >
> >       But it can be done even without this parameter. Like it
> >       is done in this version of the patchset.
> 
> I understand what you are saying. But I don't like it. :-)
> 
> It would mean that functions only interested in offloading must check
> direct. And that direct and offload are no longer exclusive. IMHO this
> is a hack. The whole point of printk_get_console_flush_type() is so that
> the flusher does not need any special code to figure out what to do.

I agree.

> If the flusher is only interested in offloaded flushing capabilities, it
> should be able to query that. We could wrap things into macros to make
> it clearer, but it might get a bit ugly with the efficience (depending
> on how well compilers can optimize the macro usage):
> 
> #define is_nbcon_offload_available() ({			\
> 	struct console_flush_type ft;			\
> 	printk_get_console_flush_type(&ft, true);	\
> 	ft.nbcon_offload;				\
> })
> 
> #define is_nbcon_atomic_available() ({			\
> 	struct console_flush_type ft;			\
> 	printk_get_console_flush_type(&ft, false);	\
> 	ft.nbcon_atomic;				\
> })
> 
> And then this code looks like:
> 
> if (prb_read_valid(prb, nbcon_seq_read(con), NULL)) &&
>     !is_nbcon_offload_available() &&
>     is_nbcon_atomic_available()) {
> 	/* flush directly using atomic callback */
> }

This is crazy. But where exactly do you need this?


> I have backported the printk_get_console_flush_type() macro to the
> atomic series for v7. I would like to keep @prefer_offload and I will
> try to add comments to clarify why the different query types are used.

Please, reconsider this.

I believe that the parameter "prefer_offload" adds more harm than good
because:

   + It is a non-sense for nbcon consoles. They always prefer offload
     except for emergency and panic. But emergency and panic is
     handled transparently by nbcon_get_default_prio().

   + It is confusing even for legacy consoles after introducing the
     kthread. There will be three types of offload:

	+ do console_lock()/unlock() in IRQ work
	+ wake kthread
	+ wake kthread in IRQ work

In fact, the meaning is rather "can_t_call_scheduler_code_a_safe_way".

Best Regards,
Petr
Re: preffer_ofload param: was: Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
Posted by John Ogness 1 year, 4 months ago
On 2024-08-01, Petr Mladek <pmladek@suse.com> wrote:
> I believe that the parameter "prefer_offload" adds more harm than good
> because:
>
>    + It is a non-sense for nbcon consoles. They always prefer offload
>      except for emergency and panic. But emergency and panic is
>      handled transparently by nbcon_get_default_prio().
>
>    + It is confusing even for legacy consoles after introducing the
>      kthread. There will be three types of offload:
>
> 	+ do console_lock()/unlock() in IRQ work
> 	+ wake kthread
> 	+ wake kthread in IRQ work

I think the confusion comes from my intention of the function. I wanted
a caller to use it as:

"Tell me how to flush."

This requires input from the caller to know some information about what
the caller's intentions are.

If I change the function so that a caller uses it as:

"Tell me what flush mechanisms are available to me."

Then the function does not need to know the caller's intentions. It only
needs to know the caller's state, and that information is readily
available via global/per-cpu variables.

I will drop the @prefer_offload argument, simplifying the function to
only provide a list of available flush options. The caller will then
decide itself which option it wants to use. I believe this aligns with
your intentions as well.

John
Re: preffer_ofload param: was: Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
Posted by Petr Mladek 1 year, 4 months ago
On Fri 2024-08-02 09:35:17, John Ogness wrote:
> On 2024-08-01, Petr Mladek <pmladek@suse.com> wrote:
> > I believe that the parameter "prefer_offload" adds more harm than good
> > because:
> >
> >    + It is a non-sense for nbcon consoles. They always prefer offload
> >      except for emergency and panic. But emergency and panic is
> >      handled transparently by nbcon_get_default_prio().
> >
> >    + It is confusing even for legacy consoles after introducing the
> >      kthread. There will be three types of offload:
> >
> > 	+ do console_lock()/unlock() in IRQ work
> > 	+ wake kthread
> > 	+ wake kthread in IRQ work
> 
> I think the confusion comes from my intention of the function. I wanted
> a caller to use it as:
> 
> "Tell me how to flush."

Yes, I understand it the same way.

> This requires input from the caller to know some information about what
> the caller's intentions are.

Does it really need it?
Where?

Please, show me two examples where the function is not able to make
the right decision by global variables?

I know about __wake_up_klogd(). And it is only because
printk_deferred() requires the offload via @level parameter
instead of the per-CPU @printk_context.

Is there any other?

Let me repeat:

    + The parameter makes a difference only when it is "true".
    + The parameter affects only legacy loop.

Let me check where the parameter is true in the current code:

 + nbcon_atomic_flush_pending_con[1548]
    + No effect! The function ignores legacy consoles.

 + nbcon_device_release[1848]
    + No effect! The function ignores legacy consoles.

 + resume_console[2706]
    + Wrong! The function should flush the legacy
	consoles directly by default. Otherwise, we would
	change the legacy behavior.

 + console_flush_all[3102]        printk_get_console_flush_type(&ft, true);
    + No effect! The function ignores legacy consoles.
    + Confusing! The function flushes legacy consoles directly!

  + console_start[3453]
    + Wrong! The function should flush the legacy
      consoles directly by default. Otherwise, we would
	change the legacy behavior.

  + legacy_kthread_should_wakeup[3484]
    + Not needed! The function is called only when
	force_legacy_kthread() == true.

  + __wake_up_klogd[4466]
    + This is the only location where it makes a difference.
    + We could simply check both legacy_direct && legacy_offload
      here.
    + Yes, it is a hack. But it is needed only because of
      printk_deferred() which is already hacky by using
      LOGLEVEL_SCHED.

  + console_try_replay_all[4851]
    + Wrong! It is called under console_lock() => the function will
	flush legacy consoles directly anyway.


> If I change the function so that a caller uses it as:
> 
> "Tell me what flush mechanisms are available to me."

No, this will make the code complicated again.

> Then the function does not need to know the caller's intentions. It only
> needs to know the caller's state, and that information is readily
> available via global/per-cpu variables.
>
> I will drop the @prefer_offload argument, simplifying the function to
> only provide a list of available flush options. The caller will then
> decide itself which option it wants to use. I believe this aligns with
> your intentions as well.

No. Please, keep the original meaning.

Best Regards,
Petr