[v7] wire up write_atomic() printing

[PATCH printk v7 30/35] printk: Add helper for flush type logic

Posted by John Ogness 1 year, 6 months ago

There are many call sites where console flushing occur.
Depending on the system state and types of consoles, the flush
methods to use are different. A flush call site generally must
consider:

	@have_boot_console
	@have_nbcon_console
	@have_legacy_console
	@legacy_allow_panic_sync
	is_printk_preferred()

and take into account the current CPU state:

	NBCON_PRIO_NORMAL
	NBCON_PRIO_EMERGENCY
	NBCON_PRIO_PANIC

in order to decide if it should:

	flush nbcon directly via atomic_write() callback
	flush legacy directly via console_unlock
	flush legacy via offload to irq_work

All of these call sites use their own logic to make this
decision, which is complicated and error prone. Especially
later when two more flush methods will be introduced:

	flush nbcon via offload to kthread
	flush legacy via offload to kthread

Introduce a new internal struct console_flush_type that
specifies the flush method(s) that are available for a
particular call site to use.

Introduce a helper function to fill out console_flush_type to
be used for flushing call sites.

In many system states it is acceptable to flush legacy directly
via console_unlock or via offload to irq_work. The caller must
decide which of these methods it prefers.

Replace the logic of all flushing call sites to use the new
helper.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/internal.h | 65 +++++++++++++++++++++++++++++++++++++++
 kernel/printk/nbcon.c    | 25 ++++++++++++---
 kernel/printk/printk.c   | 66 ++++++++++++++++++----------------------
 3 files changed, 114 insertions(+), 42 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index dbda90f0dc08..58eb2a58b07b 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -48,6 +48,7 @@ struct printk_ringbuffer;
 struct dev_printk_info;
 
 extern struct printk_ringbuffer *prb;
+extern bool legacy_allow_panic_sync;
 
 __printf(4, 0)
 int vprintk_store(int facility, int level,
@@ -156,6 +157,70 @@ static inline bool console_is_usable(struct console *con, short flags) { return
 #endif /* CONFIG_PRINTK */
 
 extern bool have_boot_console;
+extern bool have_nbcon_console;
+extern bool have_legacy_console;
+
+/**
+ * struct console_flush_type - Define available console flush methods
+ * @nbcon_atomic:	Flush directly using nbcon_atomic() callback
+ * @legacy_direct:	Call the legacy loop in this context
+ * @legacy_offload:	Offload the legacy loop into IRQ
+ *
+ * Note that the legacy loop also flushes the nbcon consoles.
+ */
+struct console_flush_type {
+	bool	nbcon_atomic;
+	bool	legacy_direct;
+	bool	legacy_offload;
+};
+
+/*
+ * Identify which console flushing methods are available to the context of
+ * the caller. The caller can then decide which of the available flushing
+ * methods it will use.
+ */
+static inline void printk_get_console_flush_type(struct console_flush_type *ft)
+{
+	memset(ft, 0, sizeof(*ft));
+
+	switch (nbcon_get_default_prio()) {
+	case NBCON_PRIO_NORMAL:
+		if (have_nbcon_console && !have_boot_console)
+			ft->nbcon_atomic = true;
+
+		if (have_legacy_console || have_boot_console) {
+			ft->legacy_offload = true;
+			if (!is_printk_legacy_deferred())
+				ft->legacy_direct = true;
+		}
+		break;
+
+	case NBCON_PRIO_PANIC:
+		/*
+		 * In panic, the nbcon consoles will directly print. But
+		 * only allowed if there are no boot consoles.
+		 */
+		if (have_nbcon_console && !have_boot_console)
+			ft->nbcon_atomic = true;
+
+		/*
+		 * In panic, if nbcon atomic printing occurs, the legacy
+		 * consoles must remain silent until explicitly allowed.
+		 * Also, legacy consoles cannot print when deferred. However,
+		 * console_flush_on_panic() will flush them anyway, even if
+		 * unsafe.
+		 */
+		if ((legacy_allow_panic_sync || !ft->nbcon_atomic) &&
+		    !is_printk_legacy_deferred()) {
+			ft->legacy_direct = true;
+		}
+		break;
+
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+}
 
 extern struct printk_buffers printk_shared_pbufs;
 
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index afdb16c1c733..9e13327b4fe3 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1153,6 +1153,7 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
 static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
 					   bool allow_unsafe_takeover)
 {
+	struct console_flush_type ft;
 	unsigned long flags;
 	int err;
 
@@ -1186,8 +1187,16 @@ static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
 	 * other context that will do it.
 	 */
 	if (prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
-		stop_seq = prb_next_reserve_seq(prb);
-		goto again;
+		printk_get_console_flush_type(&ft);
+		/*
+		 * If nbcon_atomic flushing is not available, this printing
+		 * must be occurring via the legacy loop. Let that loop be
+		 * responsible for flushing the new records.
+		 */
+		if (ft.nbcon_atomic) {
+			stop_seq = prb_next_reserve_seq(prb);
+			goto again;
+		}
 	}
 }
 
@@ -1344,6 +1353,7 @@ EXPORT_SYMBOL_GPL(nbcon_device_try_acquire);
 void nbcon_device_release(struct console *con)
 {
 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(con, nbcon_device_ctxt);
+	struct console_flush_type ft;
 	int cookie;
 
 	if (!nbcon_context_exit_unsafe(ctxt))
@@ -1359,12 +1369,17 @@ void nbcon_device_release(struct console *con)
 	cookie = console_srcu_read_lock();
 	if (console_is_usable(con, console_srcu_read_flags(con)) &&
 	    prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
-		if (!have_boot_console) {
+		/*
+		 * If nbcon_atomic flushing is not available, fallback to
+		 * using the legacy loop.
+		 */
+		printk_get_console_flush_type(&ft);
+		if (ft.nbcon_atomic) {
 			__nbcon_atomic_flush_pending_con(con, prb_next_reserve_seq(prb), false);
-		} else if (!is_printk_legacy_deferred()) {
+		} else if (ft.legacy_direct) {
 			if (console_trylock())
 				console_unlock();
-		} else {
+		} else if (ft.legacy_offload) {
 			printk_trigger_flush();
 		}
 	}
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 480c0993abd5..6bed91b5a55b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -468,14 +468,14 @@ static DEFINE_MUTEX(syslog_lock);
  * present, it is necessary to perform the console lock/unlock dance
  * whenever console flushing should occur.
  */
-static bool have_legacy_console;
+bool have_legacy_console;
 
 /*
  * Specifies if an nbcon console is registered. If nbcon consoles are present,
  * synchronous printing of legacy consoles will not occur during panic until
  * the backtrace has been stored to the ringbuffer.
  */
-static bool have_nbcon_console;
+bool have_nbcon_console;
 
 /*
  * Specifies if a boot console is registered. If boot consoles are present,
@@ -485,14 +485,6 @@ static bool have_nbcon_console;
  */
 bool have_boot_console;
 
-/*
- * Specifies if the console lock/unlock dance is needed for console
- * printing. If @have_boot_console is true, the nbcon consoles will
- * be printed serially along with the legacy consoles because nbcon
- * consoles cannot print simultaneously with boot consoles.
- */
-#define printing_via_unlock (have_legacy_console || have_boot_console)
-
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* All 3 protected by @syslog_lock. */
@@ -2332,7 +2324,7 @@ int vprintk_store(int facility, int level,
 	return ret;
 }
 
-static bool legacy_allow_panic_sync;
+bool legacy_allow_panic_sync;
 
 /*
  * This acts as a one-way switch to allow legacy consoles to print from
@@ -2341,9 +2333,12 @@ static bool legacy_allow_panic_sync;
  */
 void printk_legacy_allow_panic_sync(void)
 {
+	struct console_flush_type ft;
+
 	legacy_allow_panic_sync = true;
 
-	if (printing_via_unlock && !is_printk_legacy_deferred()) {
+	printk_get_console_flush_type(&ft);
+	if (ft.legacy_direct) {
 		if (console_trylock())
 			console_unlock();
 	}
@@ -2353,8 +2348,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 			    const struct dev_printk_info *dev_info,
 			    const char *fmt, va_list args)
 {
-	bool do_trylock_unlock = printing_via_unlock;
-	bool defer_legacy = !do_trylock_unlock;
+	struct console_flush_type ft;
+	bool do_trylock_unlock;
 	int printed_len;
 
 	/* Suppress unimportant messages after panic happens */
@@ -2369,34 +2364,23 @@ asmlinkage int vprintk_emit(int facility, int level,
 	if (other_cpu_in_panic())
 		return 0;
 
+	printk_get_console_flush_type(&ft);
+
 	/* If called from the scheduler, we can not call up(). */
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
 		do_trylock_unlock = false;
-		defer_legacy = true;
+	} else {
+		do_trylock_unlock = ft.legacy_direct;
 	}
 
 	printk_delay(level);
 
 	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
 
-	if (have_nbcon_console && !have_boot_console) {
+	if (ft.nbcon_atomic)
 		nbcon_atomic_flush_pending();
 
-		/*
-		 * In panic, the legacy consoles are not allowed to print from
-		 * the printk calling context unless explicitly allowed. This
-		 * gives the safe nbcon consoles a chance to print out all the
-		 * panic messages first. This restriction only applies if
-		 * there are nbcon consoles registered and they are allowed to
-		 * flush.
-		 */
-		if (this_cpu_in_panic() && !legacy_allow_panic_sync) {
-			do_trylock_unlock = false;
-			defer_legacy = false;
-		}
-	}
-
 	if (do_trylock_unlock) {
 		/*
 		 * The caller may be holding system-critical or
@@ -2417,7 +2401,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 		preempt_enable();
 	}
 
-	if (defer_legacy)
+	if (!do_trylock_unlock && ft.legacy_offload)
 		defer_console_output();
 	else
 		wake_up_klogd();
@@ -2776,10 +2760,14 @@ void resume_console(void)
  */
 static int console_cpu_notify(unsigned int cpu)
 {
-	if (!cpuhp_tasks_frozen && printing_via_unlock) {
-		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
+	struct console_flush_type ft;
+
+	if (!cpuhp_tasks_frozen) {
+		printk_get_console_flush_type(&ft);
+		if (ft.legacy_direct) {
+			if (console_trylock())
+				console_unlock();
+		}
 	}
 	return 0;
 }
@@ -3304,6 +3292,7 @@ static void __console_rewind_all(void)
  */
 void console_flush_on_panic(enum con_flush_mode mode)
 {
+	struct console_flush_type ft;
 	bool handover;
 	u64 next_seq;
 
@@ -3327,7 +3316,8 @@ void console_flush_on_panic(enum con_flush_mode mode)
 	if (mode == CONSOLE_REPLAY_ALL)
 		__console_rewind_all();
 
-	if (!have_boot_console)
+	printk_get_console_flush_type(&ft);
+	if (ft.nbcon_atomic)
 		nbcon_atomic_flush_pending();
 
 	/* Flush legacy consoles once allowed, even when dangerous. */
@@ -3991,6 +3981,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 {
 	unsigned long timeout_jiffies = msecs_to_jiffies(timeout_ms);
 	unsigned long remaining_jiffies = timeout_jiffies;
+	struct console_flush_type ft;
 	struct console *c;
 	u64 last_diff = 0;
 	u64 printk_seq;
@@ -4004,7 +3995,8 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 	seq = prb_next_reserve_seq(prb);
 
 	/* Flush the consoles so that records up to @seq are printed. */
-	if (printing_via_unlock) {
+	printk_get_console_flush_type(&ft);
+	if (ft.legacy_direct) {
 		console_lock();
 		console_unlock();
 	}
-- 
2.39.2

Re: [PATCH printk v7 30/35] printk: Add helper for flush type logic

Posted by Petr Mladek 1 year, 6 months ago

Hi,

I have spent a lot of time on this patch. I have got lost and rewrote
the comments several times. I hope that they make sense after all.

On Sun 2024-08-04 02:57:33, John Ogness wrote:
> There are many call sites where console flushing occur.
> Depending on the system state and types of consoles, the flush
> methods to use are different. A flush call site generally must
> consider:
> 
> Introduce a new internal struct console_flush_type that
> specifies the flush method(s) that are available for a
> particular call site to use.
> 
> Introduce a helper function to fill out console_flush_type to
> be used for flushing call sites.
> 
> In many system states it is acceptable to flush legacy directly
> via console_unlock or via offload to irq_work. The caller must
> decide which of these methods it prefers.
> 
> Replace the logic of all flushing call sites to use the new
> helper.
> 
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -156,6 +157,70 @@ static inline bool console_is_usable(struct console *con, short flags) { return
>  #endif /* CONFIG_PRINTK */
>  
>  extern bool have_boot_console;
> +extern bool have_nbcon_console;
> +extern bool have_legacy_console;
> +
> +/**
> + * struct console_flush_type - Define available console flush methods
> + * @nbcon_atomic:	Flush directly using nbcon_atomic() callback
> + * @legacy_direct:	Call the legacy loop in this context
> + * @legacy_offload:	Offload the legacy loop into IRQ
> + *
> + * Note that the legacy loop also flushes the nbcon consoles.
> + */
> +struct console_flush_type {
> +	bool	nbcon_atomic;
> +	bool	legacy_direct;
> +	bool	legacy_offload;
> +};
> +
> +/*
> + * Identify which console flushing methods are available to the context of
> + * the caller. The caller can then decide which of the available flushing
> + * methods it will use.
> + */
> +static inline void printk_get_console_flush_type(struct console_flush_type *ft)
> +{
> +	memset(ft, 0, sizeof(*ft));
> +
> +	switch (nbcon_get_default_prio()) {
> +	case NBCON_PRIO_NORMAL:
> +		if (have_nbcon_console && !have_boot_console)
> +			ft->nbcon_atomic = true;
> +
> +		if (have_legacy_console || have_boot_console) {
> +			ft->legacy_offload = true;
> +			if (!is_printk_legacy_deferred())
> +				ft->legacy_direct = true;
> +		}

I would suggest to change the semantic and set the _preferred_
flush method instead of an _available_ one. Something like:

		/* Legacy consoles are flushed directly when possible. */
		if (have_legacy_console || have_boot_console) {
			if (!is_printk_legacy_deferred()) {
				ft->legacy_direct = true;
			} else {
				ft->legacy_offload = true;
			}
		}


It would simplify vprintk_emit() in this patch. Also it might keep
nbcon_atomic_flush_pending_con() simple after adding the printk
thread. See below.

IMHO, it will not make any difference in other situations.

I personally prefer the "set preferred flush method" semantic. It
helps to make sure that all callers behave consistently. I mean that
this function would define the flush rules in "all" situations.

That said, I do not resist on it. The code looks reasonable also
with the current semantic. And it is rather easy to review
all callers. Also I might miss something. I still do not see the full
picture (patches adding the kthread).

> +		break;
> +
> +	case NBCON_PRIO_PANIC:
> +		/*
> +		 * In panic, the nbcon consoles will directly print. But
> +		 * only allowed if there are no boot consoles.
> +		 */
> +		if (have_nbcon_console && !have_boot_console)
> +			ft->nbcon_atomic = true;
> +
> +		/*
> +		 * In panic, if nbcon atomic printing occurs, the legacy
> +		 * consoles must remain silent until explicitly allowed.
> +		 * Also, legacy consoles cannot print when deferred. However,
> +		 * console_flush_on_panic() will flush them anyway, even if
> +		 * unsafe.
> +		 */
> +		if ((legacy_allow_panic_sync || !ft->nbcon_atomic) &&
> +		    !is_printk_legacy_deferred()) {
> +			ft->legacy_direct = true;

We should check here whether the legacy loop is needed at all.
I would write something like:

		if (have_legacy_console || have_boot_console) {
			/*
			 * Same decision as in PRIO_NORMAL. But we are interested
			 * only in wheter we could flush directly.
			 */
			if (!is_printk_legacy_deferred()) {
				ft->legacy_direct = true;
			}
			/* PRIO_PANIC specific part: Legacy consoles remain silent... */
			if (ft->nbcon_atomic && !legacy_allow_panic_sync)
				ft->legacy_direct = false;

> +		}
> +		break;
> +
> +	default:
> +		WARN_ON_ONCE(1);
> +		break;
> +	}
> +}
>  
>  extern struct printk_buffers printk_shared_pbufs;
>  
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1186,8 +1187,16 @@ static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
>  	 * other context that will do it.
>  	 */
>  	if (prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
> -		stop_seq = prb_next_reserve_seq(prb);
> -		goto again;
> +		printk_get_console_flush_type(&ft);
> +		/*
> +		 * If nbcon_atomic flushing is not available, this printing
> +		 * must be occurring via the legacy loop. Let that loop be
> +		 * responsible for flushing the new records.
> +		 */

The comment is a bit confusing. It is not clear who is responsible for
calling the legacy loop.

My understanding is that the caller of this function is responsible
for calling the legacy loop. At least this is the case now. The
function is called only from vprintk_emit() and
console_flush_on_panic() and they both call the legacy loop when
needed.

I would rather remove this comment. Instead, I would make it clear
in the function description that: "The caller is responsible for
flushing the console via the legacy loop when needed."

> +		if (ft.nbcon_atomic) {
> +			stop_seq = prb_next_reserve_seq(prb);
> +			goto again;
> +		}

BTW: I wonder how this code would look like after adding the printk
     threads. We should do "goto again" only when ft.nbcon_atomic
     is the preferred (only available) flush type for nbcon consoles.

     IMHO, it is another reason to change the semantic.

>  	}
>  }
>  
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2369,34 +2364,23 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	if (other_cpu_in_panic())
>  		return 0;
>  
> +	printk_get_console_flush_type(&ft);

It is a nice trick to call printk_get_console_flush_type() this early.
I allows to hack the result when processing the hacky LOGLEVEL_SCHED ;-)

> +
>  	/* If called from the scheduler, we can not call up(). */
>  	if (level == LOGLEVEL_SCHED) {
>  		level = LOGLEVEL_DEFAULT;
>  		do_trylock_unlock = false;
> -		defer_legacy = true;
> +	} else {
> +		do_trylock_unlock = ft.legacy_direct;
>  	}

We could hack the @ft structure directly here:

	if (level == LOGLEVEL_SCHED) {
		level = LOGLEVEL_DEFAULT;
		ft.legacy_offload |= ft.legacy_direct;
		ft.legacy_direct = false;
	}
>  
>  	printk_delay(level);
>  
>  	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
>  
> -	if (have_nbcon_console && !have_boot_console) {
> +	if (ft.nbcon_atomic)
>  		nbcon_atomic_flush_pending();
>  
> -		/*
> -		 * In panic, the legacy consoles are not allowed to print from
> -		 * the printk calling context unless explicitly allowed. This
> -		 * gives the safe nbcon consoles a chance to print out all the
> -		 * panic messages first. This restriction only applies if
> -		 * there are nbcon consoles registered and they are allowed to
> -		 * flush.
> -		 */
> -		if (this_cpu_in_panic() && !legacy_allow_panic_sync) {
> -			do_trylock_unlock = false;
> -			defer_legacy = false;
> -		}
> -	}
> -
>  	if (do_trylock_unlock) {

Then, with the "preferred flush method" semantic, we could use here:

	if (ft.legacy_direct) {

>  		/*
>  		 * The caller may be holding system-critical or
> @@ -2417,7 +2401,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>  		preempt_enable();
>  	}
>  
> -	if (defer_legacy)
> +	if (!do_trylock_unlock && ft.legacy_offload)

and here:

	if (ft.legacy_offload)

>  		defer_console_output();
>  	else
>  		wake_up_klogd();
> @@ -2776,10 +2760,14 @@ void resume_console(void)
>   */
>  static int console_cpu_notify(unsigned int cpu)
>  {
> -	if (!cpuhp_tasks_frozen && printing_via_unlock) {
> -		/* If trylock fails, someone else is doing the printing */
> -		if (console_trylock())
> -			console_unlock();
> +	struct console_flush_type ft;
> +
> +	if (!cpuhp_tasks_frozen) {
> +		printk_get_console_flush_type(&ft);
> +		if (ft.legacy_direct) {
> +			if (console_trylock())
> +				console_unlock();

Why do we actually call only the legacy loop here?
IMHO, we should also do

	if (ft.nbcon_atomic)
 		nbcon_atomic_flush_pending();


> +		}
>  	}
>  	return 0;
>  }
> @@ -3327,7 +3316,8 @@ void console_flush_on_panic(enum con_flush_mode mode)
>  	if (mode == CONSOLE_REPLAY_ALL)
>  		__console_rewind_all();
>  
> -	if (!have_boot_console)
> +	printk_get_console_flush_type(&ft);
> +	if (ft.nbcon_atomic)
>  		nbcon_atomic_flush_pending();

I would use "ft.legacy_direct" also below for the decision about
the legacy loop:

-	if (legacy_allow_panic_sync)
+	if (ft.legacy_direct)
		console_flush_all(false, &next_seq, &handover);

>  
>  	/* Flush legacy consoles once allowed, even when dangerous. */

Best Regards,
Petr

Re: [PATCH printk v7 30/35] printk: Add helper for flush type logic

Posted by John Ogness 1 year, 6 months ago

On 2024-08-07, Petr Mladek <pmladek@suse.com> wrote:
> I would suggest to change the semantic and set the _preferred_
> flush method instead of an _available_ one.

I will need to evaluate this closely. I worry that the caller needs to
understands how the helper function is choosing the preference. For
example, at the end you make a suggestion that is broken with this
suggested change.

>> +		if (ft.nbcon_atomic) {
>> +			stop_seq = prb_next_reserve_seq(prb);
>> +			goto again;
>> +		}
>
> BTW: I wonder how this code would look like after adding the printk
>      threads. We should do "goto again" only when ft.nbcon_atomic
>      is the preferred (only available) flush type for nbcon consoles.

	if (ft.nbcon_offload) {
		...
	} else if (ft.nbcon_atomic) {
		...
	}

>      IMHO, it is another reason to change the semantic.

The caller does not need to rely on the helper "choosing" the right
one. I understand your point that: It is easier for the caller when we
can blindly rely on the helper to choose for us. But I worry that if we
ever adjust the helper, we might break various call sites that blindly
rely on the helper making a certain choice. If the helper's job is only
to say what is possible, then I would worry less for the future when we
may need to adjust the helper.

>> +	printk_get_console_flush_type(&ft);
>
> It is a nice trick to call printk_get_console_flush_type() this early.
> I allows to hack the result when processing the hacky LOGLEVEL_SCHED ;-)
>
>> +
>>  	/* If called from the scheduler, we can not call up(). */
>>  	if (level == LOGLEVEL_SCHED) {
>>  		level = LOGLEVEL_DEFAULT;
>>  		do_trylock_unlock = false;
>> -		defer_legacy = true;
>> +	} else {
>> +		do_trylock_unlock = ft.legacy_direct;
>>  	}
>
> We could hack the @ft structure directly here:
>
> 	if (level == LOGLEVEL_SCHED) {
> 		level = LOGLEVEL_DEFAULT;
> 		ft.legacy_offload |= ft.legacy_direct;
> 		ft.legacy_direct = false;
> 	}

The hack seems a bit complicated to me. Especially when the helper is
choosing preferred methods. I will think about it.

>> +	if (!cpuhp_tasks_frozen) {
>> +		printk_get_console_flush_type(&ft);
>> +		if (ft.legacy_direct) {
>> +			if (console_trylock())
>> +				console_unlock();
>
> Why do we actually call only the legacy loop here?
> IMHO, we should also do
>
> 	if (ft.nbcon_atomic)
>  		nbcon_atomic_flush_pending();

Atomic consoles do not care if a CPU was online or not. I can add this,
but I expect there is nothing for the atomic console to flush. And when
threading is added, we would need the extra code to avoid atomic
flushing:

	if (!ft.nbcon_offload && ft.nbcon_atomic)                
		nbcon_atomic_flush_pending();

>> @@ -3327,7 +3316,8 @@ void console_flush_on_panic(enum con_flush_mode mode)
>>  	if (mode == CONSOLE_REPLAY_ALL)
>>  		__console_rewind_all();
>>  
>> -	if (!have_boot_console)
>> +	printk_get_console_flush_type(&ft);
>> +	if (ft.nbcon_atomic)
>>  		nbcon_atomic_flush_pending();
>
> I would use "ft.legacy_direct" also below for the decision about
> the legacy loop:
>
> -	if (legacy_allow_panic_sync)
> +	if (ft.legacy_direct)
> 		console_flush_all(false, &next_seq, &handover);

No, because it would mean the console is not flushed if the CPU is in
the deferred state. That is why I added an extra comment in the helper
saying that console_flush_on_panic() will _always_ flush directly.

I thought about adding that extra logic into the helper, but it really
isn't possible. @legacy_allow_panic_sync does not matter if there are no
nbcon consoles. So somehow the helper would need to know that CPU is in
the deferred state, but now it is allowed to do direct printing.

So it seemed more straight forward to have console_flush_on_panic() not
care about what is allowed (for legacy). It is going to flush directly
no matter what.

I will reconsider your suggestions about the helper and also compare the
end result at the call sites (also with threading changes applied) to
see what looks simpler to maintain.

John

Re: [PATCH printk v7 30/35] printk: Add helper for flush type logic

Posted by Petr Mladek 1 year, 6 months ago

On Wed 2024-08-07 16:17:51, John Ogness wrote:
> On 2024-08-07, Petr Mladek <pmladek@suse.com> wrote:
> > I would suggest to change the semantic and set the _preferred_
> > flush method instead of an _available_ one.
> 
> I will need to evaluate this closely. I worry that the caller needs to
> understands how the helper function is choosing the preference. For
> example, at the end you make a suggestion that is broken with this
> suggested change.

I see. Well, console_flush_on_panic() is special. It is the last
resort. It actually ignores even the "what is available/allowed"
semantic.

> >> +		if (ft.nbcon_atomic) {
> >> +			stop_seq = prb_next_reserve_seq(prb);
> >> +			goto again;
> >> +		}
> >
> > BTW: I wonder how this code would look like after adding the printk
> >      threads. We should do "goto again" only when ft.nbcon_atomic
> >      is the preferred (only available) flush type for nbcon consoles.
> 
> 	if (ft.nbcon_offload) {
> 		...
> 	} else if (ft.nbcon_atomic) {
> 		...
> 	}
> 
> >      IMHO, it is another reason to change the semantic.
> 
> The caller does not need to rely on the helper "choosing" the right
> one. I understand your point that: It is easier for the caller when we
> can blindly rely on the helper to choose for us. But I worry that if we
> ever adjust the helper, we might break various call sites that blindly
> rely on the helper making a certain choice. If the helper's job is only
> to say what is possible, then I would worry less for the future when we
> may need to adjust the helper.

In the ideal world, the helper should tell the caller what has to be
done to flush pending messages on all consoles. The helper makes
the decision using the global variables where the variables define:

   + type of registered consoles
   + NBCON_PRIO
   + deferred context
   + possible or forced offload to kthreads

IMHO, it depends on how many callers are happy with the proposed
solution.

> >> +	printk_get_console_flush_type(&ft);
> >
> > It is a nice trick to call printk_get_console_flush_type() this early.
> > I allows to hack the result when processing the hacky LOGLEVEL_SCHED ;-)
> >
> >> +
> >>  	/* If called from the scheduler, we can not call up(). */
> >>  	if (level == LOGLEVEL_SCHED) {
> >>  		level = LOGLEVEL_DEFAULT;
> >>  		do_trylock_unlock = false;
> >> -		defer_legacy = true;
> >> +	} else {
> >> +		do_trylock_unlock = ft.legacy_direct;
> >>  	}
> >
> > We could hack the @ft structure directly here:
> >
> > 	if (level == LOGLEVEL_SCHED) {
> > 		level = LOGLEVEL_DEFAULT;
> > 		ft.legacy_offload |= ft.legacy_direct;
> > 		ft.legacy_direct = false;
> > 	}
> 
> The hack seems a bit complicated to me. Especially when the helper is
> choosing preferred methods. I will think about it.

The hack converts legacy_direct -> legacy_offload.

> >> +	if (!cpuhp_tasks_frozen) {
> >> +		printk_get_console_flush_type(&ft);
> >> +		if (ft.legacy_direct) {
> >> +			if (console_trylock())
> >> +				console_unlock();
> >
> > Why do we actually call only the legacy loop here?
> > IMHO, we should also do
> >
> > 	if (ft.nbcon_atomic)
> >  		nbcon_atomic_flush_pending();
> 
> Atomic consoles do not care if a CPU was online or not. I can add this,
> but I expect there is nothing for the atomic console to flush.

console_cpu_notify() has been added by the commit 034260d6779087431
("printk: fix delayed messages from CPU hotplug events"). It
is related to the check

    cpu_online(smp_processor_id()) == 0

which is still called in console_is_usable() even for nbcon consoles.

IMHO, it means that nbcon_atomic_flush_pending() might not be able
to flush the messages when called from vprintk_emit() on CPU which
is just being hot-plugged.


> And when
> threading is added, we would need the extra code to avoid atomic
> flushing:
> 
> 	if (!ft.nbcon_offload && ft.nbcon_atomic)                
> 		nbcon_atomic_flush_pending();

This extra change won't be needed when printk_get_console_flush_type(&ft)
uses the "set the preferred flush type" semantic ;-)


> >> @@ -3327,7 +3316,8 @@ void console_flush_on_panic(enum con_flush_mode mode)
> >>  	if (mode == CONSOLE_REPLAY_ALL)
> >>  		__console_rewind_all();
> >>  
> >> -	if (!have_boot_console)
> >> +	printk_get_console_flush_type(&ft);
> >> +	if (ft.nbcon_atomic)
> >>  		nbcon_atomic_flush_pending();
> >
> > I would use "ft.legacy_direct" also below for the decision about
> > the legacy loop:
> >
> > -	if (legacy_allow_panic_sync)
> > +	if (ft.legacy_direct)
> > 		console_flush_all(false, &next_seq, &handover);
> 
> No, because it would mean the console is not flushed if the CPU is in
> the deferred state. That is why I added an extra comment in the helper
> saying that console_flush_on_panic() will _always_ flush directly.
> 
> I thought about adding that extra logic into the helper, but it really
> isn't possible. @legacy_allow_panic_sync does not matter if there are no
> nbcon consoles. So somehow the helper would need to know that CPU is in
> the deferred state, but now it is allowed to do direct printing.
> 
> So it seemed more straight forward to have console_flush_on_panic() not
> care about what is allowed (for legacy). It is going to flush directly
> no matter what.

Makes sense. console_flush_on_panic() is special. It is the last
resort. And it does things which are not safe/allowed.

> I will reconsider your suggestions about the helper and also compare the
> end result at the call sites (also with threading changes applied) to
> see what looks simpler to maintain.

Sigh, this is a kind of "bike shedding" discussion. It makes some
sense as long as it helps to find bugs and simplify the logic.
Feel free to stop it when you think that it is not longer worth
the effort.

Best Regards,
Petr