[PATCH printk v2 11/18] printk: nbcon: Show replay message on takeover

John Ogness posted 18 patches 1 year, 8 months ago
There is a newer version of this series
[PATCH printk v2 11/18] printk: nbcon: Show replay message on takeover
Posted by John Ogness 1 year, 8 months ago
An emergency or panic context can takeover console ownership
while the current owner was printing a printk message. The
atomic printer will re-print the message that the previous
owner was printing. However, this can look confusing to the
user and may even seem as though a message was lost.

  [3430014.1
  [3430014.181123] usb 1-2: Product: USB Audio

Add a new field @nbcon_prev_seq to struct console to track
the sequence number to print that was assigned to the previous
console owner. If this matches the sequence number to print
that the current owner is assigned, then a takeover must have
occurred. In this case, print an additional message to inform
the user that the previous message is being printed again.

  [3430014.1
  ** replaying previous printk message **
  [3430014.181123] usb 1-2: Product: USB Audio

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 include/linux/console.h  |  3 +++
 kernel/printk/internal.h |  1 +
 kernel/printk/nbcon.c    | 24 ++++++++++++++++++++++++
 kernel/printk/printk.c   | 19 +++++++++++++++++++
 4 files changed, 47 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index adcc1a74aeee..9927f08ac054 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -325,6 +325,7 @@ struct nbcon_write_context {
  * @nbcon_state:	State for nbcon consoles
  * @nbcon_seq:		Sequence number of the next record for nbcon to print
  * @nbcon_device_ctxt:	Context available for non-printing operations
+ * @nbcon_prev_seq:	Seq num the previous nbcon owner was assigned to print
  * @pbufs:		Pointer to nbcon private buffer
  * @kthread:		Printer kthread for this console
  * @rcuwait:		RCU-safe wait object for @kthread waking
@@ -445,6 +446,8 @@ struct console {
 	atomic_t		__private nbcon_state;
 	atomic_long_t		__private nbcon_seq;
 	struct nbcon_context	__private nbcon_device_ctxt;
+	atomic_long_t           __private nbcon_prev_seq;
+
 	struct printk_buffers	*pbufs;
 	struct task_struct	*kthread;
 	struct rcuwait		rcuwait;
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 5d4722310850..9e027e08918d 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -233,4 +233,5 @@ bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
 
 #ifdef CONFIG_PRINTK
 void console_prepend_dropped(struct printk_message *pmsg, unsigned long dropped);
+void console_prepend_replay(struct printk_message *pmsg);
 #endif
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 480a0ced2708..8c92b076c745 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -866,6 +866,7 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_a
 	unsigned long con_dropped;
 	struct nbcon_state cur;
 	unsigned long dropped;
+	unsigned long ulseq;
 
 	/*
 	 * The printk buffers are filled within an unsafe section. This
@@ -891,6 +892,28 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_a
 	if (dropped && !is_extended)
 		console_prepend_dropped(&pmsg, dropped);
 
+	/*
+	 * If the previous owner was assigned the same record, this context
+	 * has taken over ownership and is replaying the record. Prepend a
+	 * message to let the user know the record is replayed.
+	 */
+	ulseq = atomic_long_read(&ACCESS_PRIVATE(con, nbcon_prev_seq));
+	if (__ulseq_to_u64seq(prb, ulseq) == pmsg.seq) {
+		console_prepend_replay(&pmsg);
+	} else {
+		/*
+		 * Ensure this context is still the owner before trying to
+		 * update @nbcon_prev_seq. Otherwise the value in @ulseq may
+		 * not be from the previous owner.
+		 */
+		nbcon_state_read(con, &cur);
+		if (!nbcon_context_can_proceed(ctxt, &cur))
+			return false;
+
+		atomic_long_try_cmpxchg(&ACCESS_PRIVATE(con, nbcon_prev_seq), &ulseq,
+					__u64seq_to_ulseq(pmsg.seq));
+	}
+
 	if (!nbcon_context_exit_unsafe(ctxt))
 		return false;
 
@@ -1631,6 +1654,7 @@ void nbcon_init(struct console *con, u64 init_seq)
 	rcuwait_init(&con->rcuwait);
 	init_irq_work(&con->irq_work, nbcon_irq_work);
 	nbcon_seq_force(con, init_seq);
+	atomic_long_set(&ACCESS_PRIVATE(con, nbcon_prev_seq), -1UL);
 	nbcon_state_set(con, &state);
 	nbcon_kthread_create(con);
 }
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fcbf794a0aaf..cf0d612329bf 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2867,6 +2867,25 @@ void console_prepend_dropped(struct printk_message *pmsg, unsigned long dropped)
 	__console_prepend_scratch(pmsg, len);
 }
 
+/*
+ * Prepend the message in @pmsg->pbufs->outbuf with a "replay message".
+ * @pmsg->outbuf_len is updated appropriately.
+ *
+ * @pmsg is the printk message to prepend.
+ */
+void console_prepend_replay(struct printk_message *pmsg)
+{
+	struct printk_buffers *pbufs = pmsg->pbufs;
+	const size_t scratchbuf_sz = sizeof(pbufs->scratchbuf);
+	char *scratchbuf = &pbufs->scratchbuf[0];
+	size_t len;
+
+	len = scnprintf(scratchbuf, scratchbuf_sz,
+			"** replaying previous printk message **\n");
+
+	__console_prepend_scratch(pmsg, len);
+}
+
 /*
  * Read and format the specified record (or a later record if the specified
  * record is not available).
-- 
2.39.2
Re: [PATCH printk v2 11/18] printk: nbcon: Show replay message on takeover
Posted by Petr Mladek 1 year, 7 months ago
On Tue 2024-06-04 01:30:46, John Ogness wrote:
> An emergency or panic context can takeover console ownership
> while the current owner was printing a printk message. The
> atomic printer will re-print the message that the previous
> owner was printing. However, this can look confusing to the
> user and may even seem as though a message was lost.
> 
>   [3430014.1
>   [3430014.181123] usb 1-2: Product: USB Audio
> 
> Add a new field @nbcon_prev_seq to struct console to track
> the sequence number to print that was assigned to the previous
> console owner. If this matches the sequence number to print
> that the current owner is assigned, then a takeover must have
> occurred. In this case, print an additional message to inform
> the user that the previous message is being printed again.
> 
>   [3430014.1
>   ** replaying previous printk message **
>   [3430014.181123] usb 1-2: Product: USB Audio
> 
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -891,6 +892,28 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_a
>  	if (dropped && !is_extended)
>  		console_prepend_dropped(&pmsg, dropped);
>  
> +	/*
> +	 * If the previous owner was assigned the same record, this context
> +	 * has taken over ownership and is replaying the record. Prepend a
> +	 * message to let the user know the record is replayed.
> +	 */
> +	ulseq = atomic_long_read(&ACCESS_PRIVATE(con, nbcon_prev_seq));
> +	if (__ulseq_to_u64seq(prb, ulseq) == pmsg.seq) {
> +		console_prepend_replay(&pmsg);
> +	} else {
> +		/*
> +		 * Ensure this context is still the owner before trying to
> +		 * update @nbcon_prev_seq. Otherwise the value in @ulseq may
> +		 * not be from the previous owner.
> +		 */
> +		nbcon_state_read(con, &cur);
> +		if (!nbcon_context_can_proceed(ctxt, &cur))
> +			return false;

I would like to better understand the motivation here.
I guess that this pattern might be repeated in the future.

For example, it might bring the question why we do not do
the same in nbcon_seq_try_update().

OK, this is called in context where the takeover is unsafe.
It could race only with nbcon_atomic_flush_unsafe()
which is called at the very end of panic().

So we have:

   + The race is a corner case.

   + It could happen only once beause non-panic() CPUs should
     not be able to acquire the console context any more.

   + The check reduces the race window but it does not prevent
     the race.

Do we care about these races?


BTW: In this particular case, the race looks innocent. It might
     happen only once. And the cmpxchg would fail in this case.


I would replace the check with a comment:

		/*
		 * Only nbcon_atomic_flush_unsafe() could take over this
		 * context. It would be the only context which could move
		 * forward from now on. The try_cmpxchg() would fail in
		 * this case. It should succeed and be safe in all other
		 * scenarios.
		 */
> +
> +		atomic_long_try_cmpxchg(&ACCESS_PRIVATE(con, nbcon_prev_seq), &ulseq,
> +					__u64seq_to_ulseq(pmsg.seq));
> +	}
> +
>  	if (!nbcon_context_exit_unsafe(ctxt))
>  		return false;
>  

Othrewise, it looks good.

Best Regards,
Petr