[PATCH printk v7 24/35] printk: nbcon: Flush new records on device_release()

John Ogness posted 35 patches 1 year, 4 months ago
There is a newer version of this series
[PATCH printk v7 24/35] printk: nbcon: Flush new records on device_release()
Posted by John Ogness 1 year, 4 months ago
There may be new records that were added while a driver was
holding the nbcon context for non-printing purposes. These
new records must be flushed by the nbcon_device_release()
context because no other context will do it.

If boot consoles are registered, the legacy loop is used
(either direct or per irq_work) to handle the flushing.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/internal.h |  2 ++
 kernel/printk/nbcon.c    | 20 ++++++++++++++++++++
 kernel/printk/printk.c   |  2 +-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 84706c1c934b..7679e18f24b3 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -153,6 +153,8 @@ static inline bool console_is_usable(struct console *con, short flags) { return
 
 #endif /* CONFIG_PRINTK */
 
+extern bool have_boot_console;
+
 extern struct printk_buffers printk_shared_pbufs;
 
 /**
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index d09c084c9af4..269aeed18064 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1326,10 +1326,30 @@ EXPORT_SYMBOL_GPL(nbcon_device_try_acquire);
 void nbcon_device_release(struct console *con)
 {
 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(con, nbcon_device_ctxt);
+	int cookie;
 
 	if (!nbcon_context_exit_unsafe(ctxt))
 		return;
 
 	nbcon_context_release(ctxt);
+
+	/*
+	 * This context must flush any new records added while the console
+	 * was locked. The console_srcu_read_lock must be taken to ensure
+	 * the console is usable throughout flushing.
+	 */
+	cookie = console_srcu_read_lock();
+	if (console_is_usable(con, console_srcu_read_flags(con)) &&
+	    prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
+		if (!have_boot_console) {
+			__nbcon_atomic_flush_pending_con(con, prb_next_reserve_seq(prb));
+		} else if (!is_printk_legacy_deferred()) {
+			if (console_trylock())
+				console_unlock();
+		} else {
+			printk_trigger_flush();
+		}
+	}
+	console_srcu_read_unlock(cookie);
 }
 EXPORT_SYMBOL_GPL(nbcon_device_release);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5f5fc5df2f09..d54eae0d06e9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -469,7 +469,7 @@ static DEFINE_MUTEX(syslog_lock);
  * the console lock. This is because boot consoles and nbcon consoles may
  * have mapped the same hardware.
  */
-static bool have_boot_console;
+bool have_boot_console;
 
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
-- 
2.39.2
Re: [PATCH printk v7 24/35] printk: nbcon: Flush new records on device_release()
Posted by Petr Mladek 1 year, 4 months ago
On Sun 2024-08-04 02:57:27, John Ogness wrote:
> There may be new records that were added while a driver was
> holding the nbcon context for non-printing purposes. These
> new records must be flushed by the nbcon_device_release()
> context because no other context will do it.
> 
> If boot consoles are registered, the legacy loop is used
> (either direct or per irq_work) to handle the flushing.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

It makes some sense and seems to work. I do not know how to
make it better.

Reviewed-by: Petr Mladek <pmladek@suse.com>

But it makes me nervous a bit, see below:

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1326,10 +1326,30 @@ EXPORT_SYMBOL_GPL(nbcon_device_try_acquire);
>  void nbcon_device_release(struct console *con)
>  {
>  	struct nbcon_context *ctxt = &ACCESS_PRIVATE(con, nbcon_device_ctxt);
> +	int cookie;
>  
>  	if (!nbcon_context_exit_unsafe(ctxt))
>  		return;
>  
>  	nbcon_context_release(ctxt);
> +
> +	/*
> +	 * This context must flush any new records added while the console
> +	 * was locked. The console_srcu_read_lock must be taken to ensure
> +	 * the console is usable throughout flushing.
> +	 */
> +	cookie = console_srcu_read_lock();
> +	if (console_is_usable(con, console_srcu_read_flags(con)) &&
> +	    prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
> +		if (!have_boot_console) {
> +			__nbcon_atomic_flush_pending_con(con, prb_next_reserve_seq(prb));
> +		} else if (!is_printk_legacy_deferred()) {
> +			if (console_trylock())
> +				console_unlock();

nbcon_device_release() is going to be called in uart_port_unlock*()
still under the port->lock.

=> It smells with a potential deadlock. The console_flush_all() in
   console_unlock() might want to flush this console under the
   port->lock as well.

   And it almost happens because nbcon_legacy_emit_next_record()
   might eventually take con->device_lock() when called in
   a task context.

   It will not happen here because this code uses console_trylock()
   which would set @console_may_schedule to false.

Anyway, it would look more safe when the flush was done after releasing
the port->lock.

I still have to think about this.

> +		} else {
> +			printk_trigger_flush();
> +		}
> +	}
> +	console_srcu_read_unlock(cookie);
>  }
>  EXPORT_SYMBOL_GPL(nbcon_device_release);

Best Regards,
Petr
Re: [PATCH printk v7 24/35] printk: nbcon: Flush new records on device_release()
Posted by John Ogness 1 year, 4 months ago
On 2024-08-05, Petr Mladek <pmladek@suse.com> wrote:
>> +	/*
>> +	 * This context must flush any new records added while the console
>> +	 * was locked. The console_srcu_read_lock must be taken to ensure
>> +	 * the console is usable throughout flushing.
>> +	 */
>> +	cookie = console_srcu_read_lock();
>> +	if (console_is_usable(con, console_srcu_read_flags(con)) &&
>> +	    prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
>> +		if (!have_boot_console) {
>> +			__nbcon_atomic_flush_pending_con(con, prb_next_reserve_seq(prb));
>> +		} else if (!is_printk_legacy_deferred()) {
>> +			if (console_trylock())
>> +				console_unlock();
>
> nbcon_device_release() is going to be called in uart_port_unlock*()
> still under the port->lock.
>
> => It smells with a potential deadlock. The console_flush_all() in
>    console_unlock() might want to flush this console under the
>    port->lock as well.
>
>    And it almost happens because nbcon_legacy_emit_next_record()
>    might eventually take con->device_lock() when called in
>    a task context.
>
>    It will not happen here because this code uses console_trylock()
>    which would set @console_may_schedule to false.

Exactly. That is an important point. We must never try to invoke the
write_thread() callback while holding a spinlock.

> Anyway, it would look more safe when the flush was done after releasing
> the port->lock.

Even then we could never invoke the write_thread() callback because the
caller may be holding other spinlocks. If we cannot safely call
console_lock(), we cannot take the device lock. The atomic callback must
be used and that means the port lock cannot be involved in the
console_trylock().

John
Re: [PATCH printk v7 24/35] printk: nbcon: Flush new records on device_release()
Posted by Petr Mladek 1 year, 4 months ago
On Wed 2024-08-07 03:21:57, John Ogness wrote:
> On 2024-08-05, Petr Mladek <pmladek@suse.com> wrote:
> >> +	/*
> >> +	 * This context must flush any new records added while the console
> >> +	 * was locked. The console_srcu_read_lock must be taken to ensure
> >> +	 * the console is usable throughout flushing.
> >> +	 */
> >> +	cookie = console_srcu_read_lock();
> >> +	if (console_is_usable(con, console_srcu_read_flags(con)) &&
> >> +	    prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
> >> +		if (!have_boot_console) {
> >> +			__nbcon_atomic_flush_pending_con(con, prb_next_reserve_seq(prb));
> >> +		} else if (!is_printk_legacy_deferred()) {
> >> +			if (console_trylock())
> >> +				console_unlock();
> >
> > nbcon_device_release() is going to be called in uart_port_unlock*()
> > still under the port->lock.
> >
> > => It smells with a potential deadlock. The console_flush_all() in
> >    console_unlock() might want to flush this console under the
> >    port->lock as well.
> >
> >    And it almost happens because nbcon_legacy_emit_next_record()
> >    might eventually take con->device_lock() when called in
> >    a task context.
> >
> >    It will not happen here because this code uses console_trylock()
> >    which would set @console_may_schedule to false.
> 
> Exactly. That is an important point. We must never try to invoke the
> write_thread() callback while holding a spinlock.
> 
> > Anyway, it would look more safe when the flush was done after releasing
> > the port->lock.
> 
> Even then we could never invoke the write_thread() callback because the
> caller may be holding other spinlocks. If we cannot safely call
> console_lock(), we cannot take the device lock. The atomic callback must
> be used and that means the port lock cannot be involved in the
> console_trylock().

It makes sense. But it is not obvious.

I actually thought about using con->device_lock() lock around
nbcon_legacy_emit_next_record(). It would help to synchronize
the legacy loop against nbcon_device_lock()/nbcon_device_release().
It won't be needed to call the legacy loop in
nbcon_device_release().

But it was bad idea. There are more reasons to avoid taking
con->device_lock() after console_trylock():

  1. con->device_lock() might be a sleeping lock in the future. [*]
     It actually already is a sleeping lock in RT. And
     console_trylock() might be used in printk() in any context.

  2. The less locks we take the more safe printk() will be in
     various contexts and situations, especially in panic().


[*] I still have to wire this into my mental model. I keep forgetting it.

    It would be nice to have a documentation summarizing the main
    ideas and describing the printk() design. I hope that we will
    do it one day.

Best Regards,
Petr