[v2] Handle NBCON consoles on KDB

[PATCH v2 2/3] printk: nbcon: Introduce KDB helpers

Posted by Marcos Paulo de Souza 6 months ago

These helpers will be used when calling console->write_atomic on
KDB code in the next patch. It's basically the same implementaion
as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
acquiring the context.

For release, differently from nbcon_device_release, we don't need to
flush the console, since all CPUs are stopped when KDB is active.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
 include/linux/console.h |  6 ++++++
 kernel/printk/nbcon.c   | 26 ++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index 67af483574727c00eea1d5a1eacc994755c92607..b34c5a0b86303e2fb4583fa467d8be43761cf756 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -605,6 +605,9 @@ extern bool nbcon_can_proceed(struct nbcon_write_context *wctxt);
 extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt);
 extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt);
 extern void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt);
+extern bool nbcon_kdb_try_acquire(struct console *con,
+				  struct nbcon_write_context *wctxt);
+extern void nbcon_kdb_release(struct nbcon_write_context *wctxt);
 
 /*
  * Check if the given console is currently capable and allowed to print
@@ -654,6 +657,9 @@ static inline bool nbcon_can_proceed(struct nbcon_write_context *wctxt) { return
 static inline bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt) { return false; }
 static inline bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt) { return false; }
 static inline void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt) { }
+static inline bool nbcon_kdb_try_acquire(struct console *con,
+					 struct nbcon_write_context *wctxt) { return false; }
+static inline void nbcon_kdb_release(struct console *con) { }
 static inline bool console_is_usable(struct console *con, short flags,
 				     bool use_atomic) { return false; }
 #endif
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 646801813415f0abe40cabf2f28ca9e30664f028..79d8c7437806119ad9787ddc48382dc2c86c23c3 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1855,3 +1855,29 @@ void nbcon_device_release(struct console *con)
 	console_srcu_read_unlock(cookie);
 }
 EXPORT_SYMBOL_GPL(nbcon_device_release);
+
+bool nbcon_kdb_try_acquire(struct console *con,
+			   struct nbcon_write_context *wctxt)
+{
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+	memset(ctxt, 0, sizeof(*ctxt));
+	ctxt->console = con;
+	ctxt->prio    = NBCON_PRIO_EMERGENCY;
+
+	if (!nbcon_context_try_acquire(ctxt, false))
+		return false;
+
+	if (!nbcon_context_enter_unsafe(ctxt))
+		return false;
+
+	return true;
+}
+
+void nbcon_kdb_release(struct nbcon_write_context *wctxt)
+{
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+	nbcon_context_exit_unsafe(ctxt);
+	nbcon_context_release(ctxt);
+}

-- 
2.50.0

Re: [PATCH v2 2/3] printk: nbcon: Introduce KDB helpers

Posted by Petr Mladek 5 months, 2 weeks ago

On Mon 2025-08-11 10:32:46, Marcos Paulo de Souza wrote:
> These helpers will be used when calling console->write_atomic on
> KDB code in the next patch. It's basically the same implementaion
> as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
> acquiring the context.
> 
> For release, differently from nbcon_device_release, we don't need to
> flush the console, since all CPUs are stopped when KDB is active.

I thought this when we were discussion the code, especially the
comment in

static void kdb_msg_write(const char *msg, int msg_len)
{
[...]

	/*
	 * The console_srcu_read_lock() only provides safe console list
	 * traversal. The use of the ->write() callback relies on all other
	 * CPUs being stopped at the moment and console drivers being able to
	 * handle reentrance when @oops_in_progress is set.

But I see that kdb_msg_write() is called from vkdb_printf() and
there is the following synchronization:

int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
{
[...]

	/* Serialize kdb_printf if multiple cpus try to write at once.
	 * But if any cpu goes recursive in kdb, just print the output,
	 * even if it is interleaved with any other text.
	 */
	local_irq_save(flags);
	this_cpu = smp_processor_id();
	for (;;) {
		old_cpu = cmpxchg(&kdb_printf_cpu, -1, this_cpu);
		if (old_cpu == -1 || old_cpu == this_cpu)
			break;

		cpu_relax();
	}

It suggests that the code might be used when other CPUs are still
running.

And for example, kgdb_panic(buf) is called in vpanic() before
the other CPUs are stopped.

My opinion:

It might make sense to flush the console after all. But probably
only the particular console, see below.

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1855,3 +1855,29 @@ void nbcon_device_release(struct console *con)
>  	console_srcu_read_unlock(cookie);
>  }
>  EXPORT_SYMBOL_GPL(nbcon_device_release);
> +

The function must be called only in the very specific kdb
context, so it would deserve a comment. Inspired by
nbcon_device_try_acquire():

/**
 * nbcon_kdb_try_acquire - Try to acquire nbcon console, enter unsafe
 *				section, and initialized nbcon write context
 * @con:	The nbcon console to acquire
 * @wctxt:	The nbcon write context to be used on success
 *
 * Context:	Under console_srcu_read_lock() for emiting a single kdb message
 *		using the given con->write_atomic() callback. Can be called
 *		only when the console is usable at the moment.
 *
 * Return:	True if the console was acquired. False otherwise.
 *
 * kdb emits messages on consoles registered for printk() without
 * storing them into the ring buffer. It has to acquire the console
 * ownerhip so that is could call con->write_atomic() callback a safe way.
 *
 * This function acquires the nbcon console using priority NBCON_PRIO_EMERGENCY
 * and marks it unsafe for handover/takeover.
 */

> +bool nbcon_kdb_try_acquire(struct console *con,
> +			   struct nbcon_write_context *wctxt)
> +{
> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +
> +	memset(ctxt, 0, sizeof(*ctxt));
> +	ctxt->console = con;
> +	ctxt->prio    = NBCON_PRIO_EMERGENCY;
> +
> +	if (!nbcon_context_try_acquire(ctxt, false))
> +		return false;
> +
> +	if (!nbcon_context_enter_unsafe(ctxt))
> +		return false;
> +
> +	return true;
> +}
> +

It deserves a comment as well, see below:

> +void nbcon_kdb_release(struct nbcon_write_context *wctxt)
> +{
> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +
> +	nbcon_context_exit_unsafe(ctxt);
> +	nbcon_context_release(ctxt);

I agree with John that the _release() should be called only when
exit_unsafe() succeeded.

Also it might make sense to flush the console. I would do something
like:

/**
 * nbcon_kdb_release - Exit unsafe section and release the nbcon console
 *
 * @wctxt:	The nbcon write context intialized by a succesful
 *	nbcon_kdb_try_acquire()
 *
 * Context:	Under console_srcu_read_lock() for emiting a single kdb message
 *		using the given con->write_atomic() callback. Can be called
 *		only when the console is usable at the moment.
 */
void nbcon_kdb_release(struct nbcon_write_context *wctxt)
{
	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);

	nbcon_context_exit_unsafe(ctxt);
	nbcon_context_release(ctxt);

	if (!nbcon_context_exit_unsafe(ctxt))
		return;

	nbcon_context_release(ctxt);

	/*
	 * Flush any new printk() messages added when the console was blocked.
	 * Only the console used by the given write context was	blocked.
	 * The console was locked only when the write_atomic() callback
	 * was usable.
	 */
	__nbcon_atomic_flush_pending_con(ctxt->console, prb_next_reserve_seq(prb), false);

Best Regards,
Petr

Re: [PATCH v2 2/3] printk: nbcon: Introduce KDB helpers

Posted by Marcos Paulo de Souza 5 months, 1 week ago

On Thu, 2025-08-28 at 17:26 +0200, Petr Mladek wrote:
> On Mon 2025-08-11 10:32:46, Marcos Paulo de Souza wrote:
> > These helpers will be used when calling console->write_atomic on
> > KDB code in the next patch. It's basically the same implementaion
> > as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
> > acquiring the context.
> > 
> > For release, differently from nbcon_device_release, we don't need
> > to
> > flush the console, since all CPUs are stopped when KDB is active.
> 
> I thought this when we were discussion the code, especially the
> comment in
> 
> static void kdb_msg_write(const char *msg, int msg_len)
> {
> [...]
> 
> 	/*
> 	 * The console_srcu_read_lock() only provides safe console
> list
> 	 * traversal. The use of the ->write() callback relies on
> all other
> 	 * CPUs being stopped at the moment and console drivers
> being able to
> 	 * handle reentrance when @oops_in_progress is set.
> 
> 
> But I see that kdb_msg_write() is called from vkdb_printf() and
> there is the following synchronization:
> 
> int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
> {
> [...]
> 
> 	/* Serialize kdb_printf if multiple cpus try to write at
> once.
> 	 * But if any cpu goes recursive in kdb, just print the
> output,
> 	 * even if it is interleaved with any other text.
> 	 */
> 	local_irq_save(flags);
> 	this_cpu = smp_processor_id();
> 	for (;;) {
> 		old_cpu = cmpxchg(&kdb_printf_cpu, -1, this_cpu);
> 		if (old_cpu == -1 || old_cpu == this_cpu)
> 			break;
> 
> 		cpu_relax();
> 	}
> 
> It suggests that the code might be used when other CPUs are still
> running.
> 
> And for example, kgdb_panic(buf) is called in vpanic() before
> the other CPUs are stopped.
> 
> 
> My opinion:
> 
> It might make sense to flush the console after all. But probably
> only the particular console, see below.

Thanks for being so meticulous and double checking the solution.

> 
> > --- a/kernel/printk/nbcon.c
> > +++ b/kernel/printk/nbcon.c
> > @@ -1855,3 +1855,29 @@ void nbcon_device_release(struct console
> > *con)
> >  	console_srcu_read_unlock(cookie);
> >  }
> >  EXPORT_SYMBOL_GPL(nbcon_device_release);
> > +
> 
> The function must be called only in the very specific kdb
> context, so it would deserve a comment. Inspired by
> nbcon_device_try_acquire():

Yeah... I planned to add documentation on the new functions, but
somehow I forgot until now. I'll take you suggestions and add to the
functions, thanks a lot!

> 
> /**
>  * nbcon_kdb_try_acquire - Try to acquire nbcon console, enter unsafe
>  *				section, and initialized nbcon write
> context
>  * @con:	The nbcon console to acquire
>  * @wctxt:	The nbcon write context to be used on success
>  *
>  * Context:	Under console_srcu_read_lock() for emiting a single
> kdb message
>  *		using the given con->write_atomic() callback. Can be
> called
>  *		only when the console is usable at the moment.
>  *
>  * Return:	True if the console was acquired. False otherwise.
>  *
>  * kdb emits messages on consoles registered for printk() without
>  * storing them into the ring buffer. It has to acquire the console
>  * ownerhip so that is could call con->write_atomic() callback a safe
> way.
>  *
>  * This function acquires the nbcon console using priority
> NBCON_PRIO_EMERGENCY
>  * and marks it unsafe for handover/takeover.
>  */
> 
> > +bool nbcon_kdb_try_acquire(struct console *con,
> > +			   struct nbcon_write_context *wctxt)
> > +{
> > +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> > +
> > +	memset(ctxt, 0, sizeof(*ctxt));
> > +	ctxt->console = con;
> > +	ctxt->prio    = NBCON_PRIO_EMERGENCY;
> > +
> > +	if (!nbcon_context_try_acquire(ctxt, false))
> > +		return false;
> > +
> > +	if (!nbcon_context_enter_unsafe(ctxt))
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> 
> It deserves a comment as well, see below:
> 
> > +void nbcon_kdb_release(struct nbcon_write_context *wctxt)
> > +{
> > +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> > +
> > +	nbcon_context_exit_unsafe(ctxt);
> > +	nbcon_context_release(ctxt);
> 
> I agree with John that the _release() should be called only when
> exit_unsafe() succeeded.

Done.

> 
> Also it might make sense to flush the console. I would do something
> like:
> 
> 
> /**
>  * nbcon_kdb_release - Exit unsafe section and release the nbcon
> console
>  *
>  * @wctxt:	The nbcon write context intialized by a succesful
>  *	nbcon_kdb_try_acquire()
>  *
>  * Context:	Under console_srcu_read_lock() for emiting a single
> kdb message
>  *		using the given con->write_atomic() callback. Can be
> called
>  *		only when the console is usable at the moment.
>  */
> void nbcon_kdb_release(struct nbcon_write_context *wctxt)
> {
> 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> 
> 	nbcon_context_exit_unsafe(ctxt);
> 	nbcon_context_release(ctxt);
> 
> 	if (!nbcon_context_exit_unsafe(ctxt))
> 		return;
> 
> 	nbcon_context_release(ctxt);
> 
> 	/*
> 	 * Flush any new printk() messages added when the console
> was blocked.
> 	 * Only the console used by the given write context
> was	blocked.
> 	 * The console was locked only when the write_atomic()
> callback
> 	 * was usable.
> 	 */
> 	__nbcon_atomic_flush_pending_con(ctxt->console,
> prb_next_reserve_seq(prb), false);

Fixed locally, thanks a lot!

> 
> 
> Best Regards,
> Petr

Re: [PATCH v2 2/3] printk: nbcon: Introduce KDB helpers

Posted by John Ogness 5 months, 2 weeks ago

On 2025-08-11, Marcos Paulo de Souza <mpdesouza@suse.com> wrote:
> These helpers will be used when calling console->write_atomic on
> KDB code in the next patch. It's basically the same implementaion
> as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
> acquiring the context.
>
> For release, differently from nbcon_device_release, we don't need to
> flush the console, since all CPUs are stopped when KDB is active.
>
> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
> ---
>  include/linux/console.h |  6 ++++++
>  kernel/printk/nbcon.c   | 26 ++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+)
>
> diff --git a/include/linux/console.h b/include/linux/console.h
> index 67af483574727c00eea1d5a1eacc994755c92607..b34c5a0b86303e2fb4583fa467d8be43761cf756 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -605,6 +605,9 @@ extern bool nbcon_can_proceed(struct nbcon_write_context *wctxt);
>  extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt);
>  extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt);
>  extern void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt);
> +extern bool nbcon_kdb_try_acquire(struct console *con,
> +				  struct nbcon_write_context *wctxt);
> +extern void nbcon_kdb_release(struct nbcon_write_context *wctxt);
>  
>  /*
>   * Check if the given console is currently capable and allowed to print
> @@ -654,6 +657,9 @@ static inline bool nbcon_can_proceed(struct nbcon_write_context *wctxt) { return
>  static inline bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt) { return false; }
>  static inline bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt) { return false; }
>  static inline void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt) { }
> +static inline bool nbcon_kdb_try_acquire(struct console *con,
> +					 struct nbcon_write_context *wctxt) { return false; }
> +static inline void nbcon_kdb_release(struct console *con) { }
>  static inline bool console_is_usable(struct console *con, short flags,
>  				     bool use_atomic) { return false; }
>  #endif
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 646801813415f0abe40cabf2f28ca9e30664f028..79d8c7437806119ad9787ddc48382dc2c86c23c3 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1855,3 +1855,29 @@ void nbcon_device_release(struct console *con)
>  	console_srcu_read_unlock(cookie);
>  }
>  EXPORT_SYMBOL_GPL(nbcon_device_release);
> +
> +bool nbcon_kdb_try_acquire(struct console *con,
> +			   struct nbcon_write_context *wctxt)
> +{
> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +
> +	memset(ctxt, 0, sizeof(*ctxt));
> +	ctxt->console = con;
> +	ctxt->prio    = NBCON_PRIO_EMERGENCY;
> +
> +	if (!nbcon_context_try_acquire(ctxt, false))
> +		return false;
> +
> +	if (!nbcon_context_enter_unsafe(ctxt))
> +		return false;
> +
> +	return true;
> +}
> +
> +void nbcon_kdb_release(struct nbcon_write_context *wctxt)
> +{
> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +
> +	nbcon_context_exit_unsafe(ctxt);
> +	nbcon_context_release(ctxt);

If nbcon_context_exit_unsafe() fails, the current task is no longer the
owner and thus a release is not needed. I would prefer:

	if (nbcon_context_exit_unsafe(ctxt))
		nbcon_context_release(ctxt);

or

	if (!nbcon_context_exit_unsafe(ctxt))
		return;

	nbcon_context_release(ctxt);

You can find this same pattern in nbcon_device_release().

John

Re: [PATCH v2 2/3] printk: nbcon: Introduce KDB helpers

Posted by Marcos Paulo de Souza 5 months, 1 week ago

On Tue, 2025-08-26 at 16:36 +0206, John Ogness wrote:
> On 2025-08-11, Marcos Paulo de Souza <mpdesouza@suse.com> wrote:
> > These helpers will be used when calling console->write_atomic on
> > KDB code in the next patch. It's basically the same implementaion
> > as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
> > acquiring the context.
> > 
> > For release, differently from nbcon_device_release, we don't need
> > to
> > flush the console, since all CPUs are stopped when KDB is active.
> > 
> > Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
> > ---
> >  include/linux/console.h |  6 ++++++
> >  kernel/printk/nbcon.c   | 26 ++++++++++++++++++++++++++
> >  2 files changed, 32 insertions(+)
> > 
> > diff --git a/include/linux/console.h b/include/linux/console.h
> > index
> > 67af483574727c00eea1d5a1eacc994755c92607..b34c5a0b86303e2fb4583fa46
> > 7d8be43761cf756 100644
> > --- a/include/linux/console.h
> > +++ b/include/linux/console.h
> > @@ -605,6 +605,9 @@ extern bool nbcon_can_proceed(struct
> > nbcon_write_context *wctxt);
> >  extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt);
> >  extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt);
> >  extern void nbcon_reacquire_nobuf(struct nbcon_write_context
> > *wctxt);
> > +extern bool nbcon_kdb_try_acquire(struct console *con,
> > +				  struct nbcon_write_context
> > *wctxt);
> > +extern void nbcon_kdb_release(struct nbcon_write_context *wctxt);
> >  
> >  /*
> >   * Check if the given console is currently capable and allowed to
> > print
> > @@ -654,6 +657,9 @@ static inline bool nbcon_can_proceed(struct
> > nbcon_write_context *wctxt) { return
> >  static inline bool nbcon_enter_unsafe(struct nbcon_write_context
> > *wctxt) { return false; }
> >  static inline bool nbcon_exit_unsafe(struct nbcon_write_context
> > *wctxt) { return false; }
> >  static inline void nbcon_reacquire_nobuf(struct
> > nbcon_write_context *wctxt) { }
> > +static inline bool nbcon_kdb_try_acquire(struct console *con,
> > +					 struct
> > nbcon_write_context *wctxt) { return false; }
> > +static inline void nbcon_kdb_release(struct console *con) { }
> >  static inline bool console_is_usable(struct console *con, short
> > flags,
> >  				     bool use_atomic) { return
> > false; }
> >  #endif
> > diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> > index
> > 646801813415f0abe40cabf2f28ca9e30664f028..79d8c7437806119ad9787ddc4
> > 8382dc2c86c23c3 100644
> > --- a/kernel/printk/nbcon.c
> > +++ b/kernel/printk/nbcon.c
> > @@ -1855,3 +1855,29 @@ void nbcon_device_release(struct console
> > *con)
> >  	console_srcu_read_unlock(cookie);
> >  }
> >  EXPORT_SYMBOL_GPL(nbcon_device_release);
> > +
> > +bool nbcon_kdb_try_acquire(struct console *con,
> > +			   struct nbcon_write_context *wctxt)
> > +{
> > +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> > +
> > +	memset(ctxt, 0, sizeof(*ctxt));
> > +	ctxt->console = con;
> > +	ctxt->prio    = NBCON_PRIO_EMERGENCY;
> > +
> > +	if (!nbcon_context_try_acquire(ctxt, false))
> > +		return false;
> > +
> > +	if (!nbcon_context_enter_unsafe(ctxt))
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +void nbcon_kdb_release(struct nbcon_write_context *wctxt)
> > +{
> > +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> > +
> > +	nbcon_context_exit_unsafe(ctxt);
> > +	nbcon_context_release(ctxt);
> 
> If nbcon_context_exit_unsafe() fails, the current task is no longer
> the
> owner and thus a release is not needed. I would prefer:
> 
> 	if (nbcon_context_exit_unsafe(ctxt))
> 		nbcon_context_release(ctxt);
> 
> or
> 
> 	if (!nbcon_context_exit_unsafe(ctxt))
> 		return;
> 
> 	nbcon_context_release(ctxt);
> 
> You can find this same pattern in nbcon_device_release().

Thanks for spotting this issue. Fixed localy.

> 
> John

[PATCH v2 1/3] printk: nbcon: Export console_is_usable
[PATCH v2 2/3] printk: nbcon: Introduce KDB helpers
[PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON consoles