[v3] Handle NBCON consoles on KDB

[PATCH v3 2/4] printk: nbcon: Introduce KDB helpers

Posted by Marcos Paulo de Souza 1 month ago

These helpers will be used when calling console->write_atomic on
KDB code in the next patch. It's basically the same implementaion
as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
acquiring the context.

For release we need to flush the console, since some messages could be
added before the context was acquired, as KDB emits the messages using
con->{write,write_atomic} instead of storing them on the ring buffer.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
 include/linux/console.h |  6 +++++
 kernel/printk/nbcon.c   | 66 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index 67af483574727c00eea1d5a1eacc994755c92607..b34c5a0b86303e2fb4583fa467d8be43761cf756 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -605,6 +605,9 @@ extern bool nbcon_can_proceed(struct nbcon_write_context *wctxt);
 extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt);
 extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt);
 extern void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt);
+extern bool nbcon_kdb_try_acquire(struct console *con,
+				  struct nbcon_write_context *wctxt);
+extern void nbcon_kdb_release(struct nbcon_write_context *wctxt);
 
 /*
  * Check if the given console is currently capable and allowed to print
@@ -654,6 +657,9 @@ static inline bool nbcon_can_proceed(struct nbcon_write_context *wctxt) { return
 static inline bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt) { return false; }
 static inline bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt) { return false; }
 static inline void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt) { }
+static inline bool nbcon_kdb_try_acquire(struct console *con,
+					 struct nbcon_write_context *wctxt) { return false; }
+static inline void nbcon_kdb_release(struct console *con) { }
 static inline bool console_is_usable(struct console *con, short flags,
 				     bool use_atomic) { return false; }
 #endif
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 646801813415f0abe40cabf2f28ca9e30664f028..ff218e95a505fd10521c2c4dfb00ad5ec5773953 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1855,3 +1855,69 @@ void nbcon_device_release(struct console *con)
 	console_srcu_read_unlock(cookie);
 }
 EXPORT_SYMBOL_GPL(nbcon_device_release);
+
+/**
+ * nbcon_kdb_try_acquire - Try to acquire nbcon console, enter unsafe
+ *				section, and initialized nbcon write context
+ * @con:	The nbcon console to acquire
+ * @wctxt:	The nbcon write context to be used on success
+ *
+ * Context:	Under console_srcu_read_lock() for emiting a single kdb message
+ *		using the given con->write_atomic() callback. Can be called
+ *		only when the console is usable at the moment.
+ *
+ * Return:	True if the console was acquired. False otherwise.
+ *
+ * kdb emits messages on consoles registered for printk() without
+ * storing them into the ring buffer. It has to acquire the console
+ * ownerhip so that it could call con->write_atomic() callback a safe way.
+ *
+ * This function acquires the nbcon console using priority NBCON_PRIO_EMERGENCY
+ * and marks it unsafe for handover/takeover.
+ */
+bool nbcon_kdb_try_acquire(struct console *con,
+			   struct nbcon_write_context *wctxt)
+{
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+	memset(ctxt, 0, sizeof(*ctxt));
+	ctxt->console = con;
+	ctxt->prio    = NBCON_PRIO_EMERGENCY;
+
+	if (!nbcon_context_try_acquire(ctxt, false))
+		return false;
+
+	if (!nbcon_context_enter_unsafe(ctxt))
+		return false;
+
+	return true;
+}
+
+/**
+ * nbcon_kdb_release - Exit unsafe section and release the nbcon console
+ *
+ * @wctxt:	The nbcon write context initialized by a successful
+ *		nbcon_kdb_try_acquire()
+ *
+ * Context:	Under console_srcu_read_lock() for emiting a single kdb message
+ *		using the given con->write_atomic() callback. Can be called
+ *		only when the console is usable at the moment.
+ */
+void nbcon_kdb_release(struct nbcon_write_context *wctxt)
+{
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+	if (!nbcon_context_exit_unsafe(ctxt))
+		return;
+
+	nbcon_context_release(ctxt);
+
+	/*
+	 * Flush any new printk() messages added when the console was blocked.
+	 * Only the console used by the given write context was	blocked.
+	 * The console was locked only when the write_atomic() callback
+	 * was usable.
+	 */
+	__nbcon_atomic_flush_pending_con(ctxt->console,
+					 prb_next_reserve_seq(prb), false);
+}

-- 
2.50.0

Re: [PATCH v3 2/4] printk: nbcon: Introduce KDB helpers

Posted by Petr Mladek 3 weeks, 6 days ago

On Tue 2025-09-02 15:33:53, Marcos Paulo de Souza wrote:
> These helpers will be used when calling console->write_atomic on
> KDB code in the next patch. It's basically the same implementaion
> as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
> acquiring the context.
> 
> For release we need to flush the console, since some messages could be
> added before the context was acquired, as KDB emits the messages using
> con->{write,write_atomic} instead of storing them on the ring buffer.

I am a bit confused by the last paragraph. It is a very long sentence.

Sigh, I wanted to propose a simple and clear alternative. But I ended
in a rabbit hole and with a rather complex text:

<proposal>
The atomic flush in the release function is questionable. vkdb_printf()
is primary called only when other CPUs are quiescent in kdb_main_loop()
and do not call the classic printk(). But, for example, the
write_atomic() callback might print debug messages. Or there is
one kdb_printf() called in kgdb_panic() before other CPUs are
quiescent. So the flush might be useful. Especially, when
the kdb code fails to quiescent the CPUs and returns early.

Let's keep it simple and just call __nbcon_atomic_flush_pending_con().
It uses write_atomic() callback which is used by the locked kdb code
anyway.

The legacy loop (console_trylock()/console_unlock()) is not
usable in kdb context.

It might make sense to trigger the flush via the printk kthread.
But it would not work in panic() where is the only known kdb_printf()
called when other CPUs are not quiescent. So, it does not look
worth it.
</proposal>

What do you think?

My opinion:

Honestly, I think that the flush is not much important because
it will most offten have nothing to do.

I am just not sure whether it is better to have it there
or avoid it. It might be better to remove it after all.
And just document the decision.

Best Regards,
Petr

Re: [PATCH v3 2/4] printk: nbcon: Introduce KDB helpers

Posted by John Ogness 3 weeks, 3 days ago

On 2025-09-05, Petr Mladek <pmladek@suse.com> wrote:
> On Tue 2025-09-02 15:33:53, Marcos Paulo de Souza wrote:
>> These helpers will be used when calling console->write_atomic on
>> KDB code in the next patch. It's basically the same implementaion
>> as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
>> acquiring the context.
>> 
>> For release we need to flush the console, since some messages could be
>> added before the context was acquired, as KDB emits the messages using
>> con->{write,write_atomic} instead of storing them on the ring buffer.
>
> I am a bit confused by the last paragraph. It is a very long sentence.
>
> Sigh, I wanted to propose a simple and clear alternative. But I ended
> in a rabbit hole and with a rather complex text:
>
> <proposal>
> The atomic flush in the release function is questionable. vkdb_printf()
> is primary called only when other CPUs are quiescent in kdb_main_loop()
> and do not call the classic printk(). But, for example, the
> write_atomic() callback might print debug messages. Or there is
> one kdb_printf() called in kgdb_panic() before other CPUs are
> quiescent. So the flush might be useful. Especially, when
> the kdb code fails to quiescent the CPUs and returns early.
>
> Let's keep it simple and just call __nbcon_atomic_flush_pending_con().
> It uses write_atomic() callback which is used by the locked kdb code
> anyway.
>
> The legacy loop (console_trylock()/console_unlock()) is not
> usable in kdb context.
>
> It might make sense to trigger the flush via the printk kthread.
> But it would not work in panic() where is the only known kdb_printf()
> called when other CPUs are not quiescent. So, it does not look
> worth it.
> </proposal>
>
> What do you think?
>
> My opinion:
>
> Honestly, I think that the flush is not much important because
> it will most offten have nothing to do.
>
> I am just not sure whether it is better to have it there
> or avoid it. It might be better to remove it after all.
> And just document the decision.

IMHO keeping the flush is fine. There are cases where there might be
something to print. And since a printing kthread will get no chance to
print as long as kdb is alive, we should have kdb flushing that
console.

Note that this is the only console that will actually see the new
messages immediately as all the other CPUs and quiesced. For this reason
we probably want to use __nbcon_atomic_flush_pending() to try to flush
_all_ the consoles.

As to the last paragraph of the commit message, I would keep it simple:

After release try to flush all consoles since there may be a backlog of
messages in the ringbuffer. The kthread console printers do not get a
chance to run while kdb is active.

John

Re: [PATCH v3 2/4] printk: nbcon: Introduce KDB helpers

Posted by Petr Mladek 3 weeks, 2 days ago

On Mon 2025-09-08 14:15:08, John Ogness wrote:
> On 2025-09-05, Petr Mladek <pmladek@suse.com> wrote:
> > On Tue 2025-09-02 15:33:53, Marcos Paulo de Souza wrote:
> >> These helpers will be used when calling console->write_atomic on
> >> KDB code in the next patch. It's basically the same implementaion
> >> as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
> >> acquiring the context.
> >> 
> >> For release we need to flush the console, since some messages could be
> >> added before the context was acquired, as KDB emits the messages using
> >> con->{write,write_atomic} instead of storing them on the ring buffer.
> >
> > I am a bit confused by the last paragraph. It is a very long sentence.
> >
> > Sigh, I wanted to propose a simple and clear alternative. But I ended
> > in a rabbit hole and with a rather complex text:
> >
> > <proposal>
> > The atomic flush in the release function is questionable. vkdb_printf()
> > is primary called only when other CPUs are quiescent in kdb_main_loop()
> > and do not call the classic printk(). But, for example, the
> > write_atomic() callback might print debug messages. Or there is
> > one kdb_printf() called in kgdb_panic() before other CPUs are
> > quiescent. So the flush might be useful. Especially, when
> > the kdb code fails to quiescent the CPUs and returns early.
> >
> > Let's keep it simple and just call __nbcon_atomic_flush_pending_con().
> > It uses write_atomic() callback which is used by the locked kdb code
> > anyway.
> >
> > The legacy loop (console_trylock()/console_unlock()) is not
> > usable in kdb context.
> >
> > It might make sense to trigger the flush via the printk kthread.
> > But it would not work in panic() where is the only known kdb_printf()
> > called when other CPUs are not quiescent. So, it does not look
> > worth it.
> > </proposal>
> >
> > What do you think?
> >
> > My opinion:
> >
> > Honestly, I think that the flush is not much important because
> > it will most offten have nothing to do.
> >
> > I am just not sure whether it is better to have it there
> > or avoid it. It might be better to remove it after all.
> > And just document the decision.
> 
> IMHO keeping the flush is fine. There are cases where there might be
> something to print. And since a printing kthread will get no chance to
> print as long as kdb is alive, we should have kdb flushing that
> console.
> 
> Note that this is the only console that will actually see the new
> messages immediately as all the other CPUs and quiesced.

I do not understand this argument. IMHO, this new
try_acquire()/release() API should primary flush only
the console which was (b)locked by this API.

It will be called in kdb_msg_write() which tries to write
to all registered consoles. So the other nbcon consoles will
get flushed when the try_acquire() succeeds on them. And the
legacy conosles were never flushed.

> For this reason
> we probably want to use __nbcon_atomic_flush_pending() to try to flush
> _all_ the consoles.

I would prefer to keep __nbcon_atomic_flush_pending_con().
I mean to flush only the console which was blocked.

Note that we would need to increment oops_in_progress if we wanted
to flush legacy consoles in this context... which would spread
the mess into nbcon code...

> As to the last paragraph of the commit message, I would keep it simple:
> 
> After release try to flush all consoles since there may be a backlog of
> messages in the ringbuffer. The kthread console printers do not get a
> chance to run while kdb is active.

I like this text.

Best Regards,
Petr

Re: [PATCH v3 2/4] printk: nbcon: Introduce KDB helpers

Posted by John Ogness 3 weeks, 2 days ago

On 2025-09-09, Petr Mladek <pmladek@suse.com> wrote:
>> > Honestly, I think that the flush is not much important because
>> > it will most offten have nothing to do.
>> >
>> > I am just not sure whether it is better to have it there
>> > or avoid it. It might be better to remove it after all.
>> > And just document the decision.
>> 
>> IMHO keeping the flush is fine. There are cases where there might be
>> something to print. And since a printing kthread will get no chance to
>> print as long as kdb is alive, we should have kdb flushing that
>> console.
>> 
>> Note that this is the only console that will actually see the new
>> messages immediately as all the other CPUs and quiesced.
>
> I do not understand this argument. IMHO, this new
> try_acquire()/release() API should primary flush only
> the console which was (b)locked by this API.
>
> It will be called in kdb_msg_write() which tries to write
> to all registered consoles. So the other nbcon consoles will
> get flushed when the try_acquire() succeeds on them. And the
> legacy conosles were never flushed.

Right. I oversaw that it acquires each of the nbcon's.

> I would prefer to keep __nbcon_atomic_flush_pending_con().
> I mean to flush only the console which was blocked.

Agreed.

>> After release try to flush all consoles since there may be a backlog of
>> messages in the ringbuffer. The kthread console printers do not get a
>> chance to run while kdb is active.
>
> I like this text.

OK, but then change it to talk only about the one console.

After release try to flush the console since there may be a backlog of
messages in the ringbuffer. The kthread console printers do not get a
chance to run while kdb is active.

John