[PATCH v4 2/2] tty/sysrq: Dump printk ring buffer messages via sysrq

Sreenath Vijayan posted 2 patches 1 year, 10 months ago
[PATCH v4 2/2] tty/sysrq: Dump printk ring buffer messages via sysrq
Posted by Sreenath Vijayan 1 year, 10 months ago
When terminal is unresponsive, one cannot use dmesg to view printk
ring buffer messages. Also, syslog services may be disabled,
to check the messages after a reboot, especially on embedded systems.
In this scenario, dump the printk ring buffer messages via sysrq
by pressing sysrq+D.

Signed-off-by: Sreenath Vijayan <sreenath.vijayan@sony.com>
Signed-off-by: Shimoyashiki Taichi <taichi.shimoyashiki@sony.com>
---
 Documentation/admin-guide/sysrq.rst |  2 ++
 drivers/tty/sysrq.c                 | 20 +++++++++++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/sysrq.rst b/Documentation/admin-guide/sysrq.rst
index 2f2e5bd440f9..c634e8b4cea2 100644
--- a/Documentation/admin-guide/sysrq.rst
+++ b/Documentation/admin-guide/sysrq.rst
@@ -161,6 +161,8 @@ Command	    Function
             will be printed to your console. (``0``, for example would make
             it so that only emergency messages like PANICs or OOPSes would
             make it to your console.)
+
+``D``	    Dump the printk ring buffer
 =========== ===================================================================
 
 Okay, so what can I use them for?
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 02217e3c916b..365f7fa145f0 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -450,6 +450,24 @@ static const struct sysrq_key_op sysrq_unrt_op = {
 	.enable_mask	= SYSRQ_ENABLE_RTNICE,
 };
 
+static void dmesg_dump_callback(struct work_struct *work)
+{
+	dump_printk_buffer();
+}
+
+static DECLARE_WORK(sysrq_dmesg_work, dmesg_dump_callback);
+
+static void sysrq_handle_dmesg_dump(u8 key)
+{
+	queue_work(system_unbound_wq, &sysrq_dmesg_work);
+}
+static struct sysrq_key_op sysrq_dmesg_dump_op = {
+	.handler        = sysrq_handle_dmesg_dump,
+	.help_msg       = "dump-dmesg(D)",
+	.action_msg     = "Dump dmesg",
+	.enable_mask    = SYSRQ_ENABLE_DUMP,
+};
+
 /* Key Operations table and lock */
 static DEFINE_SPINLOCK(sysrq_key_table_lock);
 
@@ -505,7 +523,7 @@ static const struct sysrq_key_op *sysrq_key_table[62] = {
 	NULL,				/* A */
 	NULL,				/* B */
 	NULL,				/* C */
-	NULL,				/* D */
+	&sysrq_dmesg_dump_op,		/* D */
 	NULL,				/* E */
 	NULL,				/* F */
 	NULL,				/* G */
-- 
2.43.0
Re: [PATCH v4 2/2] tty/sysrq: Dump printk ring buffer messages via sysrq
Posted by Petr Mladek 1 year, 10 months ago
On Thu 2024-02-01 13:12:41, Sreenath Vijayan wrote:
> When terminal is unresponsive, one cannot use dmesg to view printk
> ring buffer messages. Also, syslog services may be disabled,
> to check the messages after a reboot, especially on embedded systems.
> In this scenario, dump the printk ring buffer messages via sysrq
> by pressing sysrq+D.

I would use sysrq-R and say that it replays the kernel log on
consoles.

The word "dump" is ambiguous. People might thing that it calls
dmesg dumpers.

Also the messages would be shown on the terminal only when
console_loglevel is set to show all messages. This is done
in __handle_sysrq(). But it is not done in the workqueue
context.

Finally, the commit message should explain why workqueues are used
and what are the limitations. Something like:

<add>
The log is replayed using workqueues. The reason is that it has to
be done a safe way (in compare with panic context).

This also means that the sysrq won't have the desired effect
when the system is in so bad state that workqueues do not
make any progress.
</add>

Another reason might be that we do not want to do it in
an interrupt context. But this reason is questionable.
Many other sysrq commands do a complicate work and
print many messages as well.

Another reason is that the function need to use console_lock()
which can't be called in IRQ context. Maybe, we should use
console_trylock() instead.

The function would replay the messages only when console_trylock()
succeeds. Users could repeat the sysrq when it fails.

Idea:

Using console_trylock() actually might be more reliable than
workqueues. console_trylock() might fail repeatably when:

    + the console_lock() owner is stuck. But workqueues would fail
      in this case as well.

    + there is a flood of messages. In this case, replaying
      the log would not help much.

Another advantage is that the consoles would be flushed
in sysrq context with the manipulated console_loglevel.

Best Regards,
Petr
Re: [PATCH v4 2/2] tty/sysrq: Dump printk ring buffer messages via sysrq
Posted by Sreenath Vijayan 1 year, 10 months ago
On Wed, Feb 07, 2024 at 04:09:34PM +0100, Petr Mladek wrote:
> On Thu 2024-02-01 13:12:41, Sreenath Vijayan wrote:
> > When terminal is unresponsive, one cannot use dmesg to view printk
> > ring buffer messages. Also, syslog services may be disabled,
> > to check the messages after a reboot, especially on embedded systems.
> > In this scenario, dump the printk ring buffer messages via sysrq
> > by pressing sysrq+D.
> 
> I would use sysrq-R and say that it replays the kernel log on
> consoles.
> 
> The word "dump" is ambiguous. People might thing that it calls
> dmesg dumpers.
>

Ok noted. We proposed sysrq-D as it is an alternative to dmesg
command and might be easier to remember.
 
> Also the messages would be shown on the terminal only when
> console_loglevel is set to show all messages. This is done
> in __handle_sysrq(). But it is not done in the workqueue
> context.
> 

Yes, the initial implementation was using write() of consoles
so the messages would be shown irrespective of the console log
level. The current implementation depends on the console log
level but many other sysrq keys dump the messages at KERN_INFO
level. In my understanding, __handle_sysrq() dumps only the
sysrq header at the manipulated loglevel. It restores original
loglevel before calling callback function for the key.
If console_loglevel is set to show KERN_INFO messages, it would
dump most of the important printk messages in our case. Also the
loglevel can be modified using sysrq itself now.

> Finally, the commit message should explain why workqueues are used
> and what are the limitations. Something like:
> 
> <add>
> The log is replayed using workqueues. The reason is that it has to
> be done a safe way (in compare with panic context).
> 
> This also means that the sysrq won't have the desired effect
> when the system is in so bad state that workqueues do not
> make any progress.
> </add>
> 
> Another reason might be that we do not want to do it in
> an interrupt context. But this reason is questionable.
> Many other sysrq commands do a complicate work and
> print many messages as well.
>

Noted. Will add this if we proceed with workqueue implementation.
 
> Another reason is that the function need to use console_lock()
> which can't be called in IRQ context. Maybe, we should use
> console_trylock() instead.
> 
> The function would replay the messages only when console_trylock()
> succeeds. Users could repeat the sysrq when it fails.
> 
> Idea:
> 
> Using console_trylock() actually might be more reliable than
> workqueues. console_trylock() might fail repeatably when:
> 
>     + the console_lock() owner is stuck. But workqueues would fail
>       in this case as well.
> 
>     + there is a flood of messages. In this case, replaying
>       the log would not help much.
> 
> Another advantage is that the consoles would be flushed
> in sysrq context with the manipulated console_loglevel.
> 
> Best Regards,
> Petr

Yes, this seems to work well from interrupt context when the
console lock owner is not stuck. We can also manipulate
the console_loglevel. Something like this:

//in printk.c
void console_replay_all(void)
{
       if (console_trylock()) {
               __console_rewind_all();
               console_unlock();
       }
}

//in sysrq.c
static void sysrq_handle_dmesg_dump(u8 key)
{
       int orig_log_level = console_loglevel;
       console_loglevel = CONSOLE_LOGLEVEL_DEFAULT;
       console_replay_all();
       console_loglevel = orig_log_level;
}


The downside I see is that the user may have to hit the
key multiple times or give up trying if the console lock
owner is busy at the time of key press. This information
should probably be updated in the documentation.

Please let me know your opinion.

Regards,
Sreenath
Re: [PATCH v4 2/2] tty/sysrq: Dump printk ring buffer messages via sysrq
Posted by Sreenath Vijayan 1 year, 10 months ago
On Wed, Feb 07, 2024 at 04:09:34PM +0100, Petr Mladek wrote:
> On Thu 2024-02-01 13:12:41, Sreenath Vijayan wrote:
> > When terminal is unresponsive, one cannot use dmesg to view printk
> > ring buffer messages. Also, syslog services may be disabled,
> > to check the messages after a reboot, especially on embedded systems.
> > In this scenario, dump the printk ring buffer messages via sysrq
> > by pressing sysrq+D.
> 
> I would use sysrq-R and say that it replays the kernel log on
> consoles.
> 
> The word "dump" is ambiguous. People might thing that it calls
> dmesg dumpers.
>

Ok noted. We proposed sysrq-D as it is an alternative to dmesg
command and might be easier to remember.
 
> Also the messages would be shown on the terminal only when
> console_loglevel is set to show all messages. This is done
> in __handle_sysrq(). But it is not done in the workqueue
> context.
>

Yes, the initial implementation was using write() of consoles
so the messages would be shown irrespective of the console log
level. The current implementation depends on the console log
level but many other sysrq keys dump the messages at KERN_INFO
level. In my understanding, __handle_sysrq() dumps only the
sysrq header at the manipulated loglevel. It restores original
loglevel before calling callback function for the key.
If console_loglevel is set to show KERN_INFO messages, it would
dump most of the important printk messages in our case. Also the
loglevel can be modified using sysrq itself now.
 
> Finally, the commit message should explain why workqueues are used
> and what are the limitations. Something like:
> 
> <add>
> The log is replayed using workqueues. The reason is that it has to
> be done a safe way (in compare with panic context).
> 
> This also means that the sysrq won't have the desired effect
> when the system is in so bad state that workqueues do not
> make any progress.
> </add>
> 
> Another reason might be that we do not want to do it in
> an interrupt context. But this reason is questionable.
> Many other sysrq commands do a complicate work and
> print many messages as well.
>

Noted. Will add this if we proceed with workqueue implementation.
 
> Another reason is that the function need to use console_lock()
> which can't be called in IRQ context. Maybe, we should use
> console_trylock() instead.
> 
> The function would replay the messages only when console_trylock()
> succeeds. Users could repeat the sysrq when it fails.
> 
> Idea:
> 
> Using console_trylock() actually might be more reliable than
> workqueues. console_trylock() might fail repeatably when:
> 
>     + the console_lock() owner is stuck. But workqueues would fail
>       in this case as well.
> 
>     + there is a flood of messages. In this case, replaying
>       the log would not help much.
> 
> Another advantage is that the consoles would be flushed
> in sysrq context with the manipulated console_loglevel.
> 
> Best Regards,
> Petr

Yes, this seems to work well from interrupt context when the
console lock owner is not stuck. We can also manipulate
the console_loglevel. Something like this:

//in printk.c
void console_replay_all(void)
{
       if (console_trylock()) {
               __console_rewind_all();
               console_unlock();
       }
}

//in sysrq.c
static void sysrq_handle_dmesg_dump(u8 key)
{
       int orig_log_level = console_loglevel;
       console_loglevel = CONSOLE_LOGLEVEL_DEFAULT;
       console_replay_all();
       console_loglevel = orig_log_level;
}


The downside I see is that the user may have to hit the
key multiple times or give up trying if the console lock
owner is busy at the time of key press. This information
should probably be updated in the documentation.

Please let me know your opinion.

Regards,
Sreenath
Re: [PATCH v4 2/2] tty/sysrq: Dump printk ring buffer messages via sysrq
Posted by John Ogness 1 year, 9 months ago
On 2024-02-14, Sreenath Vijayan <sreenath.vijayan@sony.com> wrote:
> //in printk.c
> void console_replay_all(void)
> {
>        if (console_trylock()) {
>                __console_rewind_all();
>                console_unlock();
>        }
> }
>
> //in sysrq.c
> static void sysrq_handle_dmesg_dump(u8 key)
> {
>        int orig_log_level = console_loglevel;
>        console_loglevel = CONSOLE_LOGLEVEL_DEFAULT;
>        console_replay_all();
>        console_loglevel = orig_log_level;
> }

Note that there is already a sysrq 0-9 to change the loglevel. That may
be more appropriate.

John
Re: [PATCH v4 2/2] tty/sysrq: Dump printk ring buffer messages via sysrq
Posted by Sreenath Vijayan 1 year, 9 months ago
On Wed, Feb 14, 2024 at 04:42:36PM +0530, Sreenath Vijayan wrote:
> On Wed, Feb 07, 2024 at 04:09:34PM +0100, Petr Mladek wrote:
> > Idea:
> > 
> > Using console_trylock() actually might be more reliable than
> > workqueues. console_trylock() might fail repeatably when:
> > 
> >     + the console_lock() owner is stuck. But workqueues would fail
> >       in this case as well.
> > 
> >     + there is a flood of messages. In this case, replaying
> >       the log would not help much.
> > 
> > Another advantage is that the consoles would be flushed
> > in sysrq context with the manipulated console_loglevel.
> > 
> > Best Regards,
> > Petr
> 
> Yes, this seems to work well from interrupt context when the
> console lock owner is not stuck. We can also manipulate
> the console_loglevel. Something like this:
> 
> //in printk.c
> void console_replay_all(void)
> {
>        if (console_trylock()) {
>                __console_rewind_all();
>                console_unlock();
>        }
> }
> 
> //in sysrq.c
> static void sysrq_handle_dmesg_dump(u8 key)
> {
>        int orig_log_level = console_loglevel;
>        console_loglevel = CONSOLE_LOGLEVEL_DEFAULT;
>        console_replay_all();
>        console_loglevel = orig_log_level;
> }
> 
> 
> The downside I see is that the user may have to hit the
> key multiple times or give up trying if the console lock
> owner is busy at the time of key press. This information
> should probably be updated in the documentation.
> 
> Please let me know your opinion.
> 
> Regards,
> Sreenath

Hi,

Kindly let me know your opinion on the suggested changes.

Regards,
Sreenath
Re: [PATCH v4 2/2] tty/sysrq: Dump printk ring buffer messages via sysrq
Posted by Greg KH 1 year, 10 months ago
On Wed, Feb 07, 2024 at 04:09:34PM +0100, Petr Mladek wrote:
> On Thu 2024-02-01 13:12:41, Sreenath Vijayan wrote:
> > When terminal is unresponsive, one cannot use dmesg to view printk
> > ring buffer messages. Also, syslog services may be disabled,
> > to check the messages after a reboot, especially on embedded systems.
> > In this scenario, dump the printk ring buffer messages via sysrq
> > by pressing sysrq+D.
> 
> I would use sysrq-R and say that it replays the kernel log on
> consoles.
> 
> The word "dump" is ambiguous. People might thing that it calls
> dmesg dumpers.
> 
> Also the messages would be shown on the terminal only when
> console_loglevel is set to show all messages. This is done
> in __handle_sysrq(). But it is not done in the workqueue
> context.
> 
> Finally, the commit message should explain why workqueues are used
> and what are the limitations. Something like:
> 
> <add>
> The log is replayed using workqueues. The reason is that it has to
> be done a safe way (in compare with panic context).
> 
> This also means that the sysrq won't have the desired effect
> when the system is in so bad state that workqueues do not
> make any progress.
> </add>
> 
> Another reason might be that we do not want to do it in
> an interrupt context. But this reason is questionable.
> Many other sysrq commands do a complicate work and
> print many messages as well.
> 
> Another reason is that the function need to use console_lock()
> which can't be called in IRQ context. Maybe, we should use
> console_trylock() instead.
> 
> The function would replay the messages only when console_trylock()
> succeeds. Users could repeat the sysrq when it fails.
> 
> Idea:
> 
> Using console_trylock() actually might be more reliable than
> workqueues. console_trylock() might fail repeatably when:
> 
>     + the console_lock() owner is stuck. But workqueues would fail
>       in this case as well.
> 
>     + there is a flood of messages. In this case, replaying
>       the log would not help much.
> 
> Another advantage is that the consoles would be flushed
> in sysrq context with the manipulated console_loglevel.

I just remembered all the rt-changes coming down the pipe for
consoles/printk, is this going to mess with that?

And in thinking about it, the workqueue is a worry, sysrq is only
usually hit if you have a lockup, and this isn't going to work well
here, if at all, in that situation.

So when this option fails when people need it the most, perhaps it's not
worth adding?  When else would people want to use it?

thanks,

greg k-h
Re: [PATCH v4 2/2] tty/sysrq: Dump printk ring buffer messages via sysrq
Posted by John Ogness 1 year, 10 months ago
On 2024-02-08, Greg KH <gregkh@linuxfoundation.org> wrote:
> I just remembered all the rt-changes coming down the pipe for
> consoles/printk, is this going to mess with that?

It will not mess with the changes because we will continue to support
the legacy consoles anyway.

> So when this option fails when people need it the most, perhaps it's not
> worth adding?  When else would people want to use it?

The feature could be massively improved once the rt-changes (atomic
consoles) become available.

Petr also brought up valid points about this feature (such as the
loglevel) that should be considered. We should clarify what exactly we
want this feature to do. The actual implementation is the easy part.

John
Re: [PATCH v4 2/2] tty/sysrq: Dump printk ring buffer messages via sysrq
Posted by John Ogness 1 year, 10 months ago
On 2024-02-01, Sreenath Vijayan <sreenath.vijayan@sony.com> wrote:
> When terminal is unresponsive, one cannot use dmesg to view printk
> ring buffer messages. Also, syslog services may be disabled,
> to check the messages after a reboot, especially on embedded systems.
> In this scenario, dump the printk ring buffer messages via sysrq
> by pressing sysrq+D.
>
> Signed-off-by: Sreenath Vijayan <sreenath.vijayan@sony.com>
> Signed-off-by: Shimoyashiki Taichi <taichi.shimoyashiki@sony.com>

Reviewed-by: John Ogness <john.ogness@linutronix.de>