[PATCH v3] softlockup: serialized softlockup's log

lizhe.67@bytedance.com posted 1 patch 2 years ago
kernel/watchdog.c | 3 +++
1 file changed, 3 insertions(+)
[PATCH v3] softlockup: serialized softlockup's log
Posted by lizhe.67@bytedance.com 2 years ago
From: Li Zhe <lizhe.67@bytedance.com>

If multiple CPUs trigger softlockup at the same time with
'softlockup_all_cpu_backtrace=0', the softlockup's logs will appear
staggeredly in dmesg, which will affect the viewing of the logs for
developer. Since the code path for outputting softlockup logs is not
a kernel hotspot and the performance requirements for the code are
not strict, locks are used to serialize the softlockup log output to
improve the readability of the logs.

Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
Changelogs:

v1->v2:
- define the lock outside the scope of function
- add precondition 'softlockup_all_cpu_backtrace=0' in commit message

v2->v3:
- define the lock inside the scope of function

 kernel/watchdog.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5cd6d4e26915..bf30a6fac665 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -448,6 +448,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	struct pt_regs *regs = get_irq_regs();
 	int duration;
 	int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
+	static DEFINE_SPINLOCK(watchdog_output_lock);
 
 	if (!watchdog_enabled)
 		return HRTIMER_NORESTART;
@@ -514,6 +515,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 		/* Start period for the next softlockup warning. */
 		update_report_ts();
 
+		spin_lock(&watchdog_output_lock);
 		pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
 			smp_processor_id(), duration,
 			current->comm, task_pid_nr(current));
@@ -523,6 +525,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 			show_regs(regs);
 		else
 			dump_stack();
+		spin_unlock(&watchdog_output_lock);
 
 		if (softlockup_all_cpu_backtrace) {
 			trigger_allbutcpu_cpu_backtrace(smp_processor_id());
-- 
2.20.1
Re: [PATCH v3] softlockup: serialized softlockup's log
Posted by Doug Anderson 2 years ago
Hi,

On Thu, Nov 23, 2023 at 12:41 AM <lizhe.67@bytedance.com> wrote:
>
> From: Li Zhe <lizhe.67@bytedance.com>
>
> If multiple CPUs trigger softlockup at the same time with
> 'softlockup_all_cpu_backtrace=0', the softlockup's logs will appear
> staggeredly in dmesg, which will affect the viewing of the logs for
> developer. Since the code path for outputting softlockup logs is not
> a kernel hotspot and the performance requirements for the code are
> not strict, locks are used to serialize the softlockup log output to
> improve the readability of the logs.
>
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> ---
> Changelogs:
>
> v1->v2:
> - define the lock outside the scope of function
> - add precondition 'softlockup_all_cpu_backtrace=0' in commit message
>
> v2->v3:
> - define the lock inside the scope of function
>
>  kernel/watchdog.c | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Douglas Anderson <dianders@chromium.org>
Re: [PATCH v3] softlockup: serialized softlockup's log
Posted by Petr Mladek 2 years ago
On Thu 2023-11-23 16:40:22, lizhe.67@bytedance.com wrote:
> From: Li Zhe <lizhe.67@bytedance.com>
> 
> If multiple CPUs trigger softlockup at the same time with
> 'softlockup_all_cpu_backtrace=0', the softlockup's logs will appear
> staggeredly in dmesg, which will affect the viewing of the logs for
> developer. Since the code path for outputting softlockup logs is not
> a kernel hotspot and the performance requirements for the code are
> not strict, locks are used to serialize the softlockup log output to
> improve the readability of the logs.
> 
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>

I do not feel fully comfortable with adding a lock into a code path
which reports system lockups. There might already be a deadlock on
the system and yet another lock would not make things easier.

On the other hand, the added spinlock looks pretty safe:

   + It synchronizes only watchdog_timer_fn() calls against each other.
     watchdog_timer_fn() could not be nested.

   + The locked code seems to be synchronized only by RCU
     and does not wait for other CPUs to finish something.


I haven't found any real deadlock scenario. Feel free
to use:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr