kernel/watchdog.c | 4 ++++ 1 file changed, 4 insertions(+)
From: Li Zhe <lizhe.67@bytedance.com>
If multiple CPUs trigger softlockup at the same time with
'softlockup_all_cpu_backtrace=0', the softlockup's logs will appear
staggeredly in dmesg, which will affect the viewing of the logs for
developer. Since the code path for outputting softlockup logs is not
a kernel hotspot and the performance requirements for the code are
not strict, locks are used to serialize the softlockup log output to
improve the readability of the logs.
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
Changelogs:
v1->v2:
- define the lock outside the scope of function
- add precondition 'softlockup_all_cpu_backtrace=0' in commit message
kernel/watchdog.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5cd6d4e26915..6c80dd8ff24b 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -28,6 +28,8 @@
#include <linux/kvm_para.h>
static DEFINE_MUTEX(watchdog_mutex);
+/* This lock is used to prevent concurrent actions of softlockup output logs */
+static DEFINE_SPINLOCK(watchdog_output_lock);
#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HARDLOCKUP_DETECTOR_SPARC64)
# define WATCHDOG_HARDLOCKUP_DEFAULT 1
@@ -514,6 +516,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
/* Start period for the next softlockup warning. */
update_report_ts();
+ spin_lock(&watchdog_output_lock);
pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
smp_processor_id(), duration,
current->comm, task_pid_nr(current));
@@ -523,6 +526,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
show_regs(regs);
else
dump_stack();
+ spin_unlock(&watchdog_output_lock);
if (softlockup_all_cpu_backtrace) {
trigger_allbutcpu_cpu_backtrace(smp_processor_id());
--
2.20.1
On Wed, 22 Nov 2023 18:02:12 +0800 lizhe.67@bytedance.com wrote:
> From: Li Zhe <lizhe.67@bytedance.com>
>
> If multiple CPUs trigger softlockup at the same time with
> 'softlockup_all_cpu_backtrace=0', the softlockup's logs will appear
> staggeredly in dmesg, which will affect the viewing of the logs for
> developer. Since the code path for outputting softlockup logs is not
> a kernel hotspot and the performance requirements for the code are
> not strict, locks are used to serialize the softlockup log output to
> improve the readability of the logs.
Seems reasonable, but...
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -28,6 +28,8 @@
> #include <linux/kvm_para.h>
>
> static DEFINE_MUTEX(watchdog_mutex);
> +/* This lock is used to prevent concurrent actions of softlockup output logs */
> +static DEFINE_SPINLOCK(watchdog_output_lock);
It would be a little neater to reduce the scope of this - move the
definition into that little code block in watchdog_timer_fn() where it
is actually used.
> #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HARDLOCKUP_DETECTOR_SPARC64)
> # define WATCHDOG_HARDLOCKUP_DEFAULT 1
> @@ -514,6 +516,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> /* Start period for the next softlockup warning. */
> update_report_ts();
>
> + spin_lock(&watchdog_output_lock);
The hrtimer callout function is called from [soft]irq context, yes?
Doesn't lockdep get upset when we take a spinlock in such a context?
> pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
> smp_processor_id(), duration,
> current->comm, task_pid_nr(current));
> @@ -523,6 +526,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> show_regs(regs);
> else
> dump_stack();
> + spin_unlock(&watchdog_output_lock);
>
> if (softlockup_all_cpu_backtrace) {
> trigger_allbutcpu_cpu_backtrace(smp_processor_id());
On Wed, 22 Nov 2023 14:05:21 -0800 akpm@linux-foundation.org wrote:
>> From: Li Zhe <lizhe.67@bytedance.com>
>>
>> If multiple CPUs trigger softlockup at the same time with
>> 'softlockup_all_cpu_backtrace=0', the softlockup's logs will appear
>> staggeredly in dmesg, which will affect the viewing of the logs for
>> developer. Since the code path for outputting softlockup logs is not
>> a kernel hotspot and the performance requirements for the code are
>> not strict, locks are used to serialize the softlockup log output to
>> improve the readability of the logs.
>
>Seems reasonable, but...
>
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -28,6 +28,8 @@
>> #include <linux/kvm_para.h>
>>
>> static DEFINE_MUTEX(watchdog_mutex);
>> +/* This lock is used to prevent concurrent actions of softlockup output logs */
>> +static DEFINE_SPINLOCK(watchdog_output_lock);
>
>It would be a little neater to reduce the scope of this - move the
>definition into that little code block in watchdog_timer_fn() where it
>is actually used.
Yes. For this patch, it is more appropriate to put the definition in
watchdog_timer_fn(). It can be moved out in subsequent patches if
necessary. I will send a v3 patch to move it in watchdog_timer_fn().
Thanks for your advice.
>> #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HARDLOCKUP_DETECTOR_SPARC64)
>> # define WATCHDOG_HARDLOCKUP_DEFAULT 1
>> @@ -514,6 +516,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>> /* Start period for the next softlockup warning. */
>> update_report_ts();
>>
>> + spin_lock(&watchdog_output_lock);
>
>The hrtimer callout function is called from [soft]irq context, yes?
Yes.
>Doesn't lockdep get upset when we take a spinlock in such a context?
My test results are the same as Doug Anderson's, things seemed OK.
>
>> pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
>> smp_processor_id(), duration,
>> current->comm, task_pid_nr(current));
>> @@ -523,6 +526,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>> show_regs(regs);
>> else
>> dump_stack();
>> + spin_unlock(&watchdog_output_lock);
>>
>> if (softlockup_all_cpu_backtrace) {
>> trigger_allbutcpu_cpu_backtrace(smp_processor_id());
Hi, On Wed, Nov 22, 2023 at 2:05 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Wed, 22 Nov 2023 18:02:12 +0800 lizhe.67@bytedance.com wrote: > > > From: Li Zhe <lizhe.67@bytedance.com> > > > > If multiple CPUs trigger softlockup at the same time with > > 'softlockup_all_cpu_backtrace=0', the softlockup's logs will appear > > staggeredly in dmesg, which will affect the viewing of the logs for > > developer. Since the code path for outputting softlockup logs is not > > a kernel hotspot and the performance requirements for the code are > > not strict, locks are used to serialize the softlockup log output to > > improve the readability of the logs. > > Seems reasonable, but... > > > --- a/kernel/watchdog.c > > +++ b/kernel/watchdog.c > > @@ -28,6 +28,8 @@ > > #include <linux/kvm_para.h> > > > > static DEFINE_MUTEX(watchdog_mutex); > > +/* This lock is used to prevent concurrent actions of softlockup output logs */ > > +static DEFINE_SPINLOCK(watchdog_output_lock); > > It would be a little neater to reduce the scope of this - move the > definition into that little code block in watchdog_timer_fn() where it > is actually used. That's where Li Zhe had it in v1. I actually requested it move to a larger scope since I think we'll want a future patch that also uses the same lock for hardlockup and I was thinking of sending that out after Li Zhe's patch lands. Specifically, I've seen cases on arm64 when pseudo-NMI isn't enabled that we can end up with a soft lockup triggering while we're waiting for a backtrace of a hardlockup (since we wait 10 seconds if a CPU isn't responding to the request to backtrace it). I've also been testing hardlockup (with pseudo-NMI enabled) and I also notice similar problems with hardlockup itself. Specifically, I'm looking at a an output currently that looks like this: [ 102.619905] lkdtm: Performing direct entry HARDLOCKUP [ 112.037900] Watchdog detected hard LOCKUP on cpu 1 [ 112.038031] Watchdog detected hard LOCKUP on cpu 7 [ 112.042872] Modules linked in: [ 112.047807] Modules linked in: ... ...but I'd also be OK with limiting the scope until I send my later patch, so up to you. > > #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HARDLOCKUP_DETECTOR_SPARC64) > > # define WATCHDOG_HARDLOCKUP_DEFAULT 1 > > @@ -514,6 +516,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > > /* Start period for the next softlockup warning. */ > > update_report_ts(); > > > > + spin_lock(&watchdog_output_lock); > > The hrtimer callout function is called from [soft]irq context, yes? > Doesn't lockdep get upset when we take a spinlock in such a context? I don't think so, but I could certainly be wrong. As far as I knew: * spin_lock_irqsave / spin_unlock_irqrestore: Always safe to call, whether from normal context or IRQ context. Saves whether IRQs were disabled or not, then disables them, then restores them to how they were before. * spin_lock_irq, spin_unlock_irq: Safe to call as long as you're _not_ in IRQ context (because spin_lock_irq disables IRQs and spin_unlock_irq unconditionally re-enables them) * spin_lock / spin_unlock: Safe to call as long as you _are_ in IRQ context. Being in IRQ context means IRQs are already disabled so the function doesn't need to disable/reenable them. Also safe to call if no users of this spinlock ever run in IRQ context. The reason for all that complexity is to specifically to avoid: 1. spin_lock(&my_lock); 2. IRQ fires off 3. In IRQ context: spin_lock(&my_lock); ...because that would make us spin forever. Just to be sure, I just tried with lockdep and things seemed OK. Specifically, I did: sysctl -w kernel.softlockup_panic=0 sysctl -w kernel.softlockup_all_cpu_backtrace=0 for i in $(seq 3); do echo SOFTLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT& done I got no lockdep splats and things seemed synchronized. Of course, the system was in pretty bad shape because the "provoke-crash" module never gets out of its soft lockup, but that's part for the course and why on our systems we run with kernel.softlockup_panic=1. I guess that also means I'd be happy with this now: Tested-by: Douglas Anderson <dianders@chromium.org>
On Wed, 22 Nov 2023 15:22:12 -0800, dianders@chromium.org wrote: >> >> On Wed, 22 Nov 2023 18:02:12 +0800 lizhe.67@bytedance.com wrote: >> >> > From: Li Zhe <lizhe.67@bytedance.com> >> > >> > If multiple CPUs trigger softlockup at the same time with >> > 'softlockup_all_cpu_backtrace=0', the softlockup's logs will appear >> > staggeredly in dmesg, which will affect the viewing of the logs for >> > developer. Since the code path for outputting softlockup logs is not >> > a kernel hotspot and the performance requirements for the code are >> > not strict, locks are used to serialize the softlockup log output to >> > improve the readability of the logs. >> >> Seems reasonable, but... >> >> > --- a/kernel/watchdog.c >> > +++ b/kernel/watchdog.c >> > @@ -28,6 +28,8 @@ >> > #include <linux/kvm_para.h> >> > >> > static DEFINE_MUTEX(watchdog_mutex); >> > +/* This lock is used to prevent concurrent actions of softlockup output logs */ >> > +static DEFINE_SPINLOCK(watchdog_output_lock); >> >> It would be a little neater to reduce the scope of this - move the >> definition into that little code block in watchdog_timer_fn() where it >> is actually used. > >That's where Li Zhe had it in v1. I actually requested it move to a >larger scope since I think we'll want a future patch that also uses >the same lock for hardlockup and I was thinking of sending that out >after Li Zhe's patch lands. Specifically, I've seen cases on arm64 >when pseudo-NMI isn't enabled that we can end up with a soft lockup >triggering while we're waiting for a backtrace of a hardlockup (since >we wait 10 seconds if a CPU isn't responding to the request to >backtrace it). In my opinion, I think both definition methods are OK. But for the original patch, it is more appropriate to put the definition inside the function. I will send a v3 patch. Looking forward to your patches for the hardlockup. > >I've also been testing hardlockup (with pseudo-NMI enabled) and I also >notice similar problems with hardlockup itself. Specifically, I'm >looking at a an output currently that looks like this: > >[ 102.619905] lkdtm: Performing direct entry HARDLOCKUP >[ 112.037900] Watchdog detected hard LOCKUP on cpu 1 >[ 112.038031] Watchdog detected hard LOCKUP on cpu 7 >[ 112.042872] Modules linked in: >[ 112.047807] Modules linked in: >... > >...but I'd also be OK with limiting the scope until I send my later >patch, so up to you. > > >> > #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HARDLOCKUP_DETECTOR_SPARC64) >> > # define WATCHDOG_HARDLOCKUP_DEFAULT 1 >> > @@ -514,6 +516,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) >> > /* Start period for the next softlockup warning. */ >> > update_report_ts(); >> > >> > + spin_lock(&watchdog_output_lock); >> >> The hrtimer callout function is called from [soft]irq context, yes? >> Doesn't lockdep get upset when we take a spinlock in such a context? > >I don't think so, but I could certainly be wrong. As far as I knew: > >* spin_lock_irqsave / spin_unlock_irqrestore: Always safe to call, >whether from normal context or IRQ context. Saves whether IRQs were >disabled or not, then disables them, then restores them to how they >were before. > >* spin_lock_irq, spin_unlock_irq: Safe to call as long as you're _not_ >in IRQ context (because spin_lock_irq disables IRQs and >spin_unlock_irq unconditionally re-enables them) > >* spin_lock / spin_unlock: Safe to call as long as you _are_ in IRQ >context. Being in IRQ context means IRQs are already disabled so the >function doesn't need to disable/reenable them. Also safe to call if >no users of this spinlock ever run in IRQ context. > >The reason for all that complexity is to specifically to avoid: > >1. spin_lock(&my_lock); >2. IRQ fires off >3. In IRQ context: spin_lock(&my_lock); > >...because that would make us spin forever. > I can't aggree with you more. > >Just to be sure, I just tried with lockdep and things seemed OK. >Specifically, I did: > >sysctl -w kernel.softlockup_panic=0 >sysctl -w kernel.softlockup_all_cpu_backtrace=0 >for i in $(seq 3); do > echo SOFTLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT& >done > >I got no lockdep splats and things seemed synchronized. Of course, the >system was in pretty bad shape because the "provoke-crash" module >never gets out of its soft lockup, but that's part for the course and >why on our systems we run with kernel.softlockup_panic=1. I guess that >also means I'd be happy with this now: I also did the same test, things seemed OK. > >Tested-by: Douglas Anderson <dianders@chromium.org> Thank you very much for your test!
Hi, On Wed, Nov 22, 2023 at 2:02 AM <lizhe.67@bytedance.com> wrote: > > From: Li Zhe <lizhe.67@bytedance.com> > > If multiple CPUs trigger softlockup at the same time with > 'softlockup_all_cpu_backtrace=0', the softlockup's logs will appear > staggeredly in dmesg, which will affect the viewing of the logs for > developer. Since the code path for outputting softlockup logs is not > a kernel hotspot and the performance requirements for the code are > not strict, locks are used to serialize the softlockup log output to > improve the readability of the logs. > > Signed-off-by: Li Zhe <lizhe.67@bytedance.com> > --- > Changelogs: > > v1->v2: > - define the lock outside the scope of function > - add precondition 'softlockup_all_cpu_backtrace=0' in commit message > > kernel/watchdog.c | 4 ++++ > 1 file changed, 4 insertions(+) This seems like a useful improvement to me and the code looks good. Reviewed-by: Douglas Anderson <dianders@chromium.org>
© 2016 - 2025 Red Hat, Inc.