kernel/watchdog.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
The watchdog may dectect false positive of softlockup because of stop
softlockup after update watchdog_thresh. The problem can be described as
follow:
# We asuume previous watchdog_thresh is 60, so the timer is coming every
# 24s.
echo 10 > /proc/sys/kernel/watchdog_thresh (User space)
|
+------>+ update watchdog_thresh (We are in kernel now)
|
|
+------>+ watchdog hrtimer (irq context: detect softlockup)
|
|
+-------+
|
|
+ softlockup_stop_all
As showed above, there is a window between update watchdog_thresh and
softlockup_stop_all. During this window, if a timer is coming, a false
positive of softlockup will happen. To fix this problem, use a shadow
variable to store the new value and write back to watchdog_thresh after
softlockup_stop_all.
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
---
kernel/watchdog.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 9fa2af9dbf2c..80e5a77e92fd 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -47,6 +47,7 @@ int __read_mostly watchdog_user_enabled = 1;
static int __read_mostly watchdog_hardlockup_user_enabled = WATCHDOG_HARDLOCKUP_DEFAULT;
static int __read_mostly watchdog_softlockup_user_enabled = 1;
int __read_mostly watchdog_thresh = 10;
+static int __read_mostly watchdog_thresh_shadow;
static int __read_mostly watchdog_hardlockup_available;
struct cpumask watchdog_cpumask __read_mostly;
@@ -876,6 +877,7 @@ static void __lockup_detector_reconfigure(void)
watchdog_hardlockup_stop();
softlockup_stop_all();
+ watchdog_thresh = READ_ONCE(watchdog_thresh_shadow);
set_sample_period();
lockup_detector_update_enable();
if (watchdog_enabled && watchdog_thresh)
@@ -1035,10 +1037,12 @@ static int proc_watchdog_thresh(const struct ctl_table *table, int write,
mutex_lock(&watchdog_mutex);
- old = READ_ONCE(watchdog_thresh);
+ watchdog_thresh_shadow = READ_ONCE(watchdog_thresh);
+
+ old = watchdog_thresh_shadow;
err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
- if (!err && write && old != READ_ONCE(watchdog_thresh))
+ if (!err && write && old != READ_ONCE(watchdog_thresh_shadow))
proc_watchdog_update();
mutex_unlock(&watchdog_mutex);
@@ -1080,7 +1084,7 @@ static const struct ctl_table watchdog_sysctls[] = {
},
{
.procname = "watchdog_thresh",
- .data = &watchdog_thresh,
+ .data = &watchdog_thresh_shadow,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_watchdog_thresh,
--
2.34.1
On Wed, 16 Apr 2025 01:39:22 +0000 Luo Gengkun <luogengkun@huaweicloud.com> wrote: > The watchdog may dectect false positive of softlockup because of stop > softlockup after update watchdog_thresh. The problem can be described as > follow: > > # We asuume previous watchdog_thresh is 60, so the timer is coming every > # 24s. > echo 10 > /proc/sys/kernel/watchdog_thresh (User space) > | > +------>+ update watchdog_thresh (We are in kernel now) > | > | > +------>+ watchdog hrtimer (irq context: detect softlockup) > | > | > +-------+ > | > | > + softlockup_stop_all > > As showed above, there is a window between update watchdog_thresh and > softlockup_stop_all. During this window, if a timer is coming, a false > positive of softlockup will happen. To fix this problem, use a shadow > variable to store the new value and write back to watchdog_thresh after > softlockup_stop_all. > Changelog is a bit hard to follow. I asked gemini.google.com to clean it up and it produced this: : The watchdog may detect a false positive softlockup due to stopping the : softlockup detection after updating `watchdog_thresh`. The problem can : be described as follows: : : ``` : # Assume the previous watchdog_thresh is 60, so the timer triggers every 24 seconds. : echo 10 > /proc/sys/kernel/watchdog_thresh (User space) : | : +------>+ Update watchdog_thresh (Kernel space) : | : | : +------>+ Watchdog hrtimer (irq context: detect softlockup) : | : | : +-------+ : | : | : + softlockup_stop_all : ``` : : As shown above, there is a window between updating `watchdog_thresh` : and `softlockup_stop_all`. During this window, if a timer triggers, a : false positive softlockup can occur. To fix this problem, a shadow : variable should be used to store the new value, and this value should : be written back to `watchdog_thresh` only after `softlockup_stop_all` : has completed. I don't know how accurate this is - please check&fix it and consider incorporating the result? > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -47,6 +47,7 @@ int __read_mostly watchdog_user_enabled = 1; > static int __read_mostly watchdog_hardlockup_user_enabled = WATCHDOG_HARDLOCKUP_DEFAULT; > static int __read_mostly watchdog_softlockup_user_enabled = 1; > int __read_mostly watchdog_thresh = 10; > +static int __read_mostly watchdog_thresh_shadow; > static int __read_mostly watchdog_hardlockup_available; > > struct cpumask watchdog_cpumask __read_mostly; > @@ -876,6 +877,7 @@ static void __lockup_detector_reconfigure(void) > watchdog_hardlockup_stop(); > > softlockup_stop_all(); > + watchdog_thresh = READ_ONCE(watchdog_thresh_shadow); I expect a reader of this code will wonder "what's all this watchdog_thresh_shadow stuff". Can you please add a few small comments to explain why we're doing this?
On 2025/4/16 10:31, Andrew Morton wrote: > On Wed, 16 Apr 2025 01:39:22 +0000 Luo Gengkun <luogengkun@huaweicloud.com> wrote: > >> The watchdog may dectect false positive of softlockup because of stop >> softlockup after update watchdog_thresh. The problem can be described as >> follow: >> >> # We asuume previous watchdog_thresh is 60, so the timer is coming every >> # 24s. >> echo 10 > /proc/sys/kernel/watchdog_thresh (User space) >> | >> +------>+ update watchdog_thresh (We are in kernel now) >> | >> | >> +------>+ watchdog hrtimer (irq context: detect softlockup) >> | >> | >> +-------+ >> | >> | >> + softlockup_stop_all >> >> As showed above, there is a window between update watchdog_thresh and >> softlockup_stop_all. During this window, if a timer is coming, a false >> positive of softlockup will happen. To fix this problem, use a shadow >> variable to store the new value and write back to watchdog_thresh after >> softlockup_stop_all. >> > Changelog is a bit hard to follow. I asked gemini.google.com to clean > it up and it produced this: > > : The watchdog may detect a false positive softlockup due to stopping the > : softlockup detection after updating `watchdog_thresh`. The problem can > : be described as follows: > : > : ``` > : # Assume the previous watchdog_thresh is 60, so the timer triggers every 24 seconds. > : echo 10 > /proc/sys/kernel/watchdog_thresh (User space) > : | > : +------>+ Update watchdog_thresh (Kernel space) > : | > : | > : +------>+ Watchdog hrtimer (irq context: detect softlockup) > : | > : | > : +-------+ > : | > : | > : + softlockup_stop_all > : ``` > : > : As shown above, there is a window between updating `watchdog_thresh` > : and `softlockup_stop_all`. During this window, if a timer triggers, a > : false positive softlockup can occur. To fix this problem, a shadow > : variable should be used to store the new value, and this value should > : be written back to `watchdog_thresh` only after `softlockup_stop_all` > : has completed. > > I don't know how accurate this is - please check&fix it and consider > incorporating the result? > >> --- a/kernel/watchdog.c >> +++ b/kernel/watchdog.c >> @@ -47,6 +47,7 @@ int __read_mostly watchdog_user_enabled = 1; >> static int __read_mostly watchdog_hardlockup_user_enabled = WATCHDOG_HARDLOCKUP_DEFAULT; >> static int __read_mostly watchdog_softlockup_user_enabled = 1; >> int __read_mostly watchdog_thresh = 10; >> +static int __read_mostly watchdog_thresh_shadow; >> static int __read_mostly watchdog_hardlockup_available; >> >> struct cpumask watchdog_cpumask __read_mostly; >> @@ -876,6 +877,7 @@ static void __lockup_detector_reconfigure(void) >> watchdog_hardlockup_stop(); >> >> softlockup_stop_all(); >> + watchdog_thresh = READ_ONCE(watchdog_thresh_shadow); > I expect a reader of this code will wonder "what's all this > watchdog_thresh_shadow stuff". Can you please add a few small comments > to explain why we're doing this? The following testcase may help to understand this problem. --------------------------------------------- echo RT_RUNTIME_SHARE > /sys/kernel/debug/sched/features echo -1 > /proc/sys/kernel/sched_rt_runtime_us echo 0 > /sys/kernel/debug/sched/fair_server/cpu3/runtime echo 60 > /proc/sys/kernel/watchdog_thresh taskset -c 3 chrt -r 99 ./test & echo 10 > /proc/sys/kernel/watchdog_thresh & --------------------------------------------- This testcase first cancels the throttle rt task. Change the value of watchdog_thresh to 60, and run a rt task on cpu3. The final command will be blocked because the kworker:3, smp_call_on_cpu will put work on it, cannot be selected by the scheduler due to the existence of real-time threads. A softlockup will be detected on cpu3 after a while. The reason is the watchdog_thresh is changed to 10, but the watchdog_timer_fn still runs every 24 seconds, because the kworker:3 has not chance to stop previous hrtimer on cpu3. The above case indicates that updating watchdog_thresh before doing softlockup_stop_all is insecure, although there is not rt task on cpu3, once the timer irq arrives, a false positive of softlockup is reported. What this patch does, is first update the new thresh to watchdog_thresh_shadow, which acts like a temporary variable, and write back to watchdog_thresh after softlockup_stop_all is finished. By this way, no more false positive case will be reported. >
On Wed, 16 Apr 2025 14:44:52 +0800 Luo Gengkun <luogengkun@huaweicloud.com> wrote: > The following testcase may help to understand this problem. OK. Please a) include this info in the v2 changelog b) consider whether you believe the fix should be backported into -stable kernels and if so, add a Cc: <stable@vger.kernel.org> and, if appropriate, add a Fixes: target c) Place a couple of comments in the code to help readers understand why we're doing this d) try to amke the changelog a little less puzzling ;) e) retest and resend! Thanks.
© 2016 - 2025 Red Hat, Inc.