From nobody Tue Apr 7 16:15:39 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2AA9A3B8927; Thu, 12 Mar 2026 23:22:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773357754; cv=none; b=J6saS17XJFUYQ9XI9BVzv4YiiCNKinQO013cBwB7Gzgs0itWDMwi86QUIvE6gyAJ+boUXuw9hRygovkHZBWxDfFqRV0tZbBCZ520fIG+uXZYTfkH1dJCSbh2stl11U+cjJzf/4xoYVcPLosnG+s6PRpDXFqZ2KcMkBlj9BV0IZI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773357754; c=relaxed/simple; bh=weK6V1BYIxAVEyIKnRkGftZezgurppXwxIQ5/lZBM6U=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=OnoOSM4U08rJUzoAT2Od0q/MeYTE4l1LvrHbuEiQRhMtCGQeEe6M+oa+JjWqh8ALYzDGLJFWqao0to2n4tCAcTpcEyRkQR82hHD6Y0PcNvwIRpXa/1uIa9lajhnN1tipwaOYbhIzV09kYG2G945ApnfV6Nu7hcmOOsSlShPr2Ys= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lFq/PF0z; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lFq/PF0z" Received: by smtp.kernel.org (Postfix) with ESMTPS id D91B9C2BCAF; Thu, 12 Mar 2026 23:22:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773357753; bh=weK6V1BYIxAVEyIKnRkGftZezgurppXwxIQ5/lZBM6U=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=lFq/PF0zHyVUZNrhmNobMGv1au7sw0L4N4HuKf1RbO41yv4ptU3KXu3VEG2aaYzM1 0Wv0oIXZyVK4cSuYlcKhHpchQJarGohsv9fkkjb5eMwHId4FMmy2rkSg9ID4ugsE6n d181v1/7SY3Rw1DkdrLR4tzO9b2Zg8IO1pPx46l5EBdO+qcJk5A18vavU5hbn6H0oZ 4N9oj0zHwcnQo20t6opADsob7iOlt7wAjNrZ5lyrruVVSCUuIVapvxSZ5+Ag0QiUwW G40N2ukoqzIDzWCdbSb87z8Xy6WkJnh8mwPzP6ZKRe8AKC899d0lLeuNslr8ejCfWH jEjKwKn05nj4A== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFE75106ACFC; Thu, 12 Mar 2026 23:22:33 +0000 (UTC) From: Mayank Rungta via B4 Relay Date: Thu, 12 Mar 2026 16:22:06 -0700 Subject: [PATCH v2 5/5] doc: watchdog: Document buddy detector Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260312-hardlockup-watchdog-fixes-v2-5-45bd8a0cc7ed@google.com> References: <20260312-hardlockup-watchdog-fixes-v2-0-45bd8a0cc7ed@google.com> In-Reply-To: <20260312-hardlockup-watchdog-fixes-v2-0-45bd8a0cc7ed@google.com> To: Petr Mladek , Jinchao Wang , Yunhui Cui , Stephane Eranian , Ian Rogers , Li Huafei , Feng Tang , Max Kellermann , Jonathan Corbet , Douglas Anderson , Andrew Morton , Florian Delizy , Shuah Khan Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Mayank Rungta X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1773357752; l=9259; i=mrungta@google.com; s=20260212; h=from:subject:message-id; bh=bwkL0Xl6QKw4NxlixI2yZ7jUdew4kR4CuvDfQemly9g=; b=c/GuPPZRvi3GB9QDz/HD7sY6yOSvVKiuSmTZ28Yk0sqRHK8ArHEaxCrlpGdx2drDzt0lxn1Rz hmcdXnQa1IKDsXNVIp2VGQQkqsiUmOGh1sX7fhi1ntCKP3pLxdwfciX X-Developer-Key: i=mrungta@google.com; a=ed25519; pk=2Bjwbv/ibL10QnyvK9G7DoKpffXy7z6+M4NawEYgYDI= X-Endpoint-Received: by B4 Relay for mrungta@google.com/20260212 with auth_id=634 X-Original-From: Mayank Rungta Reply-To: mrungta@google.com From: Mayank Rungta The current documentation generalizes the hardlockup detector as primarily NMI-perf-based and lacks details on the SMP "Buddy" detector. Update the documentation to add a detailed description of the Buddy detector, and also restructure the "Implementation" section to explicitly separate "Softlockup Detector", "Hardlockup Detector (NMI/Perf)", and "Hardlockup Detector (Buddy)". Clarify that the softlockup hrtimer acts as the heartbeat generator for both hardlockup mechanisms and centralize the configuration details in a "Frequency and Heartbeats" section. Reviewed-by: Douglas Anderson Signed-off-by: Mayank Rungta --- Documentation/admin-guide/lockup-watchdogs.rst | 149 +++++++++++++++++----= ---- 1 file changed, 101 insertions(+), 48 deletions(-) diff --git a/Documentation/admin-guide/lockup-watchdogs.rst b/Documentation= /admin-guide/lockup-watchdogs.rst index 1b374053771f..7ae7ce3abd2c 100644 --- a/Documentation/admin-guide/lockup-watchdogs.rst +++ b/Documentation/admin-guide/lockup-watchdogs.rst @@ -30,22 +30,23 @@ timeout is set through the confusingly named "kernel.pa= nic" sysctl), to cause the system to reboot automatically after a specified amount of time. =20 +Configuration +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +A kernel knob is provided that allows administrators to configure +this period. The "watchdog_thresh" parameter (default 10 seconds) +controls the threshold. The right value for a particular environment +is a trade-off between fast response to lockups and detection overhead. + Implementation =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 -The soft and hard lockup detectors are built on top of the hrtimer and -perf subsystems, respectively. A direct consequence of this is that, -in principle, they should work in any architecture where these -subsystems are present. +The soft lockup detector is built on top of the hrtimer subsystem. +The hard lockup detector is built on top of the perf subsystem +(on architectures that support it) or uses an SMP "buddy" system. =20 -A periodic hrtimer runs to generate interrupts and kick the watchdog -job. An NMI perf event is generated every "watchdog_thresh" -(compile-time initialized to 10 and configurable through sysctl of the -same name) seconds to check for hardlockups. If any CPU in the system -does not receive any hrtimer interrupt during that time the -'hardlockup detector' (the handler for the NMI perf event) will -generate a kernel warning or call panic, depending on the -configuration. +Softlockup Detector +------------------- =20 The watchdog job runs in a stop scheduling thread that updates a timestamp every time it is scheduled. If that timestamp is not updated @@ -55,53 +56,105 @@ will dump useful debug information to the system log, = after which it will call panic if it was instructed to do so or resume execution of other kernel code. =20 -The period of the hrtimer is 2*watchdog_thresh/5, which means it has -two or three chances to generate an interrupt before the hardlockup -detector kicks in. +Frequency and Heartbeats +------------------------ + +The hrtimer used by the softlockup detector serves a dual purpose: +it detects softlockups, and it also generates the interrupts +(heartbeats) that the hardlockup detectors use to verify CPU liveness. + +The period of this hrtimer is 2*watchdog_thresh/5. This means the +hrtimer has two or three chances to generate an interrupt before the +NMI hardlockup detector kicks in. + +Hardlockup Detector (NMI/Perf) +------------------------------ + +On architectures that support NMI (Non-Maskable Interrupt) perf events, +a periodic NMI is generated every "watchdog_thresh" seconds. + +If any CPU in the system does not receive any hrtimer interrupt +(heartbeat) during the "watchdog_thresh" window, the 'hardlockup +detector' (the handler for the NMI perf event) will generate a kernel +warning or call panic. + +**Detection Overhead (NMI):** + +The time to detect a lockup can vary depending on when the lockup +occurs relative to the NMI check window. Examples below assume a watchdog_= thresh of 10. + +* **Best Case:** The lockup occurs just before the first heartbeat is + due. The detector will notice the missing hrtimer interrupt almost + immediately during the next check. + + :: + + Time 100.0: cpu 1 heartbeat + Time 100.1: hardlockup_check, cpu1 stores its state + Time 103.9: Hard Lockup on cpu1 + Time 104.0: cpu 1 heartbeat never comes + Time 110.1: hardlockup_check, cpu1 checks the state again, should be t= he same, declares lockup + + Time to detection: ~6 seconds + +* **Worst Case:** The lockup occurs shortly after a valid interrupt + (heartbeat) which itself happened just after the NMI check. The next + NMI check sees that the interrupt count has changed (due to that one + heartbeat), assumes the CPU is healthy, and resets the baseline. The + lockup is only detected at the subsequent check. + + :: + + Time 100.0: hardlockup_check, cpu1 stores its state + Time 100.1: cpu 1 heartbeat + Time 100.2: Hard Lockup on cpu1 + Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as = state changed) + Time 120.0: hardlockup_check, cpu1 checks the state again, should be t= he same, declares lockup =20 -As explained above, a kernel knob is provided that allows -administrators to configure the period of the hrtimer and the perf -event. The right value for a particular environment is a trade-off -between fast response to lockups and detection overhead. + Time to detection: ~20 seconds =20 -Detection Overhead ------------------- +Hardlockup Detector (Buddy) +--------------------------- =20 -The hardlockup detector checks for lockups using a periodic NMI perf -event. This means the time to detect a lockup can vary depending on -when the lockup occurs relative to the NMI check window. +On architectures or configurations where NMI perf events are not +available (or disabled), the kernel may use the "buddy" hardlockup +detector. This mechanism requires SMP (Symmetric Multi-Processing). =20 -**Best Case:** -In the best case scenario, the lockup occurs just before the first -heartbeat is due. The detector will notice the missing hrtimer -interrupt almost immediately during the next check. +In this mode, each CPU is assigned a "buddy" CPU to monitor. The +monitoring CPU runs its own hrtimer (the same one used for softlockup +detection) and checks if the buddy CPU's hrtimer interrupt count has +increased. =20 -:: +To ensure timeliness and avoid false positives, the buddy system performs +checks at every hrtimer interval (2*watchdog_thresh/5, which is 4 seconds +by default). It uses a missed-interrupt threshold of 3. If the buddy's +interrupt count has not changed for 3 consecutive checks, it is assumed +that the buddy CPU is hardlocked (interrupts disabled). The monitoring +CPU will then trigger the hardlockup response (warning or panic). =20 - Time 100.0: cpu 1 heartbeat - Time 100.1: hardlockup_check, cpu1 stores its state - Time 103.9: Hard Lockup on cpu1 - Time 104.0: cpu 1 heartbeat never comes - Time 110.1: hardlockup_check, cpu1 checks the state again, should be the= same, declares lockup +**Detection Overhead (Buddy):** =20 - Time to detection: ~6 seconds +With a default check interval of 4 seconds (watchdog_thresh =3D 10): =20 -**Worst Case:** -In the worst case scenario, the lockup occurs shortly after a valid -interrupt (heartbeat) which itself happened just after the NMI check. -The next NMI check sees that the interrupt count has changed (due to -that one heartbeat), assumes the CPU is healthy, and resets the -baseline. The lockup is only detected at the subsequent check. +* **Best case:** Lockup occurs just before a check. + Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd). +* **Worst case:** Lockup occurs just after a check. + Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd). =20 -:: +**Limitations of the Buddy Detector:** =20 - Time 100.0: hardlockup_check, cpu1 stores its state - Time 100.1: cpu 1 heartbeat - Time 100.2: Hard Lockup on cpu1 - Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as st= ate changed) - Time 120.0: hardlockup_check, cpu1 checks the state again, should be the= same, declares lockup +1. **All-CPU Lockup:** If all CPUs lock up simultaneously, the buddy + detector cannot detect the condition because the monitoring CPUs + are also frozen. +2. **Stack Traces:** Unlike the NMI detector, the buddy detector + cannot directly interrupt the locked CPU to grab a stack trace. + It relies on architecture-specific mechanisms (like NMI backtrace + support) to try and retrieve the status of the locked CPU. If + such support is missing, the log may only show that a lockup + occurred without providing the locked CPU's stack. =20 - Time to detection: ~20 seconds +Watchdog Core Exclusion +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 By default, the watchdog runs on all online cores. However, on a kernel configured with NO_HZ_FULL, by default the watchdog runs only --=20 2.53.0.851.ga537e3e6e9-goog