From nobody Tue Apr 7 16:21:13 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EEAFB39C013; Thu, 12 Mar 2026 23:22:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773357754; cv=none; b=HKEEDulgbnxD68AL2V7r+7Jiq09O3McBUPTaVNoEfOJxw3t6bXmYDh/ZpfV7N0DQkW3Lu9KMT4o2GfqVXpw842y9lfFvPmXkxGaPLe5TrdwYJ1NnGqsX8zxLTt6ZGdHVCnVvjGob4QQHIQ20W+HXXJq1QvnNBlkJXsVo2upWtrI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773357754; c=relaxed/simple; bh=eWuXaoMzdNBHWhbv2FBD2M6er0YUT2LHSrUYqdGHcfQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=omBWFgiILDSWGLrAfYNPV6/9/42GBtjHGSXE2gl3ZS7vVZMd3YF0jqddln4NLysjmtgur0BXgAlo645tHMHuiI6IPm4ho3znnDuxhS1RPv6QpUeOGJL9+zmUlLy4gjOx1W7EtDLBp+UVYnVW3bsD1yFw9yPIxHUrwgq15M293vE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oIyjnkWO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oIyjnkWO" Received: by smtp.kernel.org (Postfix) with ESMTPS id AFBBEC19425; Thu, 12 Mar 2026 23:22:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773357753; bh=eWuXaoMzdNBHWhbv2FBD2M6er0YUT2LHSrUYqdGHcfQ=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=oIyjnkWO2pJtzQIrdUe/necxKUbRa31Q2D5jwuGUsmxaUndAShtSmB/NfD9Kom62Q 8T4pPEif7fpPC2UltLchbw8jHmtwh1r9UcjXWordGmSljSyOuE+XWInKIxC4EHQmmn 1eZ0SDXpply/9YzNFmdSTJIrYvFmYQ8n3rM0kTScK3FPgtGxrKygVVvFt323hHW0qZ jRhiwK+05voRwARbtNwLHPkHMcta/K+1ahosbuap0A0K1uu79pxEh5O99GOolMaZgo VrDNP+3khJOk6SWadeqtH8RMH7U+Sy/PyjsPMiD/CGbm4fo0Z8eUwKZao/TwF4jJjo jPRy/7QIJ0/xA== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1ACC106ACFB; Thu, 12 Mar 2026 23:22:33 +0000 (UTC) From: Mayank Rungta via B4 Relay Date: Thu, 12 Mar 2026 16:22:03 -0700 Subject: [PATCH v2 2/5] watchdog: Update saved interrupts during check Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260312-hardlockup-watchdog-fixes-v2-2-45bd8a0cc7ed@google.com> References: <20260312-hardlockup-watchdog-fixes-v2-0-45bd8a0cc7ed@google.com> In-Reply-To: <20260312-hardlockup-watchdog-fixes-v2-0-45bd8a0cc7ed@google.com> To: Petr Mladek , Jinchao Wang , Yunhui Cui , Stephane Eranian , Ian Rogers , Li Huafei , Feng Tang , Max Kellermann , Jonathan Corbet , Douglas Anderson , Andrew Morton , Florian Delizy , Shuah Khan Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Mayank Rungta X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1773357752; l=2486; i=mrungta@google.com; s=20260212; h=from:subject:message-id; bh=M8TFUUsd+x02qNZV27ER4ewjfXgDz4uo0zb2OpCd5AY=; b=c1BEltI1CYPBMfFHi5jcw831jLPMgod4QJfhUT9xrFyjz3NldwgfjwNdNDMWxW2HPjQ0sE4J1 AmprlHBsxMWCOna15znVCFqclrRZKFGioExw5mgmR06dtLbmlgCbqf9 X-Developer-Key: i=mrungta@google.com; a=ed25519; pk=2Bjwbv/ibL10QnyvK9G7DoKpffXy7z6+M4NawEYgYDI= X-Endpoint-Received: by B4 Relay for mrungta@google.com/20260212 with auth_id=634 X-Original-From: Mayank Rungta Reply-To: mrungta@google.com From: Mayank Rungta Currently, arch_touch_nmi_watchdog() causes an early return that skips updating hrtimer_interrupts_saved. This leads to stale comparisons and delayed lockup detection. I found this issue because in our system the serial console is fairly chatty. For example, the 8250 console driver frequently calls touch_nmi_watchdog() via console_write(). If a CPU locks up after a timer interrupt but before next watchdog check, we see the following sequence: * watchdog_hardlockup_check() saves counter (e.g., 1000) * Timer runs and updates the counter (1001) * touch_nmi_watchdog() is called * CPU locks up * 10s pass: check() notices touch, returns early, skips update * 10s pass: check() saves counter (1001) * 10s pass: check() finally detects lockup This delays detection to 30 seconds. With this fix, we detect the lockup in 20 seconds. Reviewed-by: Douglas Anderson Signed-off-by: Mayank Rungta Reviewed-by: Petr Mladek --- kernel/watchdog.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 4c5b47495745..431c540bd035 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -159,21 +159,28 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu) per_cpu(watchdog_hardlockup_touched, cpu) =3D true; } =20 -static bool is_hardlockup(unsigned int cpu) +static void watchdog_hardlockup_update(unsigned int cpu) { int hrint =3D atomic_read(&per_cpu(hrtimer_interrupts, cpu)); =20 - if (per_cpu(hrtimer_interrupts_saved, cpu) =3D=3D hrint) - return true; - /* * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is * written/read by a single CPU. */ per_cpu(hrtimer_interrupts_saved, cpu) =3D hrint; +} + +static bool is_hardlockup(unsigned int cpu) +{ + int hrint =3D atomic_read(&per_cpu(hrtimer_interrupts, cpu)); + + if (per_cpu(hrtimer_interrupts_saved, cpu) !=3D hrint) { + watchdog_hardlockup_update(cpu); + return false; + } =20 - return false; + return true; } =20 static void watchdog_hardlockup_kick(void) @@ -191,6 +198,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct= pt_regs *regs) unsigned long flags; =20 if (per_cpu(watchdog_hardlockup_touched, cpu)) { + watchdog_hardlockup_update(cpu); per_cpu(watchdog_hardlockup_touched, cpu) =3D false; return; } --=20 2.53.0.851.ga537e3e6e9-goog