From nobody Mon Feb 9 09:23:04 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30F1B399030 for ; Fri, 6 Feb 2026 11:18:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770376708; cv=none; b=cT2ufnT+nG9R+bCAUF24BKPJMYyz4DlYYoVcqy3op20OHwJ1Rxw3Pj5k/R3OzwIOG1M79dCDUs0amtx+2bOiLYJQ7NfnCrAso95DnIgPoK+2fffUJOaH0UAEDn3v8v/3bOmSA/sikqsUgp9LY1xaISR0fQnJ+xmPnBv6TTQdAzs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770376708; c=relaxed/simple; bh=oP1fMXu7Ml3Pv0KQYb02zk8C5FyNKLre1lEx/vGZM6s=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=O6vVx0+tX7ypTBz2SZsbESXg2YgxMMM4gsRN+WSLGWYRKTChlHyBmEhQYKcBArFJuuZx2EmUqyh1qgdwCGkuvzCM5cpruAiXSYUY3XJkYPI1MMcdUGA16YG6xcGcjkVTBgPAqlJq2Xx1amUu/6TTLR+4xD67wAU1nJsfwZPrpAE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org; spf=none smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=Oy2azL4U; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="Oy2azL4U" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=K7k7Ri0CFTVX6xyL5nHoWvVzmO11bqmRoAI5wypx3tE=; b=Oy2azL4U5Z36VZwFAD+Qv56Pk7 3fl9z9z6gI1lvFPqTBbzp1P5wPEm3g03/jJj0HCmYKZs1osjFFUOskGj0IJS2/i3xX/TGnPz0yA0c JPq5C+8MHHH3/oFCceYp0f8fN9X3AUA0lNnzDCSCOMBsmu0tCWLEwv6xKWzYXZND38XVM2iBc14DI hzhVi3Xn1fUQP2kcx99/D4hl9tgfXPcFRWQFIRzuQwjk3SGMP87TedquP/IOiQ+PJCtcRXqt2kdSp W+BnfjBjEcTwtB27D9zdOdVEzz7SXgNP3mpAeN/cvhnqqHcpTgAiagx7co2HUccCjaQJeRug7eEvb jXQtfWpg==; Received: from authenticated user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.94.2) (envelope-from ) id 1voJqs-0077aJ-Jv; Fri, 06 Feb 2026 11:18:19 +0000 From: Breno Leitao Date: Fri, 06 Feb 2026 03:18:01 -0800 Subject: [PATCH 1/2] workqueue: add time-based panic for stalls Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260206-wqstall_panic_time-v1-1-f2a21d5d87a4@debian.org> References: <20260206-wqstall_panic_time-v1-0-f2a21d5d87a4@debian.org> In-Reply-To: <20260206-wqstall_panic_time-v1-0-f2a21d5d87a4@debian.org> To: Tejun Heo , Lai Jiangshan Cc: linux-kernel@vger.kernel.org, osandov@osandov.com, rneu@meta.com, Breno Leitao , kernel-team@meta.com X-Mailer: b4 0.15-dev-f4305 X-Developer-Signature: v=1; a=openpgp-sha256; l=4533; i=leitao@debian.org; h=from:subject:message-id; bh=oP1fMXu7Ml3Pv0KQYb02zk8C5FyNKLre1lEx/vGZM6s=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBphc3zB9xnc7eaFrWbC8u27Ay0RST+fInDwhEfX 8VCRKqa2LKJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaYXN8wAKCRA1o5Of/Hh3 bc7DEACT1gq7MaGYvsdnCPrerbh5AMnMpHY9GSBZHRqERO1to/2FMORdF3HxQFudyRCIppFo4JM afHKS1SoayGvMkcH93+fCfkaGicAfDxcU+MWS+RD0PNo8r1H9wh2eqkhUwVy5KFesvCbDufz2Y7 bEUkGxH0lSI3iMN1a/6FMrfJbR6bsuRGP/OgVpvJ0FdWE2tbugvyDJD8tTuDvaOV0tOnYFWsZur 25ieoCDYAlOmS+726d0pQ+UfyMi58XJz6lbeap5jLQyMaKZqjJm23brHWHildMmuRxbRxE6yg6c Qh9fQYcFSPQ3x6mBPcCz4l8P5d8sg3hGuWFTF3YYAIPo++GWnxB8DdUot48n1zP+feMgVi2nETp cUYZtB35/M1ctfOPeQx/DjxOyVpDkqLt2rBs7AF68RRWg8Idqvkhjf0H16gDp0dLRACDjkZgLVF /yDOYi2l+0423C/aPR6LR6uCnwq9wBycR4VZWzFBJPk/1ca1zdPTK/vhWDf7rAgXor4ii7PbCNq 0LDCILWEemXydo0sXZ5Gcl0NheCjt4KxE12QHhuYYFOLpFOQw6F0Uy/4T/B3gBa3TnCI9zmQD5t nl7SKj9g2Aqw2I4OeDDoQUOJj5e/nV1mevB51FCi62QIlzUbnSl5a7l/soGV7Jfq6gm3wDdjers nqmWa7OPz9hl/1Q== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao Add a new module parameter 'panic_on_stall_time' that triggers a panic when a workqueue stall persists for longer than the specified duration in seconds. Unlike 'panic_on_stall' which counts accumulated stall events, this parameter triggers based on the duration of a single continuous stall. This is useful for catching truly stuck workqueues rather than accumulating transient stalls. Usage: workqueue.panic_on_stall_time=3D120 This would panic if any workqueue pool has been stalled for 120 seconds or more. The stall duration is measured from the workqueue last progress (poll_ts) which accounts for legitimate system stalls. Signed-off-by: Breno Leitao --- Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++ kernel/workqueue.c | 22 ++++++++++++++++++---- 2 files changed, 26 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 1058f2a6d6a8c..a2953cf6c4038 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -8373,6 +8373,14 @@ Kernel parameters =20 The default is 0, which disables the panic on stall. =20 + workqueue.panic_on_stall_time=3D + Panic when a workqueue stall has been continuous for + the specified number of seconds. Unlike panic_on_stall + which counts accumulated stall events, this triggers + based on the duration of a single continuous stall. + + The default is 0, which disables the time-based panic. + workqueue.cpu_intensive_thresh_us=3D Per-cpu work items which run for longer than this threshold are automatically considered CPU intensive diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 253311af47c6d..6f63899dd6317 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -7508,6 +7508,10 @@ static DEFINE_PER_CPU(unsigned long, wq_watchdog_tou= ched_cpu) =3D INITIAL_JIFFIES; static unsigned int wq_panic_on_stall; module_param_named(panic_on_stall, wq_panic_on_stall, uint, 0644); =20 +static unsigned int wq_panic_on_stall_time; +module_param_named(panic_on_stall_time, wq_panic_on_stall_time, uint, 0644= ); +MODULE_PARM_DESC(panic_on_stall_time, "Panic if stall exceeds this many se= conds (0=3Ddisabled)"); + /* * Show workers that might prevent the processing of pending work items. * The only candidates are CPU-bound workers in the running state. @@ -7559,7 +7563,12 @@ static void show_cpu_pools_hogs(void) rcu_read_unlock(); } =20 -static void panic_on_wq_watchdog(void) +/* + * It triggers a panic in two scenarios: when the total number of stalls + * exceeds a threshold, and when a stall lasts longer than + * wq_panic_on_stall_time + */ +static void panic_on_wq_watchdog(unsigned int stall_time_sec) { static unsigned int wq_stall; =20 @@ -7567,6 +7576,8 @@ static void panic_on_wq_watchdog(void) wq_stall++; BUG_ON(wq_stall >=3D wq_panic_on_stall); } + + BUG_ON(wq_panic_on_stall_time && stall_time_sec >=3D wq_panic_on_stall_ti= me); } =20 static void wq_watchdog_reset_touched(void) @@ -7581,10 +7592,12 @@ static void wq_watchdog_reset_touched(void) static void wq_watchdog_timer_fn(struct timer_list *unused) { unsigned long thresh =3D READ_ONCE(wq_watchdog_thresh) * HZ; + unsigned int max_stall_time =3D 0; bool lockup_detected =3D false; bool cpu_pool_stall =3D false; unsigned long now =3D jiffies; struct worker_pool *pool; + unsigned int stall_time; int pi; =20 if (!thresh) @@ -7618,14 +7631,15 @@ static void wq_watchdog_timer_fn(struct timer_list = *unused) /* did we stall? */ if (time_after(now, ts + thresh)) { lockup_detected =3D true; + stall_time =3D jiffies_to_msecs(now - pool_ts) / 1000; + max_stall_time =3D max(max_stall_time, stall_time); if (pool->cpu >=3D 0 && !(pool->flags & POOL_BH)) { pool->cpu_stall =3D true; cpu_pool_stall =3D true; } pr_emerg("BUG: workqueue lockup - pool"); pr_cont_pool_info(pool); - pr_cont(" stuck for %us!\n", - jiffies_to_msecs(now - pool_ts) / 1000); + pr_cont(" stuck for %us!\n", stall_time); } =20 =20 @@ -7638,7 +7652,7 @@ static void wq_watchdog_timer_fn(struct timer_list *u= nused) show_cpu_pools_hogs(); =20 if (lockup_detected) - panic_on_wq_watchdog(); + panic_on_wq_watchdog(max_stall_time); =20 wq_watchdog_reset_touched(); mod_timer(&wq_watchdog_timer, jiffies + thresh); --=20 2.47.3 From nobody Mon Feb 9 09:23:04 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E2593A0B2A for ; Fri, 6 Feb 2026 11:18:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770376711; cv=none; b=ryYgXZlQZy8bcZ8HtyKNqfev20YEznahwKjKr/RfrBA2Cei2iXK/F5+Ye9ptePpZGw4Xo5tc1KfC2hb9SYVz3vDqHBU4qe7ZNcGwz+ffMPtOQiMQhTM62CNJJoZuOnpwpbqO2aPySYXKHKuOpbZ6Z5fNQD3useSpqgCkmy4UN3E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770376711; c=relaxed/simple; bh=V/jDZdY2a2x9VKVaT+HOgDdYxx2fx37Ib0WZgygLBV0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=fkYqFGUlqE1aPEaa8PVGjO5WytrWXGkUkc19f24D6cw0+SV6Vp75f7Ls+1omK/JYZ7ICk6p3P721iltDMVWbw/NqpyqZZyi6dyeErbDCiKlk6xd1nM9RfjixkfrGWd3zHolsN5rZQjErJ983owd0ABRtyorr4gGptf6Ac4RT1uI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org; spf=none smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=tjkXMfNL; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="tjkXMfNL" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=NPa0rZndPCc94/FxYtt9K3I5hLAfnA+wDRnQ3Iwl8z8=; b=tjkXMfNLP8IWSDNmWGEHJrGfYb 3dQNDBBCL56op6sXBFN0VVERe2H10tNEEXq6uEKLFoCU4+S39vCr/+9boFV0JkcGBY2tH4U26PxP6 KVmgr2Pu1qYOYe/7522I8g2HyCCXsLG44jqGo47dhnJgHGOhLc8g3GgcHPOOyaQ49uXLbrrebs8mk bfc/kXG+CqJ9a8puFH6hLC3Sfa31Zcwr8nbMj2HSZer4qfWb+52RkTYBknLCgoN81rKqpTp/VDiZh HvbpXJIKJDohqZ2aq+j6VeFMAYREY9zoJPT4DXS20onOvrrLYmiXnXAUWyB+FQdQvqBHwxqD2knlF 5b/W7k1g==; Received: from authenticated user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.94.2) (envelope-from ) id 1voJqw-0077ap-6l; Fri, 06 Feb 2026 11:18:22 +0000 From: Breno Leitao Date: Fri, 06 Feb 2026 03:18:02 -0800 Subject: [PATCH 2/2] workqueue: replace BUG_ON with panic in panic_on_wq_watchdog Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260206-wqstall_panic_time-v1-2-f2a21d5d87a4@debian.org> References: <20260206-wqstall_panic_time-v1-0-f2a21d5d87a4@debian.org> In-Reply-To: <20260206-wqstall_panic_time-v1-0-f2a21d5d87a4@debian.org> To: Tejun Heo , Lai Jiangshan Cc: linux-kernel@vger.kernel.org, osandov@osandov.com, rneu@meta.com, Breno Leitao , kernel-team@meta.com X-Mailer: b4 0.15-dev-f4305 X-Developer-Signature: v=1; a=openpgp-sha256; l=1391; i=leitao@debian.org; h=from:subject:message-id; bh=V/jDZdY2a2x9VKVaT+HOgDdYxx2fx37Ib0WZgygLBV0=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBphc3zwBAzoF+Gc+qu+m3GXzXVxn3US3/nuondZ 9MxBscgjvSJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaYXN8wAKCRA1o5Of/Hh3 bbWGEACDdfI6iTL7TXxwsaPsJV2Ivk7VtbEstepCHJMuC1IrdhmpIFRzP7zT66K44qq36NdH7WX vrdwHx+p8ezkBGjXeQmportIRT4bjmzLeWJ4kC9U5aHJf58386nazO98D/XZ/IA2XJd6eMdjIKS Y2N07arnYGQZgP25uIOs6dv1nS0GdGoCl3L694OaJKRTYX3mk6hn/kyy/zFuTOAssrEBS6vwVeN lKsN8s7j8iQH0EQI/TIOpOM0HMlnk+FyIzw5GEZrsJlBywqzOZMRLX+vJsKiSlDNFh0EX5huN+v H/7UmILwKDluhGBxFiX8eSSRzifaW/9IyEl7xb6z4oi+Zk8PUN0XoJGPWmQNJI1HbRqRP959Hpd QAQ4zZotO+1ao/PccNb8FiKbjAGo3j7qvuPdVRbaz8nUBZXrzD2hBTClZdlNvJGQ3ggUxDCpITF 9SpLwW+osLQAT9W71DY/t1wST66hFdi6CODahglSkizrgi1J4E8rOm31z/Qodz+0CkxE40oQpjR XqmPcZvjtbA/O5aWb4klzGFqRW2s5cSJNlLXweJALTUKxMkF+4OF0G0Ad4+261OlzUMzsjwoOsz ZrWYrFDSPj88A5XhctkB9NFj07Pa2tGR4qWziZZ47GFyO9sIinzuyw63TOVKduDgkLnKsha/zTy i6cgIrk2psGYhFA== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao Replace BUG_ON() with panic() in panic_on_wq_watchdog(). This is not a bug condition but a deliberate forced panic requested by the user via module parameters to crash the system for debugging purposes. Using panic() instead of BUG_ON() makes this intent clearer and provides more informative output about which threshold was exceeded and the actual values, making it easier to diagnose the stall condition from crash dumps. Signed-off-by: Breno Leitao --- kernel/workqueue.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 6f63899dd6317..754833b383fe9 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -7574,10 +7574,14 @@ static void panic_on_wq_watchdog(unsigned int stall= _time_sec) =20 if (wq_panic_on_stall) { wq_stall++; - BUG_ON(wq_stall >=3D wq_panic_on_stall); + if (wq_stall >=3D wq_panic_on_stall) + panic("workqueue: %u stall(s) exceeded threshold %u\n", + wq_stall, wq_panic_on_stall); } =20 - BUG_ON(wq_panic_on_stall_time && stall_time_sec >=3D wq_panic_on_stall_ti= me); + if (wq_panic_on_stall_time && stall_time_sec >=3D wq_panic_on_stall_time) + panic("workqueue: stall lasted %us, exceeding threshold %us\n", + stall_time_sec, wq_panic_on_stall_time); } =20 static void wq_watchdog_reset_touched(void) --=20 2.47.3