From nobody Thu Apr 9 23:26:10 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0E1B3CE4BB for ; Thu, 5 Mar 2026 16:16:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772727374; cv=none; b=qpUnNvaTZOtbj+KfNJJCgeZg+wxL3kbo3DXYQZTWmOGcQKVIqxDBi6dM/3Rbetr/e81FFzc8jv0xjJEyNNX+TojunkW1I19BiqtkVUXhWvY0E7lVTkiJaVaoLV/vmRQjuVoMFFxmzhCDxiKiQC/cTeEQCAPzd4Bceiic1asS820= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772727374; c=relaxed/simple; bh=LiVumjyG7bZ7XQH4sr3/408yTD7fFBw48LSnf9KFbjI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=c4vGFa8dtRBG6VdIFWyQvcAxGSZMbrp5e1SvVFkGXIuGrmhWeQl0qb6npn2W6FBDtz/NTGjzO9ZEhiD+AnDqZJhi1E7EHo9I+8Zmi/8F9AH8n5En6FXmjZ/CWuNb4cr0Haf2q7g7CDeieB1twTTQT9Riao6I8RrxJJa7G6xKMF8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org; spf=none smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=aapduRpW; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="aapduRpW" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=jrNtcuD0+LCQ0GXSBo8d49WJRSGDNH67GbmqlwT7i6o=; b=aapduRpWUUO732dfqVD1GdU+cn vR41pUy2dsLD9ysIbQlk6elahXv2lIlSQkHy8oGQb4r3n9ZJnABxsZRBDNaQ8jOKrzSKMhC2jRJaY Yo/dCi9st768hkQOjqBae2HAL/tuaPjQfcsra6K+KMqioiMbQXggFzcMErAq1XrSyegBk3UIsTZG4 6OwXo7i5IWdYkIX3hX4KEf7U9Kuhgvrl0NUzDUD6j6c0dcxjDnvjH7uiwH/yoCnYv8imCL1eHdTTk OIn51HOEtLb/0EgNsqnfVvQlheJrdFnsiw2gPj05CG0c2jr6KP5oslZ3wwQjd+1Gr+FqGZHUGqhUE BNx/Ifwg==; Received: from authenticated user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.94.2) (envelope-from ) id 1vyBMv-00Gqr5-8S; Thu, 05 Mar 2026 16:16:09 +0000 From: Breno Leitao Date: Thu, 05 Mar 2026 08:15:40 -0800 Subject: [PATCH v2 4/5] workqueue: Show all busy workers in stall diagnostics Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260305-wqstall_start-at-v2-4-b60863ee0899@debian.org> References: <20260305-wqstall_start-at-v2-0-b60863ee0899@debian.org> In-Reply-To: <20260305-wqstall_start-at-v2-0-b60863ee0899@debian.org> To: Tejun Heo , Lai Jiangshan , Andrew Morton Cc: linux-kernel@vger.kernel.org, Omar Sandoval , Song Liu , Danielle Costantino , kasan-dev@googlegroups.com, Petr Mladek , kernel-team@meta.com, Breno Leitao X-Mailer: b4 0.15-dev-363b9 X-Developer-Signature: v=1; a=openpgp-sha256; l=2985; i=leitao@debian.org; h=from:subject:message-id; bh=LiVumjyG7bZ7XQH4sr3/408yTD7fFBw48LSnf9KFbjI=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBpqaw0+4KdZ1bZ/6RpgPjb98Fi8CGUtO1g7SggT on9De5+jP+JAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaamsNAAKCRA1o5Of/Hh3 bSaHD/0WfWVWoTO9dH+aT+BJlfv+wRrNM9nWl1eUAw5aSDz69XE2hBRkzbgWKF7ycmt+P5sliJx G5fdRj6tOfTf8YDTTbNudxf/qmYdZ7gAPQ4R75j7wxKEPoBA6Y23XjvF+eGwG8H1W1+qtE5hCzz 4MhqReGF5zXfvKkGWlW5rDnWlGvPJjG0U4aq0tnZQ2S7C9rz0MNL5Wt2Xf6iMMk12e1tfB6FSu9 d6/yBGagM/9aHFUVH4Mvt5RtVZW6IgFNehEnuc08Wr+HPD+tQ1+5spzRyiG93sgEv49arg8rPlS lcCErKK5lzG+5zUcv5VNqIFePO6oDxu/xKTDGc7Jqo7m1EigED7/r4yoI2wJIsmOKIVun54KxUr BsDOC5KUXDdZsppfkmm5jUnPzwyWyG1wJW4Vp3ozZ6R0yBj7W+GqkydIW0YVjrhrLXDFVdUOTU6 dfVM6w8m/FjsYJ6LNJ3/sjqJfbhSi7IK2VVHUbMwxzaeDYhPOp9fRqSVWXVyDzpHmrBugMxhbWv m3viWlEenK47t+E5fNh0OI4WK6fzquRtvcHmEIBTl/aV+5TTqGHuySOCJ+xTXEplYyWjrBeZNNb fXEUesWeDhbCI4OsIeZ+QP34npN7TlsEaaZPzz+Onf/HsJSCji9c2mjvdDGpcL5N0jcVIrTWqDU EyUOA2gEGX/pIrw== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao show_cpu_pool_hog() only prints workers whose task is currently running on the CPU (task_is_running()). This misses workers that are busy processing a work item but are sleeping or blocked =E2=80=94 for example, a worker that clears PF_WQ_WORKER and enters wait_event_idle(). Such a worker still occupies a pool slot and prevents progress, yet produces an empty backtrace section in the watchdog output. This is happening on real arm64 systems, where toggle_allocation_gate() IPIs every single CPU in the machine (which lacks NMI), causing workqueue stalls that show empty backtraces because toggle_allocation_gate() is sleeping in wait_event_idle(). Remove the task_is_running() filter so every in-flight worker in the pool's busy_hash is dumped. The busy_hash is protected by pool->lock, which is already held. Signed-off-by: Breno Leitao Acked-by: Song Liu --- kernel/workqueue.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 56d8af13843f8..09b9ad78d566c 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -7583,9 +7583,9 @@ MODULE_PARM_DESC(panic_on_stall_time, "Panic if stall= exceeds this many seconds =20 /* * Show workers that might prevent the processing of pending work items. - * The only candidates are CPU-bound workers in the running state. - * Pending work items should be handled by another idle worker - * in all other situations. + * A busy worker that is not running on the CPU (e.g. sleeping in + * wait_event_idle() with PF_WQ_WORKER cleared) can stall the pool just as + * effectively as a CPU-bound one, so dump every in-flight worker. */ static void show_cpu_pool_hog(struct worker_pool *pool) { @@ -7596,19 +7596,17 @@ static void show_cpu_pool_hog(struct worker_pool *p= ool) raw_spin_lock_irqsave(&pool->lock, irq_flags); =20 hash_for_each(pool->busy_hash, bkt, worker, hentry) { - if (task_is_running(worker->task)) { - /* - * Defer printing to avoid deadlocks in console - * drivers that queue work while holding locks - * also taken in their write paths. - */ - printk_deferred_enter(); + /* + * Defer printing to avoid deadlocks in console + * drivers that queue work while holding locks + * also taken in their write paths. + */ + printk_deferred_enter(); =20 - pr_info("pool %d:\n", pool->id); - sched_show_task(worker->task); + pr_info("pool %d:\n", pool->id); + sched_show_task(worker->task); =20 - printk_deferred_exit(); - } + printk_deferred_exit(); } =20 raw_spin_unlock_irqrestore(&pool->lock, irq_flags); @@ -7619,7 +7617,7 @@ static void show_cpu_pools_hogs(void) struct worker_pool *pool; int pi; =20 - pr_info("Showing backtraces of running workers in stalled CPU-bound worke= r pools:\n"); + pr_info("Showing backtraces of busy workers in stalled CPU-bound worker p= ools:\n"); =20 rcu_read_lock(); =20 --=20 2.47.3