From nobody Thu Apr 9 23:26:10 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 997E93CE4A5 for ; Thu, 5 Mar 2026 16:16:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772727378; cv=none; b=d7kAmRZIZ9SSZK62TdaUQX5vZxsMXrWRN34hsKrwc4rMpa1jBAOLtom8aNU2M3PiKYJs5CX4n/Bpm38jQDCqomYiYHafSR7uQ40iOoeQPPXVsfnXNg+V5OiMPWuA37zpyNK44Jl9mBmVlwkk/HH6TmfFK89unnKsemWixkgYLxI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772727378; c=relaxed/simple; bh=S4vA+q+VwMb69TSVGfpJnrhNcwW7Ul9gdGBE+SKblT4=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=USwfLcd9r84urVfv2095aQOuFcK5fh3Iq1O2v3W+qDydOplq66r7XHIJABE9YftFYOfboFzH2y03NBMh8ZmKakIPL99tQJHh7WaNfCY+qBO9QW0M3Si5egJpDdKEwe6dHmZrUs5xrJ+j5hmKUz978+4ig3Og7DH3uuDp4fgI4mI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org; spf=none smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=cQlbazqd; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="cQlbazqd" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=d/8arRnvKp6yW6gD1fp68y2qdqj0ZVn0huQFIoPOgBA=; b=cQlbazqdLYW3dK5kVAM/xInQRI pWNhk2Nola+OSLX+xJUMLBn4McmEAH47i1qQlLeAx5wvIynL/nGE6GjL2xkXXXn56Y9r7ibkkMHMy ligRaXDbf+BVOd2qLakHiv+N5PnJGiaBpTkAns2RhW/CeLWokjB1ZJsOqxOA2Mi/IIKQcr022ZRis PNDgnRMrhYIzvgELuevE9yS49G3ZMBSSsnBWOQcwJs7uA5Owlh3a0WA9iyV8zUars49VNonP811Y4 toQ+Wk8UKcoFTTQ9za8HvksQe8K9U4GZHAObIdIhXTxsqc+553iUKoEXgJJgC3tCM5QUjcG9JylGz 7qL9rc1Q==; Received: from authenticated user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.94.2) (envelope-from ) id 1vyBMz-00GqrH-4z; Thu, 05 Mar 2026 16:16:13 +0000 From: Breno Leitao Date: Thu, 05 Mar 2026 08:15:41 -0800 Subject: [PATCH v2 5/5] workqueue: Add stall detector sample module Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260305-wqstall_start-at-v2-5-b60863ee0899@debian.org> References: <20260305-wqstall_start-at-v2-0-b60863ee0899@debian.org> In-Reply-To: <20260305-wqstall_start-at-v2-0-b60863ee0899@debian.org> To: Tejun Heo , Lai Jiangshan , Andrew Morton Cc: linux-kernel@vger.kernel.org, Omar Sandoval , Song Liu , Danielle Costantino , kasan-dev@googlegroups.com, Petr Mladek , kernel-team@meta.com, Breno Leitao X-Mailer: b4 0.15-dev-363b9 X-Developer-Signature: v=1; a=openpgp-sha256; l=4422; i=leitao@debian.org; h=from:subject:message-id; bh=S4vA+q+VwMb69TSVGfpJnrhNcwW7Ul9gdGBE+SKblT4=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBpqaw0SFZKUq3gvc8ICSsX75MQ8b9+vYkk6okaq u/ZIg1naCmJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaamsNAAKCRA1o5Of/Hh3 bQaDD/92Bw7gTZQBrb/gLXTqCDRVNvQjPb94AXKsIiL9VxSQl4UMzQwlRRaq2IryGPc0WwVu0WU MqA1tVPiXN3y52NaZa1dfrmqTXMdDgneU9SoDX/oA6kdyQA795ZqDQ0ydkG+KhbiHTCQ3vLd6dJ kS2k1rgHUy9pgTQzms+447827ZB4jTVG9JlsEa5auw33GwhOejoDZ65SXLs2w8x9DJycj3gRddj 3jis028FhGLIrfwx1Wfr4yoqvdVu+p6MmBVqvMiGJqOCERHOKf3SM9isnU6/bri1RbzrTh14Bkz 9PN05MLdmYm4KURoT0sKqvGh0vfWg5XuOdErred6fuhJ1xR48K+R7DT+qoA4hi50YhGRo4fVNog ESSMjYp4Z1mhNrhVj3sFdtfVgVhrkpwBO/saPOIcR3s3lMCrzXXA3VBA1wjma2If5sodbvg8HXa ymb1TgUo8JKBnS5eFMp6RITlZaxwD+3oJ7w4OAcJqht+RkmXjkJfsNKKXwZfkA5QIoA8gdRE9fA XgLC7F8CIrMgg/W0do6L8kGvPg1prYBZG+NBjQbum5h8bec/0Cy+wTETqyV99NQoPFhp1BYJrJI ZXwM/Eid4My9KJNBMzNBznvxd+ySTm9ozva7scZinZFMB0Kkwl17YLrC7/EWgBD3QutV2TG1qmG 1XsDxGyyJnV3cAw== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao Add a sample module under samples/workqueue/stall_detector/ that reproduces a workqueue stall caused by PF_WQ_WORKER misuse. The module queues two work items on the same per-CPU pool, then clears PF_WQ_WORKER and sleeps in wait_event_idle(), hiding from the concurrency manager and stalling the second work item indefinitely. This is useful for testing the workqueue watchdog stall diagnostics. Signed-off-by: Breno Leitao Acked-by: Song Liu --- samples/workqueue/stall_detector/Makefile | 1 + samples/workqueue/stall_detector/wq_stall.c | 98 +++++++++++++++++++++++++= ++++ 2 files changed, 99 insertions(+) diff --git a/samples/workqueue/stall_detector/Makefile b/samples/workqueue/= stall_detector/Makefile new file mode 100644 index 0000000000000..8849e85e95bb9 --- /dev/null +++ b/samples/workqueue/stall_detector/Makefile @@ -0,0 +1 @@ +obj-m +=3D wq_stall.o diff --git a/samples/workqueue/stall_detector/wq_stall.c b/samples/workqueu= e/stall_detector/wq_stall.c new file mode 100644 index 0000000000000..6f4a497b18814 --- /dev/null +++ b/samples/workqueue/stall_detector/wq_stall.c @@ -0,0 +1,98 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * wq_stall - Test module for the workqueue stall detector. + * + * Deliberately creates a workqueue stall so the watchdog fires and + * prints diagnostic output. Useful for verifying that the stall + * detector correctly identifies stuck workers and produces useful + * backtraces. + * + * The stall is triggered by clearing PF_WQ_WORKER before sleeping, + * which hides the worker from the concurrency manager. A second + * work item queued on the same pool then sits in the worklist with + * no worker available to process it. + * + * After ~30s the workqueue watchdog fires: + * BUG: workqueue lockup - pool cpus=3DN ... + * + * Build: + * make -C M=3Dsamples/workqueue/stall_detector modules + * + * Copyright (c) 2026 Meta Platforms, Inc. and affiliates. + * Copyright (c) 2026 Breno Leitao + */ + +#include +#include +#include +#include +#include + +static DECLARE_WAIT_QUEUE_HEAD(stall_wq_head); +static atomic_t wake_condition =3D ATOMIC_INIT(0); +static struct work_struct stall_work1; +static struct work_struct stall_work2; + +static void stall_work2_fn(struct work_struct *work) +{ + pr_info("wq_stall: second work item finally ran\n"); +} + +static void stall_work1_fn(struct work_struct *work) +{ + pr_info("wq_stall: first work item running on cpu %d\n", + raw_smp_processor_id()); + + /* + * Queue second item while we're still counted as running + * (pool->nr_running > 0). Since schedule_work() on a per-CPU + * workqueue targets raw_smp_processor_id(), item 2 lands on the + * same pool. __queue_work -> kick_pool -> need_more_worker() + * sees nr_running > 0 and does NOT wake a new worker. + */ + schedule_work(&stall_work2); + + /* + * Hide from the workqueue concurrency manager. Without + * PF_WQ_WORKER, schedule() won't call wq_worker_sleeping(), + * so nr_running is never decremented and no replacement + * worker is created. Item 2 stays stuck in pool->worklist. + */ + current->flags &=3D ~PF_WQ_WORKER; + + pr_info("wq_stall: entering wait_event_idle (PF_WQ_WORKER cleared)\n"); + pr_info("wq_stall: expect 'BUG: workqueue lockup' in ~30-60s\n"); + wait_event_idle(stall_wq_head, atomic_read(&wake_condition) !=3D 0); + + /* Restore so process_one_work() cleanup works correctly */ + current->flags |=3D PF_WQ_WORKER; + pr_info("wq_stall: woke up, PF_WQ_WORKER restored\n"); +} + +static int __init wq_stall_init(void) +{ + pr_info("wq_stall: loading\n"); + + INIT_WORK(&stall_work1, stall_work1_fn); + INIT_WORK(&stall_work2, stall_work2_fn); + schedule_work(&stall_work1); + + return 0; +} + +static void __exit wq_stall_exit(void) +{ + pr_info("wq_stall: unloading\n"); + atomic_set(&wake_condition, 1); + wake_up(&stall_wq_head); + flush_work(&stall_work1); + flush_work(&stall_work2); + pr_info("wq_stall: all work flushed, module unloaded\n"); +} + +module_init(wq_stall_init); +module_exit(wq_stall_exit); + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("Reproduce workqueue stall caused by PF_WQ_WORKER misus= e"); +MODULE_AUTHOR("Breno Leitao "); --=20 2.47.3