From nobody Sat Apr 4 00:12:46 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC9B81C5D44 for ; Sun, 22 Mar 2026 03:30:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774150253; cv=none; b=dYm5T0prURJccDlEPWoxJodXpEIpfcAfJlWtl8blfTFj6dGPPs0UEENq3TgaQKI7JHytvxBkrC9jdQQ3CXLO9mSFKC2io0Su4m17DzLPAl5Z9rU/apjBSY6um3+51tq0kvMmmwcqQGe5U9uEdFQhSiOI2BTP+LOmMDv4BWrYw20= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774150253; c=relaxed/simple; bh=7Y6N9wlnG6cUUGNSOZ3DTOn2S493tbAkATLf1DpJVdQ=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=BbUg0EfcNAWuZUMlK+l7T2XhSOnP4BeHufZZ2LaxlGSGhVeLVxrIB8Cn/hG5W2uQGafXPij/Tz8F+hIZiP8zZxUrQQcKGLYqVLNoTjKYAGgzpqyCuY470ujfr2rga/iWrOgdk8wrDP9NjJxzkbXXPIaoezlmNcWsJ3ihqRq1plY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CmN79vH3; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CmN79vH3" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6FD28C2BC87; Sun, 22 Mar 2026 03:30:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774150253; bh=7Y6N9wlnG6cUUGNSOZ3DTOn2S493tbAkATLf1DpJVdQ=; h=From:To:Cc:Subject:Date:From; b=CmN79vH3iLMwcc5QRPfCdia+nu21IpeE8quS9xWy6Xg39vSUHalN/6g0DBmiueY0U Sf8pzjMXnXikxaqYgn89pQ99HI4FcU+qbK5WPY889SESNMjbZqSgmsLvA5xEAAGX3w kYFaju4ZVDeqib0kYIGrYFE1Zf+uyEFXqyCbfn8kvYmHtaYYu73W9GwNm2+7NTdSZB SwkTsccE5DayEA70J3GTPAJi09rAwnP56w+bEolJBVfnPnw4SRMleXFQngJ2KzJf7k 5pv9pgOk4CktPNlwYfcQn12JUo7XBj3vl7z7o2r1b0il5PyqE02Miua9C5AWfaDfMT nt0pZQ3DD3GFw== From: Song Liu To: linux-kernel@vger.kernel.org Cc: tj@kernel.org, jiangshanlai@gmail.com, leitao@debian.org, pmladek@suse.com, kernel-team@meta.com, puranjay@kernel.org, Song Liu Subject: [PATCH v2] workqueue: Fix false positive stall reports Date: Sat, 21 Mar 2026 20:30:45 -0700 Message-ID: <20260322033045.3405807-1-song@kernel.org> X-Mailer: git-send-email 2.52.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On weakly ordered architectures (e.g., arm64), the lockless check in wq_watchdog_timer_fn() can observe a reordering between the worklist insertion and the last_progress_ts update. Specifically, the watchdog can see a non-empty worklist (from a list_add) while reading a stale last_progress_ts value, causing a false positive stall report. This was confirmed by reading pool->last_progress_ts again after holding pool->lock in wq_watchdog_timer_fn(): workqueue watchdog: pool 7 false positive detected! lockless_ts=3D4784580465 locked_ts=3D4785033728 diff=3D453263ms worklist_empty=3D0 To avoid slowing down the hot path (queue_work, etc.), recheck last_progress_ts with pool->lock held. This will eliminate the false positive with minimal overhead. Remove two extra empty lines in wq_watchdog_timer_fn() as we are on it. Assisted-by: claude-code:claude-opus-4-6 Signed-off-by: Song Liu Acked-by: Song Liu --- v1 -> v2: - Use scoped_guard() instead of manual raw_spin_lock/unlock (Tejun) - Drop READ_ONCE() for pool->last_progress_ts under pool->lock (Tejun) - Expand comment with reordering scenario and function names (Tejun) --- kernel/workqueue.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index b77119d71641..ff97b705f25e 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -7699,8 +7699,28 @@ static void wq_watchdog_timer_fn(struct timer_list *= unused) else ts =3D touched; =20 - /* did we stall? */ + /* + * Did we stall? + * + * Do a lockless check first. On weakly ordered + * architectures, the lockless check can observe a + * reordering between worklist insert_work() and + * last_progress_ts update from __queue_work(). Since + * __queue_work() is a much hotter path than the timer + * function, we handle false positive here by reading + * last_progress_ts again with pool->lock held. + */ if (time_after(now, ts + thresh)) { + scoped_guard(raw_spinlock_irqsave, &pool->lock) { + pool_ts =3D pool->last_progress_ts; + if (time_after(pool_ts, touched)) + ts =3D pool_ts; + else + ts =3D touched; + } + if (!time_after(now, ts + thresh)) + continue; + lockup_detected =3D true; stall_time =3D jiffies_to_msecs(now - pool_ts) / 1000; max_stall_time =3D max(max_stall_time, stall_time); @@ -7712,8 +7732,6 @@ static void wq_watchdog_timer_fn(struct timer_list *u= nused) pr_cont_pool_info(pool); pr_cont(" stuck for %us!\n", stall_time); } - - } =20 if (lockup_detected) --=20 2.52.0