show_cpu_pool_hog() only prints workers whose task is currently running
on the CPU (task_is_running()). This misses workers that are busy
processing a work item but are sleeping or blocked — for example, a
worker that clears PF_WQ_WORKER and enters wait_event_idle(). Such a
worker still occupies a pool slot and prevents progress, yet produces
an empty backtrace section in the watchdog output.
This is happening on real arm64 systems, where
toggle_allocation_gate() IPIs every single CPU in the machine (which
lacks NMI), causing workqueue stalls that show empty backtraces because
toggle_allocation_gate() is sleeping in wait_event_idle().
Remove the task_is_running() filter so every in-flight worker in the
pool's busy_hash is dumped. The busy_hash is protected by pool->lock,
which is already held.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/workqueue.c | 28 +++++++++++++---------------
1 file changed, 13 insertions(+), 15 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 56d8af13843f8..09b9ad78d566c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7583,9 +7583,9 @@ MODULE_PARM_DESC(panic_on_stall_time, "Panic if stall exceeds this many seconds
/*
* Show workers that might prevent the processing of pending work items.
- * The only candidates are CPU-bound workers in the running state.
- * Pending work items should be handled by another idle worker
- * in all other situations.
+ * A busy worker that is not running on the CPU (e.g. sleeping in
+ * wait_event_idle() with PF_WQ_WORKER cleared) can stall the pool just as
+ * effectively as a CPU-bound one, so dump every in-flight worker.
*/
static void show_cpu_pool_hog(struct worker_pool *pool)
{
@@ -7596,19 +7596,17 @@ static void show_cpu_pool_hog(struct worker_pool *pool)
raw_spin_lock_irqsave(&pool->lock, irq_flags);
hash_for_each(pool->busy_hash, bkt, worker, hentry) {
- if (task_is_running(worker->task)) {
- /*
- * Defer printing to avoid deadlocks in console
- * drivers that queue work while holding locks
- * also taken in their write paths.
- */
- printk_deferred_enter();
+ /*
+ * Defer printing to avoid deadlocks in console
+ * drivers that queue work while holding locks
+ * also taken in their write paths.
+ */
+ printk_deferred_enter();
- pr_info("pool %d:\n", pool->id);
- sched_show_task(worker->task);
+ pr_info("pool %d:\n", pool->id);
+ sched_show_task(worker->task);
- printk_deferred_exit();
- }
+ printk_deferred_exit();
}
raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
@@ -7619,7 +7617,7 @@ static void show_cpu_pools_hogs(void)
struct worker_pool *pool;
int pi;
- pr_info("Showing backtraces of running workers in stalled CPU-bound worker pools:\n");
+ pr_info("Showing backtraces of busy workers in stalled CPU-bound worker pools:\n");
rcu_read_lock();
--
2.47.3
On Thu 2026-03-05 08:15:40, Breno Leitao wrote: > show_cpu_pool_hog() only prints workers whose task is currently running > on the CPU (task_is_running()). This misses workers that are busy > processing a work item but are sleeping or blocked — for example, a > worker that clears PF_WQ_WORKER and enters wait_event_idle(). IMHO, it is misleading. AFAIK, workers clear PF_WQ_WORKER flag only when they are going to die. They never do so when going to sleep. > Such a > worker still occupies a pool slot and prevents progress, yet produces > an empty backtrace section in the watchdog output. > > This is happening on real arm64 systems, where > toggle_allocation_gate() IPIs every single CPU in the machine (which > lacks NMI), causing workqueue stalls that show empty backtraces because > toggle_allocation_gate() is sleeping in wait_event_idle(). The wait_event_idle() called in toggle_allocation_gate() should not cause a stall. The scheduler should call wq_worker_sleeping(tsk) and wake up another idle worker. It should guarantee the progress. > Remove the task_is_running() filter so every in-flight worker in the > pool's busy_hash is dumped. The busy_hash is protected by pool->lock, > which is already held. As I explained in reply to the cover letter, sleeping workers should not block forward progress. It seems that in this case, the system was not able to wake up the other idle worker or it was the last idle worker and was not able to fork a new one. IMHO, we should warn about this when there is no running worker. It might be more useful than printing backtraces of the sleeping workers because they likely did not cause the problem. I believe that the problem, in this particular situation, is that the system can't schedule or fork new processes. It might help to warn about it and maybe show backtrace of the currently running process on the stalled CPU. Anyway, I think we could do better here. And blindly printing backtraces from all workers would do more harm then good in most situations. Best Regards, Petr
On Thu, Mar 12, 2026 at 06:03:03PM +0100, Petr Mladek wrote:
> On Thu 2026-03-05 08:15:40, Breno Leitao wrote:
> > show_cpu_pool_hog() only prints workers whose task is currently running
> > on the CPU (task_is_running()). This misses workers that are busy
> > processing a work item but are sleeping or blocked — for example, a
> > worker that clears PF_WQ_WORKER and enters wait_event_idle().
>
> IMHO, it is misleading. AFAIK, workers clear PF_WQ_WORKER flag only
> when they are going to die. They never do so when going to sleep.
>
> > Such a
> > worker still occupies a pool slot and prevents progress, yet produces
> > an empty backtrace section in the watchdog output.
> >
> > This is happening on real arm64 systems, where
> > toggle_allocation_gate() IPIs every single CPU in the machine (which
> > lacks NMI), causing workqueue stalls that show empty backtraces because
> > toggle_allocation_gate() is sleeping in wait_event_idle().
>
> The wait_event_idle() called in toggle_allocation_gate() should not
> cause a stall. The scheduler should call wq_worker_sleeping(tsk)
> and wake up another idle worker. It should guarantee the progress.
>
> > Remove the task_is_running() filter so every in-flight worker in the
> > pool's busy_hash is dumped. The busy_hash is protected by pool->lock,
> > which is already held.
>
> As I explained in reply to the cover letter, sleeping workers should
> not block forward progress. It seems that in this case, the system was
> not able to wake up the other idle worker or it was the last idle
> worker and was not able to fork a new one.
>
> IMHO, we should warn about this when there is no running worker.
> It might be more useful than printing backtraces of the sleeping
> workers because they likely did not cause the problem.
>
> I believe that the problem, in this particular situation, is that
> the system can't schedule or fork new processes. It might help
> to warn about it and maybe show backtrace of the currently
> running process on the stalled CPU.
Do you mean checking if pool->busy_hash is empty, and then warning?
Commit fc36ad49ce7160907bcbe4f05c226595611ac293
Author: Breno Leitao <leitao@debian.org>
Date: Fri Mar 13 05:35:02 2026 -0700
workqueue: warn when stalled pool has no running workers
When the workqueue watchdog detects a pool stall and the pool's
busy_hash is empty (no workers executing any work item), print a
diagnostic warning with the pool state and trigger a backtrace of
the currently running task on the stalled CPU.
Signed-off-by: Breno Leitao <leitao@debian.org>
Suggested-by: Petr Mladek <pmladek@suse.com>
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6ee52ba9b14f7..d538067754123 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7655,6 +7655,17 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
raw_spin_lock_irqsave(&pool->lock, irq_flags);
+ if (hash_empty(pool->busy_hash)) {
+ raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
+
+ pr_info("pool %d: no running workers, cpu=%d is %s (nr_workers=%d nr_idle=%d)\n",
+ pool->id, pool->cpu,
+ idle_cpu(pool->cpu) ? "idle" : "busy",
+ pool->nr_workers, pool->nr_idle);
+ trigger_single_cpu_backtrace(pool->cpu);
+ return;
+ }
+
hash_for_each(pool->busy_hash, bkt, worker, hentry) {
if (task_is_running(worker->task)) {
/*
On Fri 2026-03-13 05:57:59, Breno Leitao wrote:
> On Thu, Mar 12, 2026 at 06:03:03PM +0100, Petr Mladek wrote:
> > On Thu 2026-03-05 08:15:40, Breno Leitao wrote:
> > > show_cpu_pool_hog() only prints workers whose task is currently running
> > > on the CPU (task_is_running()). This misses workers that are busy
> > > processing a work item but are sleeping or blocked — for example, a
> > > worker that clears PF_WQ_WORKER and enters wait_event_idle().
> >
> > IMHO, it is misleading. AFAIK, workers clear PF_WQ_WORKER flag only
> > when they are going to die. They never do so when going to sleep.
> >
> > > Such a
> > > worker still occupies a pool slot and prevents progress, yet produces
> > > an empty backtrace section in the watchdog output.
> > >
> > > This is happening on real arm64 systems, where
> > > toggle_allocation_gate() IPIs every single CPU in the machine (which
> > > lacks NMI), causing workqueue stalls that show empty backtraces because
> > > toggle_allocation_gate() is sleeping in wait_event_idle().
> >
> > The wait_event_idle() called in toggle_allocation_gate() should not
> > cause a stall. The scheduler should call wq_worker_sleeping(tsk)
> > and wake up another idle worker. It should guarantee the progress.
> >
> > > Remove the task_is_running() filter so every in-flight worker in the
> > > pool's busy_hash is dumped. The busy_hash is protected by pool->lock,
> > > which is already held.
> >
> > As I explained in reply to the cover letter, sleeping workers should
> > not block forward progress. It seems that in this case, the system was
> > not able to wake up the other idle worker or it was the last idle
> > worker and was not able to fork a new one.
> >
> > IMHO, we should warn about this when there is no running worker.
> > It might be more useful than printing backtraces of the sleeping
> > workers because they likely did not cause the problem.
> >
> > I believe that the problem, in this particular situation, is that
> > the system can't schedule or fork new processes. It might help
> > to warn about it and maybe show backtrace of the currently
> > running process on the stalled CPU.
>
> Do you mean checking if pool->busy_hash is empty, and then warning?
>
> Commit fc36ad49ce7160907bcbe4f05c226595611ac293
> Author: Breno Leitao <leitao@debian.org>
> Date: Fri Mar 13 05:35:02 2026 -0700
>
> workqueue: warn when stalled pool has no running workers
>
> When the workqueue watchdog detects a pool stall and the pool's
> busy_hash is empty (no workers executing any work item), print a
> diagnostic warning with the pool state and trigger a backtrace of
> the currently running task on the stalled CPU.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Suggested-by: Petr Mladek <pmladek@suse.com>
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 6ee52ba9b14f7..d538067754123 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -7655,6 +7655,17 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
>
> raw_spin_lock_irqsave(&pool->lock, irq_flags);
>
> + if (hash_empty(pool->busy_hash)) {
This would print it only when there is no in-flight work.
But I think that the problem is when there in no worker in
the running state. There should always be one to guarantee
the forward progress.
I took inspiration from your patch. This is what comes to my mind
on top of the current master (printing only running workers):
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index aeaec79bc09c..a044c7e42139 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7588,12 +7588,15 @@ static void show_cpu_pool_hog(struct worker_pool *pool)
{
struct worker *worker;
unsigned long irq_flags;
+ bool found_running;
int bkt;
raw_spin_lock_irqsave(&pool->lock, irq_flags);
+ found_running = false;
hash_for_each(pool->busy_hash, bkt, worker, hentry) {
if (task_is_running(worker->task)) {
+ found_running = true;
/*
* Defer printing to avoid deadlocks in console
* drivers that queue work while holding locks
@@ -7609,6 +7612,19 @@ static void show_cpu_pool_hog(struct worker_pool *pool)
}
raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
+
+ if (!found_running) {
+ pr_info("pool %d: no worker in running state, cpu=%d is %s (nr_workers=%d nr_idle=%d)\n",
+ pool->id, pool->cpu,
+ idle_cpu(pool->cpu) ? "idle" : "busy",
+ pool->nr_workers, pool->nr_idle);
+ pr_info("The pool might have troubles to wake up another idle worker.\n");
+ if (pool->manager) {
+ pr_info("Backtrace of the pool manager:\n");
+ sched_show_task(pool->manager->task);
+ }
+ trigger_single_cpu_backtrace(pool->cpu);
+ }
}
static void show_cpu_pools_hogs(void)
Warning: The code is not safe. We would need add some synchronization
of the pool->manager pointer.
Even better might be to print state and backtrace of the process
which was woken by kick_pool() when the last running worker
went asleep.
Motivation: AFAIK, if there is a pending work in CPU bound workqueue
than at least one worker in the related worker pool should be
in "task_is_running()" state to guarantee forward progress.
If we find the running worker then it will likely be the
culprit. It either runs for too long. Or it is the last
idle worker and it fails to create a new one.
If there is no worker in running state then there is likely
a problem in the core workqueue code. Or some work shoot
the workqueue into its leg. Anyway, we might need to print
much more details to nail it down.
Best Regards,
Petr
Hello Petr,
On Fri, Mar 13, 2026 at 05:27:40PM +0100, Petr Mladek wrote:
> I took inspiration from your patch. This is what comes to my mind
> on top of the current master (printing only running workers):
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index aeaec79bc09c..a044c7e42139 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -7588,12 +7588,15 @@ static void show_cpu_pool_hog(struct worker_pool *pool)
> {
> struct worker *worker;
> unsigned long irq_flags;
> + bool found_running;
> int bkt;
>
> raw_spin_lock_irqsave(&pool->lock, irq_flags);
>
> + found_running = false;
> hash_for_each(pool->busy_hash, bkt, worker, hentry) {
> if (task_is_running(worker->task)) {
> + found_running = true;
> /*
> * Defer printing to avoid deadlocks in console
> * drivers that queue work while holding locks
> @@ -7609,6 +7612,19 @@ static void show_cpu_pool_hog(struct worker_pool *pool)
> }
>
> raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
> +
> + if (!found_running) {
> + pr_info("pool %d: no worker in running state, cpu=%d is %s (nr_workers=%d nr_idle=%d)\n",
> + pool->id, pool->cpu,
> + idle_cpu(pool->cpu) ? "idle" : "busy",
> + pool->nr_workers, pool->nr_idle);
> + pr_info("The pool might have troubles to wake up another idle worker.\n");
> + if (pool->manager) {
> + pr_info("Backtrace of the pool manager:\n");
> + sched_show_task(pool->manager->task);
> + }
> + trigger_single_cpu_backtrace(pool->cpu);
> + }
> }
>
> static void show_cpu_pools_hogs(void)
>
>
> Warning: The code is not safe. We would need add some synchronization
> of the pool->manager pointer.
>
> Even better might be to print state and backtrace of the process
> which was woken by kick_pool() when the last running worker
> went asleep.
I agree. We should probably store the last woken worker in the worker_pool
structure and print it later.
I've spent some time verifying that the locking and lifecycle management are
correct. While I'm not completely certain, I believe it's getting closer. An
extra pair of eyes would be helpful.
This is the new version of this patch:
commit feccca7e696ead3272669ee4d4dc02b6946d0faf
Author: Breno Leitao <leitao@debian.org>
Date: Mon Mar 16 09:47:09 2026 -0700
workqueue: print diagnostic info when no worker is in running state
show_cpu_pool_busy_workers() iterates over busy workers but gives no
feedback when none are found in running state, which is a key indicator
that a pool may be stuck — unable to wake an idle worker to process
pending work.
Add a diagnostic message when no running workers are found, reporting
pool id, CPU, idle state, and worker counts. Also trigger a single-CPU
backtrace for the stalled CPU.
To identify the task most likely responsible for the stall, add
last_woken_worker (L: pool->lock) to worker_pool and record it in
kick_pool() just before wake_up_process(). This captures the idle
worker that was kicked to take over when the last running worker went to
sleep; if the pool is now stuck with no running worker, that task is the
prime suspect and its backtrace is dumped.
Using struct worker * rather than struct task_struct * avoids any
lifetime concern: workers are only destroyed via set_worker_dying()
which requires pool->lock, and set_worker_dying() clears
last_woken_worker when the dying worker matches. show_cpu_pool_busy_workers()
holds pool->lock while calling sched_show_task(), so last_woken_worker
is either NULL or points to a live worker with a valid task. More
precisely, set_worker_dying() clears last_woken_worker before setting
WORKER_DIE, so a non-NULL last_woken_worker means the kthread has not
yet exited and worker->task is still alive.
The pool info message is printed inside pool->lock using
printk_deferred_enter/exit, the same pattern used by the existing
busy-worker loop, to avoid deadlocks with console drivers that queue
work while holding locks also taken in their write paths.
trigger_single_cpu_backtrace() is called after releasing the lock.
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b77119d71641a..38aebf4514c03 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -217,6 +217,7 @@ struct worker_pool {
/* L: hash of busy workers */
struct worker *manager; /* L: purely informational */
+ struct worker *last_woken_worker; /* L: last worker woken by kick_pool() */
struct list_head workers; /* A: attached workers */
struct ida worker_ida; /* worker IDs for task name */
@@ -1295,6 +1296,9 @@ static bool kick_pool(struct worker_pool *pool)
}
}
#endif
+ /* Track the last idle worker woken, used for stall diagnostics. */
+ pool->last_woken_worker = worker;
+
wake_up_process(p);
return true;
}
@@ -2902,6 +2906,13 @@ static void set_worker_dying(struct worker *worker, struct list_head *list)
pool->nr_workers--;
pool->nr_idle--;
+ /*
+ * Clear last_woken_worker if it points to this worker, so that
+ * show_cpu_pool_busy_workers() cannot dereference a freed worker.
+ */
+ if (pool->last_woken_worker == worker)
+ pool->last_woken_worker = NULL;
+
worker->flags |= WORKER_DIE;
list_move(&worker->entry, list);
@@ -7582,20 +7593,58 @@ module_param_named(panic_on_stall_time, wq_panic_on_stall_time, uint, 0644);
MODULE_PARM_DESC(panic_on_stall_time, "Panic if stall exceeds this many seconds (0=disabled)");
/*
- * Show workers that might prevent the processing of pending work items.
- * A busy worker that is not running on the CPU (e.g. sleeping in
- * wait_event_idle() with PF_WQ_WORKER cleared) can stall the pool just as
- * effectively as a CPU-bound one, so dump every in-flight worker.
+ * Report that a pool has no worker in running state, which is a sign that the
+ * pool may be stuck. Print pool info. Must be called with pool->lock held and
+ * inside a printk_deferred_enter/exit region.
+ */
+static void show_pool_no_running_worker(struct worker_pool *pool)
+{
+ lockdep_assert_held(&pool->lock);
+
+ printk_deferred_enter();
+ pr_info("pool %d: no worker in running state, cpu=%d is %s (nr_workers=%d nr_idle=%d)\n",
+ pool->id, pool->cpu,
+ idle_cpu(pool->cpu) ? "idle" : "busy",
+ pool->nr_workers, pool->nr_idle);
+ pr_info("The pool might have trouble waking an idle worker.\n");
+ /*
+ * last_woken_worker and its task are valid here: set_worker_dying()
+ * clears it under pool->lock before setting WORKER_DIE, so if
+ * last_woken_worker is non-NULL the kthread has not yet exited and
+ * worker->task is still alive.
+ */
+ if (pool->last_woken_worker) {
+ pr_info("Backtrace of last woken worker:\n");
+ sched_show_task(pool->last_woken_worker->task);
+ } else {
+ pr_info("Last woken worker empty\n");
+ }
+ printk_deferred_exit();
+}
+
+/*
+ * Show running workers that might prevent the processing of pending work items.
+ * If no running worker is found, the pool may be stuck waiting for an idle
+ * worker to be woken, so report the pool state and the last woken worker.
*/
static void show_cpu_pool_busy_workers(struct worker_pool *pool)
{
struct worker *worker;
unsigned long irq_flags;
- int bkt;
+ bool found_running = false;
+ int cpu, bkt;
raw_spin_lock_irqsave(&pool->lock, irq_flags);
+ /* Snapshot cpu inside the lock to safely use it after unlock. */
+ cpu = pool->cpu;
+
hash_for_each(pool->busy_hash, bkt, worker, hentry) {
+ /* Skip workers that are not actively running on the CPU. */
+ if (!task_is_running(worker->task))
+ continue;
+
+ found_running = true;
/*
* Defer printing to avoid deadlocks in console
* drivers that queue work while holding locks
@@ -7609,7 +7658,23 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
printk_deferred_exit();
}
+ /*
+ * If no running worker was found, the pool is likely stuck. Print pool
+ * state and the backtrace of the last woken worker, which is the prime
+ * suspect for the stall.
+ */
+ if (!found_running)
+ show_pool_no_running_worker(pool);
+
raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
+
+ /*
+ * Trigger a backtrace on the stalled CPU to capture what it is
+ * currently executing. Called after releasing the lock to avoid
+ * any potential issues with NMI delivery.
+ */
+ if (!found_running)
+ trigger_single_cpu_backtrace(cpu);
}
static void show_cpu_pools_busy_workers(void)
On Wed 2026-03-18 04:31:08, Breno Leitao wrote:
> On Fri, Mar 13, 2026 at 05:27:40PM +0100, Petr Mladek wrote:
> I agree. We should probably store the last woken worker in the worker_pool
> structure and print it later.
>
> I've spent some time verifying that the locking and lifecycle management are
> correct. While I'm not completely certain, I believe it's getting closer. An
> extra pair of eyes would be helpful.
>
> This is the new version of this patch:
>
> commit feccca7e696ead3272669ee4d4dc02b6946d0faf
> Author: Breno Leitao <leitao@debian.org>
> Date: Mon Mar 16 09:47:09 2026 -0700
>
> workqueue: print diagnostic info when no worker is in running state
>
> show_cpu_pool_busy_workers() iterates over busy workers but gives no
> feedback when none are found in running state, which is a key indicator
> that a pool may be stuck — unable to wake an idle worker to process
> pending work.
>
> Add a diagnostic message when no running workers are found, reporting
> pool id, CPU, idle state, and worker counts. Also trigger a single-CPU
> backtrace for the stalled CPU.
>
> To identify the task most likely responsible for the stall, add
> last_woken_worker (L: pool->lock) to worker_pool and record it in
> kick_pool() just before wake_up_process(). This captures the idle
> worker that was kicked to take over when the last running worker went to
> sleep; if the pool is now stuck with no running worker, that task is the
> prime suspect and its backtrace is dumped.
>
> Using struct worker * rather than struct task_struct * avoids any
> lifetime concern: workers are only destroyed via set_worker_dying()
> which requires pool->lock, and set_worker_dying() clears
> last_woken_worker when the dying worker matches. show_cpu_pool_busy_workers()
> holds pool->lock while calling sched_show_task(), so last_woken_worker
> is either NULL or points to a live worker with a valid task. More
> precisely, set_worker_dying() clears last_woken_worker before setting
> WORKER_DIE, so a non-NULL last_woken_worker means the kthread has not
> yet exited and worker->task is still alive.
>
> The pool info message is printed inside pool->lock using
> printk_deferred_enter/exit, the same pattern used by the existing
> busy-worker loop, to avoid deadlocks with console drivers that queue
> work while holding locks also taken in their write paths.
> trigger_single_cpu_backtrace() is called after releasing the lock.
>
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Breno Leitao <leitao@debian.org>
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index b77119d71641a..38aebf4514c03 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -7582,20 +7593,58 @@ module_param_named(panic_on_stall_time, wq_panic_on_stall_time, uint, 0644);
> MODULE_PARM_DESC(panic_on_stall_time, "Panic if stall exceeds this many seconds (0=disabled)");
>
> /*
> - * Show workers that might prevent the processing of pending work items.
> - * A busy worker that is not running on the CPU (e.g. sleeping in
> - * wait_event_idle() with PF_WQ_WORKER cleared) can stall the pool just as
> - * effectively as a CPU-bound one, so dump every in-flight worker.
> + * Report that a pool has no worker in running state, which is a sign that the
> + * pool may be stuck. Print pool info. Must be called with pool->lock held and
> + * inside a printk_deferred_enter/exit region.
> + */
> +static void show_pool_no_running_worker(struct worker_pool *pool)
> +{
> + lockdep_assert_held(&pool->lock);
> +
> + printk_deferred_enter();
> + pr_info("pool %d: no worker in running state, cpu=%d is %s (nr_workers=%d nr_idle=%d)\n",
> + pool->id, pool->cpu,
> + idle_cpu(pool->cpu) ? "idle" : "busy",
> + pool->nr_workers, pool->nr_idle);
> + pr_info("The pool might have trouble waking an idle worker.\n");
> + /*
> + * last_woken_worker and its task are valid here: set_worker_dying()
> + * clears it under pool->lock before setting WORKER_DIE, so if
> + * last_woken_worker is non-NULL the kthread has not yet exited and
> + * worker->task is still alive.
> + */
> + if (pool->last_woken_worker) {
> + pr_info("Backtrace of last woken worker:\n");
> + sched_show_task(pool->last_woken_worker->task);
> + } else {
> + pr_info("Last woken worker empty\n");
This is a bit ambiguous. It sounds like that the worker is idle.
I would write something like:
pr_info("There is no info about the last woken worker\n");
pr_info("Missing info about the last woken worker.\n");
> + }
> + printk_deferred_exit();
> +}
> +
Otherwise, I like this patch.
I still think what might be the reason that there is no worker
in the running state. Let's see if this patch brings some useful info.
One more idea. It might be useful to store a timestamp when the last
worker was woken. And then print either the timestamp or delta.
It would help to make sure that kick_pool() was really called
during the reported stall.
Best Regards,
Petr
On Wed, Mar 18, 2026 at 04:11:54PM +0100, Petr Mladek wrote:
> On Wed 2026-03-18 04:31:08, Breno Leitao wrote:
> > On Fri, Mar 13, 2026 at 05:27:40PM +0100, Petr Mladek wrote:
> > I agree. We should probably store the last woken worker in the worker_pool
> > structure and print it later.
> >
> > I've spent some time verifying that the locking and lifecycle management are
> > correct. While I'm not completely certain, I believe it's getting closer. An
> > extra pair of eyes would be helpful.
> >
> > This is the new version of this patch:
> >
> > commit feccca7e696ead3272669ee4d4dc02b6946d0faf
> > Author: Breno Leitao <leitao@debian.org>
> > Date: Mon Mar 16 09:47:09 2026 -0700
> >
> > workqueue: print diagnostic info when no worker is in running state
> >
> > show_cpu_pool_busy_workers() iterates over busy workers but gives no
> > feedback when none are found in running state, which is a key indicator
> > that a pool may be stuck — unable to wake an idle worker to process
> > pending work.
> >
> > Add a diagnostic message when no running workers are found, reporting
> > pool id, CPU, idle state, and worker counts. Also trigger a single-CPU
> > backtrace for the stalled CPU.
> >
> > To identify the task most likely responsible for the stall, add
> > last_woken_worker (L: pool->lock) to worker_pool and record it in
> > kick_pool() just before wake_up_process(). This captures the idle
> > worker that was kicked to take over when the last running worker went to
> > sleep; if the pool is now stuck with no running worker, that task is the
> > prime suspect and its backtrace is dumped.
> >
> > Using struct worker * rather than struct task_struct * avoids any
> > lifetime concern: workers are only destroyed via set_worker_dying()
> > which requires pool->lock, and set_worker_dying() clears
> > last_woken_worker when the dying worker matches. show_cpu_pool_busy_workers()
> > holds pool->lock while calling sched_show_task(), so last_woken_worker
> > is either NULL or points to a live worker with a valid task. More
> > precisely, set_worker_dying() clears last_woken_worker before setting
> > WORKER_DIE, so a non-NULL last_woken_worker means the kthread has not
> > yet exited and worker->task is still alive.
> >
> > The pool info message is printed inside pool->lock using
> > printk_deferred_enter/exit, the same pattern used by the existing
> > busy-worker loop, to avoid deadlocks with console drivers that queue
> > work while holding locks also taken in their write paths.
> > trigger_single_cpu_backtrace() is called after releasing the lock.
> >
> > Suggested-by: Petr Mladek <pmladek@suse.com>
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> >
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index b77119d71641a..38aebf4514c03 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -7582,20 +7593,58 @@ module_param_named(panic_on_stall_time, wq_panic_on_stall_time, uint, 0644);
> > MODULE_PARM_DESC(panic_on_stall_time, "Panic if stall exceeds this many seconds (0=disabled)");
> >
> > /*
> > - * Show workers that might prevent the processing of pending work items.
> > - * A busy worker that is not running on the CPU (e.g. sleeping in
> > - * wait_event_idle() with PF_WQ_WORKER cleared) can stall the pool just as
> > - * effectively as a CPU-bound one, so dump every in-flight worker.
> > + * Report that a pool has no worker in running state, which is a sign that the
> > + * pool may be stuck. Print pool info. Must be called with pool->lock held and
> > + * inside a printk_deferred_enter/exit region.
> > + */
> > +static void show_pool_no_running_worker(struct worker_pool *pool)
> > +{
> > + lockdep_assert_held(&pool->lock);
> > +
> > + printk_deferred_enter();
> > + pr_info("pool %d: no worker in running state, cpu=%d is %s (nr_workers=%d nr_idle=%d)\n",
> > + pool->id, pool->cpu,
> > + idle_cpu(pool->cpu) ? "idle" : "busy",
> > + pool->nr_workers, pool->nr_idle);
> > + pr_info("The pool might have trouble waking an idle worker.\n");
> > + /*
> > + * last_woken_worker and its task are valid here: set_worker_dying()
> > + * clears it under pool->lock before setting WORKER_DIE, so if
> > + * last_woken_worker is non-NULL the kthread has not yet exited and
> > + * worker->task is still alive.
> > + */
> > + if (pool->last_woken_worker) {
> > + pr_info("Backtrace of last woken worker:\n");
> > + sched_show_task(pool->last_woken_worker->task);
> > + } else {
> > + pr_info("Last woken worker empty\n");
>
> This is a bit ambiguous. It sounds like that the worker is idle.
> I would write something like:
>
> pr_info("There is no info about the last woken worker\n");
> pr_info("Missing info about the last woken worker.\n");
>
> > + }
> > + printk_deferred_exit();
> > +}
> > +
>
> Otherwise, I like this patch.
>
> I still think what might be the reason that there is no worker
> in the running state. Let's see if this patch brings some useful info.
>
> One more idea. It might be useful to store a timestamp when the last
> worker was woken. And then print either the timestamp or delta.
> It would help to make sure that kick_pool() was really called
> during the reported stall.
Ack, this is the following patch I will deploy in production, let's see
how useful it is.
commit c78b175971888da3c2ae6d84971e9beb01269a92
Author: Breno Leitao <leitao@debian.org>
Date: Mon Mar 16 09:47:09 2026 -0700
workqueue: print diagnostic info when no worker is in running state
show_cpu_pool_busy_workers() iterates over busy workers but gives no
feedback when none are found in running state, which is a key indicator
that a pool may be stuck — unable to wake an idle worker to process
pending work.
Add a diagnostic message when no running workers are found, reporting
pool id, CPU, idle state, and worker counts. Also trigger a single-CPU
backtrace for the stalled CPU.
To identify the task most likely responsible for the stall, add
last_woken_worker and last_woken_tstamp (both L: pool->lock) to
worker_pool and record them in kick_pool() just before
wake_up_process(). This captures the idle worker that was kicked to
take over when the last running worker went to sleep; if the pool is
now stuck with no running worker, that task is the prime suspect and
its backtrace is dumped along with how long ago it was woken.
Using struct worker * rather than struct task_struct * avoids any
lifetime concern: workers are only destroyed via set_worker_dying()
which requires pool->lock, and set_worker_dying() clears
last_woken_worker when the dying worker matches. show_cpu_pool_busy_workers()
holds pool->lock while calling sched_show_task(), so last_woken_worker
is either NULL or points to a live worker with a valid task. More
precisely, set_worker_dying() clears last_woken_worker before setting
WORKER_DIE, so a non-NULL last_woken_worker means the kthread has not
yet exited and worker->task is still alive.
The pool info message is printed inside pool->lock using
printk_deferred_enter/exit, the same pattern used by the existing
busy-worker loop, to avoid deadlocks with console drivers that queue
work while holding locks also taken in their write paths.
trigger_single_cpu_backtrace() is called after releasing the lock.
Sample output from a stall triggered by the wq_stall test now.
pool 174: no worker in running state, cpu=43 is idle (nr_workers=2 nr_idle=1)
The pool might have trouble waking an idle worker.
Last worker woken 48977 ms ago:
task:kworker/43:1 state:I stack:0 pid:631 tgid:631 ppid:2
Call Trace:
<stack trace>
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b77119d71641a..f8b1741824117 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -217,6 +217,8 @@ struct worker_pool {
/* L: hash of busy workers */
struct worker *manager; /* L: purely informational */
+ struct worker *last_woken_worker; /* L: last worker woken by kick_pool() */
+ unsigned long last_woken_tstamp; /* L: timestamp of last kick_pool() wake */
struct list_head workers; /* A: attached workers */
struct ida worker_ida; /* worker IDs for task name */
@@ -1295,6 +1297,10 @@ static bool kick_pool(struct worker_pool *pool)
}
}
#endif
+ /* Track the last idle worker woken, used for stall diagnostics. */
+ pool->last_woken_worker = worker;
+ pool->last_woken_tstamp = jiffies;
+
wake_up_process(p);
return true;
}
@@ -2902,6 +2908,13 @@ static void set_worker_dying(struct worker *worker, struct list_head *list)
pool->nr_workers--;
pool->nr_idle--;
+ /*
+ * Clear last_woken_worker if it points to this worker, so that
+ * show_cpu_pool_busy_workers() cannot dereference a freed worker.
+ */
+ if (pool->last_woken_worker == worker)
+ pool->last_woken_worker = NULL;
+
worker->flags |= WORKER_DIE;
list_move(&worker->entry, list);
@@ -7582,20 +7595,59 @@ module_param_named(panic_on_stall_time, wq_panic_on_stall_time, uint, 0644);
MODULE_PARM_DESC(panic_on_stall_time, "Panic if stall exceeds this many seconds (0=disabled)");
/*
- * Show workers that might prevent the processing of pending work items.
- * A busy worker that is not running on the CPU (e.g. sleeping in
- * wait_event_idle() with PF_WQ_WORKER cleared) can stall the pool just as
- * effectively as a CPU-bound one, so dump every in-flight worker.
+ * Report that a pool has no worker in running state, which is a sign that the
+ * pool may be stuck. Print pool info. Must be called with pool->lock held and
+ * inside a printk_deferred_enter/exit region.
+ */
+static void show_pool_no_running_worker(struct worker_pool *pool)
+{
+ lockdep_assert_held(&pool->lock);
+
+ printk_deferred_enter();
+ pr_info("pool %d: no worker in running state, cpu=%d is %s (nr_workers=%d nr_idle=%d)\n",
+ pool->id, pool->cpu,
+ idle_cpu(pool->cpu) ? "idle" : "busy",
+ pool->nr_workers, pool->nr_idle);
+ pr_info("The pool might have trouble waking an idle worker.\n");
+ /*
+ * last_woken_worker and its task are valid here: set_worker_dying()
+ * clears it under pool->lock before setting WORKER_DIE, so if
+ * last_woken_worker is non-NULL the kthread has not yet exited and
+ * worker->task is still alive.
+ */
+ if (pool->last_woken_worker) {
+ pr_info("Last worker woken %lu ms ago:\n",
+ jiffies_to_msecs(jiffies - pool->last_woken_tstamp));
+ sched_show_task(pool->last_woken_worker->task);
+ } else {
+ pr_info("Missing info about the last woken worker.\n");
+ }
+ printk_deferred_exit();
+}
+
+/*
+ * Show running workers that might prevent the processing of pending work items.
+ * If no running worker is found, the pool may be stuck waiting for an idle
+ * worker to be woken, so report the pool state and the last woken worker.
*/
static void show_cpu_pool_busy_workers(struct worker_pool *pool)
{
struct worker *worker;
unsigned long irq_flags;
- int bkt;
+ bool found_running = false;
+ int cpu, bkt;
raw_spin_lock_irqsave(&pool->lock, irq_flags);
+ /* Snapshot cpu inside the lock to safely use it after unlock. */
+ cpu = pool->cpu;
+
hash_for_each(pool->busy_hash, bkt, worker, hentry) {
+ /* Skip workers that are not actively running on the CPU. */
+ if (!task_is_running(worker->task))
+ continue;
+
+ found_running = true;
/*
* Defer printing to avoid deadlocks in console
* drivers that queue work while holding locks
@@ -7609,7 +7661,23 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
printk_deferred_exit();
}
+ /*
+ * If no running worker was found, the pool is likely stuck. Print pool
+ * state and the backtrace of the last woken worker, which is the prime
+ * suspect for the stall.
+ */
+ if (!found_running)
+ show_pool_no_running_worker(pool);
+
raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
+
+ /*
+ * Trigger a backtrace on the stalled CPU to capture what it is
+ * currently executing. Called after releasing the lock to avoid
+ * any potential issues with NMI delivery.
+ */
+ if (!found_running)
+ trigger_single_cpu_backtrace(cpu);
}
static void show_cpu_pools_busy_workers(void)
On Thu, Mar 5, 2026 at 8:16 AM Breno Leitao <leitao@debian.org> wrote: > > show_cpu_pool_hog() only prints workers whose task is currently running > on the CPU (task_is_running()). This misses workers that are busy > processing a work item but are sleeping or blocked — for example, a > worker that clears PF_WQ_WORKER and enters wait_event_idle(). Such a > worker still occupies a pool slot and prevents progress, yet produces > an empty backtrace section in the watchdog output. > > This is happening on real arm64 systems, where > toggle_allocation_gate() IPIs every single CPU in the machine (which > lacks NMI), causing workqueue stalls that show empty backtraces because > toggle_allocation_gate() is sleeping in wait_event_idle(). > > Remove the task_is_running() filter so every in-flight worker in the > pool's busy_hash is dumped. The busy_hash is protected by pool->lock, > which is already held. > > Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Song Liu <song@kernel.org>
© 2016 - 2026 Red Hat, Inc.