[PATCH] workqueue: Add pool_workqueue to pending_pwqs list when unplugging multiple inactive works

Matthew Brost posted 1 patch 8 hours ago
There is a newer version of this series
kernel/workqueue.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
[PATCH] workqueue: Add pool_workqueue to pending_pwqs list when unplugging multiple inactive works
Posted by Matthew Brost 8 hours ago
In unplug_oldest_pwq(), the first inactive pool_workqueue is activated
correctly. However, if multiple inactive works exist on the same
pool_workqueue, subsequent works fail to activate because
wq_node_nr_active.pending_pwqs is empty — the list insertion is skipped
when the pool_workqueue is plugged.

Fix this by checking for additional inactive works in
unplug_oldest_pwq() and updating wq_node_nr_active.pending_pwqs
accordingly.

Cc: Carlos Santa <carlos.santa@intel.com>
Cc: Ryan Neph <ryanneph@google.com>
Cc: stable@vger.kernel.org
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Cc: linux-kernel@vger.kernel.org
Fixes: 4c065dbce1e8 ("workqueue: Enable unbound cpumask update on ordered workqueues")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>

---

This bug was first reported by Google, where the Xe driver appeared to
hang due to a fencing signal not completing. We traced the issue to work
items not being scheduled, and it can be trivially reproduced on drm-tip
with the following commands:

shell0:
for i in {1..100}; do echo "Run $i"; xe_exec_threads --r \
threads-rebind-bindexecqueue; done

shell1:
for i in {1..1000}; do echo "toggle $i"; echo f > \
/sys/devices/virtual/workqueue/cpumask; echo ff > \
/sys/devices/virtual/workqueue/cpumask; echo fff > \
/sys/devices/virtual/workqueue/cpumask ; echo ffff > \
/sys/devices/virtual/workqueue/cpumask; sleep .1; done
---
 kernel/workqueue.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b77119d71641..b2cdb44ccb56 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1849,8 +1849,20 @@ static void unplug_oldest_pwq(struct workqueue_struct *wq)
 	raw_spin_lock_irq(&pwq->pool->lock);
 	if (pwq->plugged) {
 		pwq->plugged = false;
-		if (pwq_activate_first_inactive(pwq, true))
+		if (pwq_activate_first_inactive(pwq, true)) {
+			if (!list_empty(&pwq->inactive_works)) {
+				struct worker_pool *pool = pwq->pool;
+				struct wq_node_nr_active *nna =
+					wq_node_nr_active(wq, pool->node);
+
+				raw_spin_lock(&nna->lock);
+				if (list_empty(&pwq->pending_node))
+					list_add_tail(&pwq->pending_node,
+						      &nna->pending_pwqs);
+				raw_spin_unlock(&nna->lock);
+			}
 			kick_pool(pwq->pool);
+		}
 	}
 	raw_spin_unlock_irq(&pwq->pool->lock);
 }
-- 
2.34.1

Re: [PATCH] workqueue: Add pool_workqueue to pending_pwqs list when unplugging multiple inactive works
Posted by Tejun Heo 6 hours ago
Hello,

On Tue, Mar 31, 2026 at 03:18:39PM -0700, Matthew Brost wrote:
> @@ -1849,8 +1849,20 @@ static void unplug_oldest_pwq(struct workqueue_struct *wq)
>  	raw_spin_lock_irq(&pwq->pool->lock);
>  	if (pwq->plugged) {
>  		pwq->plugged = false;
> -		if (pwq_activate_first_inactive(pwq, true))
> +		if (pwq_activate_first_inactive(pwq, true)) {
> +			if (!list_empty(&pwq->inactive_works)) {
> +				struct worker_pool *pool = pwq->pool;
> +				struct wq_node_nr_active *nna =
> +					wq_node_nr_active(wq, pool->node);
> +
> +				raw_spin_lock(&nna->lock);
> +				if (list_empty(&pwq->pending_node))
> +					list_add_tail(&pwq->pending_node,
> +						      &nna->pending_pwqs);
> +				raw_spin_unlock(&nna->lock);
> +			}

It's a bit gnarly to open code locking and list operation. Would just
calling pwq_activate_first_inactive(pwq, false) one more time work here?
That'd trigger tryinc_node_nr_active() failure in pwq_tryinc_nr_active() and
the addition to the pending list. As this is quite subtle, it'd be nice to
have some comment - it's compensating for the missed pwq_tryinc_nr_active()
call due to plugging, right?

Thanks.

-- 
tejun
Re: [PATCH] workqueue: Add pool_workqueue to pending_pwqs list when unplugging multiple inactive works
Posted by Matthew Brost 6 hours ago
On Tue, Mar 31, 2026 at 02:05:41PM -1000, Tejun Heo wrote:
> Hello,
> 
> On Tue, Mar 31, 2026 at 03:18:39PM -0700, Matthew Brost wrote:
> > @@ -1849,8 +1849,20 @@ static void unplug_oldest_pwq(struct workqueue_struct *wq)
> >  	raw_spin_lock_irq(&pwq->pool->lock);
> >  	if (pwq->plugged) {
> >  		pwq->plugged = false;
> > -		if (pwq_activate_first_inactive(pwq, true))
> > +		if (pwq_activate_first_inactive(pwq, true)) {
> > +			if (!list_empty(&pwq->inactive_works)) {
> > +				struct worker_pool *pool = pwq->pool;
> > +				struct wq_node_nr_active *nna =
> > +					wq_node_nr_active(wq, pool->node);
> > +
> > +				raw_spin_lock(&nna->lock);
> > +				if (list_empty(&pwq->pending_node))
> > +					list_add_tail(&pwq->pending_node,
> > +						      &nna->pending_pwqs);
> > +				raw_spin_unlock(&nna->lock);
> > +			}
> 
> It's a bit gnarly to open code locking and list operation. Would just
> calling pwq_activate_first_inactive(pwq, false) one more time work here?
> That'd trigger tryinc_node_nr_active() failure in pwq_tryinc_nr_active() and
> the addition to the pending list. As this is quite subtle, it'd be nice to
> have some comment - it's compensating for the missed pwq_tryinc_nr_active()
> call due to plugging, right?

Yeah, I think that will work. Let me verify with my reproducer and
adjust the patch accordingly.

+1 on the comment as well—very subtle. Took a few days of
reverse-engineering work queues to track down.

Matt

> 
> Thanks.
> 
> -- 
> tejun
Re: [PATCH] workqueue: Add pool_workqueue to pending_pwqs list when unplugging multiple inactive works
Posted by Matthew Brost 6 hours ago
On Tue, Mar 31, 2026 at 05:22:08PM -0700, Matthew Brost wrote:
> On Tue, Mar 31, 2026 at 02:05:41PM -1000, Tejun Heo wrote:
> > Hello,
> > 
> > On Tue, Mar 31, 2026 at 03:18:39PM -0700, Matthew Brost wrote:
> > > @@ -1849,8 +1849,20 @@ static void unplug_oldest_pwq(struct workqueue_struct *wq)
> > >  	raw_spin_lock_irq(&pwq->pool->lock);
> > >  	if (pwq->plugged) {
> > >  		pwq->plugged = false;
> > > -		if (pwq_activate_first_inactive(pwq, true))
> > > +		if (pwq_activate_first_inactive(pwq, true)) {
> > > +			if (!list_empty(&pwq->inactive_works)) {
> > > +				struct worker_pool *pool = pwq->pool;
> > > +				struct wq_node_nr_active *nna =
> > > +					wq_node_nr_active(wq, pool->node);
> > > +
> > > +				raw_spin_lock(&nna->lock);
> > > +				if (list_empty(&pwq->pending_node))
> > > +					list_add_tail(&pwq->pending_node,
> > > +						      &nna->pending_pwqs);
> > > +				raw_spin_unlock(&nna->lock);
> > > +			}
> > 
> > It's a bit gnarly to open code locking and list operation. Would just
> > calling pwq_activate_first_inactive(pwq, false) one more time work here?
> > That'd trigger tryinc_node_nr_active() failure in pwq_tryinc_nr_active() and
> > the addition to the pending list. As this is quite subtle, it'd be nice to
> > have some comment - it's compensating for the missed pwq_tryinc_nr_active()
> > call due to plugging, right?

Sorry - missed a question.

It is compensating for early bail in pwq_tryinc_nr_active() due to plugging here:

1726 static bool pwq_tryinc_nr_active(struct pool_workqueue *pwq, bool fill)
...
1741         if (unlikely(pwq->plugged))
1742                 return false;
...
	     /* Add to &nna->pending_pwqs */

This far from my domain, so if there is different idea to fix this let
me know - takes 5-10 minutes on my end to test out.

The pwq_activate_first_inactive(pwq, false) suggestion works, so will
post that shortly unless you have another idea.

Matt

> 
> Yeah, I think that will work. Let me verify with my reproducer and
> adjust the patch accordingly.
> 
> +1 on the comment as well—very subtle. Took a few days of
> reverse-engineering work queues to track down.
> 
> Matt
> 
> > 
> > Thanks.
> > 
> > -- 
> > tejun