From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Currently the rescuer keeps looping until all work on a PWQ is done, and
this may hurt fairness among PWQs, as the rescuer could remain stuck on
one PWQ indefinitely.
Introduce RESCUER_BATCH to control the maximum number of work items the
rescuer processes in each turn, and move on to other PWQs when the limit
is reached.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
kernel/workqueue.c | 31 ++++++++++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 06cd3d6ff7e1..7cec9755b4e1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -117,6 +117,8 @@ enum wq_internal_consts {
MAYDAY_INTERVAL = HZ / 10, /* and then every 100ms */
CREATE_COOLDOWN = HZ, /* time to breath after fail */
+ RESCUER_BATCH = 16, /* process items per turn */
+
/*
* Rescue workers are used only on emergencies and shared by
* all cpus. Give MIN_NICE.
@@ -3450,7 +3452,15 @@ static int worker_thread(void *__worker)
goto woke_up;
}
-static bool assign_rescuer_work(struct pool_workqueue *pwq, struct worker *rescuer)
+/*
+ * Try to assign one work item from @pwq to @rescuer.
+ *
+ * Returns true if a work item was successfully assigned, false otherwise.
+ * If @throttled and other PWQs are in mayday, requeue mayday for this PWQ
+ * and let the rescuer handle other PWQs first.
+ * If this is the only PWQ in mayday, process it regardless of @throttled.
+ */
+static bool assign_rescuer_work(struct pool_workqueue *pwq, struct worker *rescuer, bool throttled)
{
struct worker_pool *pool = pwq->pool;
struct work_struct *cursor = &pwq->mayday_cursor;
@@ -3471,7 +3481,21 @@ static bool assign_rescuer_work(struct pool_workqueue *pwq, struct worker *rescu
/* find the next work item to rescue */
list_for_each_entry_safe_from(work, n, &pool->worklist, entry) {
- if (get_work_pwq(work) == pwq && assign_work(work, rescuer, &n)) {
+ if (get_work_pwq(work) != pwq)
+ continue;
+ /*
+ * If throttled, update the cursor, requeue a mayday for this
+ * PWQ, and move on to other PWQs. If there are no other PWQs
+ * in mayday, continue processing this one.
+ */
+ if (throttled && !list_empty(&pwq->wq->maydays)) {
+ list_add_tail(&cursor->entry, &work->entry);
+ raw_spin_lock(&wq_mayday_lock); /* for wq->maydays */
+ send_mayday(work);
+ raw_spin_unlock(&wq_mayday_lock);
+ return false;
+ }
+ if (assign_work(work, rescuer, &n)) {
pwq->stats[PWQ_STAT_RESCUED]++;
/* put the cursor for next search */
list_add_tail(&cursor->entry, &n->entry);
@@ -3536,6 +3560,7 @@ static int rescuer_thread(void *__rescuer)
struct pool_workqueue *pwq = list_first_entry(&wq->maydays,
struct pool_workqueue, mayday_node);
struct worker_pool *pool = pwq->pool;
+ unsigned int count = 0;
__set_current_state(TASK_RUNNING);
list_del_init(&pwq->mayday_node);
@@ -3548,7 +3573,7 @@ static int rescuer_thread(void *__rescuer)
WARN_ON_ONCE(!list_empty(&rescuer->scheduled));
- while (assign_rescuer_work(pwq, rescuer))
+ while (assign_rescuer_work(pwq, rescuer, ++count > RESCUER_BATCH))
process_scheduled_works(rescuer);
/*
--
2.19.1.6.gb485710b
On Tue, Nov 25, 2025 at 02:36:16PM +0800, Lai Jiangshan wrote:
...
> + while (assign_rescuer_work(pwq, rescuer, ++count > RESCUER_BATCH))
> process_scheduled_works(rescuer);
Can we something like the following instead?
while (assign_rescuer_work(pwq, rescuer, &count))
It just feels odd to for the caller to decide "you should stop" and then
taking actions on the return value of the callee. Alternatively, just
separate out the pwq rotation into a separate function, so that the caller
can do
while (assign_rescuer_work(..)) {
process_scheduled_works(rescuer);
if (++count > RESCUER_BATCH) {
rotate mayday list;
break;
}
}
Thanks.
--
tejun
On Wed, Dec 3, 2025 at 2:16 AM Tejun Heo <tj@kernel.org> wrote:
>
> On Tue, Nov 25, 2025 at 02:36:16PM +0800, Lai Jiangshan wrote:
> ...
> > + while (assign_rescuer_work(pwq, rescuer, ++count > RESCUER_BATCH))
> > process_scheduled_works(rescuer);
>
> Can we something like the following instead?
>
> while (assign_rescuer_work(pwq, rescuer, &count))
>
> It just feels odd to for the caller to decide "you should stop" and then
> taking actions on the return value of the callee.
I think the nanming of "limited" or "throttled" gives the wrong
impression of "you should stop". How about renaming it again to
"rotate" or "prefer_rotate", meaning "prefer to rotate instead of
assigning work from this pwq"?
> Alternatively, just
> separate out the pwq rotation into a separate function, so that the caller
> can do
>
> while (assign_rescuer_work(..)) {
> process_scheduled_works(rescuer);
> if (++count > RESCUER_BATCH) {
> rotate mayday list;
The current implementation of the cursor cannot be processed by workers as a
regular work item, which means it has no pwq reference and cannot be left
behind if the pwq is not put back on the mayday list. As a result, the code
here has to handle the cursor explicitly, which adds extra burden to
understanding the code.
I prefer using send_mayday() as in this patch (which is naturally available
in assign_rescuer_work() but requires extra code here) rather than the
rotate code used before this patchset.
In other words, I prefer the rotate logic - which decides whether to assign work
or not - to live in assign_rescuer_work(). I’m also okay with moving the
counting logic into assign_rescuer_work(), as
assign_rescuer_work(pwq, rescuer, &count);
Thanks
Lai
> break;
> }
> }
>
> Thanks.
>
> --
> tejun
Hello, Lai.
On Wed, Dec 03, 2025 at 11:22:32AM +0800, Lai Jiangshan wrote:
> On Wed, Dec 3, 2025 at 2:16 AM Tejun Heo <tj@kernel.org> wrote:
...
> > Alternatively, just
> > separate out the pwq rotation into a separate function, so that the caller
> > can do
> >
> > while (assign_rescuer_work(..)) {
> > process_scheduled_works(rescuer);
> > if (++count > RESCUER_BATCH) {
> > rotate mayday list;
>
> The current implementation of the cursor cannot be processed by workers as a
> regular work item, which means it has no pwq reference and cannot be left
> behind if the pwq is not put back on the mayday list. As a result, the code
> here has to handle the cursor explicitly, which adds extra burden to
> understanding the code.
I'm probably missing something but what prevents rescuer_thread() from just
not calling assign_rescuer_work() and just call send_mayday() on the pwq?
send_mayday() currently takes @work but all it cares about is pwq, so we can
just pass in pwq.
Thanks.
--
tejun
© 2016 - 2026 Red Hat, Inc.