From nobody Tue Dec 2 02:43:46 2025 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 03CCA283C93 for ; Tue, 18 Nov 2025 09:34:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763458481; cv=none; b=LEOFtpCUCWqXBeYknrOKGR+V3LasJ3A3Qspk1CjxIPkrh8GT7H4XumoWfPnYq4N8N05cYQYpHyDnUaDUL0CXE4VtroQEH3nXyIFYwNOCxC+uY60DCQtQYTyw9XndJSGR1X3rnNOJGIZhQqdtPb0C0Xf/6i224TEP4I7hsGK3pVY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763458481; c=relaxed/simple; bh=3JV0xih7HCBppyDky104vFDuOfwkJvglQOlznwhqckg=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=FYDorOme81vr63PNHpNYFH+7tqv2CDEdaEo1pyb7nupZIRDxiXFygxWPhGUL4sAANhzRNRhPTkeHQ5ykGbaztPOKMh+5EiOJQk8PXJEZ8bTh/BQ5uvNby6UnL6s7wieun3K8Hf6WLQu6rB7cN4LXS9oOKRMII1/Kymh6UqaAoL0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=H1YlkdUY; arc=none smtp.client-ip=209.85.210.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="H1YlkdUY" Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-7aab7623f42so6238275b3a.2 for ; Tue, 18 Nov 2025 01:34:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763458479; x=1764063279; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=UkZKAuwb4pzxdr6D8KzpVUPULzFzJJhwJy6aN7mz42I=; b=H1YlkdUYPvhayJx3M8iHGAURHvg+ZOARAtbvskuanJPjmtfyPtHVFJ+GU6O8nVXlmV N71sJEdzcUkEpL2BdyNob0Gfe0JUUbQ6jrsTAVJSvw6Y+lcEc3Xxs0t9d+gS7BMFj9Dl 6Lokw7ko5jCOXaaP4AhKZqHpoqdpaFXXrHCgflUqQgUqUFW4izvbONUkPpv78lt+lEui 5cnV2UO+LU/tC5KryT+TmKL81SGTyx424ZA3GTKDkYGuLFG+88pYdxRMownu9daJSVfE VdRSLlYK1+76GaEG8uCnj1VJWPHazsoSDRc/gJIUpTFyD/8JxzZkimjQrk1a4vvOGdZJ o0OA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763458479; x=1764063279; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=UkZKAuwb4pzxdr6D8KzpVUPULzFzJJhwJy6aN7mz42I=; b=hrqIivTTqXePfb78aP/n8ipJ5i3NrgyE4OxOa2RWKTys4yuZ4Nw65EXujOpDoxcdVP IsyW2nuYRKahcF4a2T5jBEnuFOkTr+GCYgsqxc/5ZnXcCftGkMpSUmsXz912HX0R+NhH jzovtgFQOBzmCDItB8UZMc24F7GiJpWLAzLzZAipVEX+Goev21Dt7TPXGL4RhdmidT68 VTokl+YY7DzztnrST9VJnR2taqj27p/6fpFdwqFcGG+Ll/tuU/4xxaG90KD9fGeWsnI3 4bcsi9LDuY0zCcboAZgtO/rKUM3wW3Q1io7qVxR7LM5rRdwsSCk07zPzmZ59IjcJzAtK Ts6w== X-Gm-Message-State: AOJu0YwiQ/ABjW1tOhRL0Y/4lf2aycQIcCSbuu51KkGo74mMteIB0izx ruW0uG7yP/uPwk8b+5VY/uz2CzXrKERNzasIkaby0n5A6Obllz6t2YNwYbouRePoSZE= X-Gm-Gg: ASbGncszdmUWn+ZFp1dYqw9hLfXCiiOXV4LJ5K60haQvUeaK+OJAZbqVaGT77Ak7j+B jfoHcej9eU9mJ3fRpkKp8BZJVnHhtVJFD0tCW+NvczLxSLmo9R/jvOWvgWLwV7UqUHASqtG07hc LRC1coah0sjpv7KF0AywLrGQ7OLUFlv6EjPJ2njLEpYc+1K5EdqYKnPAYqnRKXrLEsxysaVN7jf sjLsZrEbKLtG+k9qcvLUdiHkab3IeYLZyV13MByDS6QssXgkS5i8tI02q6Q3WAx6OFsB2CtOEex 3ZrsEY4lIupWDnfCGbho4rz9FIPvi8s+/LiOUwhxmj4DQQnxPwGVHn8i02qBXxuIybqce45zC1u eRPJ5IiYNN1xtvEBag8opcQOBzB7DAjFmIx8mD4aBv5GADYdrmS1cQNcHOALThVmNENMmrOtmoI ph714U/OTebz5IFc8KEszwug== X-Google-Smtp-Source: AGHT+IGifpT+tuGOBHveNY7L+qmqUl8qpzCCySgL1hpXhy0hxD9BXLqOna2YKcqOmSyienr0Pne5Hg== X-Received: by 2002:a05:6a20:2446:b0:342:352c:77e5 with SMTP id adf61e73a8af0-35ba209f737mr19784251637.54.1763458479019; Tue, 18 Nov 2025 01:34:39 -0800 (PST) Received: from localhost ([240b:4000:bb:1700:7b36:494d:5625:5a1]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-bc37507fc7bsm14768640a12.23.2025.11.18.01.34.38 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 18 Nov 2025 01:34:38 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , ying chen , Tejun Heo , Lai Jiangshan Subject: [PATCH V2] workqueue: Process rescuer work items one-by-one using a positional marker Date: Tue, 18 Nov 2025 17:38:31 +0800 Message-Id: <20251118093832.81920-1-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Lai Jiangshan Previously, the rescuer scanned for all matching work items at once and processed them within a single rescuer thread, which could cause one blocking work item to stall all others. Make the rescuer process work items one-by-one instead of slurping all matches in a single pass. Break the rescuer loop after finding and processing the first matching work item, then restart the search to pick up the next. This gives normal worker threads a chance to process other items which gives them the opportinity to be processed instead of waiting on the rescuer's queue and prevents a blocking work item from stalling the rest once memory pressure is relieved. Introduce a positional dummy work item to avoid potentially O(N^2) rescans of the work list. The marker records the resume position for the next scan, eliminating redundant traversals. Processing items one-by-one has additional benefits: 1) If the pwq does not need rescue (normal workers have been crated or become available), the rescuer can change to other stalled pwqs immediately. 2) It also naturally handles newly created work items during rescue. For example, work items dispatched via pwq_activate_first_inactive() or chained queueing will be picked up in subsequent rescuer iterations without special handling. Cc: ying chen Reported-by: ying chen Fixes: e22bee782b3b ("workqueue: implement concurrency managed dynamic work= er pool") Signed-off-by: Lai Jiangshan --- Changed from v1: Insert the marker and use it purely as iteration marker as Tejun request. It is hard to maintain the proper state of it, expecially maintaining it only in pool->worklist which also requires an unlikely path in the worker_thread() path. Add an unlikely path in the worker_thread() path(in assign_work()). Add other code in other place to make it not be treated like a work item. I'm sure all the paths have been covered now, but I still feel a bit nevous about it for future changes. Extra the code to assign work in the rescuer_thread() out as assign_rescue_work(). V1: https://lore.kernel.org/lkml/20251113163426.2950-1-jiangshanlai@gmail.c= om/ kernel/workqueue.c | 106 +++++++++++++++++++++++++++++---------------- 1 file changed, 69 insertions(+), 37 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 19bc7ea931d8..29f7610ee8c5 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -286,6 +286,7 @@ struct pool_workqueue { struct list_head pending_node; /* LN: node on wq_node_nr_active->pending_= pwqs */ struct list_head pwqs_node; /* WR: node on wq->pwqs */ struct list_head mayday_node; /* MD: node on wq->maydays */ + struct work_struct mayday_pos_work;/* L: position on pool->worklist */ =20 u64 stats[PWQ_NR_STATS]; =20 @@ -1126,6 +1127,12 @@ static struct worker *find_worker_executing_work(str= uct worker_pool *pool, return NULL; } =20 +static void mayday_pos_func(struct work_struct *work) +{ + /* should not be processed, only for marking position */ + BUG(); +} + /** * move_linked_works - move linked works to a list * @work: start of series of works to be scheduled @@ -1188,6 +1195,15 @@ static bool assign_work(struct work_struct *work, st= ruct worker *worker, =20 lockdep_assert_held(&pool->lock); =20 + /* The positional work should not be processed */ + if (unlikely(work->func =3D=3D mayday_pos_func)) { + /* only worker_thread() can possibly take this branch */ + if (WARN_ON_ONCE(nextp)) + *nextp =3D list_next_entry(work, entry); + list_del_init(&work->entry); + return false; + } + /* * A single work shouldn't be executed concurrently by multiple workers. * __queue_work() ensures that @work doesn't jump to a different pool @@ -2991,7 +3007,7 @@ static void send_mayday(struct work_struct *work) return; =20 /* mayday mayday mayday */ - if (list_empty(&pwq->mayday_node)) { + if (list_empty(&pwq->mayday_node) && list_empty(&pwq->mayday_pos_work.ent= ry)) { /* * If @pwq is for an unbound wq, its base ref may be put at * any time due to an attribute change. Pin @pwq until the @@ -3001,6 +3017,8 @@ static void send_mayday(struct work_struct *work) list_add_tail(&pwq->mayday_node, &wq->maydays); wake_up_process(wq->rescuer->task); pwq->stats[PWQ_STAT_MAYDAY]++; + + list_add_tail(&pwq->mayday_pos_work.entry, &work->entry); } } =20 @@ -3439,6 +3457,37 @@ static int worker_thread(void *__worker) goto woke_up; } =20 +static bool assign_rescue_work(struct pool_workqueue *pwq, struct worker *= rescuer) +{ + struct worker_pool *pool =3D pwq->pool; + struct work_struct *work, *n; + + /* from where to search */ + if (list_empty(&pwq->mayday_pos_work.entry)) + work =3D list_first_entry(&pool->worklist, struct work_struct, entry); + else { + work =3D list_next_entry(&pwq->mayday_pos_work, entry); + /* It might be at a new position or not need position anymore */ + list_del_init(&pwq->mayday_pos_work.entry); + } + + /* need rescue? */ + if (!need_to_create_worker(pool)) + return false; + + /* try to assign a work to rescue */ + list_for_each_entry_safe_from(work, n, &pool->worklist, entry) { + if (get_work_pwq(work) =3D=3D pwq && assign_work(work, rescuer, &n)) { + pwq->stats[PWQ_STAT_RESCUED]++; + /* mark the position for next search */ + list_add_tail(&pwq->mayday_pos_work.entry, &n->entry); + return true; + } + } + + return false; +} + /** * rescuer_thread - the rescuer thread function * @__rescuer: self @@ -3493,7 +3542,6 @@ static int rescuer_thread(void *__rescuer) struct pool_workqueue *pwq =3D list_first_entry(&wq->maydays, struct pool_workqueue, mayday_node); struct worker_pool *pool =3D pwq->pool; - struct work_struct *work, *n; =20 __set_current_state(TASK_RUNNING); list_del_init(&pwq->mayday_node); @@ -3504,43 +3552,9 @@ static int rescuer_thread(void *__rescuer) =20 raw_spin_lock_irq(&pool->lock); =20 - /* - * Slurp in all works issued via this workqueue and - * process'em. - */ - WARN_ON_ONCE(!list_empty(&rescuer->scheduled)); - list_for_each_entry_safe(work, n, &pool->worklist, entry) { - if (get_work_pwq(work) =3D=3D pwq && - assign_work(work, rescuer, &n)) - pwq->stats[PWQ_STAT_RESCUED]++; - } - - if (!list_empty(&rescuer->scheduled)) { + while (assign_rescue_work(pwq, rescuer)) process_scheduled_works(rescuer); =20 - /* - * The above execution of rescued work items could - * have created more to rescue through - * pwq_activate_first_inactive() or chained - * queueing. Let's put @pwq back on mayday list so - * that such back-to-back work items, which may be - * being used to relieve memory pressure, don't - * incur MAYDAY_INTERVAL delay inbetween. - */ - if (pwq->nr_active && need_to_create_worker(pool)) { - raw_spin_lock(&wq_mayday_lock); - /* - * Queue iff we aren't racing destruction - * and somebody else hasn't queued it already. - */ - if (wq->rescuer && list_empty(&pwq->mayday_node)) { - get_pwq(pwq); - list_add_tail(&pwq->mayday_node, &wq->maydays); - } - raw_spin_unlock(&wq_mayday_lock); - } - } - /* * Leave this pool. Notify regular workers; otherwise, we end up * with 0 concurrency and stalling the execution. @@ -5153,6 +5167,20 @@ static void init_pwq(struct pool_workqueue *pwq, str= uct workqueue_struct *wq, INIT_LIST_HEAD(&pwq->pwqs_node); INIT_LIST_HEAD(&pwq->mayday_node); kthread_init_work(&pwq->release_work, pwq_release_workfn); + + /* + * Set the dumpy positional work with valid function and get_work_pwq(). + * + * The positional work should only be in the pwq->pool->worklist, and + * should never be queued, processed, flushed, cancelled or even examed + * as a work item. + * + * WORK_STRUCT_PENDING and WORK_STRUCT_INACTIVE just make it less + * surprise for kernel debuging tools and reviewers. + */ + INIT_WORK(&pwq->mayday_pos_work, mayday_pos_func); + atomic_long_set(&pwq->mayday_pos_work.data, (unsigned long)pwq | + WORK_STRUCT_PENDING | WORK_STRUCT_PWQ | WORK_STRUCT_INACTIVE); } =20 /* sync @pwq with the current state of its associated wq and link it */ @@ -6300,6 +6328,8 @@ static void show_pwq(struct pool_workqueue *pwq) =20 list_for_each_entry(work, &pool->worklist, entry) { if (get_work_pwq(work) =3D=3D pwq) { + if (work->func =3D=3D mayday_pos_func) + continue; has_pending =3D true; break; } @@ -6311,6 +6341,8 @@ static void show_pwq(struct pool_workqueue *pwq) list_for_each_entry(work, &pool->worklist, entry) { if (get_work_pwq(work) !=3D pwq) continue; + if (work->func =3D=3D mayday_pos_func) + continue; =20 pr_cont_work(comma, work, &pcws); comma =3D !(*work_data_bits(work) & WORK_STRUCT_LINKED); --=20 2.19.1.6.gb485710b