From nobody Thu Dec 18 06:21:56 2025 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E571A314A9E for ; Mon, 8 Dec 2025 13:21:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765200075; cv=none; b=MzudXPxE442ex9Rt6eOVKkq12+YSHqr6g7DSKQJSbgV2m8XvtEXrASp18NMGalOZj4ik5UDnFBtpoiCuA3QaSAiV4/WZGUh76Ffr5iPehzDBZUy9FpVdgomUrmgasc3rgDl6lNbDubWTRabfvgnGt4vkUBwO6s1/pyc8p8bYYTU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765200075; c=relaxed/simple; bh=9XuibU8n0B+4RUWOACEcrvCMh6Sc3fxktvrIf828HkY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bJNK17X9Oi91zCLhH4oh4r6g5pHrzag5gp/ftgK4mt8Gg/Nz2B+08cFEEFlVD6KkNDmtm/zEtEyLI7oUHPUAIVoqpVoZ9zavLLhoAD5BpazHxsJyBftJpAsgNe+EduXZxCPHVCFpd6Jff6n3crQxEnpYXdKK76MWz0xHBD+3nzk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=EOxf+/gy; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EOxf+/gy" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-7c66822dd6dso3647032b3a.0 for ; Mon, 08 Dec 2025 05:21:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765200071; x=1765804871; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+bl6INZMAYbGxhA3x7T2YRCDZYo5i6fnyQYYebo8W1k=; b=EOxf+/gypPae+c94kzdJDyjF1IMEizYogSGTmdp4GLkzNE9W1oWbfMMzjQ7HO1/dVn ZsTw2rbD5S5/LXj9pVyRfnrWxZQ+b6yEcgD3CtCAMISy/CZiit34fiZi2MArPgCnqN/C T4mTcImLqFjdNea44q2I4NOerKO0TWJglR4JpCQuscA7R0/KLUIUG71k26LjW7SKWF7h DvXDtf9STLI6Gwpi80AfVhUzJKtKgpp1j2FvxTanjVLJIyAXCr1lPZRnvoTe4i0NTlp6 NTVo11NV+llQpN9M6BdLPeeOmjvzbcCT7jY3Imah9qCw2kcasGj8gWhbN7pYk5P4btLI LFfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765200071; x=1765804871; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=+bl6INZMAYbGxhA3x7T2YRCDZYo5i6fnyQYYebo8W1k=; b=wtZM62ZAslOInu7SMdTqPF/EIv7+8idfOHroeq7+11zQRw6nsk8kEjxcHFRDE1GxkU qjhXVVhz3wzJAqfZEp9LkQaSiPOd6QQ6bO5saxKjTwWZjhgD8esyzT0FDmXSJmYDMT7q HBPeBv6oxIIiWLIeBaeOB7s2piE6pn6yD3D+cZsgtJyF79/3xb1Ap0szi392S6eopu5k 2fMiZ9YteDrtuNDQ5P5bVxe9UCDDTgMGitugAvj9wLsu7BBgJtQwn74Kb1Pk7hxIPRVN Jb/xhEpphJvE5g9H8XzZp9F7skrP+sp7bpbqgmys/jRRj03vX6l+dJA6AAANS8IMuiw3 TPXg== X-Gm-Message-State: AOJu0YwWGk50A/3KHCkGNkd0fJEHiBTl7CkdJBpn1y8JHHKLK3YPrTwH bvf6Miq8bYZ+N4ckcg0/aiikuejhrIk2L3b3ov8DqpTjp1apKsFhQpBt50gFDjMV X-Gm-Gg: ASbGnctw4W+Vl/2ewXMTpfqh3b4Q6mEYqsL4Z/3hJ1+uzC1puAf/gGdCS6t/9ybEPi/ 2ejN+VY0zccwyCvZ1ap3KswaeAfemWf8Ly3MNp0+L3dR+oCOkKbqGqsEoYzJnFQq7UShJRC4N6L KUt42IJoaWDxCDly5MZoP0rqFT/X91Ft9msJlAFboglYjlBcp0kjHgD91yYhHsZpag/0BoyBaSQ ptspxI4IeD1/2u2+fMsffvw/FisTZktYycpHS93TQxn0KI92E3zeos7fPgqDX/v8D0YdlZUtkjP MOePb4uRtIy2A9AapTMOkZfY6HcsPA9TOouSCMQj/dT1YZrMg4FilWOn2ZCyckNmVvMYpgHR5KQ cwXRBJg9En1hccdQLsEN8n2x+RJWDl4OXsSNmVWbksvJuWEwQ9vUmI7DeayZU5Ta6dIXQlkKxAs zx0o25rDpDQFZ2iapnnsejtw== X-Google-Smtp-Source: AGHT+IF980tO25SQ1YIkU+yc5fnMIR0dbF62TfKIuvqRtdV8gdhoP89zo9AqrQquZfRCB5ZQO8YDhA== X-Received: by 2002:a05:6a00:b53:b0:7e8:3fcb:bc3e with SMTP id d2e1a72fcca58-7e8ba3df175mr6579712b3a.19.1765200071446; Mon, 08 Dec 2025 05:21:11 -0800 (PST) Received: from localhost ([240b:4004:a2:7900:ecf3:dec8:8c1e:4f5]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7e2afa3275esm13292223b3a.69.2025.12.08.05.21.10 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 Dec 2025 05:21:11 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Tejun Heo , ying chen , Lai Jiangshan , Lai Jiangshan Subject: [PATCH V5 2/3] workqueue: Process rescuer work items one-by-one using a cursor Date: Mon, 8 Dec 2025 21:25:18 +0800 Message-Id: <20251208132520.1667697-3-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20251208132520.1667697-1-jiangshanlai@gmail.com> References: <20251208132520.1667697-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Lai Jiangshan Previously, the rescuer scanned for all matching work items at once and processed them within a single rescuer thread, which could cause one blocking work item to stall all others. Make the rescuer process work items one-by-one instead of slurping all matches in a single pass. Break the rescuer loop after finding and processing the first matching work item, then restart the search to pick up the next. This gives normal worker threads a chance to process other items which gives them the opportinity to be processed instead of waiting on the rescuer's queue and prevents a blocking work item from stalling the rest once memory pressure is relieved. Introduce a dummy cursor work item to avoid potentially O(N^2) rescans of the work list. The marker records the resume position for the next scan, eliminating redundant traversals. Also introduce RESCUER_BATCH to control the maximum number of work items the rescuer processes in each turn, and move on to other PWQs when the limit is reached. Cc: ying chen Reported-by: ying chen Fixes: e22bee782b3b ("workqueue: implement concurrency managed dynamic work= er pool") Signed-off-by: Lai Jiangshan --- kernel/workqueue.c | 75 ++++++++++++++++++++++++++++++++++++---------- 1 file changed, 59 insertions(+), 16 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index f8371aa54dca..add236e0dac4 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -117,6 +117,8 @@ enum wq_internal_consts { MAYDAY_INTERVAL =3D HZ / 10, /* and then every 100ms */ CREATE_COOLDOWN =3D HZ, /* time to breath after fail */ =20 + RESCUER_BATCH =3D 16, /* process items per turn */ + /* * Rescue workers are used only on emergencies and shared by * all cpus. Give MIN_NICE. @@ -286,6 +288,7 @@ struct pool_workqueue { struct list_head pending_node; /* LN: node on wq_node_nr_active->pending_= pwqs */ struct list_head pwqs_node; /* WR: node on wq->pwqs */ struct list_head mayday_node; /* MD: node on wq->maydays */ + struct work_struct mayday_cursor; /* L: cursor on pool->worklist */ =20 u64 stats[PWQ_NR_STATS]; =20 @@ -1120,6 +1123,12 @@ static struct worker *find_worker_executing_work(str= uct worker_pool *pool, return NULL; } =20 +static void mayday_cursor_func(struct work_struct *work) +{ + /* should not be processed, only for marking position */ + BUG(); +} + /** * move_linked_works - move linked works to a list * @work: start of series of works to be scheduled @@ -1182,6 +1191,16 @@ static bool assign_work(struct work_struct *work, st= ruct worker *worker, =20 lockdep_assert_held(&pool->lock); =20 + /* The cursor work should not be processed */ + if (unlikely(work->func =3D=3D mayday_cursor_func)) { + /* only worker_thread() can possibly take this branch */ + WARN_ON_ONCE(worker->rescue_wq); + if (nextp) + *nextp =3D list_next_entry(work, entry); + list_del_init(&work->entry); + return false; + } + /* * A single work shouldn't be executed concurrently by multiple workers. * __queue_work() ensures that @work doesn't jump to a different pool @@ -3439,22 +3458,30 @@ static int worker_thread(void *__worker) static bool assign_rescuer_work(struct pool_workqueue *pwq, struct worker = *rescuer) { struct worker_pool *pool =3D pwq->pool; + struct work_struct *cursor =3D &pwq->mayday_cursor; struct work_struct *work, *n; =20 /* need rescue? */ if (!pwq->nr_active || !need_to_create_worker(pool)) return false; =20 - /* - * Slurp in all works issued via this workqueue and - * process'em. - */ - list_for_each_entry_safe(work, n, &pool->worklist, entry) { - if (get_work_pwq(work) =3D=3D pwq && assign_work(work, rescuer, &n)) + /* search from the start or cursor if available */ + if (list_empty(&cursor->entry)) + work =3D list_first_entry(&pool->worklist, struct work_struct, entry); + else + work =3D list_next_entry(cursor, entry); + + /* find the next work item to rescue */ + list_for_each_entry_safe_from(work, n, &pool->worklist, entry) { + if (get_work_pwq(work) =3D=3D pwq && assign_work(work, rescuer, &n)) { pwq->stats[PWQ_STAT_RESCUED]++; + /* put the cursor for next search */ + list_move_tail(&cursor->entry, &n->entry); + return true; + } } =20 - return !list_empty(&rescuer->scheduled); + return false; } =20 /** @@ -3511,6 +3538,7 @@ static int rescuer_thread(void *__rescuer) struct pool_workqueue *pwq =3D list_first_entry(&wq->maydays, struct pool_workqueue, mayday_node); struct worker_pool *pool =3D pwq->pool; + unsigned int count =3D 0; =20 __set_current_state(TASK_RUNNING); list_del_init(&pwq->mayday_node); @@ -3523,25 +3551,27 @@ static int rescuer_thread(void *__rescuer) =20 WARN_ON_ONCE(!list_empty(&rescuer->scheduled)); =20 - if (assign_rescuer_work(pwq, rescuer)) { + while (assign_rescuer_work(pwq, rescuer)) { process_scheduled_works(rescuer); =20 /* - * The above execution of rescued work items could - * have created more to rescue through - * pwq_activate_first_inactive() or chained - * queueing. Let's put @pwq back on mayday list so - * that such back-to-back work items, which may be - * being used to relieve memory pressure, don't - * incur MAYDAY_INTERVAL delay inbetween. + * If the per-turn work item limit is reached and other + * PWQs are in mayday, requeue mayday for this PWQ and + * let the rescuer handle the other PWQs first. */ - if (pwq->nr_active && need_to_create_worker(pool)) { + if (++count > RESCUER_BATCH && !list_empty(&pwq->wq->maydays) && + pwq->nr_active && need_to_create_worker(pool)) { raw_spin_lock(&wq_mayday_lock); send_mayday(pwq); raw_spin_unlock(&wq_mayday_lock); + break; } } =20 + /* The cursor can not be left behind without the rescuer watching it. */ + if (!list_empty(&pwq->mayday_cursor.entry) && list_empty(&pwq->mayday_no= de)) + list_del_init(&pwq->mayday_cursor.entry); + /* * Leave this pool. Notify regular workers; otherwise, we end up * with 0 concurrency and stalling the execution. @@ -5160,6 +5190,19 @@ static void init_pwq(struct pool_workqueue *pwq, str= uct workqueue_struct *wq, INIT_LIST_HEAD(&pwq->pwqs_node); INIT_LIST_HEAD(&pwq->mayday_node); kthread_init_work(&pwq->release_work, pwq_release_workfn); + + /* + * Set the dummy cursor work with valid function and get_work_pwq(). + * + * The cursor work should only be in the pwq->pool->worklist, and + * should not be treated as a processable work item. + * + * WORK_STRUCT_PENDING and WORK_STRUCT_INACTIVE just make it less + * surprise for kernel debuging tools and reviewers. + */ + INIT_WORK(&pwq->mayday_cursor, mayday_cursor_func); + atomic_long_set(&pwq->mayday_cursor.data, (unsigned long)pwq | + WORK_STRUCT_PENDING | WORK_STRUCT_PWQ | WORK_STRUCT_INACTIVE); } =20 /* sync @pwq with the current state of its associated wq and link it */ --=20 2.19.1.6.gb485710b