From nobody Tue Dec 2 01:03:31 2025 Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 517C429CB57 for ; Tue, 25 Nov 2025 06:32:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764052346; cv=none; b=Mgxa2KINfQHp6EC/lVG961bDyUxMMdJVCHfzSvmuXq1E5MWWPG9CCrw+uxO7yY1Xhi7DKySqDC8KmkuIDy0VWvPsMb6OQO7gBBKOjNLLAH8Wsoi2oa4a7WjBst5R0NqLCh91U39snQ5kKabOl/dOpm4/OEjWbBBFuCj0ALU/bbc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764052346; c=relaxed/simple; bh=X9TvG6zxHwZqc60PhR8CQyzRniUo8show66coxNAmm0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=rmHmthG45n3qn4+qLGzpAT8buhlIPjAt9KLsfZE5BLH7UFZr+yHL0f5+H39/du7CxHsKGhvn+5jLRAcJ91mPSp1iVOXpTvWwAAbznmKY2mqRXpPBAE0+Gei0isLR7T7lEDNGrc+y1hfxsmtPIa3hSdxgXqacbxIq9LhYmbVVsAA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FWVKdbUA; arc=none smtp.client-ip=209.85.210.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FWVKdbUA" Received: by mail-pf1-f173.google.com with SMTP id d2e1a72fcca58-7c66822dd6dso1608880b3a.0 for ; Mon, 24 Nov 2025 22:32:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764052343; x=1764657143; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=il2RpXEIXVNw+BA9yi89csS66MG4gPZKPafwjtYIz/Q=; b=FWVKdbUAqZXSyZAWzXKsPYcLmYi/gUUyk5j6R95CJcTDhYXbYBaQOQCsWEfVtsKmn1 Z8DQTanIYEN7uCiat8fswW7e6oqEQybZ9eBqBXi8ECHFnC2Ceie40LOD5Xf2rRenhC5o Z/PlDeIqnIEw8kez+NlPQCpvVSw9LE3E5G96XKTPUJ+1o9F1XbViTDV2W+EsXZQq8fXb /qXkZUZng5y5oFHwx1GWTxkXXzz1vbyfJN52PkW0zB0D7JNXQy6ool+lhWEwTsXth5DI ZeEV1gGyewMYYui4jqR4ViLqZGrHV21HsIBymZrBqaZqSeXL4OUKMOob0gxub4b9mkD7 +S9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764052343; x=1764657143; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=il2RpXEIXVNw+BA9yi89csS66MG4gPZKPafwjtYIz/Q=; b=PmkKqWg1rL02CSonO7NkMTVNjrvAcPmK0fhXYcs5cuxaGCF+sjIZdb/0Z/d80Mn7wk +1zp+I9FkrUZfmuy5lOuA0SRJgpFMk3vEFchUDDCUmh2uVJxEtiprgJyPfgfDizccAul N1TV9hMKteZFgJfZuKeqh9c2d3hVxT7xH7aM5Ve1qOj0o/RW+ETs6jldCutmiWWGhETJ hvF+3YlBRiN2IsRWCnLtHOItBEvOM4D4FGwKm6heIt/KOuDFTebLhP8MTyEociSMEUys tlNOr3HAZqgyu7goBTb8azqIiKTpyVu9Ji9jTqh8CRXPkkE65LMUkVdomjTDOVOPXB/G ZdTg== X-Gm-Message-State: AOJu0YyII1qqOAqZgQB97DDX+BNbnRUTt5Z8i42jmanWOiPV+dgpZVqP 9/m6yZg+yJs8lTQmtHkNv9gxfVbkWbzWDpY3YmQQA6tjy8ZbJUznkPualmwga4mG X-Gm-Gg: ASbGnctxFfECEnHxaCfCWFyc8f0ddmdRqZvytsOOeS+l34naP/LUJGMh9b/BUcRYitm b682S4ZXLi400JTywNGXSApfkOURF5YE31i5Wdljzncj9bp6SX+OOpcaSlU+7XOVJS0hp9l5ZN0 SskLRbHqeRjkvEl4L9pvAwWzm11XERebCe0PEpZjw8MdM1ax5IezpZNc3WqUCKl8DCbsxGEn6vP DOOPuBK+duKDRFR/hTSoAMZLTTnbrz9/CxVzhh73RxrZ/d9Ij+rrDe+seniVNtMcWTrJrAKRUo7 7DI+1r5xgmfpLlAXPL/H4uWYAKs8FbNdPIRRlCdRCz/MG9Vv9TLkORMmNtr8v5HTzJnAxVG6wH/ ytIbM7n3XMW2yFEK+3j/LgtVyEfDn9BMTqGUcj7mvm793k01By+2K8b3y4nV3SQIiLOZOpP7Ox1 1qDFw+uYs24Zc= X-Google-Smtp-Source: AGHT+IFdsC2utimf3qvk2fooicbHSBNEGLv1NFWm4ha6niJGOxk1QfjKzFA+gz2+ey1o8TG7dg4eFg== X-Received: by 2002:a05:6a20:a114:b0:342:fa5:8b20 with SMTP id adf61e73a8af0-3614f5aec2cmr14618378637.30.1764052343284; Mon, 24 Nov 2025 22:32:23 -0800 (PST) Received: from localhost ([240b:4000:bb:1700:7b36:494d:5625:5a0]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7c3f0243b2fsm16616310b3a.32.2025.11.24.22.32.22 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Nov 2025 22:32:23 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Tejun Heo , ying chen , Lai Jiangshan , Lai Jiangshan Subject: [PATCH V4 2/4] workqueue: Process rescuer work items one-by-one using a cursor Date: Tue, 25 Nov 2025 14:36:15 +0800 Message-Id: <20251125063617.671199-3-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20251125063617.671199-1-jiangshanlai@gmail.com> References: <20251125063617.671199-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Lai Jiangshan Previously, the rescuer scanned for all matching work items at once and processed them within a single rescuer thread, which could cause one blocking work item to stall all others. Make the rescuer process work items one-by-one instead of slurping all matches in a single pass. Break the rescuer loop after finding and processing the first matching work item, then restart the search to pick up the next. This gives normal worker threads a chance to process other items which gives them the opportinity to be processed instead of waiting on the rescuer's queue and prevents a blocking work item from stalling the rest once memory pressure is relieved. Introduce a dummy cursor work item to avoid potentially O(N^2) rescans of the work list. The marker records the resume position for the next scan, eliminating redundant traversals. Cc: ying chen Reported-by: ying chen Fixes: e22bee782b3b ("workqueue: implement concurrency managed dynamic work= er pool") Signed-off-by: Lai Jiangshan --- kernel/workqueue.c | 55 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 48 insertions(+), 7 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 02386e6eb409..06cd3d6ff7e1 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -286,6 +286,7 @@ struct pool_workqueue { struct list_head pending_node; /* LN: node on wq_node_nr_active->pending_= pwqs */ struct list_head pwqs_node; /* WR: node on wq->pwqs */ struct list_head mayday_node; /* MD: node on wq->maydays */ + struct work_struct mayday_cursor; /* L: cursor on pool->worklist */ =20 u64 stats[PWQ_NR_STATS]; =20 @@ -1120,6 +1121,12 @@ static struct worker *find_worker_executing_work(str= uct worker_pool *pool, return NULL; } =20 +static void mayday_cursor_func(struct work_struct *work) +{ + /* should not be processed, only for marking position */ + BUG(); +} + /** * move_linked_works - move linked works to a list * @work: start of series of works to be scheduled @@ -1182,6 +1189,16 @@ static bool assign_work(struct work_struct *work, st= ruct worker *worker, =20 lockdep_assert_held(&pool->lock); =20 + /* The cursor work should not be processed */ + if (unlikely(work->func =3D=3D mayday_cursor_func)) { + /* only worker_thread() can possibly take this branch */ + WARN_ON_ONCE(worker->rescue_wq); + if (nextp) + *nextp =3D list_next_entry(work, entry); + list_del_init(&work->entry); + return false; + } + /* * A single work shouldn't be executed concurrently by multiple workers. * __queue_work() ensures that @work doesn't jump to a different pool @@ -3436,22 +3453,33 @@ static int worker_thread(void *__worker) static bool assign_rescuer_work(struct pool_workqueue *pwq, struct worker = *rescuer) { struct worker_pool *pool =3D pwq->pool; + struct work_struct *cursor =3D &pwq->mayday_cursor; struct work_struct *work, *n; =20 + /* search from the start or cursor if available */ + if (list_empty(&cursor->entry)) { + work =3D list_first_entry(&pool->worklist, struct work_struct, entry); + } else { + work =3D list_next_entry(cursor, entry); + /* It will be at a new position or not need cursor anymore */ + list_del_init(&cursor->entry); + } + /* need rescue? */ if (!pwq->nr_active || !need_to_create_worker(pool)) return false; =20 - /* - * Slurp in all works issued via this workqueue and - * process'em. - */ - list_for_each_entry_safe(work, n, &pool->worklist, entry) { - if (get_work_pwq(work) =3D=3D pwq && assign_work(work, rescuer, &n)) + /* find the next work item to rescue */ + list_for_each_entry_safe_from(work, n, &pool->worklist, entry) { + if (get_work_pwq(work) =3D=3D pwq && assign_work(work, rescuer, &n)) { pwq->stats[PWQ_STAT_RESCUED]++; + /* put the cursor for next search */ + list_add_tail(&cursor->entry, &n->entry); + return true; + } } =20 - return !list_empty(&rescuer->scheduled); + return false; } =20 /** @@ -5135,6 +5163,19 @@ static void init_pwq(struct pool_workqueue *pwq, str= uct workqueue_struct *wq, INIT_LIST_HEAD(&pwq->pwqs_node); INIT_LIST_HEAD(&pwq->mayday_node); kthread_init_work(&pwq->release_work, pwq_release_workfn); + + /* + * Set the dummy cursor work with valid function and get_work_pwq(). + * + * The cursor work should only be in the pwq->pool->worklist, and + * should not be treated as a processable work item. + * + * WORK_STRUCT_PENDING and WORK_STRUCT_INACTIVE just make it less + * surprise for kernel debuging tools and reviewers. + */ + INIT_WORK(&pwq->mayday_cursor, mayday_cursor_func); + atomic_long_set(&pwq->mayday_cursor.data, (unsigned long)pwq | + WORK_STRUCT_PENDING | WORK_STRUCT_PWQ | WORK_STRUCT_INACTIVE); } =20 /* sync @pwq with the current state of its associated wq and link it */ --=20 2.19.1.6.gb485710b