From nobody Mon Apr 6 22:00:02 2026 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11A68357A4A for ; Tue, 17 Mar 2026 16:59:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773766762; cv=none; b=dNZFzL/KmC+EogN4T1AyYV9hnePCEzX8pUgAolpONnIaYsw81ZOYwwhe00Kh8hLJ3Y056d8L/6h4tYloY/PHKWerarnKtQTXZe3N3ahTx1m3s2zGsdvoeGYhpQsBaKpO7q0eV6g/HZC2NMjR9ZvSYiuJgKQad+/kgu/TztEZhVs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773766762; c=relaxed/simple; bh=WAzsgXaseVwxhQx8kusobwPRbV/QZcfHBKjkCsQdME4=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=sd+6e5yXayz/FbcR1u6ix6Hc29qWtnlzxHO7LdrImbsaqdn83lOzuRYm6BP0Ik60CVDoog4ubT2HGQ2qqYoRPRbPEFKrHX7T+a3oTx6Qb3gr/OUL15ED2XhcRcOha1T643pOCdRoFS0HAL4i2LfSSgRgfpHqEwJTdFzH4VqUUKU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kbz0pbi6; arc=none smtp.client-ip=209.85.216.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kbz0pbi6" Received: by mail-pj1-f42.google.com with SMTP id 98e67ed59e1d1-354bc7c2c46so3409747a91.0 for ; Tue, 17 Mar 2026 09:59:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773766760; x=1774371560; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=lLGrFq7YBQDBnONt65n6ShanougI911NrCNpM+pAujI=; b=kbz0pbi692YDxpLrelFO9lDAAfQh5gwE90vSv58s5gqp5uRcPCrxou5ZhHLxxk38Rs ttA2TeMtCNfrjV9kVgtJw/UDbS38rZntRNubxZKNqFkr38d15D1uhMA772MdvRS/oost HMLdqg1jaSFQwdlkdxD9KnpwCsL79buxJ8mrleD84q7CP6M5j6RgJcQyfup6Ew1/7+sD //Io9oh86iqrBUAwnIFoXfsxOlqn2a0p+8D1EeaEaL1kdDDShLi1I+6fImoJ9hrdGD4N qLmXtmHhRy/KS0wo/a02uRbGQ7cpqrrvRRrVJMBj85TSjhojvFLkvhnGGH6IZRXhdcqT zj2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773766760; x=1774371560; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=lLGrFq7YBQDBnONt65n6ShanougI911NrCNpM+pAujI=; b=L3dcqH2577U2pvkoMyHKzJ8t+7gPa3Sy7MDhjF1dho0kHwcd1HigMhCF+oRglDe459 sy/cl4ynghgz7u7g6bcWXiGAV4UG85U+SBpztCuNyHJ0A7eaej8fCDwcBKDp+DOno+wE vnYmBF16MBg5W4mIshz/CsSczgpnJLaQ9V4vuVK6LH02EQvEq89ZjWh9Y1ANPoE1feOE 1J3bLV+5ckW/7e3ijvb82Dpfg2hG+Nkc3S5CYSmpIYNCPOhJYBBZ3ZGiAaEZAtJv8fr/ /WtrIHoB3x71VuDuRRTgllkW6lyzTiV8NXm/lSO8L+m7s7aZK3ky5nko7LUaXYakZt3G nTtw== X-Forwarded-Encrypted: i=1; AJvYcCWzDrDuUgwSDloRqDdTInNPMLCkqDVxUnn0kZ////yn3jLryxG9+UU5p4SUWOVH9luA/mlantCFwyv+dwU=@vger.kernel.org X-Gm-Message-State: AOJu0YzdBgPG4b4+tEhQbZFnRJkhKbiYfryTwP2+PeP4rpDQuwauh7D1 0oP6VyrUAIWv1L+tzy6Zxe8FlrcxAMjqBzOP7I8hE41H276SQQJ4Ldnp X-Gm-Gg: ATEYQzzkGVdY5SuOwR34asXX0JZMJjo4Eu1ZVMbCaoMZ+LBZVAYP/H0k8wQPmtu72bb MZRWeraA1yG4V+NRG0HDmuDVbQI7TnvFW2ndRc73GtdDFmM0EHcYja+rlclS3FVzij4BYTilPlH R1QvkVWmGJ9KFYOxFkmnB+890x9NAtFg5guk20J49Peu0F0fsL5wd308wq59aeKy4QkTeXA1tR5 s/tIt/ykkyRYFOxK1+hYJo5H63L1Y0exlcf/jUESHDQcmmsaXhv4dUtHf97glemzVME8+uWylW6 Yb+vmpBrKlNr5Yr1Yys8ah4LwmE1IVTsKqhEt9Z9jBfUGUdbFftywpRbOKVipcuy07NDYX/CJBX esANK6gDXmTIWj3BS4H1Di3NeRxsQHpjiFwGgy1qM6+2FQNLuTHT1MFSU1fGmbjEXua5050aEw4 DkMlIog1i7BWlu8t0L2P0uP/Zq4z3pFZk04RZLsT0M0Ioq9A9E72ci25ZYdEnMoJSuNDbCf4NHm u+ayNClioG8gkGYHr28CL4I+1Kr/NjsXp9h X-Received: by 2002:a17:90b:2692:b0:35b:a3be:f1b6 with SMTP id 98e67ed59e1d1-35bb9ef60d0mr136595a91.16.1773766760241; Tue, 17 Mar 2026 09:59:20 -0700 (PDT) Received: from visitorckw-work01.c.googlers.com.com (7.162.199.104.bc.googleusercontent.com. [104.199.162.7]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-35badbcda60sm3555383a91.14.2026.03.17.09.59.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Mar 2026 09:59:19 -0700 (PDT) From: Kuan-Wei Chiu To: richard@nod.at, akpm@linux-foundation.org Cc: chengzhihao1@huawei.com, hch@infradead.org, jserv@ccns.ncku.edu.tw, eleanor15x@gmail.com, marscheng@google.com, linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org, Kuan-Wei Chiu Subject: [PATCH v2] lib/list_sort: introduce list_sort_nonatomic() and clean up scheduling workarounds Date: Tue, 17 Mar 2026 16:59:05 +0000 Message-ID: <20260317165905.1482256-1-visitorckw@gmail.com> X-Mailer: git-send-email 2.53.0.851.ga537e3e6e9-goog Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Historically, list_sort() implemented a hack in merge_final(): if (unlikely(!++count)) cmp(priv, b, b); This was designed specifically so that callers could periodically invoke cond_resched() within their comparison functions when merging highly unbalanced lists. However, an audit of the kernel tree reveals that only fs/ubifs/ relies on this mechanism. For the vast majority of list_sort() users (such as block layer IO schedulers and file systems), this results in completely wasted function calls. In the worst-case scenario (merging an already sorted list where 'a' is exhausted quickly), it results in approximately (N/2)/256 unnecessary cmp() calls. To clean up this API, eliminate the overhead for generic users, and consolidate the scheduling logic: 1. Introduce list_sort_nonatomic(), which explicitly calls cond_resched() within its inner merge loops. 2. Remove the dummy cmp(priv, b, b) fallback from standard list_sort(), saving unnecessary function calls and improving determinism for all other subsystems. 3. Convert the sole user (fs/ubifs/) to the new API and completely remove cond_resched() from UBIFS's comparison callbacks, unpolluting its comparison logic. This change leaves the generic list_sort() completely free of scheduling hacks, simplifies UBIFS's callbacks, and ensures that legacy long-list sorting workloads remain safe from soft lockups on non-preemptible kernels. Reviewed-by: Zhihao Cheng Signed-off-by: Kuan-Wei Chiu --- Changes in v2: - Drop the 'count' rate-limiter for cond_resched(). - Remove cond_resched() from UBIFS comparison callbacks. fs/ubifs/gc.c | 6 +- fs/ubifs/replay.c | 3 +- include/linux/list_sort.h | 3 + lib/list_sort.c | 174 ++++++++++++++++++++++---------------- 4 files changed, 106 insertions(+), 80 deletions(-) diff --git a/fs/ubifs/gc.c b/fs/ubifs/gc.c index 0bf08b7755b8..54fac33224fa 100644 --- a/fs/ubifs/gc.c +++ b/fs/ubifs/gc.c @@ -109,7 +109,6 @@ static int data_nodes_cmp(void *priv, const struct list= _head *a, struct ubifs_info *c =3D priv; struct ubifs_scan_node *sa, *sb; =20 - cond_resched(); if (a =3D=3D b) return 0; =20 @@ -153,7 +152,6 @@ static int nondata_nodes_cmp(void *priv, const struct l= ist_head *a, struct ubifs_info *c =3D priv; struct ubifs_scan_node *sa, *sb; =20 - cond_resched(); if (a =3D=3D b) return 0; =20 @@ -277,8 +275,8 @@ static int sort_nodes(struct ubifs_info *c, struct ubif= s_scan_leb *sleb, } =20 /* Sort data and non-data nodes */ - list_sort(c, &sleb->nodes, &data_nodes_cmp); - list_sort(c, nondata, &nondata_nodes_cmp); + list_sort_nonatomic(c, &sleb->nodes, &data_nodes_cmp); + list_sort_nonatomic(c, nondata, &nondata_nodes_cmp); =20 err =3D dbg_check_data_nodes_order(c, &sleb->nodes); if (err) diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c index a9a568f4a868..c639394e60ac 100644 --- a/fs/ubifs/replay.c +++ b/fs/ubifs/replay.c @@ -305,7 +305,6 @@ static int replay_entries_cmp(void *priv, const struct = list_head *a, struct ubifs_info *c =3D priv; struct replay_entry *ra, *rb; =20 - cond_resched(); if (a =3D=3D b) return 0; =20 @@ -329,7 +328,7 @@ static int apply_replay_list(struct ubifs_info *c) struct replay_entry *r; int err; =20 - list_sort(c, &c->replay_list, &replay_entries_cmp); + list_sort_nonatomic(c, &c->replay_list, &replay_entries_cmp); =20 list_for_each_entry(r, &c->replay_list, list) { cond_resched(); diff --git a/include/linux/list_sort.h b/include/linux/list_sort.h index 453105f74e05..f7af29073d48 100644 --- a/include/linux/list_sort.h +++ b/include/linux/list_sort.h @@ -11,4 +11,7 @@ typedef int __attribute__((nonnull(2,3))) (*list_cmp_func= _t)(void *, =20 __attribute__((nonnull(2,3))) void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp); + +__attribute__((nonnull(2, 3))) +void list_sort_nonatomic(void *priv, struct list_head *head, list_cmp_func= _t cmp); #endif diff --git a/lib/list_sort.c b/lib/list_sort.c index a310ecb7ccc0..9f9d12ae23ab 100644 --- a/lib/list_sort.c +++ b/lib/list_sort.c @@ -3,6 +3,7 @@ #include #include #include +#include =20 /* * Returns a list organized in an intermediate format suited @@ -11,11 +12,14 @@ */ __attribute__((nonnull(2,3,4))) static struct list_head *merge(void *priv, list_cmp_func_t cmp, - struct list_head *a, struct list_head *b) + struct list_head *a, struct list_head *b, + bool may_schedule) { struct list_head *head, **tail =3D &head; =20 for (;;) { + if (may_schedule) + cond_resched(); /* if equal, take 'a' -- important for sort stability */ if (cmp(priv, a, b) <=3D 0) { *tail =3D a; @@ -47,12 +51,13 @@ static struct list_head *merge(void *priv, list_cmp_fun= c_t cmp, */ __attribute__((nonnull(2,3,4,5))) static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head = *head, - struct list_head *a, struct list_head *b) + struct list_head *a, struct list_head *b, bool may_schedule) { struct list_head *tail =3D head; - u8 count =3D 0; =20 for (;;) { + if (may_schedule) + cond_resched(); /* if equal, take 'a' -- important for sort stability */ if (cmp(priv, a, b) <=3D 0) { tail->next =3D a; @@ -79,12 +84,11 @@ static void merge_final(void *priv, list_cmp_func_t cmp= , struct list_head *head, /* * If the merge is highly unbalanced (e.g. the input is * already sorted), this loop may run many iterations. - * Continue callbacks to the client even though no - * element comparison is needed, so the client's cmp() - * routine can invoke cond_resched() periodically. + * If may_schedule is true, invoke cond_resched() to + * avoid soft lockups. */ - if (unlikely(!++count)) - cmp(priv, b, b); + if (may_schedule) + cond_resched(); b->prev =3D tail; tail =3D b; b =3D b->next; @@ -95,6 +99,75 @@ static void merge_final(void *priv, list_cmp_func_t cmp,= struct list_head *head, head->prev =3D tail; } =20 +static void __list_sort(void *priv, struct list_head *head, list_cmp_func_= t cmp, bool may_schedule) +{ + struct list_head *list =3D head->next, *pending =3D NULL; + size_t count =3D 0; /* Count of pending */ + + if (list =3D=3D head->prev) /* Zero or one elements */ + return; + + /* Convert to a null-terminated singly-linked list. */ + head->prev->next =3D NULL; + + /* + * Data structure invariants: + * - All lists are singly linked and null-terminated; prev + * pointers are not maintained. + * - pending is a prev-linked "list of lists" of sorted + * sublists awaiting further merging. + * - Each of the sorted sublists is power-of-two in size. + * - Sublists are sorted by size and age, smallest & newest at front. + * - There are zero to two sublists of each size. + * - A pair of pending sublists are merged as soon as the number + * of following pending elements equals their size (i.e. + * each time count reaches an odd multiple of that size). + * That ensures each later final merge will be at worst 2:1. + * - Each round consists of: + * - Merging the two sublists selected by the highest bit + * which flips when count is incremented, and + * - Adding an element from the input as a size-1 sublist. + */ + do { + size_t bits; + struct list_head **tail =3D &pending; + + /* Find the least-significant clear bit in count */ + for (bits =3D count; bits & 1; bits >>=3D 1) + tail =3D &(*tail)->prev; + /* Do the indicated merge */ + if (likely(bits)) { + struct list_head *a =3D *tail, *b =3D a->prev; + + a =3D merge(priv, cmp, b, a, may_schedule); + /* Install the merged result in place of the inputs */ + a->prev =3D b->prev; + *tail =3D a; + } + + /* Move one element from input list to pending */ + list->prev =3D pending; + pending =3D list; + list =3D list->next; + pending->next =3D NULL; + count++; + } while (list); + + /* End of input; merge together all the pending lists. */ + list =3D pending; + pending =3D pending->prev; + for (;;) { + struct list_head *next =3D pending->prev; + + if (!next) + break; + list =3D merge(priv, cmp, pending, list, may_schedule); + pending =3D next; + } + /* The final merge, rebuilding prev links */ + merge_final(priv, cmp, head, pending, list, may_schedule); +} + /** * list_sort - sort a list * @priv: private data, opaque to list_sort(), passed to @cmp @@ -185,73 +258,26 @@ static void merge_final(void *priv, list_cmp_func_t c= mp, struct list_head *head, * of size 2^k varies from 2^(k-1) (cases 3 and 5 when x =3D=3D 0) to * 2^(k+1) - 1 (second merge of case 5 when x =3D=3D 2^(k-1) - 1). */ -__attribute__((nonnull(2,3))) +__attribute__((nonnull(2, 3))) void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp) { - struct list_head *list =3D head->next, *pending =3D NULL; - size_t count =3D 0; /* Count of pending */ - - if (list =3D=3D head->prev) /* Zero or one elements */ - return; - - /* Convert to a null-terminated singly-linked list. */ - head->prev->next =3D NULL; - - /* - * Data structure invariants: - * - All lists are singly linked and null-terminated; prev - * pointers are not maintained. - * - pending is a prev-linked "list of lists" of sorted - * sublists awaiting further merging. - * - Each of the sorted sublists is power-of-two in size. - * - Sublists are sorted by size and age, smallest & newest at front. - * - There are zero to two sublists of each size. - * - A pair of pending sublists are merged as soon as the number - * of following pending elements equals their size (i.e. - * each time count reaches an odd multiple of that size). - * That ensures each later final merge will be at worst 2:1. - * - Each round consists of: - * - Merging the two sublists selected by the highest bit - * which flips when count is incremented, and - * - Adding an element from the input as a size-1 sublist. - */ - do { - size_t bits; - struct list_head **tail =3D &pending; - - /* Find the least-significant clear bit in count */ - for (bits =3D count; bits & 1; bits >>=3D 1) - tail =3D &(*tail)->prev; - /* Do the indicated merge */ - if (likely(bits)) { - struct list_head *a =3D *tail, *b =3D a->prev; - - a =3D merge(priv, cmp, b, a); - /* Install the merged result in place of the inputs */ - a->prev =3D b->prev; - *tail =3D a; - } - - /* Move one element from input list to pending */ - list->prev =3D pending; - pending =3D list; - list =3D list->next; - pending->next =3D NULL; - count++; - } while (list); - - /* End of input; merge together all the pending lists. */ - list =3D pending; - pending =3D pending->prev; - for (;;) { - struct list_head *next =3D pending->prev; - - if (!next) - break; - list =3D merge(priv, cmp, pending, list); - pending =3D next; - } - /* The final merge, rebuilding prev links */ - merge_final(priv, cmp, head, pending, list); + __list_sort(priv, head, cmp, false); } EXPORT_SYMBOL(list_sort); + +/** + * list_sort_nonatomic - sort a list with conditional rescheduling + * @priv: private data, opaque to list_sort(), passed to @cmp + * @head: the list to sort + * @cmp: the elements comparison function + * + * This variant of list_sort() periodically invokes cond_resched() + * during the sorting process. It should be used for sorting very + * long lists where the operation might otherwise cause soft lockups. + */ +__attribute__((nonnull(2, 3))) +void list_sort_nonatomic(void *priv, struct list_head *head, list_cmp_func= _t cmp) +{ + __list_sort(priv, head, cmp, true); +} +EXPORT_SYMBOL(list_sort_nonatomic); --=20 2.53.0.851.ga537e3e6e9-goog