From nobody Tue Apr 7 05:43:30 2026 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6CD34370D5A for ; Sun, 15 Mar 2026 19:39:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773603556; cv=none; b=C4IVzmIrzZxdlRnGlfqeA3Y4gyuZEGN2lUc/dox+1D7OkS9wHvwmUX2XUakNOHDWFG9GxdmFvgIxzsWBvY1BhBCn7IB1CKN/mcoewlX077DHHXbLqF82vslvyjVHUwOrj5Fp9173zeKLHjKRx2DSYhqLjD2TfegQJefMg0n8FAw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773603556; c=relaxed/simple; bh=+yvut1me2nb1qyT5JDHYiqbhaTc5jweBH9gkW/cl8Sc=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=dNaVqDCdiXKDvhPWMwNUggRRx+2lTHgmdd9S/QvwL67xs9V1oswvcqjNU8d7F71N7a7FPlgD49v9/VUAHoO3CDRF0dQtFfY69nlsPg23T4OoI/gruX+auI+2/Q02ZI2ogm+/1nvzDXfu1iukTO5wDQkc8rLPTnVFyo833dpg2XY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fwfd+fLQ; arc=none smtp.client-ip=209.85.215.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fwfd+fLQ" Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-c73e9e4cdf7so920832a12.2 for ; Sun, 15 Mar 2026 12:39:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773603555; x=1774208355; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=pk1dQAopGeigQc+GHRFqNZV2QEU15mijTHh+1XH5bYk=; b=fwfd+fLQYkztkWhwDJFI5CYBQAD5uO4I1Be/sMAn1bZaHiGdzMNhbyxnDN5NqgGeSl kpmfHNWkhgi895xCnLKU0TZ8CZ46LRby+cA+ceLbcYe94d1MNy3IgKjp3RUthNYR/rU5 gbNRXuJJVBjNzAHaqQRL51JtqiqDb2Y6FuTa3XXf7WCMRfDOobCBZTT/vCn3VccTWWK3 r4PgqsyozWEd/Mmqk/aKiPOkeVI21wr99CGeRsgvG3jq4dCMuSD3GCrMfVNvZyRm8JtE bpxNPA/Dye6NhCxCNWEACfHrnW+eO1iarHm2p+K1znZiiTvp4VG1skUEJlPhuHZVfzsI pbMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773603555; x=1774208355; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=pk1dQAopGeigQc+GHRFqNZV2QEU15mijTHh+1XH5bYk=; b=aPZLMN0So3rL9QGxC6/4DVlu+POw3jmGN7M7kXfsa5hJ5qNNEUvj1z0q+YW0w6D39K 7Hdecnc1yfcPRg0DuL+yFsAxTfS0W7AIB1YzGM+L5rNeftHabAoBiwbO3ibE3JFrU1DN 0PO0B8S+NqWIjDzk+uGjmaRWluGMMxQqKLq5tv8LSbdlFCtCzbGG1hfgOym/xpr3IPZq QJFBy/NEM3iUcOx5UPj5geCqW7JZl1d9k5y1Bzxi+kjxsd5+IaOeJeNzpz8hRypVEerm F5a0NvM2+ztfY+SDv8S5rsC5mvKyw9gJv8nPb/LkU6Hbgq59JTjXglOCM0O3qPo6pdSL MOKA== X-Forwarded-Encrypted: i=1; AJvYcCXaZcNfxbNTLo7ALNVBkC2vWxocysCvRYACxRmna4448LxY6rjnZWW9M6rn5atXBo4HxPSHpbTpmA0WPSA=@vger.kernel.org X-Gm-Message-State: AOJu0Yy++RxE0ejv1f7+7syttLjuzznSlEP2q8OvDL3DvMT7je5loGpi S6VNH/fZKfbXP2lvtZqJaiC7atRj8E1YXDw7tWp2xZvGo9Up+Sh11FPn X-Gm-Gg: ATEYQzyKavCGVVIQ73NEEi2flPVB5r/GA4PqhZcygQ/YBS5qSsBMpD4hu7aVw+9n+3V Irw+2ZoHDwqfN1Ob2W6/eXVnqeUqwpIjBy7KhfcrcmRTgZRcSNK3H6Yu+PoX7Zh5F2+ts6Fk5lR 3Hip3TspqKjv40r1+s5+KiIW30AvoYKkwqdY8NOscdeJTYATsUxger/KoxCECxvbj7X9aUuUCSE jCx03Pil7l0NLqNcqdBZnuxSMRXdd0yJlRUghuwP2P38heDahTEL1refg6YJZkwd2sKcmvfItZv TDQuXAFwJV3GLvS+MciWZ/iVDNSV4jjh0zPsLwcMrJy00jAmobN5v/PoDXsTSTt8IZK05xyvefK QZ1h004Wene1cbqCvddrA7GCC3aah7emDktx6FXi3k3BRzSIJTp9rO7aMkmd6bK1hvIqYSsw7at rmM17gtpO9iScXdm9RRXgYIb4NMhgu96hefNmTN7eZZATnKjxkn+K0CBXbA+KkuBhL9yqMEpClR 2gBUwNuk0BIOCQNCt2cyYJdDEyn2zWhtaD64Hi/yCl/hsM= X-Received: by 2002:a17:902:c947:b0:2ae:5763:99c4 with SMTP id d9443c01a7336-2aecaa14cf4mr107591555ad.26.1773603554520; Sun, 15 Mar 2026 12:39:14 -0700 (PDT) Received: from visitorckw-work01.c.googlers.com.com (7.162.199.104.bc.googleusercontent.com. [104.199.162.7]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2aece7ed753sm104795755ad.45.2026.03.15.12.39.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 15 Mar 2026 12:39:14 -0700 (PDT) From: Kuan-Wei Chiu To: richard@nod.at, akpm@linux-foundation.org Cc: chengzhihao1@huawei.com, jserv@ccns.ncku.edu.tw, eleanor15x@gmail.com, marscheng@google.com, linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org, Kuan-Wei Chiu Subject: [PATCH] lib/list_sort: introduce list_sort_nonatomic() and remove dummy cmp() calls Date: Sun, 15 Mar 2026 19:39:00 +0000 Message-ID: <20260315193900.218737-1-visitorckw@gmail.com> X-Mailer: git-send-email 2.53.0.851.ga537e3e6e9-goog Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Historically, list_sort() implemented a hack in merge_final(): if (unlikely(!++count)) cmp(priv, b, b); This was designed specifically so that callers could periodically invoke cond_resched() within their comparison functions when merging highly unbalanced lists. However, an audit of the kernel tree reveals that only fs/ubifs/ relies on this mechanism. For the vast majority of list_sort() users (such as block layer IO schedulers and file systems), this results in completely wasted function calls. In the worst-case scenario (merging an already sorted list where 'a' is exhausted quickly), this results in approximately (N/2)/256 unnecessary cmp() calls. To clean up this API while ensuring behavior compatibility: 1. Introduce list_sort_nonatomic(), which explicitly calls cond_resched() internally when count overflows. 2. Remove the dummy cmp(priv, b, b) fallback for standard list_sort(), saving unnecessary function calls and improving determinism. 3. Convert the sole user (fs/ubifs/) to the new API. Note that ubifs still maintains cond_resched() inside its own comparison functions. This patch does not alter the frequency or timing of those scheduling points, guaranteeing no regressions for ubifs, while benefiting all other kernel users. Signed-off-by: Kuan-Wei Chiu Reviewed-by: Zhihao Cheng --- fs/ubifs/gc.c | 4 +- fs/ubifs/replay.c | 2 +- include/linux/list_sort.h | 3 + lib/list_sort.c | 166 +++++++++++++++++++++----------------- 4 files changed, 100 insertions(+), 75 deletions(-) diff --git a/fs/ubifs/gc.c b/fs/ubifs/gc.c index 0bf08b7755b8..dcc5742210b0 100644 --- a/fs/ubifs/gc.c +++ b/fs/ubifs/gc.c @@ -277,8 +277,8 @@ static int sort_nodes(struct ubifs_info *c, struct ubif= s_scan_leb *sleb, } =20 /* Sort data and non-data nodes */ - list_sort(c, &sleb->nodes, &data_nodes_cmp); - list_sort(c, nondata, &nondata_nodes_cmp); + list_sort_nonatomic(c, &sleb->nodes, &data_nodes_cmp); + list_sort_nonatomic(c, nondata, &nondata_nodes_cmp); =20 err =3D dbg_check_data_nodes_order(c, &sleb->nodes); if (err) diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c index a9a568f4a868..ad03dd106a54 100644 --- a/fs/ubifs/replay.c +++ b/fs/ubifs/replay.c @@ -329,7 +329,7 @@ static int apply_replay_list(struct ubifs_info *c) struct replay_entry *r; int err; =20 - list_sort(c, &c->replay_list, &replay_entries_cmp); + list_sort_nonatomic(c, &c->replay_list, &replay_entries_cmp); =20 list_for_each_entry(r, &c->replay_list, list) { cond_resched(); diff --git a/include/linux/list_sort.h b/include/linux/list_sort.h index 453105f74e05..f7af29073d48 100644 --- a/include/linux/list_sort.h +++ b/include/linux/list_sort.h @@ -11,4 +11,7 @@ typedef int __attribute__((nonnull(2,3))) (*list_cmp_func= _t)(void *, =20 __attribute__((nonnull(2,3))) void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp); + +__attribute__((nonnull(2, 3))) +void list_sort_nonatomic(void *priv, struct list_head *head, list_cmp_func= _t cmp); #endif diff --git a/lib/list_sort.c b/lib/list_sort.c index a310ecb7ccc0..788bfc26cf7b 100644 --- a/lib/list_sort.c +++ b/lib/list_sort.c @@ -3,6 +3,7 @@ #include #include #include +#include =20 /* * Returns a list organized in an intermediate format suited @@ -47,7 +48,7 @@ static struct list_head *merge(void *priv, list_cmp_func_= t cmp, */ __attribute__((nonnull(2,3,4,5))) static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head = *head, - struct list_head *a, struct list_head *b) + struct list_head *a, struct list_head *b, bool may_schedule) { struct list_head *tail =3D head; u8 count =3D 0; @@ -79,12 +80,11 @@ static void merge_final(void *priv, list_cmp_func_t cmp= , struct list_head *head, /* * If the merge is highly unbalanced (e.g. the input is * already sorted), this loop may run many iterations. - * Continue callbacks to the client even though no - * element comparison is needed, so the client's cmp() - * routine can invoke cond_resched() periodically. + * If may_schedule is true, periodically invoke cond_resched() + * to avoid soft lockups. */ - if (unlikely(!++count)) - cmp(priv, b, b); + if (may_schedule && unlikely(!++count)) + cond_resched(); b->prev =3D tail; tail =3D b; b =3D b->next; @@ -95,6 +95,75 @@ static void merge_final(void *priv, list_cmp_func_t cmp,= struct list_head *head, head->prev =3D tail; } =20 +static void __list_sort(void *priv, struct list_head *head, list_cmp_func_= t cmp, bool may_schedule) +{ + struct list_head *list =3D head->next, *pending =3D NULL; + size_t count =3D 0; /* Count of pending */ + + if (list =3D=3D head->prev) /* Zero or one elements */ + return; + + /* Convert to a null-terminated singly-linked list. */ + head->prev->next =3D NULL; + + /* + * Data structure invariants: + * - All lists are singly linked and null-terminated; prev + * pointers are not maintained. + * - pending is a prev-linked "list of lists" of sorted + * sublists awaiting further merging. + * - Each of the sorted sublists is power-of-two in size. + * - Sublists are sorted by size and age, smallest & newest at front. + * - There are zero to two sublists of each size. + * - A pair of pending sublists are merged as soon as the number + * of following pending elements equals their size (i.e. + * each time count reaches an odd multiple of that size). + * That ensures each later final merge will be at worst 2:1. + * - Each round consists of: + * - Merging the two sublists selected by the highest bit + * which flips when count is incremented, and + * - Adding an element from the input as a size-1 sublist. + */ + do { + size_t bits; + struct list_head **tail =3D &pending; + + /* Find the least-significant clear bit in count */ + for (bits =3D count; bits & 1; bits >>=3D 1) + tail =3D &(*tail)->prev; + /* Do the indicated merge */ + if (likely(bits)) { + struct list_head *a =3D *tail, *b =3D a->prev; + + a =3D merge(priv, cmp, b, a); + /* Install the merged result in place of the inputs */ + a->prev =3D b->prev; + *tail =3D a; + } + + /* Move one element from input list to pending */ + list->prev =3D pending; + pending =3D list; + list =3D list->next; + pending->next =3D NULL; + count++; + } while (list); + + /* End of input; merge together all the pending lists. */ + list =3D pending; + pending =3D pending->prev; + for (;;) { + struct list_head *next =3D pending->prev; + + if (!next) + break; + list =3D merge(priv, cmp, pending, list); + pending =3D next; + } + /* The final merge, rebuilding prev links */ + merge_final(priv, cmp, head, pending, list, may_schedule); +} + /** * list_sort - sort a list * @priv: private data, opaque to list_sort(), passed to @cmp @@ -185,73 +254,26 @@ static void merge_final(void *priv, list_cmp_func_t c= mp, struct list_head *head, * of size 2^k varies from 2^(k-1) (cases 3 and 5 when x =3D=3D 0) to * 2^(k+1) - 1 (second merge of case 5 when x =3D=3D 2^(k-1) - 1). */ -__attribute__((nonnull(2,3))) +__attribute__((nonnull(2, 3))) void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp) { - struct list_head *list =3D head->next, *pending =3D NULL; - size_t count =3D 0; /* Count of pending */ - - if (list =3D=3D head->prev) /* Zero or one elements */ - return; - - /* Convert to a null-terminated singly-linked list. */ - head->prev->next =3D NULL; - - /* - * Data structure invariants: - * - All lists are singly linked and null-terminated; prev - * pointers are not maintained. - * - pending is a prev-linked "list of lists" of sorted - * sublists awaiting further merging. - * - Each of the sorted sublists is power-of-two in size. - * - Sublists are sorted by size and age, smallest & newest at front. - * - There are zero to two sublists of each size. - * - A pair of pending sublists are merged as soon as the number - * of following pending elements equals their size (i.e. - * each time count reaches an odd multiple of that size). - * That ensures each later final merge will be at worst 2:1. - * - Each round consists of: - * - Merging the two sublists selected by the highest bit - * which flips when count is incremented, and - * - Adding an element from the input as a size-1 sublist. - */ - do { - size_t bits; - struct list_head **tail =3D &pending; - - /* Find the least-significant clear bit in count */ - for (bits =3D count; bits & 1; bits >>=3D 1) - tail =3D &(*tail)->prev; - /* Do the indicated merge */ - if (likely(bits)) { - struct list_head *a =3D *tail, *b =3D a->prev; - - a =3D merge(priv, cmp, b, a); - /* Install the merged result in place of the inputs */ - a->prev =3D b->prev; - *tail =3D a; - } - - /* Move one element from input list to pending */ - list->prev =3D pending; - pending =3D list; - list =3D list->next; - pending->next =3D NULL; - count++; - } while (list); - - /* End of input; merge together all the pending lists. */ - list =3D pending; - pending =3D pending->prev; - for (;;) { - struct list_head *next =3D pending->prev; - - if (!next) - break; - list =3D merge(priv, cmp, pending, list); - pending =3D next; - } - /* The final merge, rebuilding prev links */ - merge_final(priv, cmp, head, pending, list); + __list_sort(priv, head, cmp, false); } EXPORT_SYMBOL(list_sort); + +/** + * list_sort_nonatomic - sort a list with conditional rescheduling + * @priv: private data, opaque to list_sort(), passed to @cmp + * @head: the list to sort + * @cmp: the elements comparison function + * + * This variant of list_sort() periodically invokes cond_resched() + * during the sorting process. It should be used for sorting very + * long lists where the operation might otherwise cause soft lockups. + */ +__attribute__((nonnull(2, 3))) +void list_sort_nonatomic(void *priv, struct list_head *head, list_cmp_func= _t cmp) +{ + __list_sort(priv, head, cmp, true); +} +EXPORT_SYMBOL(list_sort_nonatomic); --=20 2.53.0.851.ga537e3e6e9-goog