From nobody Sun Feb 8 17:37:19 2026 Received: from mail-dy1-f171.google.com (mail-dy1-f171.google.com [74.125.82.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58EC3303A37 for ; Thu, 22 Jan 2026 17:03:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769101447; cv=none; b=IDLpXjwO3g2hkpuRHK0GYTUUhOd9V40o25JuVkMrUsVcbrU45ENX1qpPY1NnVQXrJz0V8TKzRpGgkWJCNIWh11zhV6zGOGk/+fulSpa191y5OIiHBfOUXU2JNGicbKPYzav2+J/UL9dAgUd4/DSNb2yFWkHMiEYlsuqASimq4aA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769101447; c=relaxed/simple; bh=VY2vVw51mePM2zgAbBEm/Vez+SIerJBHj9XE3FAxxKY=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=Fb7hhwmjNKpOZAH2tX+y+yHDyoJxSHq5fPb56DmYvTjwy3eODan95+eLy4YPdq7SgWm7bJohx/TWyrk2UwCYzaRyp3CZQHInRMuQYRb/TUaNLWjkKCvo6Xyi6zDhixzqW064lmIUVUiJ1WYBJ/6fVczuTg8RG7VACwBAtFStVFU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=NSuHtNm6; arc=none smtp.client-ip=74.125.82.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NSuHtNm6" Received: by mail-dy1-f171.google.com with SMTP id 5a478bee46e88-2b720e4dcb4so1224764eec.0 for ; Thu, 22 Jan 2026 09:03:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769101431; x=1769706231; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=mwaw7ulJ8eqCQzZAWx+2Cr0wFnlDolIVL12AnNX/htc=; b=NSuHtNm6nJF65NkZwb598bhhVKbonZGshcooJjJgKBbj5c7dx00FrDj3/wBCWL1ENz qeo1mtuDSYchtFmUR5HBRk+Jnyd4AdqL8oSdRCWc7/jRFb62Qc4OBPreLqVbHOFZM0ZP /HAL0lfgcfTnpX0aO/iY+vekROT4bPsfV+Xtmh82RrN+YSeev4jBygUqK6LYHNLJ8S8k QqTd80ddjC5OJN2hjVbOLjZcjZ4hgXZi+9WyNpp8Dk2Jed/f74tYBA7+zJ+yphjEjIeU M/dfMQnPmaYnvBTnSpUOS7DFAC3pq+tNrachUZdt83x5U7r67hft1JdU5x7i4YXC3Zv6 hz2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769101431; x=1769706231; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=mwaw7ulJ8eqCQzZAWx+2Cr0wFnlDolIVL12AnNX/htc=; b=IwlMWj0fz9Uj2PG7z05r35dHP4pD4N4qSXGdxntbxWs9tHTM22tnBGGeBdOU7jNdep QHTTOR8KxM/qXPqszsZXDbGdrhj+zo7j0DaolNHhbA4T7V1MxtYpyHTPEKke4B6Cp27K Pfl2UENBBtqAPne5eXS8Cp8v1M4Ot2NIQJAaU544IlbZDJ0sRV79p+JqvKFG9wtaezH9 T+8hQpdpI/EyXuyNWyqqfzEziMX9R/1iuBRO5Irc/5/2LYYkVQJkZpwg4tSeNSkZfs/L FlSvPZl2rto8psPTAsxVlXAo0OBhR1B6SMqE50CI/TtfUlrZGXMFTnvrXGc92h+PoY+W TDNw== X-Forwarded-Encrypted: i=1; AJvYcCVSlo5psKFs+A0X2C1l+V9viNIF5Q1zD1AnZ6S2pZEK5PqCkGweqObHq8izPJFx3ZK6WSm9vVjiTxZVqrI=@vger.kernel.org X-Gm-Message-State: AOJu0YzQ5Jda4lPxH0cyBj3cInl7G1Z3Du3KzZr5FyLaz1pK3vrtlf6t pLhMoSjkHqkwqbvVLvITCXaYPq5BhUR6IKZO6D8tb039svN95y/Kcs5K X-Gm-Gg: AZuq6aL4dlK/viajG8xdd6bGaTVUEzMRTlNYvH5WOrEjMBPbVZDvBIgOTa2BQPjeOds 2m4M+60EF0WEZxE/0AbNAxk4K//3EBAJfCNCE5VUxAoVNUMOQcXhegPUksJe7fvo3q0+1QSrZzU C92MuMAJxRqGSib6AOjs+qJYWg1i51hkHMu80zPDQlKGIlQfQbf3H7bf+a1jrbetIM7y1F9gvjc xsqjQaTvLOvwyOuVfqIlYhyBh99CRAdatglRR+t0L7D3x2jT/2TMSjYZB++qze/aa5tMQmJ7hxH grG7VHie0dCxQ9ya2ZQE3nzq1PsU9ylsaziV84tCVBl1qYK0eq8616nSuDMHO4TF+mMPYGjiUEM jvNDSHLT3CfdR1O5ZOK0IrSeidJ6NQ+Yf7eECrVP3KCQzWam+2NR9ESXZSQUMy8ET66xjH/SlCe fLPomIckVib2Zl4Q== X-Received: by 2002:a05:693c:2296:b0:2b4:7e6b:9c00 with SMTP id 5a478bee46e88-2b739b68347mr247eec.23.1769101431087; Thu, 22 Jan 2026 09:03:51 -0800 (PST) Received: from debian ([74.48.213.230]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-2b6b361f5d4sm26392109eec.17.2026.01.22.09.03.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Jan 2026 09:03:50 -0800 (PST) From: Qiliang Yuan To: viro@zeniv.linux.org.uk, brauner@kernel.org Cc: jack@suse.cz, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, yuanql9@chinatelecom.cn, Qiliang Yuan Subject: [PATCH] fs/file: optimize FD allocation complexity with 3-level summary bitmap Date: Thu, 22 Jan 2026 12:03:45 -0500 Message-ID: <20260122170345.157803-1-realwujing@gmail.com> X-Mailer: git-send-email 2.51.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Current FD allocation performs a two-level bitmap search (open_fds and full_fds_bits). This results in O(N/64) complexity when searching for a free FD, as the kernel needs to scan the first-level summary bitmap. For processes with very large FD limits (e.g., millions of sockets), scanning even the level 1 summary bitmap can become a bottleneck during high-frequency allocation. This patch introduces a third level of summary bitmap (full_fds_bits_l2), where each bit represents whether a 64-word chunk (4096 FDs) in open_fds is fully allocated. This reduces the search complexity to O(N/4096), making FD allocation significantly more scalable for high-concurrency workloads. Signed-off-by: Qiliang Yuan Signed-off-by: Qiliang Yuan --- fs/file.c | 45 +++++++++++++++++++++++++++++++++-------- include/linux/fdtable.h | 2 ++ 2 files changed, 39 insertions(+), 8 deletions(-) diff --git a/fs/file.c b/fs/file.c index 0a4f3bdb2dec..1163160e81af 100644 --- a/fs/file.c +++ b/fs/file.c @@ -114,6 +114,8 @@ static void free_fdtable_rcu(struct rcu_head *rcu) =20 #define BITBIT_NR(nr) BITS_TO_LONGS(BITS_TO_LONGS(nr)) #define BITBIT_SIZE(nr) (BITBIT_NR(nr) * sizeof(long)) +#define BITBITBIT_NR(nr) BITS_TO_LONGS(BITBIT_NR(nr)) +#define BITBITBIT_SIZE(nr) (BITBITBIT_NR(nr) * sizeof(long)) =20 #define fdt_words(fdt) ((fdt)->max_fds / BITS_PER_LONG) // words in ->open= _fds /* @@ -132,6 +134,8 @@ static inline void copy_fd_bitmaps(struct fdtable *nfdt= , struct fdtable *ofdt, copy_words * BITS_PER_LONG, nwords * BITS_PER_LONG); bitmap_copy_and_extend(nfdt->full_fds_bits, ofdt->full_fds_bits, copy_words, nwords); + bitmap_copy_and_extend(nfdt->full_fds_bits_l2, ofdt->full_fds_bits_l2, + BITS_TO_LONGS(copy_words), BITS_TO_LONGS(nwords)); } =20 /* @@ -222,7 +226,7 @@ static struct fdtable *alloc_fdtable(unsigned int slots= _wanted) fdt->fd =3D data; =20 data =3D kvmalloc(max_t(size_t, - 2 * nr / BITS_PER_BYTE + BITBIT_SIZE(nr), L1_CACHE_BYTES), + 2 * nr / BITS_PER_BYTE + BITBIT_SIZE(nr) + BITBITBIT_SIZE(nr), L1_CAC= HE_BYTES), GFP_KERNEL_ACCOUNT); if (!data) goto out_arr; @@ -231,6 +235,8 @@ static struct fdtable *alloc_fdtable(unsigned int slots= _wanted) fdt->close_on_exec =3D data; data +=3D nr / BITS_PER_BYTE; fdt->full_fds_bits =3D data; + data +=3D BITBIT_SIZE(nr); + fdt->full_fds_bits_l2 =3D data; =20 return fdt; =20 @@ -335,16 +341,24 @@ static inline void __set_open_fd(unsigned int fd, str= uct fdtable *fdt, bool set) __set_bit(fd, fdt->open_fds); __set_close_on_exec(fd, fdt, set); fd /=3D BITS_PER_LONG; - if (!~fdt->open_fds[fd]) + if (!~fdt->open_fds[fd]) { __set_bit(fd, fdt->full_fds_bits); + unsigned int idx =3D fd / BITS_PER_LONG; + if (!~fdt->full_fds_bits[idx]) + __set_bit(idx, fdt->full_fds_bits_l2); + } } =20 static inline void __clear_open_fd(unsigned int fd, struct fdtable *fdt) { __clear_bit(fd, fdt->open_fds); fd /=3D BITS_PER_LONG; - if (test_bit(fd, fdt->full_fds_bits)) + if (test_bit(fd, fdt->full_fds_bits)) { __clear_bit(fd, fdt->full_fds_bits); + unsigned int idx =3D fd / BITS_PER_LONG; + if (test_bit(idx, fdt->full_fds_bits_l2)) + __clear_bit(idx, fdt->full_fds_bits_l2); + } } =20 static inline bool fd_is_open(unsigned int fd, const struct fdtable *fdt) @@ -402,6 +416,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, = struct fd_range *punch_ho new_fdt->close_on_exec =3D newf->close_on_exec_init; new_fdt->open_fds =3D newf->open_fds_init; new_fdt->full_fds_bits =3D newf->full_fds_bits_init; + new_fdt->full_fds_bits_l2 =3D newf->full_fds_bits_init_l2; new_fdt->fd =3D &newf->fd_array[0]; =20 spin_lock(&oldf->file_lock); @@ -536,6 +551,7 @@ struct files_struct init_files =3D { .close_on_exec =3D init_files.close_on_exec_init, .open_fds =3D init_files.open_fds_init, .full_fds_bits =3D init_files.full_fds_bits_init, + .full_fds_bits_l2 =3D init_files.full_fds_bits_init_l2, }, .file_lock =3D __SPIN_LOCK_UNLOCKED(init_files.file_lock), .resize_wait =3D __WAIT_QUEUE_HEAD_INITIALIZER(init_files.resize_wait), @@ -545,22 +561,35 @@ static unsigned int find_next_fd(struct fdtable *fdt,= unsigned int start) { unsigned int maxfd =3D fdt->max_fds; /* always multiple of BITS_PER_LONG = */ unsigned int maxbit =3D maxfd / BITS_PER_LONG; + unsigned int maxbit_l2 =3D BITBIT_NR(maxfd); unsigned int bitbit =3D start / BITS_PER_LONG; + unsigned int bitbit_l2 =3D bitbit / BITS_PER_LONG; unsigned int bit; =20 /* - * Try to avoid looking at the second level bitmap + * Try to avoid looking at the upper level bitmaps */ bit =3D find_next_zero_bit(&fdt->open_fds[bitbit], BITS_PER_LONG, start & (BITS_PER_LONG - 1)); if (bit < BITS_PER_LONG) return bit + bitbit * BITS_PER_LONG; =20 - bitbit =3D find_next_zero_bit(fdt->full_fds_bits, maxbit, bitbit) * BITS_= PER_LONG; - if (bitbit >=3D maxfd) + /* Algorithmic Optimization: O(N) -> O(1) via 3rd-level summary bitmap */ + bitbit_l2 =3D find_next_zero_bit(fdt->full_fds_bits_l2, maxbit_l2, bitbit= _l2); + if (bitbit_l2 >=3D maxbit_l2) return maxfd; - if (bitbit > start) - start =3D bitbit; + + if (bitbit_l2 * BITS_PER_LONG > bitbit) + bitbit =3D bitbit_l2 * BITS_PER_LONG; + + bitbit =3D find_next_zero_bit(fdt->full_fds_bits, maxbit, bitbit); + if (bitbit >=3D maxbit) + return maxfd; + + bit =3D bitbit * BITS_PER_LONG; + if (bit > start) + start =3D bit; + return find_next_zero_bit(fdt->open_fds, maxfd, start); } =20 diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index c45306a9f007..992b4ed9c1e0 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -29,6 +29,7 @@ struct fdtable { unsigned long *close_on_exec; unsigned long *open_fds; unsigned long *full_fds_bits; + unsigned long *full_fds_bits_l2; struct rcu_head rcu; }; =20 @@ -53,6 +54,7 @@ struct files_struct { unsigned long close_on_exec_init[1]; unsigned long open_fds_init[1]; unsigned long full_fds_bits_init[1]; + unsigned long full_fds_bits_init_l2[1]; struct file __rcu * fd_array[NR_OPEN_DEFAULT]; }; =20 --=20 2.51.0