From nobody Sat Feb 7 21:23:45 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CFA0117591; Sat, 13 Jul 2024 02:13:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720836803; cv=none; b=HlMFGVagxDNIw1K2R0GLeMcONdlkl6i3kuuqWu3ZnD46Ruz6kvInXiVt2ABwzj3TZK9fabwj8vaAU3+EiJNcfm6cmHGRG7a/ZYmbcrTlpNySMac42r2vkg6WZazTPFGTHyPTUN/0EucahfgvAm4Eu+lOKHNzrqeMjwJDq43uFEg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720836803; c=relaxed/simple; bh=vbUi5zKMqImT3C+nksSqGjIBCDKIitAH3vpPg8VR/+w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Tzaxc6B/haPXc5u8Sc8cqm3cfU1zwHv88bj1MbksRuXGkXdvYEFK/vH8so7QJZy9OFS3d9DqxfEfMU21ZpkDtBRY8KpyNKsIxQ4hDaPp+46PBlaeGEv3Lx5FtCasiSCaAAV1W5qKSbbSu9aPRVKwTpttdkJNRWLX2skTmeYYj64= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QAI3Fh5R; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QAI3Fh5R" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720836801; x=1752372801; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vbUi5zKMqImT3C+nksSqGjIBCDKIitAH3vpPg8VR/+w=; b=QAI3Fh5RKOMuFodzxY3Aug6gPo7SZJO3+PMxmqCmc05W//RkQB8qdwgE F9+Eg0USu/qOwwknBQvvBwf+XvuFQmJXqiolC/pxIDdIU0EPhxxrT1aD5 mk9Ar2QZruaHgQxIDgXpFB7MbLjJoPRPqQQkb6pUQiFtxcLmJz0Z1Csat QovvD/NlTkVZsFlgVGD8jqIt43UvoVkUPIYoFQM90BW5GZsBs/ZugnMuK HFK5974C4SWdAMBvfEPnPSE8xfJc5h2Fjf2iMpiqNy3kZLTq2eXyRjI23 +e1MDqkSlamM9tYljjvIW8DXxnvPXvaC8nC8J8ipWW1fXpkHuPegppdbd Q==; X-CSE-ConnectionGUID: R1B9IaOfTg6H90wwVMRtPw== X-CSE-MsgGUID: +qwlZq1ISLuutiEfHo5lQw== X-IronPort-AV: E=McAfee;i="6700,10204,11131"; a="12531269" X-IronPort-AV: E=Sophos;i="6.09,204,1716274800"; d="scan'208";a="12531269" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jul 2024 19:13:21 -0700 X-CSE-ConnectionGUID: vADG1FrCSluLzH3d1BXzFA== X-CSE-MsgGUID: WOQ7wzXJTy6laTIRgMJ1Cg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,204,1716274800"; d="scan'208";a="53449891" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by fmviesa005.fm.intel.com with ESMTP; 12 Jul 2024 19:13:19 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mjguzik@gmail.com, edumazet@google.com Cc: yu.ma@intel.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com, tim.c.chen@linux.intel.com Subject: [PATCH v4 1/3] fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd() Date: Fri, 12 Jul 2024 22:39:15 -0400 Message-ID: <20240713023917.3967269-2-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240713023917.3967269-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> <20240713023917.3967269-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" alloc_fd() has a sanity check inside to make sure the struct file mapping t= o the allocated fd is NULL. Remove this sanity check since it can be assured by exisitng zero initilization and NULL set when recycling fd. Meanwhile, add likely/unlikely and expand_file() call avoidance to reduce the work under file_lock. Reviewed-by: Tim Chen Signed-off-by: Yu Ma Reviewed-by: Jan Kara --- fs/file.c | 33 ++++++++++++++------------------- 1 file changed, 14 insertions(+), 19 deletions(-) diff --git a/fs/file.c b/fs/file.c index a3b72aa64f11..e1b9d6df7941 100644 --- a/fs/file.c +++ b/fs/file.c @@ -515,7 +515,7 @@ static int alloc_fd(unsigned start, unsigned end, unsig= ned flags) if (fd < files->next_fd) fd =3D files->next_fd; =20 - if (fd < fdt->max_fds) + if (likely(fd < fdt->max_fds)) fd =3D find_next_fd(fdt, fd); =20 /* @@ -523,19 +523,21 @@ static int alloc_fd(unsigned start, unsigned end, uns= igned flags) * will limit the total number of files that can be opened. */ error =3D -EMFILE; - if (fd >=3D end) + if (unlikely(fd >=3D end)) goto out; =20 - error =3D expand_files(files, fd); - if (error < 0) - goto out; + if (unlikely(fd >=3D fdt->max_fds)) { + error =3D expand_files(files, fd); + if (error < 0) + goto out; =20 - /* - * If we needed to expand the fs array we - * might have blocked - try again. - */ - if (error) - goto repeat; + /* + * If we needed to expand the fs array we + * might have blocked - try again. + */ + if (error) + goto repeat; + } =20 if (start <=3D files->next_fd) files->next_fd =3D fd + 1; @@ -546,13 +548,6 @@ static int alloc_fd(unsigned start, unsigned end, unsi= gned flags) else __clear_close_on_exec(fd, fdt); error =3D fd; -#if 1 - /* Sanity check */ - if (rcu_access_pointer(fdt->fd[fd]) !=3D NULL) { - printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd); - rcu_assign_pointer(fdt->fd[fd], NULL); - } -#endif =20 out: spin_unlock(&files->file_lock); @@ -618,7 +613,7 @@ void fd_install(unsigned int fd, struct file *file) rcu_read_unlock_sched(); spin_lock(&files->file_lock); fdt =3D files_fdtable(files); - BUG_ON(fdt->fd[fd] !=3D NULL); + WARN_ON(fdt->fd[fd] !=3D NULL); rcu_assign_pointer(fdt->fd[fd], file); spin_unlock(&files->file_lock); return; --=20 2.43.0 From nobody Sat Feb 7 21:23:45 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70E251B59A; Sat, 13 Jul 2024 02:13:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720836807; cv=none; b=hxEkbaQgu6UfyYA3HqN10/nvwrqpluZDbRFFUMgqv69A503XTX2WOs9EMwTY/NfCEbcNCzZjQBn40v45S7B+gPQf/qqkSf/qLvWzwo7Nn1zfCz00WNLDnQFCljAyhI1kV1htgweU51Z6eKfZ8SDf0r6lgBsWRPekuy9wl7cX9x8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720836807; c=relaxed/simple; bh=Kh4pfuxGLBxgdiaqyNTwfQleD37uUT2ITDpzkzR/xrE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Oba5kbVkegvVzbTJaE4YpWKqNOtTqx2AM348mqcj9ZBvkhLZRL7vj/6JegwPkOd7r5H/rUcHw9bBLY+e5/+qzIfED1RySp4qeGKrZixP+svZvyufwWdDPUMU0S3SyXmHm3KGKti0UKHY+TynHaJwJOetuSCYrnB3ziTuTUdnwAM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hC4OsMzE; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hC4OsMzE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720836805; x=1752372805; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Kh4pfuxGLBxgdiaqyNTwfQleD37uUT2ITDpzkzR/xrE=; b=hC4OsMzEfVG3arH4E70Q1XJmaZkUo1emcNz1QCFkVNTA7H3CejMakhXY RBADPdstOlcQEV3+Oe4Bw5U77u0n2tJE0ffc+NHebwFLVxRoPubIvQzfi 1chapfMNes2O80csGIPlWFqZ6Yw+LNwbB2z8woqBv2nFQIFn2aBfVYK1F YiaPew57QlOLm5OIVK3LcY6GY51U5g2k8eDvNvTHnqvId5gnKOiR6AML6 Lhkxg++e4QMjAnBAoKdN6/64sDkVRslmKHlfKv/BekACLkUDZQjvOKCG6 mOSAuaRI8QHgJ9MNsb7E4LJAfK0jMgQJfev41p5eGBJzeRsWGYlDcd0yN A==; X-CSE-ConnectionGUID: GRPMHdH2RjC8rkFNTerYgw== X-CSE-MsgGUID: D/YVpXdmTJqC0NpbaHVP8Q== X-IronPort-AV: E=McAfee;i="6700,10204,11131"; a="12531275" X-IronPort-AV: E=Sophos;i="6.09,204,1716274800"; d="scan'208";a="12531275" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jul 2024 19:13:25 -0700 X-CSE-ConnectionGUID: qaXw09J2RMml9GYf4aC60w== X-CSE-MsgGUID: rxMb0jNBT6Gvdt0jcUujNQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,204,1716274800"; d="scan'208";a="53449901" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by fmviesa005.fm.intel.com with ESMTP; 12 Jul 2024 19:13:23 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mjguzik@gmail.com, edumazet@google.com Cc: yu.ma@intel.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com, tim.c.chen@linux.intel.com Subject: [PATCH v4 2/3] fs/file.c: conditionally clear full_fds Date: Fri, 12 Jul 2024 22:39:16 -0400 Message-ID: <20240713023917.3967269-3-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240713023917.3967269-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> <20240713023917.3967269-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" 64 bits in open_fds are mapped to a common bit in full_fds_bits. It is very likely that a bit in full_fds_bits has been cleared before in __clear_open_fds()'s operation. Check the clear bit in full_fds_bits before clearing to avoid unnecessary write and cache bouncing. See commit fc90888d= 07b8 ("vfs: conditionally clear close-on-exec flag") for a similar optimization. take stock kernel with patch 1 as baseline, it improves pts/blogbench-1.1.0 read for 13%, and write for 5% on Intel ICX 160 cores configuration with v6.10-rc7. Reviewed-by: Jan Kara Reviewed-by: Tim Chen Signed-off-by: Yu Ma --- fs/file.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/file.c b/fs/file.c index e1b9d6df7941..1be2a5bcc7c4 100644 --- a/fs/file.c +++ b/fs/file.c @@ -268,7 +268,9 @@ static inline void __set_open_fd(unsigned int fd, struc= t fdtable *fdt) static inline void __clear_open_fd(unsigned int fd, struct fdtable *fdt) { __clear_bit(fd, fdt->open_fds); - __clear_bit(fd / BITS_PER_LONG, fdt->full_fds_bits); + fd /=3D BITS_PER_LONG; + if (test_bit(fd, fdt->full_fds_bits)) + __clear_bit(fd, fdt->full_fds_bits); } =20 static inline bool fd_is_open(unsigned int fd, const struct fdtable *fdt) --=20 2.43.0 From nobody Sat Feb 7 21:23:45 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA63F29401; Sat, 13 Jul 2024 02:13:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720836819; cv=none; b=K5gEB5lOcyHGPyKyeO5I3qk+bVPEVMg8O17MHEwa1zW3EMQTQkWJa/QSR0VvC1zYBluCPsuu1Z4f5oQvTxreQB9VYWshCEigi9z9cs6/FfsSehRoUxgW7v7rpWWX8/2IPzEg5M7s6EbDkTo0nB8jXIa8d2qxA+NufgnZFKEvosM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720836819; c=relaxed/simple; bh=2Em9aXUK4hj994y2DrlM4htA/Mn72IIDxtASYHeCNb4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PJHHo7rHsfK7b7FEP+HfmPLiwlfcPY3hLXG5z8Nh+94tWFCHJsBwY+gW+TAhJvTL46DRRs4kX4uHo9MsDxV/mDUvBfxlnu9twGDT6cDVCAyr5rApXnjSkTfaSGebOT5jD5ONO/wJNTWXEGYKqW1HW7CBag04k74C3OIpyKCYpPI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=I13Twe6b; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="I13Twe6b" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720836818; x=1752372818; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2Em9aXUK4hj994y2DrlM4htA/Mn72IIDxtASYHeCNb4=; b=I13Twe6bliP/N1tsoZmV0HhkwLJrismCYuvOWxzry+fmF1Q8GiDXLQHU 3xBegFQoKDHwV86wdgc/t3X6Eoh4WhpOBmGw0omtj5LY9DttqxtiwjXSF Ho3aVeuYzInytoDRbrP2hijp3ruZbVFsdchMye+bIRMoo+Cwvky/loXVW iLtbLB/G18kRSwnttotXSWo5VHmqQBXscUGgK0aVmx5kDxBJJw9k5U78d 4dS7xDDSI+CiHIUbml1TEmOlJve9mM8HohgeLFJASs0oBE+UEBbO+PiUt WzlUHEBiaxkIdokxU/bQQ9TsNVDGSKnAUtv0GY1YTesiup5i6L45F8rFm w==; X-CSE-ConnectionGUID: mdm38AozTt2ruVqvQw6bgg== X-CSE-MsgGUID: SavpnTgUTeiB3G46bIwxqA== X-IronPort-AV: E=McAfee;i="6700,10204,11131"; a="12531282" X-IronPort-AV: E=Sophos;i="6.09,204,1716274800"; d="scan'208";a="12531282" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jul 2024 19:13:37 -0700 X-CSE-ConnectionGUID: eMkj99wjQe2+EhROswhplQ== X-CSE-MsgGUID: RMdm8jjqShaZtcmnZ6/B0Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,204,1716274800"; d="scan'208";a="53449935" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by fmviesa005.fm.intel.com with ESMTP; 12 Jul 2024 19:13:35 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mjguzik@gmail.com, edumazet@google.com Cc: yu.ma@intel.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com, tim.c.chen@linux.intel.com Subject: [PATCH v4 3/3] fs/file.c: add fast path in find_next_fd() Date: Fri, 12 Jul 2024 22:39:17 -0400 Message-ID: <20240713023917.3967269-4-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240713023917.3967269-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> <20240713023917.3967269-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Skip 2-levels searching via find_next_zero_bit() when there is free slot in= the word contains next_fd, as: (1) next_fd indicates the lower bound for the first free fd. (2) There is fast path inside of find_next_zero_bit() when size<=3D64 to sp= eed up searching. (3) After fdt is expanded (the bitmap size doubled for each time of expansi= on), it would never be shrunk. The search size increases but there are few open = fds available here. This fast path is proposed by Mateusz Guzik , and agreed= by Jan Kara , which is more generic and scalable than previous versions. And on top of patch 1 and 2, it improves pts/blogbench-1.1.0 read= by 8% and write by 4% on Intel ICX 160 cores configuration with v6.10-rc7. Reviewed-by: Tim Chen Signed-off-by: Yu Ma Reviewed-by: Jan Kara --- fs/file.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/fs/file.c b/fs/file.c index 1be2a5bcc7c4..a3ce6ba30c8c 100644 --- a/fs/file.c +++ b/fs/file.c @@ -488,9 +488,20 @@ struct files_struct init_files =3D { =20 static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start) { + unsigned int bitbit =3D start / BITS_PER_LONG; + unsigned int bit; + + /* + * Try to avoid looking at the second level bitmap + */ + bit =3D find_next_zero_bit(&fdt->open_fds[bitbit], BITS_PER_LONG, + start & (BITS_PER_LONG -1)); + if (bit < BITS_PER_LONG) { + return bit + bitbit * BITS_PER_LONG; + } + unsigned int maxfd =3D fdt->max_fds; /* always multiple of BITS_PER_LONG = */ unsigned int maxbit =3D maxfd / BITS_PER_LONG; - unsigned int bitbit =3D start / BITS_PER_LONG; =20 bitbit =3D find_next_zero_bit(fdt->full_fds_bits, maxbit, bitbit) * BITS_= PER_LONG; if (bitbit >=3D maxfd) --=20 2.43.0