From nobody Sun Feb 8 04:11:11 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1BFC17C9E8; Wed, 3 Jul 2024 14:07:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720015638; cv=none; b=cZhuq6Q2+mCL1TxBpKuXUfFiynzO9joFk8Alq8oziBox7SwUkGN+LgCrDLuzJfJ5Q8MFVq1RcNSMyrmHGb8jLGioIuPm3eLx/e7/m4kt/UCmsGZ2xzuzlNfoeaQ1Nw5PsP6ASPLfgQQeb3QKBMGUwIgJy2EZXkOAtS4vgZR+cZQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720015638; c=relaxed/simple; bh=MWZtVxjh3PFjIWGDTRmBv/J1DQS8L9dlFBlCzhsyd10=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=j6Reh7WbqCQLRnwOa8DvJFJigMVd5ylNB2HTOGY1PzOVbktXdXObO3HIIm2MWf/Mmtvh5HKeeQDNvDlcSVfFAgCC9AtWPpJ+v9rXF0XI9IA0tbHsteGVFB7/r/QGSegJp3zJqkx72TMkJrMBtm9qexsrNM5rJXE1Mp55axrsRCo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Vv2DXr1I; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Vv2DXr1I" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720015637; x=1751551637; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MWZtVxjh3PFjIWGDTRmBv/J1DQS8L9dlFBlCzhsyd10=; b=Vv2DXr1I3VJY/hBYq3zAzjHcMy1Q1p4/9orh4trqI88lxQtJ5wYFAu/v c8sZQGtjsdtroglT8KRCSJ/A/xHHIthC45fwdnWBjGK+FG2VsP385ERj7 4ey7eQ1herNrNh5Q6ZX5rHB7diQeSI/xcKZur5oaePTGjy0SImgpsunaP VOmDmFzXxETOjrSSMReY/ZDNAN2CU/rOO4V2Bam12MnphpcLFuF9qIt76 tUDATNEptuAs2/2bHcuRxek/3VZI2izMRp5i8I9Y4o9WuPhi0Uq7k50Uw /A6xTUPS1yYDPNB/qKcb6AHjOipH2KvRahDUWzyValU7LtP+POV8xMtop A==; X-CSE-ConnectionGUID: vZ76/Iq6RqKhm9hbhc6ibQ== X-CSE-MsgGUID: +/P0uKcsQeKgH7CZzzu6hA== X-IronPort-AV: E=McAfee;i="6700,10204,11121"; a="16900702" X-IronPort-AV: E=Sophos;i="6.09,182,1716274800"; d="scan'208";a="16900702" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jul 2024 07:07:17 -0700 X-CSE-ConnectionGUID: CfWTawYGR9WQ+E4bqntFRQ== X-CSE-MsgGUID: 4H/zJIhMSgOQsEe6WsNVOg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,182,1716274800"; d="scan'208";a="46693470" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by orviesa006.jf.intel.com with ESMTP; 03 Jul 2024 07:07:13 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mjguzik@gmail.com, edumazet@google.com Cc: yu.ma@intel.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com, tim.c.chen@linux.intel.com Subject: [PATCH v3 1/3] fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd() Date: Wed, 3 Jul 2024 10:33:09 -0400 Message-ID: <20240703143311.2184454-2-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240703143311.2184454-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> <20240703143311.2184454-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" alloc_fd() has a sanity check inside to make sure the struct file mapping t= o the allocated fd is NULL. Remove this sanity check since it can be assured by exisitng zero initilization and NULL set when recycling fd. Meanwhile, add likely/unlikely and expand_file() call avoidance to reduce the work under file_lock. Reviewed-by: Tim Chen Signed-off-by: Yu Ma --- fs/file.c | 38 ++++++++++++++++---------------------- 1 file changed, 16 insertions(+), 22 deletions(-) diff --git a/fs/file.c b/fs/file.c index a3b72aa64f11..5178b246e54b 100644 --- a/fs/file.c +++ b/fs/file.c @@ -515,28 +515,29 @@ static int alloc_fd(unsigned start, unsigned end, uns= igned flags) if (fd < files->next_fd) fd =3D files->next_fd; =20 - if (fd < fdt->max_fds) + if (likely(fd < fdt->max_fds)) fd =3D find_next_fd(fdt, fd); =20 + error =3D -EMFILE; + if (unlikely(fd >=3D fdt->max_fds)) { + error =3D expand_files(files, fd); + if (error < 0) + goto out; + /* + * If we needed to expand the fs array we + * might have blocked - try again. + */ + if (error) + goto repeat; + } + /* * N.B. For clone tasks sharing a files structure, this test * will limit the total number of files that can be opened. */ - error =3D -EMFILE; - if (fd >=3D end) - goto out; - - error =3D expand_files(files, fd); - if (error < 0) + if (unlikely(fd >=3D end)) goto out; =20 - /* - * If we needed to expand the fs array we - * might have blocked - try again. - */ - if (error) - goto repeat; - if (start <=3D files->next_fd) files->next_fd =3D fd + 1; =20 @@ -546,13 +547,6 @@ static int alloc_fd(unsigned start, unsigned end, unsi= gned flags) else __clear_close_on_exec(fd, fdt); error =3D fd; -#if 1 - /* Sanity check */ - if (rcu_access_pointer(fdt->fd[fd]) !=3D NULL) { - printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd); - rcu_assign_pointer(fdt->fd[fd], NULL); - } -#endif =20 out: spin_unlock(&files->file_lock); @@ -618,7 +612,7 @@ void fd_install(unsigned int fd, struct file *file) rcu_read_unlock_sched(); spin_lock(&files->file_lock); fdt =3D files_fdtable(files); - BUG_ON(fdt->fd[fd] !=3D NULL); + WARN_ON(fdt->fd[fd] !=3D NULL); rcu_assign_pointer(fdt->fd[fd], file); spin_unlock(&files->file_lock); return; --=20 2.43.0 From nobody Sun Feb 8 04:11:11 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B24CD17B501; Wed, 3 Jul 2024 14:07:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720015642; cv=none; b=LwKH+B5sUqu031iRUVT/SCc/eiwg1S6HADRL8DBupPffecj+o2ZWmwUj1KNVRgBoi4Yskw4QqApFoBeYcHlCBBnax0yOJi7aB/sX/+nl8ojm7AVPbf5pYk+AliHeBYwJj2N2tIhCeLsGnUDhUNliuK5JO1SA3eDuCQfPYcYc/qM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720015642; c=relaxed/simple; bh=tkmY9I0q+40NyjPrxluCu/UB2+hGM/Hp1Gj0qiZNw8w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nR4CpYEYWhx9w3m1FbkZqut5cxE+okSuxbUBPAev3ZG7zOsGY8jvIrjqay7zvh909NZw6gp6NeN+51rIsoh+KdzWzVRTMBx0q/IsK9f+eo6dJDJ8gxr3e3NW8Bm+ULkhM16GXG3T/husu+4lcK22lFwnzshsvFCDmxWHcKkdPPE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=MSFFQ4Ba; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="MSFFQ4Ba" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720015641; x=1751551641; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tkmY9I0q+40NyjPrxluCu/UB2+hGM/Hp1Gj0qiZNw8w=; b=MSFFQ4BaB+zkOX8uA8n/WkaUubX4vbu7K/911VCgOO4Z6yeqp3o9mfvE +F7E/voqwpp5aUkpPhWWc+wB1rnaTJAftKwypaPr63bl5u40tzCx3O/cy y3mnl+kwc2trF6KfuECfeiFGH6MJTMr+ZqPnEqmLDEXAIClSWHLz2ypJV jwH0ZXS/kuVkFEBoBeE4hMY7la+7PYkkAy1cocL6TuZjcLzogCOMaP7zN I7BAK+WfoJdSh8VNAOXibwctqiDbN1NIpG51jzzaiQuW5G1l7g2W3XRTP PzcY0T5b/dRdwMDfLLv7iTt61GqoDqpkBWaybbj6SFbMau+IBR1Dam8Ts g==; X-CSE-ConnectionGUID: 2U7NZC/DQ0C77zbLdLLVng== X-CSE-MsgGUID: QaHpGd+ISsWGa8LXoSHpYg== X-IronPort-AV: E=McAfee;i="6700,10204,11121"; a="16900716" X-IronPort-AV: E=Sophos;i="6.09,182,1716274800"; d="scan'208";a="16900716" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jul 2024 07:07:21 -0700 X-CSE-ConnectionGUID: jbiKvuIdSmq3Psj50HB5bQ== X-CSE-MsgGUID: 4YFiFHXHRMGFRybRNZopDg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,182,1716274800"; d="scan'208";a="46693488" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by orviesa006.jf.intel.com with ESMTP; 03 Jul 2024 07:07:17 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mjguzik@gmail.com, edumazet@google.com Cc: yu.ma@intel.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com, tim.c.chen@linux.intel.com Subject: [PATCH v3 2/3] fs/file.c: conditionally clear full_fds Date: Wed, 3 Jul 2024 10:33:10 -0400 Message-ID: <20240703143311.2184454-3-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240703143311.2184454-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> <20240703143311.2184454-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" 64 bits in open_fds are mapped to a common bit in full_fds_bits. It is very likely that a bit in full_fds_bits has been cleared before in __clear_open_fds()'s operation. Check the clear bit in full_fds_bits before clearing to avoid unnecessary write and cache bouncing. See commit fc90888d= 07b8 ("vfs: conditionally clear close-on-exec flag") for a similar optimization. Take stock kernel with patch 1 as baseline, it improves pts/blogbench-1.1.0 read for 13%, and write for 5% on Intel ICX 160 cores configuration with v6.10-rc6. Reviewed-by: Jan Kara Reviewed-by: Tim Chen Signed-off-by: Yu Ma --- fs/file.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/file.c b/fs/file.c index 5178b246e54b..a15317db3119 100644 --- a/fs/file.c +++ b/fs/file.c @@ -268,7 +268,9 @@ static inline void __set_open_fd(unsigned int fd, struc= t fdtable *fdt) static inline void __clear_open_fd(unsigned int fd, struct fdtable *fdt) { __clear_bit(fd, fdt->open_fds); - __clear_bit(fd / BITS_PER_LONG, fdt->full_fds_bits); + fd /=3D BITS_PER_LONG; + if (test_bit(fd, fdt->full_fds_bits)) + __clear_bit(fd, fdt->full_fds_bits); } =20 static inline bool fd_is_open(unsigned int fd, const struct fdtable *fdt) --=20 2.43.0 From nobody Sun Feb 8 04:11:11 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A2F617FABD; Wed, 3 Jul 2024 14:07:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720015646; cv=none; b=OPHdQnT8fPHoN5aBPHZFqb+EJNCD7Ns68pz7mF/1AdR4AfN001eoJvLPkk6LVgQXESHCOGjBZ9ekrUrtRhX9B8zpnDr34BCv6hgg1xxznpgJMsAkpMnKabLC+isxx+7jKBwN7DVpHfHKG3V30iKMe3OuWfkHmQ0hQlNBeg+5TYo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720015646; c=relaxed/simple; bh=L4N3eNb+hTwnRqOC9zrQnPwJrLZ4wp6PpFdHRrO6tkU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CB+TpC+k3yQnLeAg3DZPf1DEFldXoWdNQP/DnRJ5qU/IyHu39fE3iumQwci1kool33PrqVWoYXimQkFbDCOGNyddrLt2ota3xxok7dBQ9VaQa5OW2FUgp7ZwsR/JZ0u9/TqF21g1RZxA0YigPz4GhEkJcL+UmNfQy3ASk2Qu8mk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=OPzep+Ir; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="OPzep+Ir" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720015646; x=1751551646; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=L4N3eNb+hTwnRqOC9zrQnPwJrLZ4wp6PpFdHRrO6tkU=; b=OPzep+Ir+doochKpEDrcqT1JZoPWQDB3bV6reFOaERd7S78yctGt1zv8 9a+SL6LwtkwptFeBo5SB4OurgncPZ+NmMLBeWb6EEle1BrNqASIxTC6U0 1haZtdtl0N0qi7gqkP/xCQCOkYKzKKs+WsvTkjbVoKnnezVgl5mFsKIMW 5W/ZVWG0pF8Xh1pz20/BBZWaKL1pbTm70z4PLjri5kdpWupcaDCg4jWbv veD6Q+UrJg+Flz5DCNR/vGRfqk2Al0NDdRk72x0qzlH69fKcxeb8TiFtT /ISt+z+7xmS2xUw5ShG41SUjmgLaEmqR11u6wHErZpjevcBrCc6J17vxU A==; X-CSE-ConnectionGUID: sUSNJtu0Q2CXUWw8/6PWfA== X-CSE-MsgGUID: Kg3WhDsxQ1mf51F+SnBBjQ== X-IronPort-AV: E=McAfee;i="6700,10204,11121"; a="16900731" X-IronPort-AV: E=Sophos;i="6.09,182,1716274800"; d="scan'208";a="16900731" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jul 2024 07:07:25 -0700 X-CSE-ConnectionGUID: mTsZkbyJTsapUGliTg5XdQ== X-CSE-MsgGUID: h119Mf75TQOy7vvAYg3Iaw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,182,1716274800"; d="scan'208";a="46693515" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by orviesa006.jf.intel.com with ESMTP; 03 Jul 2024 07:07:22 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mjguzik@gmail.com, edumazet@google.com Cc: yu.ma@intel.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com, tim.c.chen@linux.intel.com Subject: [PATCH v3 3/3] fs/file.c: add fast path in find_next_fd() Date: Wed, 3 Jul 2024 10:33:11 -0400 Message-ID: <20240703143311.2184454-4-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240703143311.2184454-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> <20240703143311.2184454-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There is available fd in the lower 64 bits of open_fds bitmap for most cases when we look for an available fd slot. Skip 2-levels searching via find_next_zero_bit() for this common fast path. Look directly for an open bit in the lower 64 bits of open_fds bitmap when a free slot is available there, as: (1) The fd allocation algorithm would always allocate fd from small to larg= e. Lower bits in open_fds bitmap would be used much more frequently than higher bits. (2) After fdt is expanded (the bitmap size doubled for each time of expansi= on), it would never be shrunk. The search size increases but there are few open = fds available here. (3) There is fast path inside of find_next_zero_bit() when size<=3D64 to sp= eed up searching. As suggested by Mateusz Guzik and Jan Kara , update the fast path from alloc_fd() to find_next_fd(). With which, on top = of patch 1 and 2, pts/blogbench-1.1.0 read is improved by 13% and write by 7% = on Intel ICX 160 cores configuration with v6.10-rc6. Reviewed-by: Tim Chen Signed-off-by: Yu Ma Reviewed-by: Jan Kara --- fs/file.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/file.c b/fs/file.c index a15317db3119..f25eca311f51 100644 --- a/fs/file.c +++ b/fs/file.c @@ -488,6 +488,11 @@ struct files_struct init_files =3D { =20 static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start) { + unsigned int bit; + bit =3D find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, start); + if (bit < BITS_PER_LONG) + return bit; + unsigned int maxfd =3D fdt->max_fds; /* always multiple of BITS_PER_LONG = */ unsigned int maxbit =3D maxfd / BITS_PER_LONG; unsigned int bitbit =3D start / BITS_PER_LONG; --=20 2.43.0