From nobody Wed Dec 17 12:14:35 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1902C16F918; Sat, 22 Jun 2024 15:23:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719069792; cv=none; b=CZ8Tb899QuFMPeff+r9MDKnvHeBNdpe/eLa2DZj0mXmCrjhiFq4Ek+tEX+HC6BwEu8HyLZQ1KFETH5bxlCGm2+uwAxXXIvX5AGNZ+ndOk4bCc6CAkjJzIzYZjLD7JlRCbESGPbWuxHXROuJvD6qcLID0AuE8LbLlMSPXJ/HCXIc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719069792; c=relaxed/simple; bh=090Oenp2xYBj31TJSwOgyDVmmqhieSXMUyvU+EdGtCE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Hoa4lpvh90aLqU8Kscgoh9j5ls4KeDAi/di1h/mfUOwNkvCbbNOtjk8G5+/48XpStwd0JbIGRv/6s95LyEdzOyAH2L/JCoCWcNzgLtU9D+P6fgGt4Cbv50XatpBLdEjS11cwvNcfD+yebUyGCszXdnUnd/ElVtIsGNajSCiwWAE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=dUVUWlc2; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="dUVUWlc2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719069791; x=1750605791; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=090Oenp2xYBj31TJSwOgyDVmmqhieSXMUyvU+EdGtCE=; b=dUVUWlc2RjsV3j7GLBQ0lCTZn19KGO3ECtL1BmAcOMnveCD7w/0KE6ps LpXSFh5r3cq0rlKd4RhaSUzIcwqLtO2vGGOoWUrNZk2/+M+s0iGucU1Qb /1HnjS8lKiqvS6AdbeaAZxScog6uSXQXdvek2CS/4AgHFGlieS08rwHYv KdeNfX1JX3HMn3MCyWZpMTuT28Hn3nj+3usLvgx+4+GOIpVEZuPk8fhZ4 m2KJeh5QSiVUMQgx04I6m8CZ1m2nFjIOjunhJS5u4JEjU1UIY4Lhn/wHV Htyt9DQz6yVNamEiI56ANRtw4BFKMGS/jjLI/3k7Mu4WN7C6Cba6/V3Li g==; X-CSE-ConnectionGUID: A/N53p59RzCm8CVGnv6mnQ== X-CSE-MsgGUID: 4AZ2bA7WQTm69EqwaaBQcw== X-IronPort-AV: E=McAfee;i="6700,10204,11111"; a="41495806" X-IronPort-AV: E=Sophos;i="6.08,258,1712646000"; d="scan'208";a="41495806" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2024 08:23:11 -0700 X-CSE-ConnectionGUID: xDaJIqpyTqCByot2QvZNTA== X-CSE-MsgGUID: k8vnp9ckQQ+3ZRmCcXKjKA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,258,1712646000"; d="scan'208";a="42680518" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by fmviesa006.fm.intel.com with ESMTP; 22 Jun 2024 08:23:08 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mjguzik@gmail.com, edumazet@google.com Cc: yu.ma@intel.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com, tim.c.chen@linux.intel.com Subject: [PATCH v2 1/3] fs/file.c: add fast path in alloc_fd() Date: Sat, 22 Jun 2024 11:49:02 -0400 Message-ID: <20240622154904.3774273-2-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240622154904.3774273-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> <20240622154904.3774273-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There is available fd in the lower 64 bits of open_fds bitmap for most cases when we look for an available fd slot. Skip 2-levels searching via find_next_zero_bit() for this common fast path. Look directly for an open bit in the lower 64 bits of open_fds bitmap when a free slot is available there, as: (1) The fd allocation algorithm would always allocate fd from small to larg= e. Lower bits in open_fds bitmap would be used much more frequently than higher bits. (2) After fdt is expanded (the bitmap size doubled for each time of expansi= on), it would never be shrunk. The search size increases but there are few open = fds available here. (3) find_next_zero_bit() itself has a fast path inside to speed up searching when size<=3D64. Besides, "!start" is added to fast path condition to ensure the allocated f= d is greater than start (i.e. >=3D0), given alloc_fd() is only called in two sce= narios: (1) Allocating a new fd (the most common usage scenario) via get_unused_fd_flags() to find fd start from bit 0 in fdt (i.e. start=3D=3D0= ). (2) Duplicating a fd (less common usage) via dup_fd() to find a fd start fr= om old_fd's index in fdt, which is only called by syscall fcntl. With the fast path added in alloc_fd(), pts/blogbench-1.1.0 read is improved by 17% and write by 9% on Intel ICX 160 cores configuration with v6.10-rc4. Reviewed-by: Tim Chen Signed-off-by: Yu Ma --- fs/file.c | 35 +++++++++++++++++++++-------------- 1 file changed, 21 insertions(+), 14 deletions(-) diff --git a/fs/file.c b/fs/file.c index a3b72aa64f11..50e900a47107 100644 --- a/fs/file.c +++ b/fs/file.c @@ -515,28 +515,35 @@ static int alloc_fd(unsigned start, unsigned end, uns= igned flags) if (fd < files->next_fd) fd =3D files->next_fd; =20 - if (fd < fdt->max_fds) + error =3D -EMFILE; + if (likely(fd < fdt->max_fds)) { + if (~fdt->open_fds[0] && !start) { + fd =3D find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, fd); + goto fastreturn; + } fd =3D find_next_fd(fdt, fd); + } + + if (unlikely(fd >=3D fdt->max_fds)) { + error =3D expand_files(files, fd); + if (error < 0) + goto out; + /* + * If we needed to expand the fs array we + * might have blocked - try again. + */ + if (error) + goto repeat; + } =20 +fastreturn: /* * N.B. For clone tasks sharing a files structure, this test * will limit the total number of files that can be opened. */ - error =3D -EMFILE; - if (fd >=3D end) + if (unlikely(fd >=3D end)) goto out; =20 - error =3D expand_files(files, fd); - if (error < 0) - goto out; - - /* - * If we needed to expand the fs array we - * might have blocked - try again. - */ - if (error) - goto repeat; - if (start <=3D files->next_fd) files->next_fd =3D fd + 1; =20 --=20 2.43.0 From nobody Wed Dec 17 12:14:35 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C2F3170831; Sat, 22 Jun 2024 15:23:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719069796; cv=none; b=JsSDLPwV8PwM/6LrktvjY55bBdCNSO6UJG94guqzyVTcQIjAj0KuevxucEbbulw9+CD0XYwdKk2oUAUas9Gmd83k594ndZGy0lezJ+DINgPzEGCRYBsjL2Cr+UTpWGSuiZAgEvScaS2DsogZD+eKmopaIfEYq+4RL8nXpMxjaYo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719069796; c=relaxed/simple; bh=g9kJGRRRbLpi96eXV/nMIG4vmmZQtau79XbMSAyQ/oM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KaPS11TodLmBcLKCJfyroHiJ1E3R+eUSKNBUuNuw6YCVlL4zAPoX8IVKpLwvu9E8FB0mre/79Fba37qfwMc/CT0hOaGt08eSalddDKfktfH49MVxpbDXhZKFO+z2MXluHHrfHJNzumLaPj7fghxNbW8LaLcVr66jMZaj9v7PtLs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=S0VCrHOz; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="S0VCrHOz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719069794; x=1750605794; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=g9kJGRRRbLpi96eXV/nMIG4vmmZQtau79XbMSAyQ/oM=; b=S0VCrHOzbciFLCEkpS2gFgnxtbBKaS3nRCKWUvIOOXVmBukg+zvrLHzF O6SiEZAoiPlkak4gS6wD1tDfzSWveXhKAhZx4IPfe59K0yawhZRgEyfN9 Sdk01QplOEnfahJTa6JayBooI5KjWCap6XlF0AHSdEwh7WdH07xHYTMs9 RWRfPagiAxgl9pHRh8k1WIwuBVogdsLByLCmmu613JEC0+e1T7LZrc8pu J95CLzpXH1R/g36X7EBgdZmMX7lYkZcWC6T0qBO+35sVUmTR7ATSZvgK1 MxoOFQ+TkMYNnp6RvCXO1FSpQC/Z9LnwRh7rbWQAXCUfG2cWpArolfNQs w==; X-CSE-ConnectionGUID: bW3lbH/QTpKTApcFr2HTbQ== X-CSE-MsgGUID: pKuo/2foTE+IvcDqdS0wTA== X-IronPort-AV: E=McAfee;i="6700,10204,11111"; a="41495813" X-IronPort-AV: E=Sophos;i="6.08,258,1712646000"; d="scan'208";a="41495813" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2024 08:23:14 -0700 X-CSE-ConnectionGUID: +3omq9eXQair3IC0x4AoTg== X-CSE-MsgGUID: w2ieawAzQOaGiuF/JBgQow== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,258,1712646000"; d="scan'208";a="42680522" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by fmviesa006.fm.intel.com with ESMTP; 22 Jun 2024 08:23:12 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mjguzik@gmail.com, edumazet@google.com Cc: yu.ma@intel.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com, tim.c.chen@linux.intel.com Subject: [PATCH v2 2/3] fs/file.c: conditionally clear full_fds Date: Sat, 22 Jun 2024 11:49:03 -0400 Message-ID: <20240622154904.3774273-3-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240622154904.3774273-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> <20240622154904.3774273-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" 64 bits in open_fds are mapped to a common bit in full_fds_bits. It is very likely that a bit in full_fds_bits has been cleared before in __clear_open_fds()'s operation. Check the clear bit in full_fds_bits before clearing to avoid unnecessary write and cache bouncing. See commit fc90888d= 07b8 ("vfs: conditionally clear close-on-exec flag") for a similar optimization. Together with patch 1, they improves pts/blogbench-1.1.0 read for 27%, and = write for 14% on Intel ICX 160 cores configuration with v6.10-rc4. Reviewed-by: Tim Chen Signed-off-by: Yu Ma Reviewed-by: Jan Kara --- fs/file.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/file.c b/fs/file.c index 50e900a47107..b4d25f6d4c19 100644 --- a/fs/file.c +++ b/fs/file.c @@ -268,7 +268,9 @@ static inline void __set_open_fd(unsigned int fd, struc= t fdtable *fdt) static inline void __clear_open_fd(unsigned int fd, struct fdtable *fdt) { __clear_bit(fd, fdt->open_fds); - __clear_bit(fd / BITS_PER_LONG, fdt->full_fds_bits); + fd /=3D BITS_PER_LONG; + if (test_bit(fd, fdt->full_fds_bits)) + __clear_bit(fd, fdt->full_fds_bits); } =20 static inline bool fd_is_open(unsigned int fd, const struct fdtable *fdt) --=20 2.43.0 From nobody Wed Dec 17 12:14:35 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3ACDF16F903; Sat, 22 Jun 2024 15:23:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719069800; cv=none; b=FNSxG1Xks6kaIY7F/B3m/auGK1e4XFhBhHFmH1ZEtk6hn5lWWFgreqKk0rA9KmNGp4C7Xf6tVH3QyEaPKb0hfk5TjwfHeJxSHlKuYLE5CVcW9CvuT5V/oncifhhxY1+A9MWSWX36rhnyfbYr46XTuG/4MVoUi1taXxtczidy0/E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719069800; c=relaxed/simple; bh=+3EtXD3anw5T5SPSXo7iEqIIBqdxGY53eBDTDUvUN6A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RKdr+fP8WDVisMiHQPTIGww7yew8HyZOt6OMb60QaixAiq5sHdClG3yHfZWPL1OUdLsxH1wz6pvOMdzN48ifJ4suOgt9kx2ERyG1qH9dJumZ57Mx32JpW73h4lfBtgNi1AnS+X0Kg4y20ti9RBOCNSbWbcS8Lz9fMi4OOCpg7tY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=EUHnRP7j; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="EUHnRP7j" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719069798; x=1750605798; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+3EtXD3anw5T5SPSXo7iEqIIBqdxGY53eBDTDUvUN6A=; b=EUHnRP7jGDm2Tx80mmXBHZye71Pzbr/AcqP1ih2PL52ljheDemfie3Rt Qdu5XM5xphoi6LyFQTdeqlczig3r9kr4J3Xl07DLJ8OzK/bOoF0v4GlfZ rBdSLwDCOhdpp05J9R7t+sPYg9wM3G6phCx9dkPKPzz1nSmT7w6o7bUsb ohzbfC297heVAW8VsR8Im9bI5ELY6HwIfS2vRVIVv3alZSoeBCuIHJYlS 2hgxvTcXVinLFwOvSO2XcvlHWS4mUAgTfJhoXAscSSIV8ip5KThe78TML iq8H3fw7+Y4CGkdM8wZqD+6spf8AtNsjmJ2xxUHpsHh85S97ZdI8B5uf4 g==; X-CSE-ConnectionGUID: ZDPF5iUMR7mpEXujn3mikw== X-CSE-MsgGUID: YDs3RqLuSLOHRSfF7u7Kxg== X-IronPort-AV: E=McAfee;i="6700,10204,11111"; a="41495819" X-IronPort-AV: E=Sophos;i="6.08,258,1712646000"; d="scan'208";a="41495819" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2024 08:23:18 -0700 X-CSE-ConnectionGUID: LAVAn+nqTZC3/Mh2iI7B5A== X-CSE-MsgGUID: CofonECtRpuZaK2eOtuHsA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,258,1712646000"; d="scan'208";a="42680526" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by fmviesa006.fm.intel.com with ESMTP; 22 Jun 2024 08:23:15 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mjguzik@gmail.com, edumazet@google.com Cc: yu.ma@intel.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com, tim.c.chen@linux.intel.com Subject: [PATCH v2 3/3] fs/file.c: remove sanity_check from alloc_fd() Date: Sat, 22 Jun 2024 11:49:04 -0400 Message-ID: <20240622154904.3774273-4-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240622154904.3774273-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> <20240622154904.3774273-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" alloc_fd() has a sanity check inside to make sure the struct file mapping t= o the allocated fd is NULL. Remove this sanity check since it can be assured by exisitng zero initilization and NULL set when recycling fd. Combined with patch 1 and 2 in series, pts/blogbench-1.1.0 read improved by 32%, write improved by 17% on Intel ICX 160 cores configuration with v6.10-= rc4. Reviewed-by: Tim Chen Signed-off-by: Yu Ma --- fs/file.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/fs/file.c b/fs/file.c index b4d25f6d4c19..1153b0b7ba3d 100644 --- a/fs/file.c +++ b/fs/file.c @@ -555,13 +555,6 @@ static int alloc_fd(unsigned start, unsigned end, unsi= gned flags) else __clear_close_on_exec(fd, fdt); error =3D fd; -#if 1 - /* Sanity check */ - if (rcu_access_pointer(fdt->fd[fd]) !=3D NULL) { - printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd); - rcu_assign_pointer(fdt->fd[fd], NULL); - } -#endif =20 out: spin_unlock(&files->file_lock); --=20 2.43.0