From nobody Thu Dec 18 11:22:50 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76299181323; Fri, 14 Jun 2024 16:08:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718381336; cv=none; b=FqCXoUSiBf9M+Q0wrEXYS2sfpG50GhDTa4W48DTurcBDPuyKcsURQMIG6Wo7czRw6//YebHkK/ycuN+J9LCzN6LGiHBFktep4B1eiVCSoju+hLgyuuvpGGI1nWUeGCf+ctLI0mANWawOA6Z75GluYu95BHEg986U0xfHLAS6www= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718381336; c=relaxed/simple; bh=PW1CwgwF4DEJPAtCw5E8u2zb9VKV5pgVkPiDLQwmwMg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kRDwojqr4km8qBGEjgG6gCgKqhQN/rgsgqGe3tnwaD/ISLh9XVZinIoQJfcosNyY9Tgj7ZVWW0bUYYMmmLwFS9/mKu/fgBvfiosl9bMorBQ0Kt2epL0qZFhD3BT4LntvVc1sVXKBuM6bvjw2kBjhwgSJasBtGIFB7QdNPW8o8Vo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=KJemTTI4; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KJemTTI4" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1718381334; x=1749917334; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PW1CwgwF4DEJPAtCw5E8u2zb9VKV5pgVkPiDLQwmwMg=; b=KJemTTI48iVgiG+z1cqTsCyZOSKwMmF+owKz8g7Lp7FxJjfr7PUwsgYY Vs6gFTXhIjtjuPqRzgSZYldKayrG4d8KbX7dSTvp8a9xonaVqU1lyy+OC R1Oez9MMFLsVnLZOLNO+dkudMILQwfAsGV1NVuX1DArzuSqAPx3LCjvaM rGBQiE/P8Y+a6YXX7p1Et4cF0Hb7b/Ou51A15ADxlUnaqXdkkkwSmRgoG cN6FW+PTVgbUtRLx+15HT3GZuQhMu7SMZ241Wy21x5Ue/pXUIQFhV4+s5 J/hhxElhvjbVAAnhODvvvNYUBVU7ef9iQ+UeJfJOrJCxKYjvbu3IEewU9 w==; X-CSE-ConnectionGUID: ipZD0eH/RSCv9qktbPlTlQ== X-CSE-MsgGUID: uf3cEddyQHyZOPtJS2A3lg== X-IronPort-AV: E=McAfee;i="6700,10204,11103"; a="15399385" X-IronPort-AV: E=Sophos;i="6.08,238,1712646000"; d="scan'208";a="15399385" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2024 09:08:54 -0700 X-CSE-ConnectionGUID: yBxsODLRSlK45G+x8PXgMA== X-CSE-MsgGUID: ygVkJ34WQKyUBeRivbolNg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,238,1712646000"; d="scan'208";a="71741074" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by fmviesa001.fm.intel.com with ESMTP; 14 Jun 2024 09:08:51 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tim.c.chen@linux.intel.com, tim.c.chen@intel.com, pan.deng@intel.com, tianyou.li@intel.com, yu.ma@intel.com Subject: [PATCH 1/3] fs/file.c: add fast path in alloc_fd() Date: Fri, 14 Jun 2024 12:34:14 -0400 Message-ID: <20240614163416.728752-2-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240614163416.728752-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There is available fd in the lower 64 bits of open_fds bitmap for most cases when we look for an available fd slot. Skip 2-levels searching via find_next_zero_bit() for this common fast path. Look directly for an open bit in the lower 64 bits of open_fds bitmap when a free slot is available there, as: (1) The fd allocation algorithm would always allocate fd from small to larg= e. Lower bits in open_fds bitmap would be used much more frequently than higher bits. (2) After fdt is expanded (the bitmap size doubled for each time of expansi= on), it would never be shrunk. The search size increases but there are few open = fds available here. (3) There is fast path inside of find_next_zero_bit() when size<=3D64 to sp= eed up searching. With the fast path added in alloc_fd() through one-time bitmap searching, pts/blogbench-1.1.0 read is improved by 20% and write by 10% on Intel ICX 1= 60 cores configuration with v6.8-rc6. Reviewed-by: Tim Chen Signed-off-by: Yu Ma --- fs/file.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/file.c b/fs/file.c index 3b683b9101d8..e8d2f9ef7fd1 100644 --- a/fs/file.c +++ b/fs/file.c @@ -510,8 +510,13 @@ static int alloc_fd(unsigned start, unsigned end, unsi= gned flags) if (fd < files->next_fd) fd =3D files->next_fd; =20 - if (fd < fdt->max_fds) + if (fd < fdt->max_fds) { + if (~fdt->open_fds[0]) { + fd =3D find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, fd); + goto success; + } fd =3D find_next_fd(fdt, fd); + } =20 /* * N.B. For clone tasks sharing a files structure, this test @@ -531,7 +536,7 @@ static int alloc_fd(unsigned start, unsigned end, unsig= ned flags) */ if (error) goto repeat; - +success: if (start <=3D files->next_fd) files->next_fd =3D fd + 1; =20 --=20 2.43.0 From nobody Thu Dec 18 11:22:50 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4EC819AD6E; Fri, 14 Jun 2024 16:08:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718381339; cv=none; b=lEp3rb6cOKV8KrkFCaWj/OhBn4XDJ8xdRnQXsoJLyS+tQBVM3RQREW+J+nHpxH/o4SQRg87+qE8yxf6KkXNwdTebU2LaCjSDuJeuiKwaldSjogcm8hpyELBDPi+cgQQXyqD5/hWj1V8c9s3IJVJWjK1tj5zzHNywQroEXIZVt90= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718381339; c=relaxed/simple; bh=XaUtWTN7akwimeRKOM+4b5jvaenMH+iXA0eNBgjkAJY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TH7WvgUpMaFD2zowYVNGknQGageRt2i75rvjdj0UUpcV9XvLHZkj+ZeCmLuYx50O1vGeVbDZzP1B+nQ/Cjhar7peJC2D9C6LR8vPDKOafSI1mxOc3sOhsNXy442zCczrN3uznf9SQvDBLpotJUKRyvwMaCz69ej6lFS0wn2zMSY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Il5nuDff; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Il5nuDff" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1718381338; x=1749917338; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XaUtWTN7akwimeRKOM+4b5jvaenMH+iXA0eNBgjkAJY=; b=Il5nuDffywf1zmXUqc97i3BH98Sz2p21XhFEgtWWtifNVtiVTHU0sVJf tAg4FhK76rOsgKpecTB8qC+0DbMmWWPDMzUkZrVh5bZQ0zaSUojxRdWJ4 GZoMuj+nzNG39GYrNT3YdidoxoCmQv/nfLEC3sBJgkWwyPN0VyPB9AvMw 6DByBWWfkt2rS1dbyDjI8iU5zK09/BtGkvQ3qa4DDim8j7mA+WJnYMKbf ZTq89O/wVVqbeBholrdtib2JtOLFfzR+XT/DoKeSaNMY1l5v609fHUQhm omtItsIV8/hlE/4pH8pAI78BdG6B2M34jw5hJUFQoQ3crwdHBkyNGDmp0 A==; X-CSE-ConnectionGUID: znlBW63ET6qbgtpN1RE4mQ== X-CSE-MsgGUID: 06Lb98oZSOeM4xpccqAzlg== X-IronPort-AV: E=McAfee;i="6700,10204,11103"; a="15399411" X-IronPort-AV: E=Sophos;i="6.08,238,1712646000"; d="scan'208";a="15399411" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2024 09:08:57 -0700 X-CSE-ConnectionGUID: P7B4ZEOATw64b2ByqlQHlw== X-CSE-MsgGUID: VuUBpHm/SE2pOap1LEEH3g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,238,1712646000"; d="scan'208";a="71741079" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by fmviesa001.fm.intel.com with ESMTP; 14 Jun 2024 09:08:55 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tim.c.chen@linux.intel.com, tim.c.chen@intel.com, pan.deng@intel.com, tianyou.li@intel.com, yu.ma@intel.com Subject: [PATCH 2/3] fs/file.c: conditionally clear full_fds Date: Fri, 14 Jun 2024 12:34:15 -0400 Message-ID: <20240614163416.728752-3-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240614163416.728752-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" 64 bits in open_fds are mapped to a common bit in full_fds_bits. It is very likely that a bit in full_fds_bits has been cleared before in __clear_open_fds()'s operation. Check the clear bit in full_fds_bits before clearing to avoid unnecessary write and cache bouncing. See commit fc90888d= 07b8 ("vfs: conditionally clear close-on-exec flag") for a similar optimization. Together with patch 1, they improves pts/blogbench-1.1.0 read for 28%, and = write for 14% on Intel ICX 160 cores configuration with v6.8-rc6. Reviewed-by: Tim Chen Signed-off-by: Yu Ma --- fs/file.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/file.c b/fs/file.c index e8d2f9ef7fd1..a0e94a178c0b 100644 --- a/fs/file.c +++ b/fs/file.c @@ -268,7 +268,9 @@ static inline void __set_open_fd(unsigned int fd, struc= t fdtable *fdt) static inline void __clear_open_fd(unsigned int fd, struct fdtable *fdt) { __clear_bit(fd, fdt->open_fds); - __clear_bit(fd / BITS_PER_LONG, fdt->full_fds_bits); + fd /=3D BITS_PER_LONG; + if (test_bit(fd, fdt->full_fds_bits)) + __clear_bit(fd, fdt->full_fds_bits); } =20 static unsigned int count_open_files(struct fdtable *fdt) --=20 2.43.0 From nobody Thu Dec 18 11:22:50 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8FC219D068; Fri, 14 Jun 2024 16:09:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718381342; cv=none; b=Db4qomzak/+95p8h0TK9JuZjJD9cMzLLZuX/qIF/sBoATsvSLH7SWND0/XT4E3jQA9/vSmnLmnkVNzjAYHGz/tLLL1FggCNiOT4UJZrhwu4tXn6kJdG79m6WJU1GCP/tOq/Yym/Khwu04QdjilquZmVa/vjBnGPY3xcEfeHhMdk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718381342; c=relaxed/simple; bh=7j/faoRXzWh8uYOzQ7IXFB9d5L6kjkr5xK99YDrC7To=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QF/pk05xc1Y9bgz27SK+yD9gSzQfrGjxGuFQQ+8MicztL7WoWuotsD7uryKBgSi3GHH14chE/3dXWPfM+fCs+3jne5TjmTPrOHDyziZQlerYODWsXbmz1Inw+XTXVUiGfyzkFcFEOKaKmELvu9vnPNnD/UOuIA5KhWMCbSFtt0c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=VB20f3hC; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="VB20f3hC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1718381341; x=1749917341; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7j/faoRXzWh8uYOzQ7IXFB9d5L6kjkr5xK99YDrC7To=; b=VB20f3hCKJcgOMYIFt4PpDglZ4g2NbLrdeQ4Ok27fSkFjrq9Tno+dNx7 Twzvxth4nNgcUfBNu8J5z4Lp7ekB4PyNkDFKsRISSconfl40Xlm/uIreJ rx6n/h+iR5YEiNjVk0SRieRWW4FvGux98+AKqAKNJVpD7LauFaq58kN27 npclSeVZuSCFUkut7pR1vxX72eq3uJPZkiWcF8zqBmQ0Yl9Z86gojCoLc Bkc0gNFF1Fc23EWgODtAwPhXj4brmPnfNRM8mNPnTsy7fSvE4xk+71kBX uHcNAqsLFI3Cf/n+NBUJwVz6PprqvauBDrr9XPzHeXIr8+vXVrfXC4pJh w==; X-CSE-ConnectionGUID: rur5bjjARpSF1Z+kfpxQRA== X-CSE-MsgGUID: aiTmifgOSJSUm6/oEbx1Pg== X-IronPort-AV: E=McAfee;i="6700,10204,11103"; a="15399431" X-IronPort-AV: E=Sophos;i="6.08,238,1712646000"; d="scan'208";a="15399431" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2024 09:09:00 -0700 X-CSE-ConnectionGUID: Y2zZfm5BTlukoOmsTXo3aA== X-CSE-MsgGUID: koUQl62rTAqdHC4GeeJaBA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,238,1712646000"; d="scan'208";a="71741096" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by fmviesa001.fm.intel.com with ESMTP; 14 Jun 2024 09:08:58 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tim.c.chen@linux.intel.com, tim.c.chen@intel.com, pan.deng@intel.com, tianyou.li@intel.com, yu.ma@intel.com Subject: [PATCH 3/3] fs/file.c: move sanity_check from alloc_fd() to put_unused_fd() Date: Fri, 14 Jun 2024 12:34:16 -0400 Message-ID: <20240614163416.728752-4-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240614163416.728752-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" alloc_fd() has a sanity check inside to make sure the FILE object mapping t= o the allocated fd is NULL. Move the sanity check from performance critical alloc= _fd() path to non performance critical put_unused_fd() path. As the initial NULL FILE object condition can be assured by zero initializa= tion in init_file, we just need to make sure that it is NULL when recycling fd b= ack. There are 3 functions call __put_unused_fd() to return fd, file_close_fd_locked(), do_close_on_exec() and put_unused_fd(). For file_close_fd_locked() and do_close_on_exec(), they have implemented NULL c= heck already. Adds NULL check to put_unused_fd() to cover all release paths. Combined with patch 1 and 2 in series, pts/blogbench-1.1.0 read improved by 32%, write improved by 15% on Intel ICX 160 cores configuration with v6.8-r= c6. Reviewed-by: Tim Chen Signed-off-by: Yu Ma --- fs/file.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/fs/file.c b/fs/file.c index a0e94a178c0b..59d62909e2e3 100644 --- a/fs/file.c +++ b/fs/file.c @@ -548,13 +548,6 @@ static int alloc_fd(unsigned start, unsigned end, unsi= gned flags) else __clear_close_on_exec(fd, fdt); error =3D fd; -#if 1 - /* Sanity check */ - if (rcu_access_pointer(fdt->fd[fd]) !=3D NULL) { - printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd); - rcu_assign_pointer(fdt->fd[fd], NULL); - } -#endif =20 out: spin_unlock(&files->file_lock); @@ -572,7 +565,7 @@ int get_unused_fd_flags(unsigned flags) } EXPORT_SYMBOL(get_unused_fd_flags); =20 -static void __put_unused_fd(struct files_struct *files, unsigned int fd) +static inline void __put_unused_fd(struct files_struct *files, unsigned in= t fd) { struct fdtable *fdt =3D files_fdtable(files); __clear_open_fd(fd, fdt); @@ -583,7 +576,12 @@ static void __put_unused_fd(struct files_struct *files= , unsigned int fd) void put_unused_fd(unsigned int fd) { struct files_struct *files =3D current->files; + struct fdtable *fdt =3D files_fdtable(files); spin_lock(&files->file_lock); + if (unlikely(rcu_access_pointer(fdt->fd[fd]))) { + printk(KERN_WARNING "put_unused_fd: slot %d not NULL!\n", fd); + rcu_assign_pointer(fdt->fd[fd], NULL); + } __put_unused_fd(files, fd); spin_unlock(&files->file_lock); } --=20 2.43.0