alloc_fd() has a sanity check inside to make sure the struct file mapping to the
allocated fd is NULL. Remove this sanity check since it can be assured by
exisitng zero initilization and NULL set when recycling fd.
Combined with patch 1 and 2 in series, pts/blogbench-1.1.0 read improved by
32%, write improved by 17% on Intel ICX 160 cores configuration with v6.10-rc4.
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Yu Ma <yu.ma@intel.com>
---
fs/file.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/fs/file.c b/fs/file.c
index b4d25f6d4c19..1153b0b7ba3d 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -555,13 +555,6 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags)
else
__clear_close_on_exec(fd, fdt);
error = fd;
-#if 1
- /* Sanity check */
- if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
- printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
- rcu_assign_pointer(fdt->fd[fd], NULL);
- }
-#endif
out:
spin_unlock(&files->file_lock);
--
2.43.0
On Sat 22-06-24 11:49:04, Yu Ma wrote:
> alloc_fd() has a sanity check inside to make sure the struct file mapping to the
> allocated fd is NULL. Remove this sanity check since it can be assured by
> exisitng zero initilization and NULL set when recycling fd.
^^^ existing ^^^ initialization
Well, since this is a sanity check, it is expected it never hits. Yet
searching the web shows it has hit a few times in the past :). So would
wrapping this with unlikely() give a similar performance gain while keeping
debugability? If unlikely() does not help, I agree we can remove this since
fd_install() actually has the same check:
BUG_ON(fdt->fd[fd] != NULL);
and there we need the cacheline anyway so performance impact is minimal.
Now, this condition in alloc_fd() is nice that it does not take the kernel
down so perhaps we could change the BUG_ON to WARN() dumping similar kind
of info as alloc_fd()?
Honza
> Combined with patch 1 and 2 in series, pts/blogbench-1.1.0 read improved by
> 32%, write improved by 17% on Intel ICX 160 cores configuration with v6.10-rc4.
>
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Yu Ma <yu.ma@intel.com>
> ---
> fs/file.c | 7 -------
> 1 file changed, 7 deletions(-)
>
> diff --git a/fs/file.c b/fs/file.c
> index b4d25f6d4c19..1153b0b7ba3d 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -555,13 +555,6 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags)
> else
> __clear_close_on_exec(fd, fdt);
> error = fd;
> -#if 1
> - /* Sanity check */
> - if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
> - printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
> - rcu_assign_pointer(fdt->fd[fd], NULL);
> - }
> -#endif
>
> out:
> spin_unlock(&files->file_lock);
> --
> 2.43.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
On Tue, Jun 25, 2024 at 2:08 PM Jan Kara <jack@suse.cz> wrote:
>
> On Sat 22-06-24 11:49:04, Yu Ma wrote:
> > alloc_fd() has a sanity check inside to make sure the struct file mapping to the
> > allocated fd is NULL. Remove this sanity check since it can be assured by
> > exisitng zero initilization and NULL set when recycling fd.
> ^^^ existing ^^^ initialization
>
> Well, since this is a sanity check, it is expected it never hits. Yet
> searching the web shows it has hit a few times in the past :). So would
> wrapping this with unlikely() give a similar performance gain while keeping
> debugability? If unlikely() does not help, I agree we can remove this since
> fd_install() actually has the same check:
>
> BUG_ON(fdt->fd[fd] != NULL);
>
> and there we need the cacheline anyway so performance impact is minimal.
> Now, this condition in alloc_fd() is nice that it does not take the kernel
> down so perhaps we could change the BUG_ON to WARN() dumping similar kind
> of info as alloc_fd()?
>
Christian suggested just removing it.
To my understanding the problem is not the branch per se, but the the
cacheline bounce of the fd array induced by reading the status.
Note the thing also nullifies the pointer, kind of defeating the
BUG_ON in fd_install.
I'm guessing it's not going to hurt to branch on it after releasing
the lock and forego nullifying, more or less:
diff --git a/fs/file.c b/fs/file.c
index a3b72aa64f11..d22b867db246 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -524,11 +524,11 @@ static int alloc_fd(unsigned start, unsigned
end, unsigned flags)
*/
error = -EMFILE;
if (fd >= end)
- goto out;
+ goto out_locked;
error = expand_files(files, fd);
if (error < 0)
- goto out;
+ goto out_locked;
/*
* If we needed to expand the fs array we
@@ -546,15 +546,15 @@ static int alloc_fd(unsigned start, unsigned
end, unsigned flags)
else
__clear_close_on_exec(fd, fdt);
error = fd;
-#if 1
- /* Sanity check */
- if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
+ spin_unlock(&files->file_lock);
+
+ if (unlikely(rcu_access_pointer(fdt->fd[fd]) != NULL)) {
printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
- rcu_assign_pointer(fdt->fd[fd], NULL);
}
-#endif
-out:
+ return error;
+
+out_locked:
spin_unlock(&files->file_lock);
return error;
}
> Honza
>
> > Combined with patch 1 and 2 in series, pts/blogbench-1.1.0 read improved by
> > 32%, write improved by 17% on Intel ICX 160 cores configuration with v6.10-rc4.
> >
> > Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Signed-off-by: Yu Ma <yu.ma@intel.com>
> > ---
> > fs/file.c | 7 -------
> > 1 file changed, 7 deletions(-)
> >
> > diff --git a/fs/file.c b/fs/file.c
> > index b4d25f6d4c19..1153b0b7ba3d 100644
> > --- a/fs/file.c
> > +++ b/fs/file.c
> > @@ -555,13 +555,6 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags)
> > else
> > __clear_close_on_exec(fd, fdt);
> > error = fd;
> > -#if 1
> > - /* Sanity check */
> > - if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
> > - printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
> > - rcu_assign_pointer(fdt->fd[fd], NULL);
> > - }
> > -#endif
> >
> > out:
> > spin_unlock(&files->file_lock);
> > --
> > 2.43.0
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
--
Mateusz Guzik <mjguzik gmail.com>
On Tue, Jun 25, 2024 at 3:09 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Tue, Jun 25, 2024 at 2:08 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Sat 22-06-24 11:49:04, Yu Ma wrote:
> > > alloc_fd() has a sanity check inside to make sure the struct file mapping to the
> > > allocated fd is NULL. Remove this sanity check since it can be assured by
> > > exisitng zero initilization and NULL set when recycling fd.
> > ^^^ existing ^^^ initialization
> >
> > Well, since this is a sanity check, it is expected it never hits. Yet
> > searching the web shows it has hit a few times in the past :). So would
> > wrapping this with unlikely() give a similar performance gain while keeping
> > debugability? If unlikely() does not help, I agree we can remove this since
> > fd_install() actually has the same check:
> >
> > BUG_ON(fdt->fd[fd] != NULL);
> >
> > and there we need the cacheline anyway so performance impact is minimal.
> > Now, this condition in alloc_fd() is nice that it does not take the kernel
> > down so perhaps we could change the BUG_ON to WARN() dumping similar kind
> > of info as alloc_fd()?
> >
>
> Christian suggested just removing it.
>
> To my understanding the problem is not the branch per se, but the the
> cacheline bounce of the fd array induced by reading the status.
>
> Note the thing also nullifies the pointer, kind of defeating the
> BUG_ON in fd_install.
>
> I'm guessing it's not going to hurt to branch on it after releasing
> the lock and forego nullifying, more or less:
> diff --git a/fs/file.c b/fs/file.c
> index a3b72aa64f11..d22b867db246 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -524,11 +524,11 @@ static int alloc_fd(unsigned start, unsigned
> end, unsigned flags)
> */
> error = -EMFILE;
> if (fd >= end)
> - goto out;
> + goto out_locked;
>
> error = expand_files(files, fd);
> if (error < 0)
> - goto out;
> + goto out_locked;
>
> /*
> * If we needed to expand the fs array we
> @@ -546,15 +546,15 @@ static int alloc_fd(unsigned start, unsigned
> end, unsigned flags)
> else
> __clear_close_on_exec(fd, fdt);
> error = fd;
> -#if 1
> - /* Sanity check */
> - if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
> + spin_unlock(&files->file_lock);
> +
> + if (unlikely(rcu_access_pointer(fdt->fd[fd]) != NULL)) {
> printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
> - rcu_assign_pointer(fdt->fd[fd], NULL);
> }
> -#endif
Now that I sent it it is of course not safe to deref without
protection from either rcu or the lock, so this would have to be
wrapped with rcu_read_lock, which makes it even less appealing.
Whacking the thing as in the submitted patch seems like the best way
forward here. :)
>
> -out:
> + return error;
> +
> +out_locked:
> spin_unlock(&files->file_lock);
> return error;
> }
>
>
>
> > Honza
> >
> > > Combined with patch 1 and 2 in series, pts/blogbench-1.1.0 read improved by
> > > 32%, write improved by 17% on Intel ICX 160 cores configuration with v6.10-rc4.
> > >
> > > Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > Signed-off-by: Yu Ma <yu.ma@intel.com>
> > > ---
> > > fs/file.c | 7 -------
> > > 1 file changed, 7 deletions(-)
> > >
> > > diff --git a/fs/file.c b/fs/file.c
> > > index b4d25f6d4c19..1153b0b7ba3d 100644
> > > --- a/fs/file.c
> > > +++ b/fs/file.c
> > > @@ -555,13 +555,6 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags)
> > > else
> > > __clear_close_on_exec(fd, fdt);
> > > error = fd;
> > > -#if 1
> > > - /* Sanity check */
> > > - if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
> > > - printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
> > > - rcu_assign_pointer(fdt->fd[fd], NULL);
> > > - }
> > > -#endif
> > >
> > > out:
> > > spin_unlock(&files->file_lock);
> > > --
> > > 2.43.0
> > >
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
>
>
>
> --
> Mateusz Guzik <mjguzik gmail.com>
--
Mateusz Guzik <mjguzik gmail.com>
On Tue 25-06-24 15:11:23, Mateusz Guzik wrote:
> On Tue, Jun 25, 2024 at 3:09 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > On Tue, Jun 25, 2024 at 2:08 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Sat 22-06-24 11:49:04, Yu Ma wrote:
> > > > alloc_fd() has a sanity check inside to make sure the struct file mapping to the
> > > > allocated fd is NULL. Remove this sanity check since it can be assured by
> > > > exisitng zero initilization and NULL set when recycling fd.
> > > ^^^ existing ^^^ initialization
> > >
> > > Well, since this is a sanity check, it is expected it never hits. Yet
> > > searching the web shows it has hit a few times in the past :). So would
> > > wrapping this with unlikely() give a similar performance gain while keeping
> > > debugability? If unlikely() does not help, I agree we can remove this since
> > > fd_install() actually has the same check:
> > >
> > > BUG_ON(fdt->fd[fd] != NULL);
> > >
> > > and there we need the cacheline anyway so performance impact is minimal.
> > > Now, this condition in alloc_fd() is nice that it does not take the kernel
> > > down so perhaps we could change the BUG_ON to WARN() dumping similar kind
> > > of info as alloc_fd()?
> > >
> >
> > Christian suggested just removing it.
> >
> > To my understanding the problem is not the branch per se, but the the
> > cacheline bounce of the fd array induced by reading the status.
> >
> > Note the thing also nullifies the pointer, kind of defeating the
> > BUG_ON in fd_install.
> >
> > I'm guessing it's not going to hurt to branch on it after releasing
> > the lock and forego nullifying, more or less:
> > diff --git a/fs/file.c b/fs/file.c
> > index a3b72aa64f11..d22b867db246 100644
> > --- a/fs/file.c
> > +++ b/fs/file.c
> > @@ -524,11 +524,11 @@ static int alloc_fd(unsigned start, unsigned
> > end, unsigned flags)
> > */
> > error = -EMFILE;
> > if (fd >= end)
> > - goto out;
> > + goto out_locked;
> >
> > error = expand_files(files, fd);
> > if (error < 0)
> > - goto out;
> > + goto out_locked;
> >
> > /*
> > * If we needed to expand the fs array we
> > @@ -546,15 +546,15 @@ static int alloc_fd(unsigned start, unsigned
> > end, unsigned flags)
> > else
> > __clear_close_on_exec(fd, fdt);
> > error = fd;
> > -#if 1
> > - /* Sanity check */
> > - if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
> > + spin_unlock(&files->file_lock);
> > +
> > + if (unlikely(rcu_access_pointer(fdt->fd[fd]) != NULL)) {
> > printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
> > - rcu_assign_pointer(fdt->fd[fd], NULL);
> > }
> > -#endif
>
> Now that I sent it it is of course not safe to deref without
> protection from either rcu or the lock, so this would have to be
> wrapped with rcu_read_lock, which makes it even less appealing.
>
> Whacking the thing as in the submitted patch seems like the best way
> forward here. :)
Yeah, as I wrote, I'm fine removing it, in particular if Christian is of
the same opinion. I was more musing about whether we should make the check
in fd_install() less aggressive since it is now more likely to trigger...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
On Tue, Jun 25, 2024 at 03:30:31PM GMT, Jan Kara wrote:
> On Tue 25-06-24 15:11:23, Mateusz Guzik wrote:
> > On Tue, Jun 25, 2024 at 3:09 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> > >
> > > On Tue, Jun 25, 2024 at 2:08 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Sat 22-06-24 11:49:04, Yu Ma wrote:
> > > > > alloc_fd() has a sanity check inside to make sure the struct file mapping to the
> > > > > allocated fd is NULL. Remove this sanity check since it can be assured by
> > > > > exisitng zero initilization and NULL set when recycling fd.
> > > > ^^^ existing ^^^ initialization
> > > >
> > > > Well, since this is a sanity check, it is expected it never hits. Yet
> > > > searching the web shows it has hit a few times in the past :). So would
> > > > wrapping this with unlikely() give a similar performance gain while keeping
> > > > debugability? If unlikely() does not help, I agree we can remove this since
> > > > fd_install() actually has the same check:
> > > >
> > > > BUG_ON(fdt->fd[fd] != NULL);
> > > >
> > > > and there we need the cacheline anyway so performance impact is minimal.
> > > > Now, this condition in alloc_fd() is nice that it does not take the kernel
> > > > down so perhaps we could change the BUG_ON to WARN() dumping similar kind
> > > > of info as alloc_fd()?
> > > >
> > >
> > > Christian suggested just removing it.
> > >
> > > To my understanding the problem is not the branch per se, but the the
> > > cacheline bounce of the fd array induced by reading the status.
> > >
> > > Note the thing also nullifies the pointer, kind of defeating the
> > > BUG_ON in fd_install.
> > >
> > > I'm guessing it's not going to hurt to branch on it after releasing
> > > the lock and forego nullifying, more or less:
> > > diff --git a/fs/file.c b/fs/file.c
> > > index a3b72aa64f11..d22b867db246 100644
> > > --- a/fs/file.c
> > > +++ b/fs/file.c
> > > @@ -524,11 +524,11 @@ static int alloc_fd(unsigned start, unsigned
> > > end, unsigned flags)
> > > */
> > > error = -EMFILE;
> > > if (fd >= end)
> > > - goto out;
> > > + goto out_locked;
> > >
> > > error = expand_files(files, fd);
> > > if (error < 0)
> > > - goto out;
> > > + goto out_locked;
> > >
> > > /*
> > > * If we needed to expand the fs array we
> > > @@ -546,15 +546,15 @@ static int alloc_fd(unsigned start, unsigned
> > > end, unsigned flags)
> > > else
> > > __clear_close_on_exec(fd, fdt);
> > > error = fd;
> > > -#if 1
> > > - /* Sanity check */
> > > - if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
> > > + spin_unlock(&files->file_lock);
> > > +
> > > + if (unlikely(rcu_access_pointer(fdt->fd[fd]) != NULL)) {
> > > printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
> > > - rcu_assign_pointer(fdt->fd[fd], NULL);
> > > }
> > > -#endif
> >
> > Now that I sent it it is of course not safe to deref without
> > protection from either rcu or the lock, so this would have to be
> > wrapped with rcu_read_lock, which makes it even less appealing.
> >
> > Whacking the thing as in the submitted patch seems like the best way
> > forward here. :)
>
> Yeah, as I wrote, I'm fine removing it, in particular if Christian is of
> the same opinion. I was more musing about whether we should make the check
> in fd_install() less aggressive since it is now more likely to trigger...
We could change it to WARN_ON() and then people can get BUG_ON()
behavior when they turn WARN into BUG which apparently is a thing that
we support.
© 2016 - 2025 Red Hat, Inc.