[v2] fs/file.c: optimize the critical section of file_lock in

[PATCH v2 3/3] fs/file.c: remove sanity_check from alloc_fd()

Posted by Yu Ma 1 year, 7 months ago

alloc_fd() has a sanity check inside to make sure the struct file mapping to the
allocated fd is NULL. Remove this sanity check since it can be assured by
exisitng zero initilization and NULL set when recycling fd.

Combined with patch 1 and 2 in series, pts/blogbench-1.1.0 read improved by
32%, write improved by 17% on Intel ICX 160 cores configuration with v6.10-rc4.

Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Yu Ma <yu.ma@intel.com>
---
 fs/file.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index b4d25f6d4c19..1153b0b7ba3d 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -555,13 +555,6 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags)
 	else
 		__clear_close_on_exec(fd, fdt);
 	error = fd;
-#if 1
-	/* Sanity check */
-	if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
-		printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
-		rcu_assign_pointer(fdt->fd[fd], NULL);
-	}
-#endif
 
 out:
 	spin_unlock(&files->file_lock);
-- 
2.43.0

Re: [PATCH v2 3/3] fs/file.c: remove sanity_check from alloc_fd()

Posted by Jan Kara 1 year, 7 months ago

On Sat 22-06-24 11:49:04, Yu Ma wrote:
> alloc_fd() has a sanity check inside to make sure the struct file mapping to the
> allocated fd is NULL. Remove this sanity check since it can be assured by
> exisitng zero initilization and NULL set when recycling fd.
  ^^^ existing  ^^^ initialization

Well, since this is a sanity check, it is expected it never hits. Yet
searching the web shows it has hit a few times in the past :). So would
wrapping this with unlikely() give a similar performance gain while keeping
debugability? If unlikely() does not help, I agree we can remove this since
fd_install() actually has the same check:

BUG_ON(fdt->fd[fd] != NULL);

and there we need the cacheline anyway so performance impact is minimal.
Now, this condition in alloc_fd() is nice that it does not take the kernel
down so perhaps we could change the BUG_ON to WARN() dumping similar kind
of info as alloc_fd()?

								Honza

> Combined with patch 1 and 2 in series, pts/blogbench-1.1.0 read improved by
> 32%, write improved by 17% on Intel ICX 160 cores configuration with v6.10-rc4.
> 
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Yu Ma <yu.ma@intel.com>
> ---
>  fs/file.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/fs/file.c b/fs/file.c
> index b4d25f6d4c19..1153b0b7ba3d 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -555,13 +555,6 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags)
>  	else
>  		__clear_close_on_exec(fd, fdt);
>  	error = fd;
> -#if 1
> -	/* Sanity check */
> -	if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
> -		printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
> -		rcu_assign_pointer(fdt->fd[fd], NULL);
> -	}
> -#endif
>  
>  out:
>  	spin_unlock(&files->file_lock);
> -- 
> 2.43.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [PATCH v2 3/3] fs/file.c: remove sanity_check from alloc_fd()

Posted by Mateusz Guzik 1 year, 7 months ago

On Tue, Jun 25, 2024 at 2:08 PM Jan Kara <jack@suse.cz> wrote:
>
> On Sat 22-06-24 11:49:04, Yu Ma wrote:
> > alloc_fd() has a sanity check inside to make sure the struct file mapping to the
> > allocated fd is NULL. Remove this sanity check since it can be assured by
> > exisitng zero initilization and NULL set when recycling fd.
>   ^^^ existing  ^^^ initialization
>
> Well, since this is a sanity check, it is expected it never hits. Yet
> searching the web shows it has hit a few times in the past :). So would
> wrapping this with unlikely() give a similar performance gain while keeping
> debugability? If unlikely() does not help, I agree we can remove this since
> fd_install() actually has the same check:
>
> BUG_ON(fdt->fd[fd] != NULL);
>
> and there we need the cacheline anyway so performance impact is minimal.
> Now, this condition in alloc_fd() is nice that it does not take the kernel
> down so perhaps we could change the BUG_ON to WARN() dumping similar kind
> of info as alloc_fd()?
>

Christian suggested just removing it.

To my understanding the problem is not the branch per se, but the the
cacheline bounce of the fd array induced by reading the status.

Note the thing also nullifies the pointer, kind of defeating the
BUG_ON in fd_install.

I'm guessing it's not going to hurt to branch on it after releasing
the lock and forego nullifying, more or less:
diff --git a/fs/file.c b/fs/file.c
index a3b72aa64f11..d22b867db246 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -524,11 +524,11 @@ static int alloc_fd(unsigned start, unsigned
end, unsigned flags)
         */
        error = -EMFILE;
        if (fd >= end)
-               goto out;
+               goto out_locked;

        error = expand_files(files, fd);
        if (error < 0)
-               goto out;
+               goto out_locked;

        /*
         * If we needed to expand the fs array we
@@ -546,15 +546,15 @@ static int alloc_fd(unsigned start, unsigned
end, unsigned flags)
        else
                __clear_close_on_exec(fd, fdt);
        error = fd;
-#if 1
-       /* Sanity check */
-       if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
+       spin_unlock(&files->file_lock);
+
+       if (unlikely(rcu_access_pointer(fdt->fd[fd]) != NULL)) {
                printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
-               rcu_assign_pointer(fdt->fd[fd], NULL);
        }
-#endif

-out:
+       return error;
+
+out_locked:
        spin_unlock(&files->file_lock);
        return error;
 }



>                                                                 Honza
>
> > Combined with patch 1 and 2 in series, pts/blogbench-1.1.0 read improved by
> > 32%, write improved by 17% on Intel ICX 160 cores configuration with v6.10-rc4.
> >
> > Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Signed-off-by: Yu Ma <yu.ma@intel.com>
> > ---
> >  fs/file.c | 7 -------
> >  1 file changed, 7 deletions(-)
> >
> > diff --git a/fs/file.c b/fs/file.c
> > index b4d25f6d4c19..1153b0b7ba3d 100644
> > --- a/fs/file.c
> > +++ b/fs/file.c
> > @@ -555,13 +555,6 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags)
> >       else
> >               __clear_close_on_exec(fd, fdt);
> >       error = fd;
> > -#if 1
> > -     /* Sanity check */
> > -     if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
> > -             printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
> > -             rcu_assign_pointer(fdt->fd[fd], NULL);
> > -     }
> > -#endif
> >
> >  out:
> >       spin_unlock(&files->file_lock);
> > --
> > 2.43.0
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR



-- 
Mateusz Guzik <mjguzik gmail.com>

Re: [PATCH v2 3/3] fs/file.c: remove sanity_check from alloc_fd()

Posted by Mateusz Guzik 1 year, 7 months ago

On Tue, Jun 25, 2024 at 3:09 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Tue, Jun 25, 2024 at 2:08 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Sat 22-06-24 11:49:04, Yu Ma wrote:
> > > alloc_fd() has a sanity check inside to make sure the struct file mapping to the
> > > allocated fd is NULL. Remove this sanity check since it can be assured by
> > > exisitng zero initilization and NULL set when recycling fd.
> >   ^^^ existing  ^^^ initialization
> >
> > Well, since this is a sanity check, it is expected it never hits. Yet
> > searching the web shows it has hit a few times in the past :). So would
> > wrapping this with unlikely() give a similar performance gain while keeping
> > debugability? If unlikely() does not help, I agree we can remove this since
> > fd_install() actually has the same check:
> >
> > BUG_ON(fdt->fd[fd] != NULL);
> >
> > and there we need the cacheline anyway so performance impact is minimal.
> > Now, this condition in alloc_fd() is nice that it does not take the kernel
> > down so perhaps we could change the BUG_ON to WARN() dumping similar kind
> > of info as alloc_fd()?
> >
>
> Christian suggested just removing it.
>
> To my understanding the problem is not the branch per se, but the the
> cacheline bounce of the fd array induced by reading the status.
>
> Note the thing also nullifies the pointer, kind of defeating the
> BUG_ON in fd_install.
>
> I'm guessing it's not going to hurt to branch on it after releasing
> the lock and forego nullifying, more or less:
> diff --git a/fs/file.c b/fs/file.c
> index a3b72aa64f11..d22b867db246 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -524,11 +524,11 @@ static int alloc_fd(unsigned start, unsigned
> end, unsigned flags)
>          */
>         error = -EMFILE;
>         if (fd >= end)
> -               goto out;
> +               goto out_locked;
>
>         error = expand_files(files, fd);
>         if (error < 0)
> -               goto out;
> +               goto out_locked;
>
>         /*
>          * If we needed to expand the fs array we
> @@ -546,15 +546,15 @@ static int alloc_fd(unsigned start, unsigned
> end, unsigned flags)
>         else
>                 __clear_close_on_exec(fd, fdt);
>         error = fd;
> -#if 1
> -       /* Sanity check */
> -       if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
> +       spin_unlock(&files->file_lock);
> +
> +       if (unlikely(rcu_access_pointer(fdt->fd[fd]) != NULL)) {
>                 printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
> -               rcu_assign_pointer(fdt->fd[fd], NULL);
>         }
> -#endif

Now that I sent it it is of course not safe to deref without
protection from either rcu or the lock, so this would have to be
wrapped with rcu_read_lock, which makes it even less appealing.

Whacking the thing as in the submitted patch seems like the best way
forward here. :)

>
> -out:
> +       return error;
> +
> +out_locked:
>         spin_unlock(&files->file_lock);
>         return error;
>  }
>
>
>
> >                                                                 Honza
> >
> > > Combined with patch 1 and 2 in series, pts/blogbench-1.1.0 read improved by
> > > 32%, write improved by 17% on Intel ICX 160 cores configuration with v6.10-rc4.
> > >
> > > Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > Signed-off-by: Yu Ma <yu.ma@intel.com>
> > > ---
> > >  fs/file.c | 7 -------
> > >  1 file changed, 7 deletions(-)
> > >
> > > diff --git a/fs/file.c b/fs/file.c
> > > index b4d25f6d4c19..1153b0b7ba3d 100644
> > > --- a/fs/file.c
> > > +++ b/fs/file.c
> > > @@ -555,13 +555,6 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags)
> > >       else
> > >               __clear_close_on_exec(fd, fdt);
> > >       error = fd;
> > > -#if 1
> > > -     /* Sanity check */
> > > -     if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
> > > -             printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
> > > -             rcu_assign_pointer(fdt->fd[fd], NULL);
> > > -     }
> > > -#endif
> > >
> > >  out:
> > >       spin_unlock(&files->file_lock);
> > > --
> > > 2.43.0
> > >
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
>
>
>
> --
> Mateusz Guzik <mjguzik gmail.com>



-- 
Mateusz Guzik <mjguzik gmail.com>

Re: [PATCH v2 3/3] fs/file.c: remove sanity_check from alloc_fd()

Posted by Jan Kara 1 year, 7 months ago

On Tue 25-06-24 15:11:23, Mateusz Guzik wrote:
> On Tue, Jun 25, 2024 at 3:09 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > On Tue, Jun 25, 2024 at 2:08 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Sat 22-06-24 11:49:04, Yu Ma wrote:
> > > > alloc_fd() has a sanity check inside to make sure the struct file mapping to the
> > > > allocated fd is NULL. Remove this sanity check since it can be assured by
> > > > exisitng zero initilization and NULL set when recycling fd.
> > >   ^^^ existing  ^^^ initialization
> > >
> > > Well, since this is a sanity check, it is expected it never hits. Yet
> > > searching the web shows it has hit a few times in the past :). So would
> > > wrapping this with unlikely() give a similar performance gain while keeping
> > > debugability? If unlikely() does not help, I agree we can remove this since
> > > fd_install() actually has the same check:
> > >
> > > BUG_ON(fdt->fd[fd] != NULL);
> > >
> > > and there we need the cacheline anyway so performance impact is minimal.
> > > Now, this condition in alloc_fd() is nice that it does not take the kernel
> > > down so perhaps we could change the BUG_ON to WARN() dumping similar kind
> > > of info as alloc_fd()?
> > >
> >
> > Christian suggested just removing it.
> >
> > To my understanding the problem is not the branch per se, but the the
> > cacheline bounce of the fd array induced by reading the status.
> >
> > Note the thing also nullifies the pointer, kind of defeating the
> > BUG_ON in fd_install.
> >
> > I'm guessing it's not going to hurt to branch on it after releasing
> > the lock and forego nullifying, more or less:
> > diff --git a/fs/file.c b/fs/file.c
> > index a3b72aa64f11..d22b867db246 100644
> > --- a/fs/file.c
> > +++ b/fs/file.c
> > @@ -524,11 +524,11 @@ static int alloc_fd(unsigned start, unsigned
> > end, unsigned flags)
> >          */
> >         error = -EMFILE;
> >         if (fd >= end)
> > -               goto out;
> > +               goto out_locked;
> >
> >         error = expand_files(files, fd);
> >         if (error < 0)
> > -               goto out;
> > +               goto out_locked;
> >
> >         /*
> >          * If we needed to expand the fs array we
> > @@ -546,15 +546,15 @@ static int alloc_fd(unsigned start, unsigned
> > end, unsigned flags)
> >         else
> >                 __clear_close_on_exec(fd, fdt);
> >         error = fd;
> > -#if 1
> > -       /* Sanity check */
> > -       if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
> > +       spin_unlock(&files->file_lock);
> > +
> > +       if (unlikely(rcu_access_pointer(fdt->fd[fd]) != NULL)) {
> >                 printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
> > -               rcu_assign_pointer(fdt->fd[fd], NULL);
> >         }
> > -#endif
> 
> Now that I sent it it is of course not safe to deref without
> protection from either rcu or the lock, so this would have to be
> wrapped with rcu_read_lock, which makes it even less appealing.
> 
> Whacking the thing as in the submitted patch seems like the best way
> forward here. :)

Yeah, as I wrote, I'm fine removing it, in particular if Christian is of
the same opinion. I was more musing about whether we should make the check
in fd_install() less aggressive since it is now more likely to trigger...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [PATCH v2 3/3] fs/file.c: remove sanity_check from alloc_fd()

Posted by Christian Brauner 1 year, 7 months ago

On Tue, Jun 25, 2024 at 03:30:31PM GMT, Jan Kara wrote:
> On Tue 25-06-24 15:11:23, Mateusz Guzik wrote:
> > On Tue, Jun 25, 2024 at 3:09 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> > >
> > > On Tue, Jun 25, 2024 at 2:08 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Sat 22-06-24 11:49:04, Yu Ma wrote:
> > > > > alloc_fd() has a sanity check inside to make sure the struct file mapping to the
> > > > > allocated fd is NULL. Remove this sanity check since it can be assured by
> > > > > exisitng zero initilization and NULL set when recycling fd.
> > > >   ^^^ existing  ^^^ initialization
> > > >
> > > > Well, since this is a sanity check, it is expected it never hits. Yet
> > > > searching the web shows it has hit a few times in the past :). So would
> > > > wrapping this with unlikely() give a similar performance gain while keeping
> > > > debugability? If unlikely() does not help, I agree we can remove this since
> > > > fd_install() actually has the same check:
> > > >
> > > > BUG_ON(fdt->fd[fd] != NULL);
> > > >
> > > > and there we need the cacheline anyway so performance impact is minimal.
> > > > Now, this condition in alloc_fd() is nice that it does not take the kernel
> > > > down so perhaps we could change the BUG_ON to WARN() dumping similar kind
> > > > of info as alloc_fd()?
> > > >
> > >
> > > Christian suggested just removing it.
> > >
> > > To my understanding the problem is not the branch per se, but the the
> > > cacheline bounce of the fd array induced by reading the status.
> > >
> > > Note the thing also nullifies the pointer, kind of defeating the
> > > BUG_ON in fd_install.
> > >
> > > I'm guessing it's not going to hurt to branch on it after releasing
> > > the lock and forego nullifying, more or less:
> > > diff --git a/fs/file.c b/fs/file.c
> > > index a3b72aa64f11..d22b867db246 100644
> > > --- a/fs/file.c
> > > +++ b/fs/file.c
> > > @@ -524,11 +524,11 @@ static int alloc_fd(unsigned start, unsigned
> > > end, unsigned flags)
> > >          */
> > >         error = -EMFILE;
> > >         if (fd >= end)
> > > -               goto out;
> > > +               goto out_locked;
> > >
> > >         error = expand_files(files, fd);
> > >         if (error < 0)
> > > -               goto out;
> > > +               goto out_locked;
> > >
> > >         /*
> > >          * If we needed to expand the fs array we
> > > @@ -546,15 +546,15 @@ static int alloc_fd(unsigned start, unsigned
> > > end, unsigned flags)
> > >         else
> > >                 __clear_close_on_exec(fd, fdt);
> > >         error = fd;
> > > -#if 1
> > > -       /* Sanity check */
> > > -       if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
> > > +       spin_unlock(&files->file_lock);
> > > +
> > > +       if (unlikely(rcu_access_pointer(fdt->fd[fd]) != NULL)) {
> > >                 printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
> > > -               rcu_assign_pointer(fdt->fd[fd], NULL);
> > >         }
> > > -#endif
> > 
> > Now that I sent it it is of course not safe to deref without
> > protection from either rcu or the lock, so this would have to be
> > wrapped with rcu_read_lock, which makes it even less appealing.
> > 
> > Whacking the thing as in the submitted patch seems like the best way
> > forward here. :)
> 
> Yeah, as I wrote, I'm fine removing it, in particular if Christian is of
> the same opinion. I was more musing about whether we should make the check
> in fd_install() less aggressive since it is now more likely to trigger...

We could change it to WARN_ON() and then people can get BUG_ON()
behavior when they turn WARN into BUG which apparently is a thing that
we support.