nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire

[PATCH] nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire

Posted by Jeff Layton 1 year, 7 months ago

Given that we do the search and insertion while holding the i_lock, I
don't think it's possible for us to get EEXIST here. Remove this case.

Cc: Youzhong Yang <youzhong@gmail.com>
Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
This is replacement for PATCH 1/3 in the series I sent yesterday. I
think it makes sense to just eliminate this case.
---
 fs/nfsd/filecache.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index f84913691b78..b9dc7c22242c 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (likely(ret == 0))
 		goto open_file;
 
-	if (ret == -EEXIST)
-		goto retry;
 	trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
 	status = nfserr_jukebox;
 	goto construction_err;

---
base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
change-id: 20240711-nfsd-next-c9d17f66e2bd

Best regards,
-- 
Jeff Layton <jlayton@kernel.org>

Re: [PATCH] nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire

Posted by NeilBrown 1 year, 6 months ago

On Fri, 12 Jul 2024, Jeff Layton wrote:
> Given that we do the search and insertion while holding the i_lock, I
> don't think it's possible for us to get EEXIST here. Remove this case.

I was going to comment that as rhltable_insert() cannot return -EEXIST
that is an extra reason to discard the check.  But then I looked at the
code an I cannot convince myself that it cannot.
If __rhashtable_insert_fast() finds that tbl->future_tbl is not NULL it
calls rhashtable_insert_slow(), and that seems to fail if the key
already exists.  But it shouldn't for an rhltable, it should just add
the new item to the linked list for that key.

It looks like this has always been broken: adding to an rhltable during
a resize event can cause EEXIST....

Would anyone like to check my work?  I'm surprise that hasn't been
noticed if it is really the case.

NeilBrown

> 
> Cc: Youzhong Yang <youzhong@gmail.com>
> Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> This is replacement for PATCH 1/3 in the series I sent yesterday. I
> think it makes sense to just eliminate this case.
> ---
>  fs/nfsd/filecache.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index f84913691b78..b9dc7c22242c 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (likely(ret == 0))
>  		goto open_file;
>  
> -	if (ret == -EEXIST)
> -		goto retry;
>  	trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
>  	status = nfserr_jukebox;
>  	goto construction_err;
> 
> ---
> base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
> change-id: 20240711-nfsd-next-c9d17f66e2bd
> 
> Best regards,
> -- 
> Jeff Layton <jlayton@kernel.org>
> 
>

Re: [PATCH] nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire

Posted by Jeff Layton 1 year, 6 months ago

On Mon, 2024-07-15 at 10:27 +1000, NeilBrown wrote:
> On Fri, 12 Jul 2024, Jeff Layton wrote:
> > Given that we do the search and insertion while holding the i_lock, I
> > don't think it's possible for us to get EEXIST here. Remove this case.
> 
> I was going to comment that as rhltable_insert() cannot return -EEXIST
> that is an extra reason to discard the check.  But then I looked at the
> code an I cannot convince myself that it cannot.
> If __rhashtable_insert_fast() finds that tbl->future_tbl is not NULL it
> calls rhashtable_insert_slow(), and that seems to fail if the key
> already exists.  But it shouldn't for an rhltable, it should just add
> the new item to the linked list for that key.
> 
> It looks like this has always been broken: adding to an rhltable during
> a resize event can cause EEXIST....
> 
> Would anyone like to check my work?  I'm surprise that hasn't been
> noticed if it is really the case.
> 
> 

I don't know this code well at all, but it looks correct to me:

static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
                                   struct rhash_head *obj)
{
        struct bucket_table *new_tbl;
        struct bucket_table *tbl;
        struct rhash_lock_head __rcu **bkt;
        unsigned long flags;
        unsigned int hash;
        void *data;

        new_tbl = rcu_dereference(ht->tbl);

        do {
                tbl = new_tbl;
                hash = rht_head_hashfn(ht, tbl, obj, ht->p);
                if (rcu_access_pointer(tbl->future_tbl))
                        /* Failure is OK */
                        bkt = rht_bucket_var(tbl, hash);
                else
                        bkt = rht_bucket_insert(ht, tbl, hash);
                if (bkt == NULL) {
                        new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
                        data = ERR_PTR(-EAGAIN);
                } else {
                        flags = rht_lock(tbl, bkt);
                        data = rhashtable_lookup_one(ht, bkt, tbl,
                                                     hash, key, obj);
                        new_tbl = rhashtable_insert_one(ht, bkt, tbl,
                                                        hash, obj, data);
                        if (PTR_ERR(new_tbl) != -EEXIST)
                                data = ERR_CAST(new_tbl);

                        rht_unlock(tbl, bkt, flags);
                }
        } while (!IS_ERR_OR_NULL(new_tbl));

        if (PTR_ERR(data) == -EAGAIN)
                data = ERR_PTR(rhashtable_insert_rehash(ht, tbl) ?:
                               -EAGAIN);

        return data;
}

I'm assuming the part we need to worry about is where
rhashtable_insert_one returns -EEXIST.

It holds the rht_lock across the lookup and insert though. So if
rhashtable_insert_one returns -EEXIST, then "data" must be something
valid. In that case, "data" won't be overwritten and it will fall
through and return the pointer to the entry already there.

That said, this logic is really convoluted, so I may have missed
something too.

> > 
> > Cc: Youzhong Yang <youzhong@gmail.com>
> > Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > This is replacement for PATCH 1/3 in the series I sent yesterday. I
> > think it makes sense to just eliminate this case.
> > ---
> >  fs/nfsd/filecache.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index f84913691b78..b9dc7c22242c 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	if (likely(ret == 0))
> >  		goto open_file;
> >  
> > -	if (ret == -EEXIST)
> > -		goto retry;
> >  	trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
> >  	status = nfserr_jukebox;
> >  	goto construction_err;
> > 
> > ---
> > base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
> > change-id: 20240711-nfsd-next-c9d17f66e2bd
> > 
> > Best regards,
> > -- 
> > Jeff Layton <jlayton@kernel.org>
> > 
> > 
> 

-- 
Jeff Layton <jlayton@kernel.org>

Re: [PATCH] nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire

Posted by Chuck Lever 1 year, 6 months ago

On Mon, Jul 15, 2024 at 08:25:53AM -0400, Jeff Layton wrote:
> On Mon, 2024-07-15 at 10:27 +1000, NeilBrown wrote:
> > On Fri, 12 Jul 2024, Jeff Layton wrote:
> > > Given that we do the search and insertion while holding the i_lock, I
> > > don't think it's possible for us to get EEXIST here. Remove this case.
> > 
> > I was going to comment that as rhltable_insert() cannot return -EEXIST
> > that is an extra reason to discard the check.  But then I looked at the
> > code an I cannot convince myself that it cannot.
> > If __rhashtable_insert_fast() finds that tbl->future_tbl is not NULL it
> > calls rhashtable_insert_slow(), and that seems to fail if the key
> > already exists.  But it shouldn't for an rhltable, it should just add
> > the new item to the linked list for that key.
> > 
> > It looks like this has always been broken: adding to an rhltable during
> > a resize event can cause EEXIST....
> > 
> > Would anyone like to check my work?  I'm surprise that hasn't been
> > noticed if it is really the case.
> > 
> > 
> 
> I don't know this code well at all, but it looks correct to me:
> 
> static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
>                                    struct rhash_head *obj)
> {
>         struct bucket_table *new_tbl;
>         struct bucket_table *tbl;
>         struct rhash_lock_head __rcu **bkt;
>         unsigned long flags;
>         unsigned int hash;
>         void *data;
> 
>         new_tbl = rcu_dereference(ht->tbl);
> 
>         do {
>                 tbl = new_tbl;
>                 hash = rht_head_hashfn(ht, tbl, obj, ht->p);
>                 if (rcu_access_pointer(tbl->future_tbl))
>                         /* Failure is OK */
>                         bkt = rht_bucket_var(tbl, hash);
>                 else
>                         bkt = rht_bucket_insert(ht, tbl, hash);
>                 if (bkt == NULL) {
>                         new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
>                         data = ERR_PTR(-EAGAIN);
>                 } else {
>                         flags = rht_lock(tbl, bkt);
>                         data = rhashtable_lookup_one(ht, bkt, tbl,
>                                                      hash, key, obj);
>                         new_tbl = rhashtable_insert_one(ht, bkt, tbl,
>                                                         hash, obj, data);
>                         if (PTR_ERR(new_tbl) != -EEXIST)
>                                 data = ERR_CAST(new_tbl);
> 
>                         rht_unlock(tbl, bkt, flags);
>                 }
>         } while (!IS_ERR_OR_NULL(new_tbl));
> 
>         if (PTR_ERR(data) == -EAGAIN)
>                 data = ERR_PTR(rhashtable_insert_rehash(ht, tbl) ?:
>                                -EAGAIN);
> 
>         return data;
> }
> 
> I'm assuming the part we need to worry about is where
> rhashtable_insert_one returns -EEXIST.
> 
> It holds the rht_lock across the lookup and insert though. So if
> rhashtable_insert_one returns -EEXIST, then "data" must be something
> valid. In that case, "data" won't be overwritten and it will fall
> through and return the pointer to the entry already there.
> 
> That said, this logic is really convoluted, so I may have missed
> something too.

This is the issue I was concerned about after my review: it's
obvious that the rhtable API can return -EEXIST, but it's just
really hard to tell whether the rh/l/table API will ever return
-EEXIST.

As Neil says, the rhtable "hash table full" case should not happen
with rhltable. But can we prove that?

If we are not yet confident, then maybe PATCH 1/3 should replace
the "if (ret == -EEXIST)" with "WARN_ON(ret == -EEXIST)"...? It's
also possible to ask the human(s) who constructed the rhltable
code. :-)


> > > Cc: Youzhong Yang <youzhong@gmail.com>
> > > Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > > This is replacement for PATCH 1/3 in the series I sent yesterday. I
> > > think it makes sense to just eliminate this case.
> > > ---
> > >  fs/nfsd/filecache.c | 2 --
> > >  1 file changed, 2 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > index f84913691b78..b9dc7c22242c 100644
> > > --- a/fs/nfsd/filecache.c
> > > +++ b/fs/nfsd/filecache.c
> > > @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > >  	if (likely(ret == 0))
> > >  		goto open_file;
> > >  
> > > -	if (ret == -EEXIST)
> > > -		goto retry;
> > >  	trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
> > >  	status = nfserr_jukebox;
> > >  	goto construction_err;
> > > 
> > > ---
> > > base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
> > > change-id: 20240711-nfsd-next-c9d17f66e2bd
> > > 
> > > Best regards,
> > > -- 
> > > Jeff Layton <jlayton@kernel.org>
> > > 
> > > 
> > 
> 
> -- 
> Jeff Layton <jlayton@kernel.org>

-- 
Chuck Lever

Re: [PATCH] nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire

Posted by Youzhong Yang 1 year, 6 months ago

How is this going? any chance to move forward and deal with the EEXIST
case in a future patch? I see no harm in keeping the EEXIST check.

On Mon, Jul 15, 2024 at 10:06 AM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> On Mon, Jul 15, 2024 at 08:25:53AM -0400, Jeff Layton wrote:
> > On Mon, 2024-07-15 at 10:27 +1000, NeilBrown wrote:
> > > On Fri, 12 Jul 2024, Jeff Layton wrote:
> > > > Given that we do the search and insertion while holding the i_lock, I
> > > > don't think it's possible for us to get EEXIST here. Remove this case.
> > >
> > > I was going to comment that as rhltable_insert() cannot return -EEXIST
> > > that is an extra reason to discard the check.  But then I looked at the
> > > code an I cannot convince myself that it cannot.
> > > If __rhashtable_insert_fast() finds that tbl->future_tbl is not NULL it
> > > calls rhashtable_insert_slow(), and that seems to fail if the key
> > > already exists.  But it shouldn't for an rhltable, it should just add
> > > the new item to the linked list for that key.
> > >
> > > It looks like this has always been broken: adding to an rhltable during
> > > a resize event can cause EEXIST....
> > >
> > > Would anyone like to check my work?  I'm surprise that hasn't been
> > > noticed if it is really the case.
> > >
> > >
> >
> > I don't know this code well at all, but it looks correct to me:
> >
> > static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
> >                                    struct rhash_head *obj)
> > {
> >         struct bucket_table *new_tbl;
> >         struct bucket_table *tbl;
> >         struct rhash_lock_head __rcu **bkt;
> >         unsigned long flags;
> >         unsigned int hash;
> >         void *data;
> >
> >         new_tbl = rcu_dereference(ht->tbl);
> >
> >         do {
> >                 tbl = new_tbl;
> >                 hash = rht_head_hashfn(ht, tbl, obj, ht->p);
> >                 if (rcu_access_pointer(tbl->future_tbl))
> >                         /* Failure is OK */
> >                         bkt = rht_bucket_var(tbl, hash);
> >                 else
> >                         bkt = rht_bucket_insert(ht, tbl, hash);
> >                 if (bkt == NULL) {
> >                         new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
> >                         data = ERR_PTR(-EAGAIN);
> >                 } else {
> >                         flags = rht_lock(tbl, bkt);
> >                         data = rhashtable_lookup_one(ht, bkt, tbl,
> >                                                      hash, key, obj);
> >                         new_tbl = rhashtable_insert_one(ht, bkt, tbl,
> >                                                         hash, obj, data);
> >                         if (PTR_ERR(new_tbl) != -EEXIST)
> >                                 data = ERR_CAST(new_tbl);
> >
> >                         rht_unlock(tbl, bkt, flags);
> >                 }
> >         } while (!IS_ERR_OR_NULL(new_tbl));
> >
> >         if (PTR_ERR(data) == -EAGAIN)
> >                 data = ERR_PTR(rhashtable_insert_rehash(ht, tbl) ?:
> >                                -EAGAIN);
> >
> >         return data;
> > }
> >
> > I'm assuming the part we need to worry about is where
> > rhashtable_insert_one returns -EEXIST.
> >
> > It holds the rht_lock across the lookup and insert though. So if
> > rhashtable_insert_one returns -EEXIST, then "data" must be something
> > valid. In that case, "data" won't be overwritten and it will fall
> > through and return the pointer to the entry already there.
> >
> > That said, this logic is really convoluted, so I may have missed
> > something too.
>
> This is the issue I was concerned about after my review: it's
> obvious that the rhtable API can return -EEXIST, but it's just
> really hard to tell whether the rh/l/table API will ever return
> -EEXIST.
>
> As Neil says, the rhtable "hash table full" case should not happen
> with rhltable. But can we prove that?
>
> If we are not yet confident, then maybe PATCH 1/3 should replace
> the "if (ret == -EEXIST)" with "WARN_ON(ret == -EEXIST)"...? It's
> also possible to ask the human(s) who constructed the rhltable
> code. :-)
>
>
> > > > Cc: Youzhong Yang <youzhong@gmail.com>
> > > > Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > > This is replacement for PATCH 1/3 in the series I sent yesterday. I
> > > > think it makes sense to just eliminate this case.
> > > > ---
> > > >  fs/nfsd/filecache.c | 2 --
> > > >  1 file changed, 2 deletions(-)
> > > >
> > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > > index f84913691b78..b9dc7c22242c 100644
> > > > --- a/fs/nfsd/filecache.c
> > > > +++ b/fs/nfsd/filecache.c
> > > > @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > >   if (likely(ret == 0))
> > > >           goto open_file;
> > > >
> > > > - if (ret == -EEXIST)
> > > > -         goto retry;
> > > >   trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
> > > >   status = nfserr_jukebox;
> > > >   goto construction_err;
> > > >
> > > > ---
> > > > base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
> > > > change-id: 20240711-nfsd-next-c9d17f66e2bd
> > > >
> > > > Best regards,
> > > > --
> > > > Jeff Layton <jlayton@kernel.org>
> > > >
> > > >
> > >
> >
> > --
> > Jeff Layton <jlayton@kernel.org>
>
> --
> Chuck Lever

Re: [PATCH] nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire

Posted by Chuck Lever 1 year, 7 months ago

On Thu, Jul 11, 2024 at 03:11:13PM -0400, Jeff Layton wrote:
> Given that we do the search and insertion while holding the i_lock, I
> don't think it's possible for us to get EEXIST here. Remove this case.
> 
> Cc: Youzhong Yang <youzhong@gmail.com>
> Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> This is replacement for PATCH 1/3 in the series I sent yesterday. I
> think it makes sense to just eliminate this case.
> ---
>  fs/nfsd/filecache.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index f84913691b78..b9dc7c22242c 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (likely(ret == 0))
>  		goto open_file;
>  
> -	if (ret == -EEXIST)
> -		goto retry;
>  	trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
>  	status = nfserr_jukebox;
>  	goto construction_err;
> 
> ---
> base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
> change-id: 20240711-nfsd-next-c9d17f66e2bd
> 
> Best regards,
> -- 
> Jeff Layton <jlayton@kernel.org>

Youzhong, can you replace 1/3 in Jeff's file cache series and
test again please?

-- 
Chuck Lever

Re: [PATCH] nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire

Posted by Youzhong Yang 1 year, 7 months ago

It's in progress, I will report back once it's done, most likely late
this afternoon. This time it will have much more nfs load on the
server.

On Fri, Jul 12, 2024 at 9:32 AM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> On Thu, Jul 11, 2024 at 03:11:13PM -0400, Jeff Layton wrote:
> > Given that we do the search and insertion while holding the i_lock, I
> > don't think it's possible for us to get EEXIST here. Remove this case.
> >
> > Cc: Youzhong Yang <youzhong@gmail.com>
> > Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > This is replacement for PATCH 1/3 in the series I sent yesterday. I
> > think it makes sense to just eliminate this case.
> > ---
> >  fs/nfsd/filecache.c | 2 --
> >  1 file changed, 2 deletions(-)
> >
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index f84913691b78..b9dc7c22242c 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >       if (likely(ret == 0))
> >               goto open_file;
> >
> > -     if (ret == -EEXIST)
> > -             goto retry;
> >       trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
> >       status = nfserr_jukebox;
> >       goto construction_err;
> >
> > ---
> > base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
> > change-id: 20240711-nfsd-next-c9d17f66e2bd
> >
> > Best regards,
> > --
> > Jeff Layton <jlayton@kernel.org>
>
> Youzhong, can you replace 1/3 in Jeff's file cache series and
> test again please?
>
> --
> Chuck Lever