nfs: propagate fileid changed errors back to syscall

[PATCH] nfs: propagate fileid changed errors back to syscall

Posted by Nikhil Jha 1 year, 2 months ago

Hello! This is the first kernel patch I have tried to upstream. I'm
following along with the kernel newbies guide but apologies if I got
anything wrong.

Currently, if there is a mismatch in the request and response fileids in
an NFS request, the kernel logs an error and attempts to return ESTALE.
However, this error is currently dropped before it makes it all the way
to userspace. This appears to be a mistake, since as far as I can tell
that ESTALE value is never consumed from anywhere.

Callstack for async NFS write, at time of error:

        nfs_update_inode <- returns -ESTALE
        nfs_refresh_inode_locked
        nfs_writeback_update_inode <- error is dropped here
        nfs3_write_done
        nfs_writeback_done
        nfs_pgio_result <- other errors are collected here
        rpc_exit_task
        __rpc_execute
        rpc_async_schedule
        process_one_work
        worker_thread
        kthread
        ret_from_fork

We ran into this issue ourselves, and seeing the -ESTALE in the kernel
source code but not from userspace was surprising.

I tested a rebased version of this patch on an el8 kernel (v6.1.114),
and it seems to correctly propagate this error.

>8------------------------------------------------------8<

If an NFS server returns a response with a different file id to the
response, the kernel currently prints out an error and attempts to
return -ESTALE. However, this -ESTALE value is never surfaced anywhere.

This patch modifies nfs_writeback_update_inode() to propagate these
errors up the call stack, and modifies nfs_pgio_result() to report
the resulting error.

Signed-off-by: Nikhil Jha <njha@janestreet.com>
---
 fs/nfs/filelayout/filelayout.c         | 2 +-
 fs/nfs/flexfilelayout/flexfilelayout.c | 2 +-
 fs/nfs/internal.h                      | 2 +-
 fs/nfs/nfs3proc.c                      | 2 +-
 fs/nfs/nfs4proc.c                      | 2 +-
 fs/nfs/pagelist.c                      | 5 ++++-
 fs/nfs/proc.c                          | 2 +-
 fs/nfs/write.c                         | 9 ++++++---
 8 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index d39a1f58e18d..4e80a13e9639 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -335,7 +335,7 @@ static int filelayout_write_done_cb(struct rpc_task *task,
 	/* zero out the fattr */
 	hdr->fattr.valid = 0;
 	if (task->tk_status >= 0)
-		nfs_writeback_update_inode(hdr);
+		return nfs_writeback_update_inode(hdr);
 
 	return 0;
 }
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index f78115c6c2c1..d15e3799a351 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1503,7 +1503,7 @@ static int ff_layout_write_done_cb(struct rpc_task *task,
 	/* zero out fattr since we don't care DS attr at all */
 	hdr->fattr.valid = 0;
 	if (task->tk_status >= 0)
-		nfs_writeback_update_inode(hdr);
+		return nfs_writeback_update_inode(hdr);
 
 	return 0;
 }
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index e564bd11ba60..5c4e2fa88324 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -592,7 +592,7 @@ void nfs_mark_request_commit(struct nfs_page *req,
 			     struct nfs_commit_info *cinfo,
 			     u32 ds_commit_idx);
 int nfs_write_need_commit(struct nfs_pgio_header *);
-void nfs_writeback_update_inode(struct nfs_pgio_header *hdr);
+int nfs_writeback_update_inode(struct nfs_pgio_header *hdr);
 int nfs_generic_commit_list(struct inode *inode, struct list_head *head,
 			    int how, struct nfs_commit_info *cinfo);
 void nfs_retry_commit(struct list_head *page_list,
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 1566163c6d85..42ddbc21fb05 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -887,7 +887,7 @@ static int nfs3_write_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
 	if (nfs3_async_handle_jukebox(task, inode))
 		return -EAGAIN;
 	if (task->tk_status >= 0)
-		nfs_writeback_update_inode(hdr);
+		return nfs_writeback_update_inode(hdr);
 	return 0;
 }
 
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 405f17e6e0b4..7ec372a1eb98 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5661,7 +5661,7 @@ static int nfs4_write_done_cb(struct rpc_task *task,
 	}
 	if (task->tk_status >= 0) {
 		renew_lease(NFS_SERVER(inode), hdr->timestamp);
-		nfs_writeback_update_inode(hdr);
+		return nfs_writeback_update_inode(hdr);
 	}
 	return 0;
 }
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index e27c07bd8929..19cb080653e3 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -854,9 +854,12 @@ static void nfs_pgio_result(struct rpc_task *task, void *calldata)
 {
 	struct nfs_pgio_header *hdr = calldata;
 	struct inode *inode = hdr->inode;
+	int status = hdr->rw_ops->rw_done(task, hdr, inode);
 
-	if (hdr->rw_ops->rw_done(task, hdr, inode) != 0)
+	if (status != 0) {
+		nfs_set_pgio_error(hdr, status, hdr->args.offset);
 		return;
+	}
 	if (task->tk_status < 0)
 		nfs_set_pgio_error(hdr, task->tk_status, hdr->args.offset);
 	else
diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
index 6c09cd090c34..72ffbdfc7ae6 100644
--- a/fs/nfs/proc.c
+++ b/fs/nfs/proc.c
@@ -629,7 +629,7 @@ static int nfs_write_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
 {
 	if (task->tk_status >= 0) {
 		hdr->res.count = hdr->args.count;
-		nfs_writeback_update_inode(hdr);
+		return nfs_writeback_update_inode(hdr);
 	}
 	return 0;
 }
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 50fa539611f5..151da29175fd 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1507,22 +1507,25 @@ static void nfs_writeback_check_extend(struct nfs_pgio_header *hdr,
 	fattr->valid |= NFS_ATTR_FATTR_SIZE;
 }
 
-void nfs_writeback_update_inode(struct nfs_pgio_header *hdr)
+int nfs_writeback_update_inode(struct nfs_pgio_header *hdr)
 {
 	struct nfs_fattr *fattr = &hdr->fattr;
 	struct inode *inode = hdr->inode;
+	int ret = 0;
 
 	if (nfs_have_delegated_mtime(inode)) {
 		spin_lock(&inode->i_lock);
 		nfs_set_cache_invalid(inode, NFS_INO_INVALID_BLOCKS);
 		spin_unlock(&inode->i_lock);
-		return;
+		return 0;
 	}
 
 	spin_lock(&inode->i_lock);
 	nfs_writeback_check_extend(hdr, fattr);
-	nfs_post_op_update_inode_force_wcc_locked(inode, fattr);
+	ret = nfs_post_op_update_inode_force_wcc_locked(inode, fattr);
 	spin_unlock(&inode->i_lock);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(nfs_writeback_update_inode);
 
-- 
2.39.3

Re: [PATCH] nfs: propagate fileid changed errors back to syscall

Posted by Benjamin Coddington 1 year, 2 months ago

On 9 Dec 2024, at 12:39, Nikhil Jha wrote:

> Hello! This is the first kernel patch I have tried to upstream. I'm
> following along with the kernel newbies guide but apologies if I got
> anything wrong.
>
> Currently, if there is a mismatch in the request and response fileids in
> an NFS request, the kernel logs an error and attempts to return ESTALE.
> However, this error is currently dropped before it makes it all the way
> to userspace. This appears to be a mistake, since as far as I can tell
> that ESTALE value is never consumed from anywhere.
>
> Callstack for async NFS write, at time of error:
>
>         nfs_update_inode <- returns -ESTALE
>         nfs_refresh_inode_locked
>         nfs_writeback_update_inode <- error is dropped here
>         nfs3_write_done
>         nfs_writeback_done
>         nfs_pgio_result <- other errors are collected here
>         rpc_exit_task
>         __rpc_execute
>         rpc_async_schedule
>         process_one_work
>         worker_thread
>         kthread
>         ret_from_fork
>
> We ran into this issue ourselves, and seeing the -ESTALE in the kernel
> source code but not from userspace was surprising.
>
> I tested a rebased version of this patch on an el8 kernel (v6.1.114),
> and it seems to correctly propagate this error.
>
>> 8------------------------------------------------------8<
>
> If an NFS server returns a response with a different file id to the
> response, the kernel currently prints out an error and attempts to
> return -ESTALE. However, this -ESTALE value is never surfaced anywhere.

Hi Nikhil Jha,

Will this cause us to return -ESTALE to the application even if the WRITE
was successful?

Ben

Re: [PATCH] nfs: propagate fileid changed errors back to syscall

Posted by Nikhil Jha 1 year, 1 month ago

On Tue, Dec 10, 2024 at 07:11:43AM -0500, Benjamin Coddington wrote:
> On 9 Dec 2024, at 12:39, Nikhil Jha wrote:
> 
> > Hello! This is the first kernel patch I have tried to upstream. I'm
> > following along with the kernel newbies guide but apologies if I got
> > anything wrong.
> >
> > Currently, if there is a mismatch in the request and response fileids in
> > an NFS request, the kernel logs an error and attempts to return ESTALE.
> > However, this error is currently dropped before it makes it all the way
> > to userspace. This appears to be a mistake, since as far as I can tell
> > that ESTALE value is never consumed from anywhere.
> >
> > Callstack for async NFS write, at time of error:
> >
> >         nfs_update_inode <- returns -ESTALE
> >         nfs_refresh_inode_locked
> >         nfs_writeback_update_inode <- error is dropped here
> >         nfs3_write_done
> >         nfs_writeback_done
> >         nfs_pgio_result <- other errors are collected here
> >         rpc_exit_task
> >         __rpc_execute
> >         rpc_async_schedule
> >         process_one_work
> >         worker_thread
> >         kthread
> >         ret_from_fork
> >
> > We ran into this issue ourselves, and seeing the -ESTALE in the kernel
> > source code but not from userspace was surprising.
> >
> > I tested a rebased version of this patch on an el8 kernel (v6.1.114),
> > and it seems to correctly propagate this error.
> >
> >> 8------------------------------------------------------8<
> >
> > If an NFS server returns a response with a different file id to the
> > response, the kernel currently prints out an error and attempts to
> > return -ESTALE. However, this -ESTALE value is never surfaced anywhere.
> 
> Hi Nikhil Jha,
> 
> Will this cause us to return -ESTALE to the application even if the WRITE
> was successful?
> 
> Ben
> 

Hi Ben,

Hmm.. I'm not sure how to answer that question exactly. A fileid is only
mismatched between a request RPC and response RPC when something really
weird is going on (i.e. a bug in the NFS server), so it's hard to reason
about if a WRITE was successful or not.

The `return -ESTALE` was already in the kernel code, but this particular
codepath seems to have accidentally dropped that error.

Nikhil

Re: [PATCH] nfs: propagate fileid changed errors back to syscall

Posted by Trond Myklebust 1 year, 1 month ago

On Tue, 2024-12-10 at 12:12 -0500, Nikhil Jha wrote:
> On Tue, Dec 10, 2024 at 07:11:43AM -0500, Benjamin Coddington wrote:
> > On 9 Dec 2024, at 12:39, Nikhil Jha wrote:
> > 
> > > Hello! This is the first kernel patch I have tried to upstream.
> > > I'm
> > > following along with the kernel newbies guide but apologies if I
> > > got
> > > anything wrong.
> > > 
> > > Currently, if there is a mismatch in the request and response
> > > fileids in
> > > an NFS request, the kernel logs an error and attempts to return
> > > ESTALE.
> > > However, this error is currently dropped before it makes it all
> > > the way
> > > to userspace. This appears to be a mistake, since as far as I can
> > > tell
> > > that ESTALE value is never consumed from anywhere.
> > > 
> > > Callstack for async NFS write, at time of error:
> > > 
> > >         nfs_update_inode <- returns -ESTALE
> > >         nfs_refresh_inode_locked
> > >         nfs_writeback_update_inode <- error is dropped here
> > >         nfs3_write_done
> > >         nfs_writeback_done
> > >         nfs_pgio_result <- other errors are collected here
> > >         rpc_exit_task
> > >         __rpc_execute
> > >         rpc_async_schedule
> > >         process_one_work
> > >         worker_thread
> > >         kthread
> > >         ret_from_fork
> > > 
> > > We ran into this issue ourselves, and seeing the -ESTALE in the
> > > kernel
> > > source code but not from userspace was surprising.
> > > 
> > > I tested a rebased version of this patch on an el8 kernel
> > > (v6.1.114),
> > > and it seems to correctly propagate this error.
> > > 
> > > > 8------------------------------------------------------8<
> > > 
> > > If an NFS server returns a response with a different file id to
> > > the
> > > response, the kernel currently prints out an error and attempts
> > > to
> > > return -ESTALE. However, this -ESTALE value is never surfaced
> > > anywhere.
> > 
> > Hi Nikhil Jha,
> > 
> > Will this cause us to return -ESTALE to the application even if the
> > WRITE
> > was successful?
> > 
> > Ben
> > 
> 
> Hi Ben,
> 
> Hmm.. I'm not sure how to answer that question exactly. A fileid is
> only
> mismatched between a request RPC and response RPC when something
> really
> weird is going on (i.e. a bug in the NFS server), so it's hard to
> reason
> about if a WRITE was successful or not.
> 
> The `return -ESTALE` was already in the kernel code, but this
> particular
> codepath seems to have accidentally dropped that error.
> 
> Nikhil
> 

I'm not keen on taking any client changes to paper over what appears to
be a major server bug. We simply can't support broken servers that
reuse filehandles across files.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com

Re: [PATCH] nfs: propagate fileid changed errors back to syscall

Posted by Nikhil Jha 1 year, 1 month ago

On Tue, Dec 10, 2024 at 05:33:14PM +0000, Trond Myklebust wrote:
> On Tue, 2024-12-10 at 12:12 -0500, Nikhil Jha wrote:
> > On Tue, Dec 10, 2024 at 07:11:43AM -0500, Benjamin Coddington wrote:
> > > On 9 Dec 2024, at 12:39, Nikhil Jha wrote:
> > > 
> > > > Hello! This is the first kernel patch I have tried to upstream.
> > > > I'm
> > > > following along with the kernel newbies guide but apologies if I
> > > > got
> > > > anything wrong.
> > > > 
> > > > Currently, if there is a mismatch in the request and response
> > > > fileids in
> > > > an NFS request, the kernel logs an error and attempts to return
> > > > ESTALE.
> > > > However, this error is currently dropped before it makes it all
> > > > the way
> > > > to userspace. This appears to be a mistake, since as far as I can
> > > > tell
> > > > that ESTALE value is never consumed from anywhere.
> > > > 
> > > > Callstack for async NFS write, at time of error:
> > > > 
> > > >         nfs_update_inode <- returns -ESTALE
> > > >         nfs_refresh_inode_locked
> > > >         nfs_writeback_update_inode <- error is dropped here
> > > >         nfs3_write_done
> > > >         nfs_writeback_done
> > > >         nfs_pgio_result <- other errors are collected here
> > > >         rpc_exit_task
> > > >         __rpc_execute
> > > >         rpc_async_schedule
> > > >         process_one_work
> > > >         worker_thread
> > > >         kthread
> > > >         ret_from_fork
> > > > 
> > > > We ran into this issue ourselves, and seeing the -ESTALE in the
> > > > kernel
> > > > source code but not from userspace was surprising.
> > > > 
> > > > I tested a rebased version of this patch on an el8 kernel
> > > > (v6.1.114),
> > > > and it seems to correctly propagate this error.
> > > > 
> > > > > 8------------------------------------------------------8<
> > > > 
> > > > If an NFS server returns a response with a different file id to
> > > > the
> > > > response, the kernel currently prints out an error and attempts
> > > > to
> > > > return -ESTALE. However, this -ESTALE value is never surfaced
> > > > anywhere.
> > > 
> > > Hi Nikhil Jha,
> > > 
> > > Will this cause us to return -ESTALE to the application even if the
> > > WRITE
> > > was successful?
> > > 
> > > Ben
> > > 
> > 
> > Hi Ben,
> > 
> > Hmm.. I'm not sure how to answer that question exactly. A fileid is
> > only
> > mismatched between a request RPC and response RPC when something
> > really
> > weird is going on (i.e. a bug in the NFS server), so it's hard to
> > reason
> > about if a WRITE was successful or not.
> > 
> > The `return -ESTALE` was already in the kernel code, but this
> > particular
> > codepath seems to have accidentally dropped that error.
> > 
> > Nikhil
> > 
> 
> I'm not keen on taking any client changes to paper over what appears to
> be a major server bug. We simply can't support broken servers that
> reuse filehandles across files.
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 

Right, this patch *doesn't* alleviate any server bug from the client. I
don't think it would make sense to try to fix a buggy server from the
Linux NFS client. What I'm thinking about is...

1. We're already detecting this problem, printing an error message, and 
   trying to return ESTALE.
2. That ESTALE is not propagated anywhere. It just gets dropped.

If we don't want this ESTALE to be visible, should it just get removed
entirely? The value does not appear to be consumed from anywhere in the
kernel code.