fs/nfs/nfs4proc.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
From: Lei Yin <yinlei2@lenovo.com>
Handle -NFS4ERR_OLD_STATEID in nfs4_layoutcommit_done().
This issue was reproduced on NFSv4.2.
Without refreshing data->args.stateid, LAYOUTCOMMIT can keep retrying
with the same stale stateid after OLD_STATEID, resulting in an
unbounded retry loop.
Refresh the layout stateid with nfs4_layout_refresh_old_stateid()
and restart the RPC only after a successful refresh.
Changes since v1: update refreshed stateid in inode layout header.
Signed-off-by: Lei Yin <yinlei2@lenovo.com>
---
fs/nfs/nfs4proc.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 7225b4cfa6c2..575bf45a9209 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -9989,6 +9989,38 @@ nfs4_layoutcommit_done(struct rpc_task *task, void *calldata)
case -NFS4ERR_GRACE: /* loca_recalim always false */
task->tk_status = 0;
break;
+ case -NFS4ERR_OLD_STATEID: {
+ u32 old_seqid = be32_to_cpu(data->args.stateid.seqid);
+ struct pnfs_layout_range range = {
+ .iomode = IOMODE_ANY,
+ .offset = 0,
+ .length = NFS4_MAX_UINT64,
+ };
+
+ if (nfs4_layout_refresh_old_stateid(&data->args.stateid,
+ &range,
+ data->args.inode)) {
+ struct pnfs_layout_hdr *lo;
+
+ spin_lock(&data->args.inode->i_lock);
+ lo = NFS_I(data->args.inode)->layout;
+ if (lo && pnfs_layout_is_valid(lo) &&
+ nfs4_stateid_match_other(&data->args.stateid,
+ &lo->plh_stateid))
+ pnfs_set_layout_stateid(lo, &data->args.stateid,
+ NULL, false);
+ spin_unlock(&data->args.inode->i_lock);
+
+ dprintk("%s: refreshed OLD_STATEID inode %lu seq %u->%u\n",
+ __func__, data->args.inode->i_ino,
+ old_seqid,
+ be32_to_cpu(data->args.stateid.seqid));
+
+ rpc_restart_call_prepare(task);
+ return;
+ }
+ fallthrough;
+ }
case 0:
break;
default:
--
2.43.0
Hi, Sorry for the confusion in the previous submissions. Due to an editing mistake, the first two versions of this patch were not sent as one proper series. My patch "[PATCH v2] NFSv4.1/pNFS: fix LAYOUTCOMMIT retry loop on OLD_STATEID" was marked as Not Applicable. I would like to ask for clarification on the reason. This patch is intended to handle the case where LAYOUTCOMMIT gets NFS4ERR_OLD_STATEID in nfs4_layoutcommit_done(). The change refreshes data->args.stateid via nfs4_layout_refresh_old_stateid(), updates the layout stateid in the inode layout header when appropriate, and restarts the RPC only after the refresh succeeds. The purpose is to avoid retrying LAYOUTCOMMIT indefinitely with the same stale stateid after OLD_STATEID. The issue was reproduced on NFSv4.2. The most reliable way I found to reproduce it is: 1. Run a workload with relatively high concurrent I/O on the client. 2. Kill the client-side I/O process with kill -9 while those I/Os are still in flight. 3. In that situation, there is roughly a 50% chance that a subsequent LAYOUTCOMMIT is sent with an old stateid. 4. Since LAYOUTCOMMIT does not handle NFS4ERR_OLD_STATEID in this path, the same stale stateid may continue to be retried. 5. This can lead to an infinite retry loop, and the affected file then appears to become unresponsive. Using kill without -9 makes this problem much harder to reproduce. However, even without kill -9, the same issue can still occasionally be observed under sufficient concurrency and stress testing. So my understanding of the bug is: - kill -9 makes the stale stateid window much easier to hit; - ordinary concurrency/stress testing can still trigger it occasionally; - because LAYOUTCOMMIT does not recover from OLD_STATEID here, the RPC can loop indefinitely with the stale stateid; - once this happens, operations on the corresponding file may stop making progress. Could you please let me know whether the Not Applicable status means: 1. an equivalent fix is already present in the target tree, 2. the patch was sent against the wrong tree or branch, or 3. there is some issue with the problem analysis or the proposed fix? If needed, I can resend the patch against the appropriate branch or adjust the description accordingly. Thanks, Lei Yin
On Mon, Apr 27, 2026, at 12:52 AM, Lei Yin wrote: > Could you please let me know whether the Not Applicable status means: > > 1. an equivalent fix is already present in the target tree, > 2. the patch was sent against the wrong tree or branch, or > 3. there is some issue with the problem analysis or the proposed fix? > > If needed, I can resend the patch against the appropriate branch or adjust > the description accordingly. The linux-nfs patchworks is for NFSD only. "Not Applicable" means the patch is not an NFSD patch. -- Chuck Lever
© 2016 - 2026 Red Hat, Inc.