From nobody Fri Apr 17 02:59:45 2026 Received: from n169-111.mail.139.com (n169-111.mail.139.com [120.232.169.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC63B24E4A8; Tue, 24 Feb 2026 07:22:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=120.232.169.111 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771917729; cv=none; b=iREUffL8JKJxTid4lyRCgLnUGonqdJ8dqQx3RrylizwdkUcnr9Hc439/OFsQUIRqgT/dhWlO728Mv+bivtpoQhxZ/ReiJlwHEARuClafStu7yM5YmYyGtYcW92PP/c6Cmm5kDdl8Bc+yBmoyCZl20Tpv3Q0xVuwxPZF0PJxkc9w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771917729; c=relaxed/simple; bh=hhuDUjEPGphqCg9CRGJxIhbOh/Xkc56H4lbpmiOM8sM=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=gC8wm5ZJSsCy3OPOtn1Bki4ciH2+jGkpboowz2U55x428UWvamiuFitMvLy8I6o4wXC4zh5rTBc9Sr9Y2LaSJOii90xjdvRQbq/7ifyNilgPOzb+AsHO+Bo3jliQCXnv+yuN5a4gONjbGPsGY7tRIrJay0VQjaLg2xG0VU4bD9E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=139.com; spf=pass smtp.mailfrom=139.com; dkim=pass (1024-bit key) header.d=139.com header.i=@139.com header.b=vssCnZvc; arc=none smtp.client-ip=120.232.169.111 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=139.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=139.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=139.com header.i=@139.com header.b="vssCnZvc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=139.com; s=dkim; l=0; h=from:subject:message-id:to:cc:mime-version; bh=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=; b=vssCnZvc4BsBodqdf7MnVWmKT17b9tlhNrv/Lr8mCZXe0q4oyE1p4fkN4QgGvdEkF0/YIpHzx07qC nxuMKFUX1cGbeASiKvq/nG3l4zjMg/Gb+Qc/GBw1bBofeTDXBhCbN2HKVScP8ZDjuBxg8HZmXFf44e tnCSMlY0ncWXQheA= X-RM-TagInfo: emlType=0 X-RM-SPAM: X-RM-SPAM-FLAG: 00000000 Received: from NTT-kernel-dev (unknown[60.247.85.88]) by rmsmtp-lg-appmail-14-12003 (RichMail) with SMTP id 2ee3699d519a731-0073a; Tue, 24 Feb 2026 15:22:03 +0800 (CST) X-RM-TRANSID: 2ee3699d519a731-0073a From: Li hongliang <1468888505@139.com> To: gregkh@linuxfoundation.org, stable@vger.kernel.org, trond.myklebust@hammerspace.com Cc: patches@lists.linux.dev, linux-kernel@vger.kernel.org, bcodding@hammerspace.com, anna@kernel.org, linux-nfs@vger.kernel.org, wangzhaolong@huaweicloud.com Subject: [PATCH 6.6.y] pNFS: Fix a deadlock when returning a delegation during open() Date: Tue, 24 Feb 2026 15:22:02 +0800 Message-Id: <20260224072202.2940831-1-1468888505@139.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Trond Myklebust [ Upstream commit 857bf9056291a16785ae3be1d291026b2437fc48 ] Ben Coddington reports seeing a hang in the following stack trace: 0 [ffffd0b50e1774e0] __schedule at ffffffff9ca05415 1 [ffffd0b50e177548] schedule at ffffffff9ca05717 2 [ffffd0b50e177558] bit_wait at ffffffff9ca061e1 3 [ffffd0b50e177568] __wait_on_bit at ffffffff9ca05cfb 4 [ffffd0b50e1775c8] out_of_line_wait_on_bit at ffffffff9ca05ea5 5 [ffffd0b50e177618] pnfs_roc at ffffffffc154207b [nfsv4] 6 [ffffd0b50e1776b8] _nfs4_proc_delegreturn at ffffffffc1506586 [nfsv4] 7 [ffffd0b50e177788] nfs4_proc_delegreturn at ffffffffc1507480 [nfsv4] 8 [ffffd0b50e1777f8] nfs_do_return_delegation at ffffffffc1523e41 [nfsv4] 9 [ffffd0b50e177838] nfs_inode_set_delegation at ffffffffc1524a75 [nfsv4] 10 [ffffd0b50e177888] nfs4_process_delegation at ffffffffc14f41dd [nfsv4] 11 [ffffd0b50e1778a0] _nfs4_opendata_to_nfs4_state at ffffffffc1503edf [nf= sv4] 12 [ffffd0b50e1778c0] _nfs4_open_and_get_state at ffffffffc1504e56 [nfsv4] 13 [ffffd0b50e177978] _nfs4_do_open at ffffffffc15051b8 [nfsv4] 14 [ffffd0b50e1779f8] nfs4_do_open at ffffffffc150559c [nfsv4] 15 [ffffd0b50e177a80] nfs4_atomic_open at ffffffffc15057fb [nfsv4] 16 [ffffd0b50e177ad0] nfs4_file_open at ffffffffc15219be [nfsv4] 17 [ffffd0b50e177b78] do_dentry_open at ffffffff9c09e6ea 18 [ffffd0b50e177ba8] vfs_open at ffffffff9c0a082e 19 [ffffd0b50e177bd0] dentry_open at ffffffff9c0a0935 The issue is that the delegreturn is being asked to wait for a layout return that cannot complete because a state recovery was initiated. The state recovery cannot complete until the open() finishes processing the delegations it was given. The solution is to propagate the existing flags that indicate a non-blocking call to the function pnfs_roc(), so that it knows not to wait in this situation. Reported-by: Benjamin Coddington Fixes: 29ade5db1293 ("pNFS: Wait on outstanding layoutreturns to complete i= n pnfs_roc()") Signed-off-by: Trond Myklebust [ Minor conflict resolved. ] Signed-off-by: Li hongliang <1468888505@139.com> --- fs/nfs/nfs4proc.c | 6 ++--- fs/nfs/pnfs.c | 58 +++++++++++++++++++++++++++++++++-------------- fs/nfs/pnfs.h | 17 ++++++-------- 3 files changed, 51 insertions(+), 30 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index fe6986939bc9..42fa7c915e29 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -3792,8 +3792,8 @@ int nfs4_do_close(struct nfs4_state *state, gfp_t gfp= _mask, int wait) calldata->res.seqid =3D calldata->arg.seqid; calldata->res.server =3D server; calldata->res.lr_ret =3D -NFS4ERR_NOMATCHING_LAYOUT; - calldata->lr.roc =3D pnfs_roc(state->inode, - &calldata->lr.arg, &calldata->lr.res, msg.rpc_cred); + calldata->lr.roc =3D pnfs_roc(state->inode, &calldata->lr.arg, + &calldata->lr.res, msg.rpc_cred, wait); if (calldata->lr.roc) { calldata->arg.lr_args =3D &calldata->lr.arg; calldata->res.lr_res =3D &calldata->lr.res; @@ -6742,7 +6742,7 @@ static int _nfs4_proc_delegreturn(struct inode *inode= , const struct cred *cred, data->inode =3D nfs_igrab_and_active(inode); if (data->inode || issync) { data->lr.roc =3D pnfs_roc(inode, &data->lr.arg, &data->lr.res, - cred); + cred, issync); if (data->lr.roc) { data->args.lr_args =3D &data->lr.arg; data->res.lr_res =3D &data->lr.res; diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index 0737d9a15d86..6dbfae295f76 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -1426,10 +1426,9 @@ pnfs_commit_and_return_layout(struct inode *inode) return ret; } =20 -bool pnfs_roc(struct inode *ino, - struct nfs4_layoutreturn_args *args, - struct nfs4_layoutreturn_res *res, - const struct cred *cred) +bool pnfs_roc(struct inode *ino, struct nfs4_layoutreturn_args *args, + struct nfs4_layoutreturn_res *res, const struct cred *cred, + bool sync) { struct nfs_inode *nfsi =3D NFS_I(ino); struct nfs_open_context *ctx; @@ -1440,7 +1439,7 @@ bool pnfs_roc(struct inode *ino, nfs4_stateid stateid; enum pnfs_iomode iomode =3D 0; bool layoutreturn =3D false, roc =3D false; - bool skip_read =3D false; + bool skip_read; =20 if (!nfs_have_layout(ino)) return false; @@ -1453,20 +1452,14 @@ bool pnfs_roc(struct inode *ino, lo =3D NULL; goto out_noroc; } - pnfs_get_layout_hdr(lo); - if (test_bit(NFS_LAYOUT_RETURN_LOCK, &lo->plh_flags)) { - spin_unlock(&ino->i_lock); - rcu_read_unlock(); - wait_on_bit(&lo->plh_flags, NFS_LAYOUT_RETURN, - TASK_UNINTERRUPTIBLE); - pnfs_put_layout_hdr(lo); - goto retry; - } =20 /* no roc if we hold a delegation */ + skip_read =3D false; if (nfs4_check_delegation(ino, FMODE_READ)) { - if (nfs4_check_delegation(ino, FMODE_WRITE)) + if (nfs4_check_delegation(ino, FMODE_WRITE)) { + lo =3D NULL; goto out_noroc; + } skip_read =3D true; } =20 @@ -1475,12 +1468,43 @@ bool pnfs_roc(struct inode *ino, if (state =3D=3D NULL) continue; /* Don't return layout if there is open file state */ - if (state->state & FMODE_WRITE) + if (state->state & FMODE_WRITE) { + lo =3D NULL; goto out_noroc; + } if (state->state & FMODE_READ) skip_read =3D true; } =20 + if (skip_read) { + bool writes =3D false; + + list_for_each_entry(lseg, &lo->plh_segs, pls_list) { + if (lseg->pls_range.iomode !=3D IOMODE_READ) { + writes =3D true; + break; + } + } + if (!writes) { + lo =3D NULL; + goto out_noroc; + } + } + + pnfs_get_layout_hdr(lo); + if (test_bit(NFS_LAYOUT_RETURN_LOCK, &lo->plh_flags)) { + if (!sync) { + pnfs_set_plh_return_info( + lo, skip_read ? IOMODE_RW : IOMODE_ANY, 0); + goto out_noroc; + } + spin_unlock(&ino->i_lock); + rcu_read_unlock(); + wait_on_bit(&lo->plh_flags, NFS_LAYOUT_RETURN, + TASK_UNINTERRUPTIBLE); + pnfs_put_layout_hdr(lo); + goto retry; + } =20 list_for_each_entry_safe(lseg, next, &lo->plh_segs, pls_list) { if (skip_read && lseg->pls_range.iomode =3D=3D IOMODE_READ) @@ -1520,7 +1544,7 @@ bool pnfs_roc(struct inode *ino, out_noroc: spin_unlock(&ino->i_lock); rcu_read_unlock(); - pnfs_layoutcommit_inode(ino, true); + pnfs_layoutcommit_inode(ino, sync); if (roc) { struct pnfs_layoutdriver_type *ld =3D NFS_SERVER(ino)->pnfs_curr_ld; if (ld->prepare_layoutreturn) diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h index 79996d7dad0f..a613466f6f22 100644 --- a/fs/nfs/pnfs.h +++ b/fs/nfs/pnfs.h @@ -295,10 +295,9 @@ int pnfs_mark_matching_lsegs_return(struct pnfs_layout= _hdr *lo, u32 seq); int pnfs_mark_layout_stateid_invalid(struct pnfs_layout_hdr *lo, struct list_head *lseg_list); -bool pnfs_roc(struct inode *ino, - struct nfs4_layoutreturn_args *args, - struct nfs4_layoutreturn_res *res, - const struct cred *cred); +bool pnfs_roc(struct inode *ino, struct nfs4_layoutreturn_args *args, + struct nfs4_layoutreturn_res *res, const struct cred *cred, + bool sync); int pnfs_roc_done(struct rpc_task *task, struct nfs4_layoutreturn_args **a= rgpp, struct nfs4_layoutreturn_res **respp, int *ret); void pnfs_roc_release(struct nfs4_layoutreturn_args *args, @@ -769,12 +768,10 @@ pnfs_layoutcommit_outstanding(struct inode *inode) return false; } =20 - -static inline bool -pnfs_roc(struct inode *ino, - struct nfs4_layoutreturn_args *args, - struct nfs4_layoutreturn_res *res, - const struct cred *cred) +static inline bool pnfs_roc(struct inode *ino, + struct nfs4_layoutreturn_args *args, + struct nfs4_layoutreturn_res *res, + const struct cred *cred, bool sync) { return false; } --=20 2.34.1