From nobody Thu Apr 9 03:28:42 2026 Received: from canpmsgout03.his.huawei.com (canpmsgout03.his.huawei.com [113.46.200.218]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A5B930C35F; Wed, 11 Mar 2026 08:23:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.218 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773217400; cv=none; b=FaaQ1LzzV7mhx/MgbLNvXdc944iQB7KQ+LEux8EMNXtO7ZfZmfK3Bhf20FRS/7+nig6Lcjt5n+aRzT/cgseQBJGbcvfThZ5zj6adYeiubsaQCM22ynM29SP7NkeFl3G1AtkZznDmFkWGASsc5OeuOVFU4DMAPHHjt6UcpPB/E6w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773217400; c=relaxed/simple; bh=4eSZyoFAwEUMZ81ixtbBeaHswzP34BePZ92kjzklOBY=; h=Message-ID:Date:MIME-Version:From:Subject:To:CC:Content-Type; b=S2W33VaYe6p2KrA+YgA816pTEVq5BzU+swLfi+HIsl9PCbKU9NGtozDxar5g8eIV3PJrh0PzNAafCmy168gzmnjleS/TNsxf1lyMx1bUqzxReKy5yhD8jwGP+6ueGRkKWBf9PgmbLwJmanzllNGRk6syE2+skC10jx9RWmVK2jY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=QMM2M6Vv; arc=none smtp.client-ip=113.46.200.218 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="QMM2M6Vv" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=nG6kOv9RugoOWjX5D9vaNFJbG9fivW5371xa0ifivZ4=; b=QMM2M6VvaZjizR1KDlpy5SFI3tPBCfltBW2OVwknTOPPithwWhegG4iUQQYWZOXxxJXeoX8i5 2rjOkx1F1bgIaIVA0f9nK5FTc6rSpv/Y/ZHaNflTYlb6X45H7wszdJ2oDaEhztz/oelHaqoSDbH BoeNZoZUWEQcAYjpfIuBTj8= Received: from mail.maildlp.com (unknown [172.19.162.140]) by canpmsgout03.his.huawei.com (SkyGuard) with ESMTPS id 4fW3XW260YzpStF; Wed, 11 Mar 2026 16:18:03 +0800 (CST) Received: from kwepemj200013.china.huawei.com (unknown [7.202.194.25]) by mail.maildlp.com (Postfix) with ESMTPS id BBE012022B; Wed, 11 Mar 2026 16:23:14 +0800 (CST) Received: from [10.174.179.155] (10.174.179.155) by kwepemj200013.china.huawei.com (7.202.194.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 11 Mar 2026 16:23:13 +0800 Message-ID: <55da00d4-a656-4ed2-ae57-7f881297a1b2@huawei.com> Date: Wed, 11 Mar 2026 16:23:13 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: =?UTF-8?B?TW96aWxsYSBUaHVuZGVyYmlyZCDmtYvor5XniYg=?= From: Li Lingfeng Subject: [BUG] Server returns nfserr_grace causing client infinite loop To: Trond Myklebust , Anna Schumaker , Chuck Lever , Jeff Layton , NeilBrown , Olga Kornievskaia , Dai Ngo , Tom Talpey CC: "linux-nfs@vger.kernel.org" , "linux-kernel@vger.kernel.org" , yangerkun , "zhangyi (F)" , Hou Tao , "chengzhihao1@huawei.com" , "zhangjian (CG)" , Li Lingfeng Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems200002.china.huawei.com (7.221.188.68) To kwepemj200013.china.huawei.com (7.202.194.25) We recently encountered an issue where the NFS client's state manager gets stuck in an infinite loop, making the client unresponsive to user operations. The problem occurs when the server returns nfserr_grace during open reclaim. Stack trace from the client: // client nfs4_state_manager =C2=A0nfs4_do_reclaim // NFS4CLNT_RECLAIM_NOGRACE =C2=A0 nfs4_reclaim_open_state =C2=A0 =C2=A0__nfs4_reclaim_open_state =C2=A0 =C2=A0 nfs41_open_expired // ops->recover_open =C2=A0 =C2=A0 =C2=A0nfs4_open_expired =C2=A0 =C2=A0 =C2=A0 nfs4_do_open_expired =C2=A0 =C2=A0 =C2=A0 =C2=A0_nfs4_open_expired // gets NFS4ERR_GRACE and re= tries On the server side, nfsd4_open returns nfserr_grace because: 1. The session exists 2. The NFSD4_CLIENT_RECLAIM_COMPLETE flag is not set 3. The op_claim_type is not NFS4_OPEN_CLAIM_PREVIOUS Steps to reproduce: 1. Normal mount on client =C2=A0 =C2=A0On server: =C2=A0 =C2=A0mkfs.ext4 -F /dev/sdb =C2=A0 =C2=A0mount /dev/sdb /mnt/sdb =C2=A0 =C2=A0echo "/mnt *(rw,no_root_squash,fsid=3D0)" > /etc/exports =C2=A0 =C2=A0echo "/mnt/sdb *(rw,no_root_squash,fsid=3D1)" >> /etc/exports =C2=A0 =C2=A0systemctl restart nfs-server =C2=A0 =C2=A0echo 123 > /mnt/sdb/testfile =C2=A0 =C2=A0On client: =C2=A0 =C2=A0mount -t nfs -o rw 192.168.122.251:/sdb /mnt/sdbb 2. Client opens a file and prepares a delay before entering the NFS4CLNT_RECLAIM_NOGRACE branch in the state manager =C2=A0 =C2=A0exec 100>/mnt/sdbb/testfile =C2=A0 =C2=A0rpcdebug -m nfs -s proc 3. Change hostname on server =C2=A0 =C2=A0hostname server-nfs 4. Restart NFS service on server =C2=A0 =C2=A0systemctl restart nfs-server 5. Wait for client to set NFS4CLNT_RECLAIM_NOGRACE and enter the delay before the NFS4CLNT_RECLAIM_NOGRACE branch in the state manager 6. Enable delay for force_expire_client on server =C2=A0 =C2=A0rpcdebug -m nfsd -s proc 7. Trigger client expiration on server (stop at the delay point) =C2=A0 =C2=A0echo "expire" > /proc/fs/nfsd/clients/4/ctl & 8. Enable delay for the NFS4CLNT_LEASE_EXPIRED branch on client, and disable the delay for the NFS4CLNT_RECLAIM_NOGRACE branch =C2=A0 =C2=A0rpcdebug -m nfs -s xdr =C2=A0 =C2=A0rpcdebug -m nfs -c proc 9. Client state now has flags NFS4CLNT_LEASE_EXPIRED, NFS4CLNT_RECLAIM_NOGRACE, and NFS4CLNT_MANAGER_RUNNING, and is stopped at the delay point in the NFS4CLNT_LEASE_EXPIRED branch 10. Disable delay on server =C2=A0 =C2=A0 rpcdebug -m nfsd -c proc 11. Disable delay on client =C2=A0 =C2=A0 rpcdebug -m nfs -c xdr 12. Client state manager enters an infinite loop in the NFS4CLNT_RECLAIM_NOGRACE branch [root@nfs-client1 ~]# cat /proc/779/stack [<0>] nfs4_handle_exception+0x245/0x600 [<0>] nfs4_do_open_expired+0x2c8/0x4e0 [<0>] nfs4_open_expired+0x31/0x90 [<0>] nfs41_open_expired+0x18b/0x290 [<0>] __nfs4_reclaim_open_state+0x4f/0x330 [<0>] nfs4_reclaim_open_state+0x1e9/0x530 [<0>] nfs4_do_reclaim+0x2a9/0x470 [<0>] nfs4_state_manager+0x1644/0x17f0 [<0>] nfs4_run_state_manager+0x1cc/0x490 [<0>] kthread+0x327/0x410 [<0>] ret_from_fork+0x360/0x6c0 [<0>] ret_from_fork_asm+0x1a/0x30 [root@nfs-client1 ~]# base: Linux 7.0-rc3 master 1f318b96cc84d7c2ab792fcc0bfd42a7ca890681 diff: diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 305a772e5497..5d0b1eef5d9b 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1315,6 +1315,7 @@ int nfs4_state_mark_reclaim_nograce(struct=20 nfs_client *clp, struct nfs4_state *s =C2=A0 =C2=A0 =C2=A0 =C2=A0 clear_bit(NFS_STATE_RECLAIM_REBOOT, &state->fl= ags); =C2=A0 =C2=A0 =C2=A0 =C2=A0 set_bit(NFS_OWNER_RECLAIM_NOGRACE, &state->own= er->so_flags); =C2=A0 =C2=A0 =C2=A0 =C2=A0 set_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_sta= te); +=C2=A0 =C2=A0 =C2=A0 =C2=A0printk("%s set NFS4CLNT_RECLAIM_NOGRACE for clp= %px\n",=20 __func__, clp); =C2=A0 =C2=A0 =C2=A0 =C2=A0 return 1; =C2=A0} @@ -1814,6 +1815,7 @@ static int nfs4_recovery_handle_error(struct=20 nfs_client *clp, int error) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 break; =C2=A0 =C2=A0 =C2=A0 =C2=A0 case -NFS4ERR_EXPIRED: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 set_bit(NFS4CLNT_L= EASE_EXPIRED, &clp->cl_state); +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0printk("%s set NFS4= CLNT_LEASE_EXPIRED for clp %px\n",=20 __func__, clp); =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 nfs4_state_start_r= eclaim_nograce(clp); =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 break; =C2=A0 =C2=A0 =C2=A0 =C2=A0 case -NFS4ERR_BADSESSION: @@ -2540,6 +2542,14 @@ static void nfs4_state_manager(struct nfs_client=20 *clp) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (test_bit(NFS4C= LNT_LEASE_EXPIRED, &clp->cl_state)) { =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 section =3D "lease expired"; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0while (1) { +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ifdebug(XDR) { +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0printk("%= s sleep before lease=20 expired\n", __func__); +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0msleep(5 = * 1000); +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0continue; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0break; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 /* We're going to have to re-establish a=20 clientid */ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 status =3D nfs4_reclaim_lease(clp); =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 if (status < 0) @@ -2616,9 +2626,18 @@ static void nfs4_state_manager(struct nfs_client=20 *clp) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* Now recover exp= ired state... */ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (test_bit(NFS4C= LNT_RECLAIM_NOGRACE, &clp->cl_state)) { +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0while (1) { +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ifdebug(PROC) { +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0printk("%= s sleep before deal=20 NFS4CLNT_RECLAIM_NOGRACE\n", __func__); +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0msleep(5 = * 1000); +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0continue; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0break; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 section =3D "reclaim nograce"; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 status =3D nfs4_do_reclaim(clp, clp->cl_mvops->nograce_recovery_ops); +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0printk("%s nograce reclaim status %d=20 clp->cl_state 0x%lx\n", __func__, status, clp->cl_state); =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 if (status =3D=3D -EAGAIN) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 continue; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 if (status < 0) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 6b9c399b89df..203f1d7c6c5f 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3146,6 +3146,14 @@ static void force_expire_client(struct=20 nfs4_client *clp) =C2=A0 =C2=A0 =C2=A0 =C2=A0 clp->cl_time =3D 0; =C2=A0 =C2=A0 =C2=A0 =C2=A0 spin_unlock(&nn->client_lock); +=C2=A0 =C2=A0 =C2=A0 =C2=A0while (1) { +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ifdebug(PROC) { +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0printk("%s sleep before destroy session\n",=20 __func__); +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0msleep(5 * 1000); +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0continue; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0break; +=C2=A0 =C2=A0 =C2=A0 =C2=A0} =C2=A0 =C2=A0 =C2=A0 =C2=A0 wait_event(expiry_wq, atomic_read(&clp->cl_rpc= _users) =3D=3D 0); =C2=A0 =C2=A0 =C2=A0 =C2=A0 spin_lock(&nn->client_lock); =C2=A0 =C2=A0 =C2=A0 =C2=A0 already_expired =3D list_empty(&clp->cl_lru); From the server's perspective, returning nfserr_grace is reasonable when no RECLAIM_COMPLETE request has set the NFSD4_CLIENT_RECLAIM_COMPLETE flag. However, I suspect the loss of the NFSD4_CLIENT_RECLAIM_COMPLETE flag is related to the server-side "expire" write. Therefore, I'm unsure whether this issue should be attributed to the server or the client. Please let me know if you need any further information or testing. Thanks, Lingfeng.