From nobody Tue Apr 7 01:02:00 2026 Received: from canpmsgout04.his.huawei.com (canpmsgout04.his.huawei.com [113.46.200.219]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0686578F4A; Tue, 17 Mar 2026 06:31:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.219 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773729121; cv=none; b=mbGysUy0XF8XvINyMi+bbbgNmPtbgkE1ehglnRz1dr7WfpaqMri053sEg7rVPFdVo9g6YobisLj7QioHdo1BiHqbz3p10eGNs3df8Pbsn3pjE8adUHNfP2m0TojbujeGGi1Oga2RruFLERDpsOm6ypA3thcCjp2d4Csyl8ezfgk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773729121; c=relaxed/simple; bh=ju96+bJmrc2VpckiecwxoQi78vr/xzdhIQQusWz9+xs=; h=Message-ID:Date:MIME-Version:To:CC:From:Subject:Content-Type; b=J3OcdUT9V6EgFzGc3It+04XuE6kAonyV53CV3dWxBSL6JjdOnXQZhvvUUId4DnUmmhFUfSdx3AsKCg1qR3QDa+BLgfcs4VyfLPgCehqCOsx09lzN7ZOAKEXR0Ikb2PvRsn/rj9fFtph1SNhtOsOYQ8rmPJX45frQVG3dQA/eJaM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=f8BBtctp; arc=none smtp.client-ip=113.46.200.219 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="f8BBtctp" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=5s8+WlE0XKAze0Ewc+sTllcMzzu/Ix1r4OTpAN2QuEs=; b=f8BBtctpS1FTJfRXBDV3+KozIaHXIdzPwXBqGZuj2AYO+2fVO/m5zAso4wDJKahhaAXgvurhA HsIJWWnXjYygR2GVhekXSMEmW9RK7udzrpFGV2NATSffLGAlHKoe3SSZy/NN7oJm5sc4TLOmH4m 5WgztSCejbPCBL2CykrtkQI= Received: from mail.maildlp.com (unknown [172.19.162.144]) by canpmsgout04.his.huawei.com (SkyGuard) with ESMTPS id 4fZhnV15rTz1prN0; Tue, 17 Mar 2026 14:26:54 +0800 (CST) Received: from kwepemj200013.china.huawei.com (unknown [7.202.194.25]) by mail.maildlp.com (Postfix) with ESMTPS id 41D0240538; Tue, 17 Mar 2026 14:31:55 +0800 (CST) Received: from [10.174.179.155] (10.174.179.155) by kwepemj200013.china.huawei.com (7.202.194.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 17 Mar 2026 14:31:54 +0800 Message-ID: <64fecc88-6b11-4fdf-a26f-271c4445be1a@huawei.com> Date: Tue, 17 Mar 2026 14:31:53 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: =?UTF-8?B?TW96aWxsYSBUaHVuZGVyYmlyZCDmtYvor5XniYg=?= To: Trond Myklebust , Anna Schumaker CC: "linux-nfs@vger.kernel.org" , "linux-kernel@vger.kernel.org" , yangerkun , "zhangyi (F)" , Hou Tao , "chengzhihao1@huawei.com" , "zhangjian (CG)" , Li Lingfeng From: Li Lingfeng Subject: [BUG]nfs_writepages may loop forever with -EBADF after state recovery failure Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems500001.china.huawei.com (7.221.188.70) To kwepemj200013.china.huawei.com (7.202.194.25) We have encountered an issue where the NFS client gets stuck in an infinite loop in nfs_writepages after a server restart and state recovery failure. This causes mount operations to hang because the superblock lock is held by the looping writeback process. Problem Description: When the NFS server is restarted, the client's state manager attempts to reclaim open files. If the server returns errors such as EROFS, EIO, or ESTALE during reclamation, the affected file's state is marked as bad (via nfs4_state_mark_open_context_bad). Subsequently, when the writeback work (wb_workfn) tries to flush dirty pages for that inode, nfs_writepages enters a loop because nfs_page_create returns -EBADF, and nfs_writepages does not treat -EBADF as a fatal error, so it retries indefinitely. The call chain is: nfs4_do_reclaim =C2=A0nfs4_reclaim_open_state =C2=A0 __nfs4_reclaim_open_state // get -ESTALE =C2=A0 =C2=A0nfs4_open_reclaim // ops->recover_open =C2=A0 =C2=A0 nfs4_do_open_reclaim =C2=A0 =C2=A0 =C2=A0_nfs4_do_open_reclaim =C2=A0 =C2=A0 =C2=A0 nfs4_open_recover =C2=A0 =C2=A0 =C2=A0 =C2=A0nfs4_open_recover_helper // return -ESTALE =C2=A0 =C2=A0 =C2=A0 =C2=A0 nfs4_opendata_to_nfs4_state =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0_nfs4_opendata_reclaim_to_nfs4_state =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 nfs_refresh_inode =C2=A0 nfs4_state_mark_recovery_failed =C2=A0 =C2=A0nfs4_state_mark_open_context_bad =C2=A0 =C2=A0 set_bit // NFS_CONTEXT_BAD wb_workfn =C2=A0wb_do_writeback =C2=A0 wb_writeback =C2=A0 =C2=A0writeback_sb_inodes =C2=A0 =C2=A0 __writeback_single_inode =C2=A0 =C2=A0 =C2=A0do_writepages =C2=A0 =C2=A0 =C2=A0 nfs_writepages // loop here =C2=A0 =C2=A0 =C2=A0 =C2=A0write_cache_pages =C2=A0 =C2=A0 =C2=A0 =C2=A0 nfs_writepages_callback =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nfs_do_writepage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 nfs_page_async_flush =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nfs_pageio_add_request =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 nfs_pageio_add_request_mirror =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__nfs_pageio_add_request =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 nfs_create_subreq =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nfs_page_create // = return -EBADF nfs_writepages retries the loop as long as the error is not fatal according to nfs_error_is_fatal(). Since -EBADF is not considered fatal, it keeps retrying forever. This prevents the superblock lock from being released, causing any concurrent mount operation to hang. Steps to Reproduce: We have a reliable reproducer on a recent kernel (Linux 7.0-rc4, commit 2d1373e4246da3b58e1df058374ed6b101804e07). 1) Prepare a server with an export: mkfs.ext4 -F /dev/sdb mount /dev/sdb /mnt/sdb echo "/mnt *(rw,no_root_squash,fsid=3D0)" > /etc/exports echo "/mnt/sdb *(rw,no_root_squash,fsid=3D1)" >> /etc/exports systemctl restart nfs-server dd if=3D/dev/random of=3D/mnt/sdb/testfile bs=3D1k count=3D4 oflag=3Ddirect 2) On the client, mount the export and start a writer that holds a file open and creates dirty pages: mount -t nfs -o rw,vers=3D4.1,rsize=3D1024,wsize=3D1024 127.0.0.1:/sdb /mnt= /sdbb Run the following Python script in one terminal: import os, time fd =3D os.open("/mnt/sdbb/testfile", os.O_CREAT|os.O_WRONLY|os.O_TRUNC, 0o6= 44) buf =3D b'A' * 4096 for i in range(1024):=C2=A0 # ~1GB =C2=A0 =C2=A0 os.write(fd, buf) print("dirty pages created, fd kept open, sleeping...") time.sleep(10**9) 3) In another terminal, restart the server and wipe the underlying filesystem to force ESTALE: systemctl stop nfs-server umount /dev/sdb mkfs.ext4 -F /dev/sdb mount /dev/sdb /mnt/sdb echo "/mnt *(rw,no_root_squash,fsid=3D0)" > /etc/exports echo "/mnt/sdb *(rw,no_root_squash,fsid=3D1)" >> /etc/exports systemctl restart nfs-server Temporary Workaround: We have applied the following patch to break the loop by treating -EBADF as fatal in nfs_writepages diff --git a/fs/nfs/write.c b/fs/nfs/write.c index dc57e67cefcd..0147f7a7a1a3 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -781,7 +781,7 @@ int nfs_writepages(struct address_space *mapping,=20 struct writeback_control *wbc) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &pgio); =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 pgio.pg_error =3D = 0; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 nfs_pageio_complet= e(&pgio); -=C2=A0 =C2=A0 =C2=A0 =C2=A0} while (err < 0 && !nfs_error_is_fatal(err)); +=C2=A0 =C2=A0 =C2=A0 =C2=A0} while (err < 0 && !nfs_error_is_fatal(err) &&= (err !=3D -EBADF)); =C2=A0 =C2=A0 =C2=A0 =C2=A0 nfs_io_completion_put(ioc); =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (err < 0) While the patch above avoids the hang, we wonder if a more comprehensive fix is needed. For instance, perhaps nfs_error_is_fatal() should include -EBADF in its fatal list, or the state manager should actively abort pending I/O for contexts marked bad. We are not sure whether -EBADF should always be considered fatal in writeback paths. We would appreciate your insights and any suggestions for a proper fix. Thanks, Lingfeng