[PATCH] sunrpc: start cache request seqno at 1 to fix netlink GET_REQS

Jeff Layton posted 1 patch 2 months ago
net/sunrpc/cache.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] sunrpc: start cache request seqno at 1 to fix netlink GET_REQS
Posted by Jeff Layton 2 months ago
sunrpc_cache_requests_snapshot() filters requests with
crq->seqno <= min_seqno. The min_seqno for the first netlink
dump call is cb->args[0] which is 0. Since next_seqno was
initialized to 0, the very first cache request got seqno=0
and was silently skipped by the snapshot (0 <= 0 is true).

This caused netlink-based GET_REQS to return 0 pending requests
even when a request was queued, preventing mountd from resolving
cache entries (particularly expkey/nfsd.fh). The unresolved
CACHE_PENDING state blocked all further notifications for the
entry, leading to permanent NFS4ERR_DELAY hangs.

Start next_seqno at 1 so all requests have seqno >= 1 and pass
the snapshot filter when min_seqno is 0.

Fixes: facc4e3c8042 ("sunrpc: split cache_detail queue into request and reader lists")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
I started hitting a persistent hang in mountd upcalls just after reboot,
and this turns out to be the cause. It's probably best to fold this into
the patch in the Fixes: line.
---
 net/sunrpc/cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index d477b19dbfa1..305c6e67f052 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -405,7 +405,7 @@ void sunrpc_init_cache_detail(struct cache_detail *cd)
 	INIT_LIST_HEAD(&cd->readers);
 	spin_lock_init(&cd->queue_lock);
 	init_waitqueue_head(&cd->queue_wait);
-	cd->next_seqno = 0;
+	cd->next_seqno = 1;
 	spin_lock(&cache_list_lock);
 	cd->nextcheck = 0;
 	cd->entries = 0;

---
base-commit: 68f3218e45ab644ed37d5020a4a25e523fc0e30e
change-id: 20260411-exportd-nl-2cd2e9d451bc

Best regards,
-- 
Jeff Layton <jlayton@kernel.org>
Re: [PATCH] sunrpc: start cache request seqno at 1 to fix netlink GET_REQS
Posted by Chuck Lever 2 months ago
From: Chuck Lever <chuck.lever@oracle.com>

On Sat, 11 Apr 2026 17:12:16 -0400, Jeff Layton wrote:
> sunrpc_cache_requests_snapshot() filters requests with
> crq->seqno <= min_seqno. The min_seqno for the first netlink
> dump call is cb->args[0] which is 0. Since next_seqno was
> initialized to 0, the very first cache request got seqno=0
> and was silently skipped by the snapshot (0 <= 0 is true).
> 
> This caused netlink-based GET_REQS to return 0 pending requests
> even when a request was queued, preventing mountd from resolving
> cache entries (particularly expkey/nfsd.fh). The unresolved
> CACHE_PENDING state blocked all further notifications for the
> entry, leading to permanent NFS4ERR_DELAY hangs.
> 
> [...]

Applied to nfsd-testing, thanks!

[1/1] sunrpc: start cache request seqno at 1 to fix netlink GET_REQS
      commit: 2dd2484661eb089e5eb2ba9da4b87decb0f3be36

--
Chuck Lever <chuck.lever@oracle.com>