[PATCH v1 07/13] ceph: add timeout to caps wait in __ceph_get_caps()

Ionut Nechita (Wind River) posted 13 patches 3 weeks, 5 days ago
[PATCH v1 07/13] ceph: add timeout to caps wait in __ceph_get_caps()
Posted by Ionut Nechita (Wind River) 3 weeks, 5 days ago
From: Ionut Nechita <ionut.nechita@windriver.com>

When waiting for caps in __ceph_get_caps(), the code uses
wait_woken() with MAX_SCHEDULE_TIMEOUT, which can block
indefinitely if the MDS is unavailable or slow to grant caps
during reconnection.

This causes hung task warnings when MDS fails over:

  INFO: task dd:12345 blocked for more than 122 seconds.
  Call Trace:
    __ceph_get_caps+0x...
    ceph_write_iter+0x...

During MDS failover, caps may be revoked or delayed while the
client reconnects. Processes waiting for caps block indefinitely,
also holding i_rwsem which blocks other I/O operations on the
same inode, causing a cascade of blocked processes.

Fix this by using wait_woken() with mount_timeout instead of
MAX_SCHEDULE_TIMEOUT. On timeout, return -ETIMEDOUT to allow
the caller to handle the situation appropriately.

Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
---
 fs/ceph/caps.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index bed34fc11c919..c88e10a634e5c 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -3055,7 +3055,10 @@ int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi, int need,
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_fs_client(inode);
+	struct ceph_client *cl = fsc->client;
+	unsigned long timeout = ceph_timeout_jiffies(cl->options->mount_timeout);
 	int ret, _got, flags;
+	bool warned = false;
 
 	ret = ceph_pool_perm_check(inode, need);
 	if (ret < 0)
@@ -3104,7 +3107,18 @@ int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi, int need,
 					ret = -ERESTARTSYS;
 					break;
 				}
-				wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
+				if (!wait_woken(&wait, TASK_INTERRUPTIBLE, timeout)) {
+					if (!warned) {
+						pr_warn_ratelimited_client(cl,
+							"%p %llx.%llx caps wait timed out (need %s want %s)\n",
+							inode, ceph_vinop(inode),
+							ceph_cap_string(need),
+							ceph_cap_string(want));
+						warned = true;
+					}
+					ret = -ETIMEDOUT;
+					break;
+				}
 			}
 
 			remove_wait_queue(&ci->i_cap_wq, &wait);
-- 
2.53.0
Re: [PATCH v1 07/13] ceph: add timeout to caps wait in __ceph_get_caps()
Posted by Viacheslav Dubeyko 3 weeks, 4 days ago
On Thu, 2026-03-12 at 10:16 +0200, Ionut Nechita (Wind River) wrote:
> From: Ionut Nechita <ionut.nechita@windriver.com>
> 
> When waiting for caps in __ceph_get_caps(), the code uses
> wait_woken() with MAX_SCHEDULE_TIMEOUT, which can block
> indefinitely if the MDS is unavailable or slow to grant caps
> during reconnection.
> 
> This causes hung task warnings when MDS fails over:
> 
>   INFO: task dd:12345 blocked for more than 122 seconds.
>   Call Trace:
>     __ceph_get_caps+0x...
>     ceph_write_iter+0x...
> 
> During MDS failover, caps may be revoked or delayed while the
> client reconnects. Processes waiting for caps block indefinitely,
> also holding i_rwsem which blocks other I/O operations on the
> same inode, causing a cascade of blocked processes.
> 
> Fix this by using wait_woken() with mount_timeout instead of
> MAX_SCHEDULE_TIMEOUT. On timeout, return -ETIMEDOUT to allow
> the caller to handle the situation appropriately.
> 
> Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
> ---
>  fs/ceph/caps.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index bed34fc11c919..c88e10a634e5c 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -3055,7 +3055,10 @@ int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi, int need,
>  {
>  	struct ceph_inode_info *ci = ceph_inode(inode);
>  	struct ceph_fs_client *fsc = ceph_inode_to_fs_client(inode);
> +	struct ceph_client *cl = fsc->client;
> +	unsigned long timeout = ceph_timeout_jiffies(cl->options->mount_timeout);

The same concern about timeout value. :)

>  	int ret, _got, flags;
> +	bool warned = false;

Technically speaking, you are trying to create pr_warn_once_client() here.
Probably, we need to prefer this one instead of pr_warn_ratelimited_client().
However, maybe, we need to have debug output instead. This warned variable
completely not necessary here.

Thanks,
Slava.

>  
>  	ret = ceph_pool_perm_check(inode, need);
>  	if (ret < 0)
> @@ -3104,7 +3107,18 @@ int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi, int need,
>  					ret = -ERESTARTSYS;
>  					break;
>  				}
> -				wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
> +				if (!wait_woken(&wait, TASK_INTERRUPTIBLE, timeout)) {
> +					if (!warned) {
> +						pr_warn_ratelimited_client(cl,
> +							"%p %llx.%llx caps wait timed out (need %s want %s)\n",
> +							inode, ceph_vinop(inode),
> +							ceph_cap_string(need),
> +							ceph_cap_string(want));
> +						warned = true;
> +					}
> +					ret = -ETIMEDOUT;
> +					break;
> +				}
>  			}
>  
>  			remove_wait_queue(&ci->i_cap_wq, &wait);