ceph/libceph: fix hung tasks and connection recovery during network disruptions

[PATCH v1 05/13] ceph: add timeout protection to ceph_lock_wait_for_completion()

Posted by Ionut Nechita (Wind River) 3 weeks, 5 days ago

From: Ionut Nechita <ionut.nechita@windriver.com>

When a file lock operation is interrupted and an unlock request is
sent to cancel it, ceph_lock_wait_for_completion() waits indefinitely
for r_safe_completion using wait_for_completion_killable().
If the MDS becomes unreachable after the unlock request is sent,
this wait will block indefinitely, causing hung task warnings:
  INFO: task flock:12345 blocked for more than 122 seconds.
  Call Trace:
    wait_for_completion_killable+0x...
    ceph_lock_wait_for_completion+0x...
    ceph_flock+0x...
This is similar to the issue fixed in ceph_mdsc_sync() where
indefinite waits on r_safe_completion can hang when MDS is
unavailable.
Fix this by using wait_for_completion_killable_timeout() with
mount_timeout instead of the indefinite wait. On timeout, return
-ETIMEDOUT to the caller. The lock state remains consistent because:
1. If the unlock succeeded on MDS, the lock is released
2. If the unlock didn't reach MDS, the original lock request
   was already aborted (CEPH_MDS_R_ABORTED set), so MDS will
   clean it up on reconnect
This follows the same timeout pattern used throughout the ceph
client for MDS operations.
Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
---
 fs/ceph/locks.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index ebf4ac0055ddc..55dd99460b81a 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -160,6 +160,8 @@ static int ceph_lock_wait_for_completion(struct ceph_mds_client *mdsc,
                                          struct ceph_mds_request *req)
 {
 	struct ceph_client *cl = mdsc->fsc->client;
+	struct ceph_options *opts = mdsc->fsc->client->options;
+	unsigned long timeout = ceph_timeout_jiffies(opts->mount_timeout);
 	struct ceph_mds_request *intr_req;
 	struct inode *inode = req->r_inode;
 	int err, lock_type;
@@ -221,7 +223,17 @@ static int ceph_lock_wait_for_completion(struct ceph_mds_client *mdsc,
 	if (err && err != -ERESTARTSYS)
 		return err;
 
-	wait_for_completion_killable(&req->r_safe_completion);
+	err = wait_for_completion_killable_timeout(&req->r_safe_completion,
+						   timeout);
+	if (err == -ERESTARTSYS) {
+		/* Interrupted again, just return the error */
+		return err;
+	}
+	if (err == 0) {
+		pr_warn_client(cl, "lock request tid %llu safe completion timed out\n",
+			       req->r_tid);
+		return -ETIMEDOUT;
+	}
 	return 0;
 }
 
-- 
2.53.0

Re: [PATCH v1 05/13] ceph: add timeout protection to ceph_lock_wait_for_completion()

Posted by Viacheslav Dubeyko 3 weeks, 4 days ago

On Thu, 2026-03-12 at 10:16 +0200, Ionut Nechita (Wind River) wrote:
> From: Ionut Nechita <ionut.nechita@windriver.com>
> 
> When a file lock operation is interrupted and an unlock request is
> sent to cancel it, ceph_lock_wait_for_completion() waits indefinitely
> for r_safe_completion using wait_for_completion_killable().
> If the MDS becomes unreachable after the unlock request is sent,
> this wait will block indefinitely, causing hung task warnings:
>   INFO: task flock:12345 blocked for more than 122 seconds.
>   Call Trace:
>     wait_for_completion_killable+0x...
>     ceph_lock_wait_for_completion+0x...
>     ceph_flock+0x...
> This is similar to the issue fixed in ceph_mdsc_sync() where
> indefinite waits on r_safe_completion can hang when MDS is
> unavailable.
> Fix this by using wait_for_completion_killable_timeout() with
> mount_timeout instead of the indefinite wait. On timeout, return
> -ETIMEDOUT to the caller. The lock state remains consistent because:
> 1. If the unlock succeeded on MDS, the lock is released
> 2. If the unlock didn't reach MDS, the original lock request
>    was already aborted (CEPH_MDS_R_ABORTED set), so MDS will
>    clean it up on reconnect
> This follows the same timeout pattern used throughout the ceph
> client for MDS operations.
> Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
> ---
>  fs/ceph/locks.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
> index ebf4ac0055ddc..55dd99460b81a 100644
> --- a/fs/ceph/locks.c
> +++ b/fs/ceph/locks.c
> @@ -160,6 +160,8 @@ static int ceph_lock_wait_for_completion(struct ceph_mds_client *mdsc,
>                                           struct ceph_mds_request *req)
>  {
>  	struct ceph_client *cl = mdsc->fsc->client;
> +	struct ceph_options *opts = mdsc->fsc->client->options;
> +	unsigned long timeout = ceph_timeout_jiffies(opts->mount_timeout);

The opts->mount_timeout could be configured unreasonably. Should we do something
about it?

>  	struct ceph_mds_request *intr_req;
>  	struct inode *inode = req->r_inode;
>  	int err, lock_type;
> @@ -221,7 +223,17 @@ static int ceph_lock_wait_for_completion(struct ceph_mds_client *mdsc,
>  	if (err && err != -ERESTARTSYS)
>  		return err;
>  
> -	wait_for_completion_killable(&req->r_safe_completion);
> +	err = wait_for_completion_killable_timeout(&req->r_safe_completion,
> +						   timeout);
> +	if (err == -ERESTARTSYS) {

Interesting... You didn't check this in other patches. Why? :)

> +		/* Interrupted again, just return the error */
> +		return err;
> +	}
> +	if (err == 0) {
> +		pr_warn_client(cl, "lock request tid %llu safe completion timed out\n",
> +			       req->r_tid);

The same concern about sending warning into system log here.

> +		return -ETIMEDOUT;
> +	}

Maybe, some style cleanup here:

if (err == -ERESTARTSYS) {
  <logic_1>
} else if (err == 0) {
  <logic_2>
}

Thanks,
Slava.

>  	return 0;
>  }
>