[PATCH v1 06/13] ceph: set default timeout for MDS requests

Ionut Nechita (Wind River) posted 13 patches 3 weeks, 5 days ago
[PATCH v1 06/13] ceph: set default timeout for MDS requests
Posted by Ionut Nechita (Wind River) 3 weeks, 5 days ago
From: Ionut Nechita <ionut.nechita@windriver.com>

MDS requests created via ceph_mdsc_create_request() have r_timeout
initialized to 0 (from kmem_cache_zalloc). When r_timeout is 0,
ceph_timeout_jiffies() returns MAX_SCHEDULE_TIMEOUT, causing
ceph_mdsc_wait_request() to wait indefinitely.

This causes hung task warnings when MDS becomes unavailable during
operations like setattr or truncate:

  INFO: task dd:12345 blocked for more than 122 seconds.
  Call Trace:
    ceph_mdsc_wait_request+0x...
    ceph_mdsc_do_request+0x...
    __ceph_setattr+0x...

Only the mount path in super.c explicitly sets r_timeout to
mount_timeout. All other MDS requests (setattr, lookup, mkdir,
etc.) use the default 0 value, making them wait forever.

Fix this by initializing r_timeout to mount_timeout in
ceph_mdsc_create_request(). This ensures all MDS requests have
a reasonable timeout and will fail with -ETIMEDOUT rather than
hanging indefinitely.

Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
---
 fs/ceph/mds_client.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 45abddd7f317e..ac86225595b5f 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2613,6 +2613,7 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode)
 	mutex_init(&req->r_fill_mutex);
 	req->r_mdsc = mdsc;
 	req->r_started = jiffies;
+	req->r_timeout = mdsc->fsc->client->options->mount_timeout;
 	req->r_start_latency = ktime_get();
 	req->r_resend_mds = -1;
 	INIT_LIST_HEAD(&req->r_unsafe_dir_item);
-- 
2.53.0
Re: [PATCH v1 06/13] ceph: set default timeout for MDS requests
Posted by Viacheslav Dubeyko 3 weeks, 4 days ago
On Thu, 2026-03-12 at 10:16 +0200, Ionut Nechita (Wind River) wrote:
> From: Ionut Nechita <ionut.nechita@windriver.com>
> 
> MDS requests created via ceph_mdsc_create_request() have r_timeout
> initialized to 0 (from kmem_cache_zalloc). When r_timeout is 0,
> ceph_timeout_jiffies() returns MAX_SCHEDULE_TIMEOUT, causing
> ceph_mdsc_wait_request() to wait indefinitely.
> 
> This causes hung task warnings when MDS becomes unavailable during
> operations like setattr or truncate:
> 
>   INFO: task dd:12345 blocked for more than 122 seconds.
>   Call Trace:
>     ceph_mdsc_wait_request+0x...
>     ceph_mdsc_do_request+0x...
>     __ceph_setattr+0x...
> 
> Only the mount path in super.c explicitly sets r_timeout to
> mount_timeout. All other MDS requests (setattr, lookup, mkdir,
> etc.) use the default 0 value, making them wait forever.
> 
> Fix this by initializing r_timeout to mount_timeout in
> ceph_mdsc_create_request(). This ensures all MDS requests have
> a reasonable timeout and will fail with -ETIMEDOUT rather than
> hanging indefinitely.
> 
> Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
> ---
>  fs/ceph/mds_client.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 45abddd7f317e..ac86225595b5f 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2613,6 +2613,7 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode)
>  	mutex_init(&req->r_fill_mutex);
>  	req->r_mdsc = mdsc;
>  	req->r_started = jiffies;
> +	req->r_timeout = mdsc->fsc->client->options->mount_timeout;
>  	req->r_start_latency = ktime_get();
>  	req->r_resend_mds = -1;
>  	INIT_LIST_HEAD(&req->r_unsafe_dir_item);

I like this fix. Really nice. But, maybe, we should check the mount_timeout
value.

Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

Thanks,
Slava.