drivers/infiniband/core/cm.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
When a GSI MAD packet is sent on the QP, it will potentially be
retried CMA_MAX_CM_RETRIES times with a timeout value of:
4.096usec * 2 ^ CMA_CM_RESPONSE_TIMEOUT
The above equates to ~64 seconds using the default CMA values.
The cm_id_priv's refcount will be incremented for this period.
Therefore, the timeout value waiting for a cm_id destruction must be
based on the effective timeout of MAD packets. To provide additional
leeway, we add 25% to this timeout and use that instead of the
constant 10 seconds timeout, which may result in false negatives.
Fixes: 96d9cbe2f2ff ("RDMA/cm: add timeout to cm_destroy_id wait")
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
---
drivers/infiniband/core/cm.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 01bede8ba1055..2a36a93459592 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -34,7 +34,6 @@ MODULE_AUTHOR("Sean Hefty");
MODULE_DESCRIPTION("InfiniBand CM");
MODULE_LICENSE("Dual BSD/GPL");
-#define CM_DESTROY_ID_WAIT_TIMEOUT 10000 /* msecs */
#define CM_DIRECT_RETRY_CTX ((void *) 1UL)
#define CM_MRA_SETTING 24 /* 4.096us * 2^24 = ~68.7 seconds */
@@ -1057,6 +1056,7 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
{
struct cm_id_private *cm_id_priv;
enum ib_cm_state old_state;
+ unsigned long timeout;
struct cm_work *work;
int ret;
@@ -1167,10 +1167,9 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
xa_erase(&cm.local_id_table, cm_local_id(cm_id->local_id));
cm_deref_id(cm_id_priv);
+ timeout = msecs_to_jiffies((cm_id_priv->max_cm_retries * cm_id_priv->timeout_ms * 5) / 4);
do {
- ret = wait_for_completion_timeout(&cm_id_priv->comp,
- msecs_to_jiffies(
- CM_DESTROY_ID_WAIT_TIMEOUT));
+ ret = wait_for_completion_timeout(&cm_id_priv->comp, timeout);
if (!ret) /* timeout happened */
cm_destroy_id_wait_timeout(cm_id, old_state);
} while (!ret);
--
2.43.5
On Tue, Oct 21, 2025 at 03:27:33PM +0200, Håkon Bugge wrote:
> When a GSI MAD packet is sent on the QP, it will potentially be
> retried CMA_MAX_CM_RETRIES times with a timeout value of:
>
> 4.096usec * 2 ^ CMA_CM_RESPONSE_TIMEOUT
>
> The above equates to ~64 seconds using the default CMA values.
>
> The cm_id_priv's refcount will be incremented for this period.
> Therefore, the timeout value waiting for a cm_id destruction must be
> based on the effective timeout of MAD packets. To provide additional
> leeway, we add 25% to this timeout and use that instead of the
> constant 10 seconds timeout, which may result in false negatives.
>
> Fixes: 96d9cbe2f2ff ("RDMA/cm: add timeout to cm_destroy_id wait")
I applied and removed this Fixes line. Most likely someone will complain
that this patch breaks his flow.
Thanks
On Tue, 21 Oct 2025 15:27:33 +0200, Håkon Bugge wrote:
> When a GSI MAD packet is sent on the QP, it will potentially be
> retried CMA_MAX_CM_RETRIES times with a timeout value of:
>
> 4.096usec * 2 ^ CMA_CM_RESPONSE_TIMEOUT
>
> The above equates to ~64 seconds using the default CMA values.
>
> [...]
Applied, thanks!
[1/1] RDMA/cm: Base cm_id destruction timeout on CMA values
https://git.kernel.org/rdma/rdma/c/58aca1f3de059c
Best regards,
--
Leon Romanovsky <leon@kernel.org>
> On 27 Oct 2025, at 12:36, Leon Romanovsky <leon@kernel.org> wrote: > > > On Tue, 21 Oct 2025 15:27:33 +0200, Håkon Bugge wrote: >> When a GSI MAD packet is sent on the QP, it will potentially be >> retried CMA_MAX_CM_RETRIES times with a timeout value of: >> >> 4.096usec * 2 ^ CMA_CM_RESPONSE_TIMEOUT >> >> The above equates to ~64 seconds using the default CMA values. >> >> [...] > > Applied, thanks! > Thanks, Leon! Håkon
© 2016 - 2026 Red Hat, Inc.