RDMA: Ensure CQ UMEMs are managed by ib_core

[PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step

Posted by Leon Romanovsky 1 month, 2 weeks ago

From: Leon Romanovsky <leonro@nvidia.com>

There is no need to defer the CQ resize operation, as it is intended to
be completed in one pass. The current bnxt_re_resize_cq() implementation
does not handle concurrent CQ resize requests, and this will be addressed
in the following patches.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++++++-----------------------
 1 file changed, 9 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index d652018c19b3..2aecfbbb7eaf 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -3309,20 +3309,6 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	return rc;
 }
 
-static void bnxt_re_resize_cq_complete(struct bnxt_re_cq *cq)
-{
-	struct bnxt_re_dev *rdev = cq->rdev;
-
-	bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
-
-	cq->qplib_cq.max_wqe = cq->resize_cqe;
-	if (cq->resize_umem) {
-		ib_umem_release(cq->ib_cq.umem);
-		cq->ib_cq.umem = cq->resize_umem;
-		cq->resize_umem = NULL;
-		cq->resize_cqe = 0;
-	}
-}
 
 int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
 		      struct ib_udata *udata)
@@ -3387,7 +3373,15 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
 		goto fail;
 	}
 
-	cq->ib_cq.cqe = cq->resize_cqe;
+	bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
+
+	cq->qplib_cq.max_wqe = cq->resize_cqe;
+	ib_umem_release(cq->ib_cq.umem);
+	cq->ib_cq.umem = cq->resize_umem;
+	cq->resize_umem = NULL;
+	cq->resize_cqe = 0;
+
+	cq->ib_cq.cqe = entries;
 	atomic_inc(&rdev->stats.res.resize_count);
 
 	return 0;
@@ -3907,15 +3901,6 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
 	struct bnxt_re_sqp_entries *sqp_entry = NULL;
 	unsigned long flags;
 
-	/* User CQ; the only processing we do is to
-	 * complete any pending CQ resize operation.
-	 */
-	if (cq->ib_cq.umem) {
-		if (cq->resize_umem)
-			bnxt_re_resize_cq_complete(cq);
-		return 0;
-	}
-
 	spin_lock_irqsave(&cq->cq_lock, flags);
 	budget = min_t(u32, num_entries, cq->max_cql);
 	num_entries = budget;

-- 
2.52.0

Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step

Posted by Selvin Xavier 1 month, 1 week ago

On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> From: Leon Romanovsky <leonro@nvidia.com>
>
> There is no need to defer the CQ resize operation, as it is intended to
> be completed in one pass. The current bnxt_re_resize_cq() implementation
> does not handle concurrent CQ resize requests, and this will be addressed
> in the following patches.
>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++++++-----------------------
>  1 file changed, 9 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> index d652018c19b3..2aecfbbb7eaf 100644
> --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> @@ -3309,20 +3309,6 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
>         return rc;
>  }
>
> -static void bnxt_re_resize_cq_complete(struct bnxt_re_cq *cq)
> -{
> -       struct bnxt_re_dev *rdev = cq->rdev;
> -
> -       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> -
> -       cq->qplib_cq.max_wqe = cq->resize_cqe;
> -       if (cq->resize_umem) {
> -               ib_umem_release(cq->ib_cq.umem);
> -               cq->ib_cq.umem = cq->resize_umem;
> -               cq->resize_umem = NULL;
> -               cq->resize_cqe = 0;
> -       }
> -}
>
>  int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
>                       struct ib_udata *udata)
> @@ -3387,7 +3373,15 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
>                 goto fail;
>         }
>
> -       cq->ib_cq.cqe = cq->resize_cqe;
> +       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> +
> +       cq->qplib_cq.max_wqe = cq->resize_cqe;
> +       ib_umem_release(cq->ib_cq.umem);
> +       cq->ib_cq.umem = cq->resize_umem;
> +       cq->resize_umem = NULL;
> +       cq->resize_cqe = 0;
> +
> +       cq->ib_cq.cqe = entries;
>         atomic_inc(&rdev->stats.res.resize_count);
>
>         return 0;
> @@ -3907,15 +3901,6 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
>         struct bnxt_re_sqp_entries *sqp_entry = NULL;
>         unsigned long flags;
>
> -       /* User CQ; the only processing we do is to
> -        * complete any pending CQ resize operation.
> -        */
> -       if (cq->ib_cq.umem) {
> -               if (cq->resize_umem)
> -                       bnxt_re_resize_cq_complete(cq);
> -               return 0;
> -       }
> -
Since this code is removed,  we need to remove  ibv_cmd_poll_cq call
from the user library.
For older libraries which still calls ibv_cmd_poll_cq, i think we
should we keep a check.  Else it will throw a print "POLL CQ : no CQL
to use". Either we should add the following code or remove this print.
       if (cq->ib_cq.umem)
                  return 0;
Otherwise, it looks good to me.

Thanks,
Selvin




>         spin_lock_irqsave(&cq->cq_lock, flags);
>         budget = min_t(u32, num_entries, cq->max_cql);
>         num_entries = budget;
>
> --
> 2.52.0
>

Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step

Posted by Leon Romanovsky 1 month, 1 week ago

On Tue, Feb 24, 2026 at 01:45:42PM +0530, Selvin Xavier wrote:
> On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > From: Leon Romanovsky <leonro@nvidia.com>
> >
> > There is no need to defer the CQ resize operation, as it is intended to
> > be completed in one pass. The current bnxt_re_resize_cq() implementation
> > does not handle concurrent CQ resize requests, and this will be addressed
> > in the following patches.
> >
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++++++-----------------------
> >  1 file changed, 9 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > index d652018c19b3..2aecfbbb7eaf 100644
> > --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > @@ -3309,20 +3309,6 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> >         return rc;
> >  }
> >
> > -static void bnxt_re_resize_cq_complete(struct bnxt_re_cq *cq)
> > -{
> > -       struct bnxt_re_dev *rdev = cq->rdev;
> > -
> > -       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> > -
> > -       cq->qplib_cq.max_wqe = cq->resize_cqe;
> > -       if (cq->resize_umem) {
> > -               ib_umem_release(cq->ib_cq.umem);
> > -               cq->ib_cq.umem = cq->resize_umem;
> > -               cq->resize_umem = NULL;
> > -               cq->resize_cqe = 0;
> > -       }
> > -}
> >
> >  int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
> >                       struct ib_udata *udata)
> > @@ -3387,7 +3373,15 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
> >                 goto fail;
> >         }
> >
> > -       cq->ib_cq.cqe = cq->resize_cqe;
> > +       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> > +
> > +       cq->qplib_cq.max_wqe = cq->resize_cqe;
> > +       ib_umem_release(cq->ib_cq.umem);
> > +       cq->ib_cq.umem = cq->resize_umem;
> > +       cq->resize_umem = NULL;
> > +       cq->resize_cqe = 0;
> > +
> > +       cq->ib_cq.cqe = entries;
> >         atomic_inc(&rdev->stats.res.resize_count);
> >
> >         return 0;
> > @@ -3907,15 +3901,6 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
> >         struct bnxt_re_sqp_entries *sqp_entry = NULL;
> >         unsigned long flags;
> >
> > -       /* User CQ; the only processing we do is to
> > -        * complete any pending CQ resize operation.
> > -        */
> > -       if (cq->ib_cq.umem) {
> > -               if (cq->resize_umem)
> > -                       bnxt_re_resize_cq_complete(cq);
> > -               return 0;
> > -       }
> > -
> Since this code is removed,  we need to remove  ibv_cmd_poll_cq call
> from the user library.
> For older libraries which still calls ibv_cmd_poll_cq, i think we
> should we keep a check.  Else it will throw a print "POLL CQ : no CQL
> to use". Either we should add the following code or remove this print.
>        if (cq->ib_cq.umem)
>                   return 0;

I'll add the check with extra comment.

> Otherwise, it looks good to me.

Thanks

> 
> Thanks,
> Selvin
> 
> 
> 
> 
> >         spin_lock_irqsave(&cq->cq_lock, flags);
> >         budget = min_t(u32, num_entries, cq->max_cql);
> >         num_entries = budget;
> >
> > --
> > 2.52.0
> >

Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step

Posted by Selvin Xavier 1 month, 2 weeks ago

On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> From: Leon Romanovsky <leonro@nvidia.com>
>
> There is no need to defer the CQ resize operation, as it is intended to
> be completed in one pass. The current bnxt_re_resize_cq() implementation
> does not handle concurrent CQ resize requests, and this will be addressed
> in the following patches.
bnxt HW requires that the previous CQ memory be available with the HW until
HW generates a cut off cqe on the CQ that is being destroyed. This is
the reason for
polling the completions in the user library after returning the
resize_cq call. Once the polling
thread sees the expected CQE, it will invoke the driver to free CQ
memory.  So ib_umem_release
should wait. This patch doesn't guarantee that.  Do you think if there
is a better way to handle this requirement?

>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++++++-----------------------
>  1 file changed, 9 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> index d652018c19b3..2aecfbbb7eaf 100644
> --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> @@ -3309,20 +3309,6 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
>         return rc;
>  }
>
> -static void bnxt_re_resize_cq_complete(struct bnxt_re_cq *cq)
> -{
> -       struct bnxt_re_dev *rdev = cq->rdev;
> -
> -       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> -
> -       cq->qplib_cq.max_wqe = cq->resize_cqe;
> -       if (cq->resize_umem) {
> -               ib_umem_release(cq->ib_cq.umem);
> -               cq->ib_cq.umem = cq->resize_umem;
> -               cq->resize_umem = NULL;
> -               cq->resize_cqe = 0;
> -       }
> -}
>
>  int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
>                       struct ib_udata *udata)
> @@ -3387,7 +3373,15 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
>                 goto fail;
>         }
>
> -       cq->ib_cq.cqe = cq->resize_cqe;
> +       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> +
> +       cq->qplib_cq.max_wqe = cq->resize_cqe;
> +       ib_umem_release(cq->ib_cq.umem);
> +       cq->ib_cq.umem = cq->resize_umem;
> +       cq->resize_umem = NULL;
> +       cq->resize_cqe = 0;
> +
> +       cq->ib_cq.cqe = entries;
>         atomic_inc(&rdev->stats.res.resize_count);
>
>         return 0;
> @@ -3907,15 +3901,6 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
>         struct bnxt_re_sqp_entries *sqp_entry = NULL;
>         unsigned long flags;
>
> -       /* User CQ; the only processing we do is to
> -        * complete any pending CQ resize operation.
> -        */
> -       if (cq->ib_cq.umem) {
> -               if (cq->resize_umem)
> -                       bnxt_re_resize_cq_complete(cq);
> -               return 0;
> -       }
> -
>         spin_lock_irqsave(&cq->cq_lock, flags);
>         budget = min_t(u32, num_entries, cq->max_cql);
>         num_entries = budget;
>
> --
> 2.52.0
>

Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step

Posted by Leon Romanovsky 1 month, 2 weeks ago

On Mon, Feb 16, 2026 at 09:29:29AM +0530, Selvin Xavier wrote:
> On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > From: Leon Romanovsky <leonro@nvidia.com>
> >
> > There is no need to defer the CQ resize operation, as it is intended to
> > be completed in one pass. The current bnxt_re_resize_cq() implementation
> > does not handle concurrent CQ resize requests, and this will be addressed
> > in the following patches.
> bnxt HW requires that the previous CQ memory be available with the HW until
> HW generates a cut off cqe on the CQ that is being destroyed. This is
> the reason for
> polling the completions in the user library after returning the
> resize_cq call. Once the polling
> thread sees the expected CQE, it will invoke the driver to free CQ
> memory.

This flow is problematic. It requires the kernel to trust a user‑space
application, which is not acceptable. There is no guarantee that the
rdma-core implementation is correct or will invoke the interface properly.
Users can bypass rdma-core entirely and issue ioctls directly (syzkaller,
custom rdma-core variants, etc.), leading to umem leaks, races that overwrite
kernel memory, and access to fields that are now being modified. All of this
can occur silently and without any protections.

> So ib_umem_release should wait. This patch doesn't guarantee that.

The issue is that it was never guaranteed in the first place. It only appeared
to work under very controlled conditions.

> Do you think if there is a better way to handle this requirement?

You should wait for BNXT_RE_WC_TYPE_COFF in the kernel before returning
from resize_cq.

Thanks

> 
> >
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++++++-----------------------
> >  1 file changed, 9 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > index d652018c19b3..2aecfbbb7eaf 100644
> > --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > @@ -3309,20 +3309,6 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> >         return rc;
> >  }
> >
> > -static void bnxt_re_resize_cq_complete(struct bnxt_re_cq *cq)
> > -{
> > -       struct bnxt_re_dev *rdev = cq->rdev;
> > -
> > -       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> > -
> > -       cq->qplib_cq.max_wqe = cq->resize_cqe;
> > -       if (cq->resize_umem) {
> > -               ib_umem_release(cq->ib_cq.umem);
> > -               cq->ib_cq.umem = cq->resize_umem;
> > -               cq->resize_umem = NULL;
> > -               cq->resize_cqe = 0;
> > -       }
> > -}
> >
> >  int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
> >                       struct ib_udata *udata)
> > @@ -3387,7 +3373,15 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
> >                 goto fail;
> >         }
> >
> > -       cq->ib_cq.cqe = cq->resize_cqe;
> > +       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> > +
> > +       cq->qplib_cq.max_wqe = cq->resize_cqe;
> > +       ib_umem_release(cq->ib_cq.umem);
> > +       cq->ib_cq.umem = cq->resize_umem;
> > +       cq->resize_umem = NULL;
> > +       cq->resize_cqe = 0;
> > +
> > +       cq->ib_cq.cqe = entries;
> >         atomic_inc(&rdev->stats.res.resize_count);
> >
> >         return 0;
> > @@ -3907,15 +3901,6 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
> >         struct bnxt_re_sqp_entries *sqp_entry = NULL;
> >         unsigned long flags;
> >
> > -       /* User CQ; the only processing we do is to
> > -        * complete any pending CQ resize operation.
> > -        */
> > -       if (cq->ib_cq.umem) {
> > -               if (cq->resize_umem)
> > -                       bnxt_re_resize_cq_complete(cq);
> > -               return 0;
> > -       }
> > -
> >         spin_lock_irqsave(&cq->cq_lock, flags);
> >         budget = min_t(u32, num_entries, cq->max_cql);
> >         num_entries = budget;
> >
> > --
> > 2.52.0
> >

Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step

Posted by Selvin Xavier 1 month, 2 weeks ago

On Mon, Feb 16, 2026 at 1:37 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Mon, Feb 16, 2026 at 09:29:29AM +0530, Selvin Xavier wrote:
> > On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > From: Leon Romanovsky <leonro@nvidia.com>
> > >
> > > There is no need to defer the CQ resize operation, as it is intended to
> > > be completed in one pass. The current bnxt_re_resize_cq() implementation
> > > does not handle concurrent CQ resize requests, and this will be addressed
> > > in the following patches.
> > bnxt HW requires that the previous CQ memory be available with the HW until
> > HW generates a cut off cqe on the CQ that is being destroyed. This is
> > the reason for
> > polling the completions in the user library after returning the
> > resize_cq call. Once the polling
> > thread sees the expected CQE, it will invoke the driver to free CQ
> > memory.
>
> This flow is problematic. It requires the kernel to trust a user‑space
> application, which is not acceptable. There is no guarantee that the
> rdma-core implementation is correct or will invoke the interface properly.
> Users can bypass rdma-core entirely and issue ioctls directly (syzkaller,
> custom rdma-core variants, etc.), leading to umem leaks, races that overwrite
> kernel memory, and access to fields that are now being modified. All of this
> can occur silently and without any protections.
>
> > So ib_umem_release should wait. This patch doesn't guarantee that.
>
> The issue is that it was never guaranteed in the first place. It only appeared
> to work under very controlled conditions.
>
> > Do you think if there is a better way to handle this requirement?
>
> You should wait for BNXT_RE_WC_TYPE_COFF in the kernel before returning
> from resize_cq.
The difficulty is that libbnxt_re  in rdma-core has the  queue  the
consumer index used for completion lookup. The driver therefore has to
use copy_from_user to read the queue memory and then check for
BNXT_RE_WC_TYPE_COFF, along with the queue consumer index and the
relevant validity flags. I’ll explore if we have a way to handle this
and get back.
>
> Thanks
>
> >
> > >
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > ---
> > >  drivers/infiniband/hw/bnxt_re/ib_verbs.c | 33 +++++++++-----------------------
> > >  1 file changed, 9 insertions(+), 24 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > > index d652018c19b3..2aecfbbb7eaf 100644
> > > --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > > +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > > @@ -3309,20 +3309,6 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> > >         return rc;
> > >  }
> > >
> > > -static void bnxt_re_resize_cq_complete(struct bnxt_re_cq *cq)
> > > -{
> > > -       struct bnxt_re_dev *rdev = cq->rdev;
> > > -
> > > -       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> > > -
> > > -       cq->qplib_cq.max_wqe = cq->resize_cqe;
> > > -       if (cq->resize_umem) {
> > > -               ib_umem_release(cq->ib_cq.umem);
> > > -               cq->ib_cq.umem = cq->resize_umem;
> > > -               cq->resize_umem = NULL;
> > > -               cq->resize_cqe = 0;
> > > -       }
> > > -}
> > >
> > >  int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
> > >                       struct ib_udata *udata)
> > > @@ -3387,7 +3373,15 @@ int bnxt_re_resize_cq(struct ib_cq *ibcq, unsigned int cqe,
> > >                 goto fail;
> > >         }
> > >
> > > -       cq->ib_cq.cqe = cq->resize_cqe;
> > > +       bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq);
> > > +
> > > +       cq->qplib_cq.max_wqe = cq->resize_cqe;
> > > +       ib_umem_release(cq->ib_cq.umem);
> > > +       cq->ib_cq.umem = cq->resize_umem;
> > > +       cq->resize_umem = NULL;
> > > +       cq->resize_cqe = 0;
> > > +
> > > +       cq->ib_cq.cqe = entries;
> > >         atomic_inc(&rdev->stats.res.resize_count);
> > >
> > >         return 0;
> > > @@ -3907,15 +3901,6 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
> > >         struct bnxt_re_sqp_entries *sqp_entry = NULL;
> > >         unsigned long flags;
> > >
> > > -       /* User CQ; the only processing we do is to
> > > -        * complete any pending CQ resize operation.
> > > -        */
> > > -       if (cq->ib_cq.umem) {
> > > -               if (cq->resize_umem)
> > > -                       bnxt_re_resize_cq_complete(cq);
> > > -               return 0;
> > > -       }
> > > -
> > >         spin_lock_irqsave(&cq->cq_lock, flags);
> > >         budget = min_t(u32, num_entries, cq->max_cql);
> > >         num_entries = budget;
> > >
> > > --
> > > 2.52.0
> > >
>
>

Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step

Posted by Leon Romanovsky 1 month, 2 weeks ago

On Tue, Feb 17, 2026 at 10:32:25AM +0530, Selvin Xavier wrote:
> On Mon, Feb 16, 2026 at 1:37 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Mon, Feb 16, 2026 at 09:29:29AM +0530, Selvin Xavier wrote:
> > > On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > >
> > > > There is no need to defer the CQ resize operation, as it is intended to
> > > > be completed in one pass. The current bnxt_re_resize_cq() implementation
> > > > does not handle concurrent CQ resize requests, and this will be addressed
> > > > in the following patches.
> > > bnxt HW requires that the previous CQ memory be available with the HW until
> > > HW generates a cut off cqe on the CQ that is being destroyed. This is
> > > the reason for
> > > polling the completions in the user library after returning the
> > > resize_cq call. Once the polling
> > > thread sees the expected CQE, it will invoke the driver to free CQ
> > > memory.
> >
> > This flow is problematic. It requires the kernel to trust a user‑space
> > application, which is not acceptable. There is no guarantee that the
> > rdma-core implementation is correct or will invoke the interface properly.
> > Users can bypass rdma-core entirely and issue ioctls directly (syzkaller,
> > custom rdma-core variants, etc.), leading to umem leaks, races that overwrite
> > kernel memory, and access to fields that are now being modified. All of this
> > can occur silently and without any protections.
> >
> > > So ib_umem_release should wait. This patch doesn't guarantee that.
> >
> > The issue is that it was never guaranteed in the first place. It only appeared
> > to work under very controlled conditions.
> >
> > > Do you think if there is a better way to handle this requirement?
> >
> > You should wait for BNXT_RE_WC_TYPE_COFF in the kernel before returning
> > from resize_cq.
> The difficulty is that libbnxt_re  in rdma-core has the  queue  the
> consumer index used for completion lookup. The driver therefore has to
> use copy_from_user to read the queue memory and then check for
> BNXT_RE_WC_TYPE_COFF, along with the queue consumer index and the
> relevant validity flags. I’ll explore if we have a way to handle this
> and get back.

The thing is that you need to ensure that after libbnxt_re issued resize_cq command,
kernel won't require anything from user-space.

Can you cause to your HW to stop generate CQEs before resize_cq?

Thanks

Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step

Posted by Selvin Xavier 1 month, 2 weeks ago

On Tue, Feb 17, 2026 at 1:27 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Tue, Feb 17, 2026 at 10:32:25AM +0530, Selvin Xavier wrote:
> > On Mon, Feb 16, 2026 at 1:37 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Mon, Feb 16, 2026 at 09:29:29AM +0530, Selvin Xavier wrote:
> > > > On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > > >
> > > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > >
> > > > > There is no need to defer the CQ resize operation, as it is intended to
> > > > > be completed in one pass. The current bnxt_re_resize_cq() implementation
> > > > > does not handle concurrent CQ resize requests, and this will be addressed
> > > > > in the following patches.
> > > > bnxt HW requires that the previous CQ memory be available with the HW until
> > > > HW generates a cut off cqe on the CQ that is being destroyed. This is
> > > > the reason for
> > > > polling the completions in the user library after returning the
> > > > resize_cq call. Once the polling
> > > > thread sees the expected CQE, it will invoke the driver to free CQ
> > > > memory.
> > >
> > > This flow is problematic. It requires the kernel to trust a user‑space
> > > application, which is not acceptable. There is no guarantee that the
> > > rdma-core implementation is correct or will invoke the interface properly.
> > > Users can bypass rdma-core entirely and issue ioctls directly (syzkaller,
> > > custom rdma-core variants, etc.), leading to umem leaks, races that overwrite
> > > kernel memory, and access to fields that are now being modified. All of this
> > > can occur silently and without any protections.
> > >
> > > > So ib_umem_release should wait. This patch doesn't guarantee that.
> > >
> > > The issue is that it was never guaranteed in the first place. It only appeared
> > > to work under very controlled conditions.
> > >
> > > > Do you think if there is a better way to handle this requirement?
> > >
> > > You should wait for BNXT_RE_WC_TYPE_COFF in the kernel before returning
> > > from resize_cq.
> > The difficulty is that libbnxt_re  in rdma-core has the  queue  the
> > consumer index used for completion lookup. The driver therefore has to
> > use copy_from_user to read the queue memory and then check for
> > BNXT_RE_WC_TYPE_COFF, along with the queue consumer index and the
> > relevant validity flags. I’ll explore if we have a way to handle this
> > and get back.
>
> The thing is that you need to ensure that after libbnxt_re issued resize_cq command,
> kernel won't require anything from user-space.
>
> Can you cause to your HW to stop generate CQEs before resize_cq?
we dont have this control (especially on the Receive CQ side).  For
the Tx side, maybe we can prevent
posting to the Tx queue.
>
> Thanks

Re: [PATCH rdma-next 42/50] RDMA/bnxt_re: Complete CQ resize in a single step

Posted by Selvin Xavier 1 month, 1 week ago

On Tue, Feb 17, 2026 at 4:22 PM Selvin Xavier
<selvin.xavier@broadcom.com> wrote:
>
> On Tue, Feb 17, 2026 at 1:27 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Tue, Feb 17, 2026 at 10:32:25AM +0530, Selvin Xavier wrote:
> > > On Mon, Feb 16, 2026 at 1:37 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Mon, Feb 16, 2026 at 09:29:29AM +0530, Selvin Xavier wrote:
> > > > > On Fri, Feb 13, 2026 at 4:31 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > > > >
> > > > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > > >
> > > > > > There is no need to defer the CQ resize operation, as it is intended to
> > > > > > be completed in one pass. The current bnxt_re_resize_cq() implementation
> > > > > > does not handle concurrent CQ resize requests, and this will be addressed
> > > > > > in the following patches.
> > > > > bnxt HW requires that the previous CQ memory be available with the HW until
> > > > > HW generates a cut off cqe on the CQ that is being destroyed. This is
> > > > > the reason for
> > > > > polling the completions in the user library after returning the
> > > > > resize_cq call. Once the polling
> > > > > thread sees the expected CQE, it will invoke the driver to free CQ
> > > > > memory.
> > > >
> > > > This flow is problematic. It requires the kernel to trust a user‑space
> > > > application, which is not acceptable. There is no guarantee that the
> > > > rdma-core implementation is correct or will invoke the interface properly.
> > > > Users can bypass rdma-core entirely and issue ioctls directly (syzkaller,
> > > > custom rdma-core variants, etc.), leading to umem leaks, races that overwrite
> > > > kernel memory, and access to fields that are now being modified. All of this
> > > > can occur silently and without any protections.
> > > >
> > > > > So ib_umem_release should wait. This patch doesn't guarantee that.
> > > >
> > > > The issue is that it was never guaranteed in the first place. It only appeared
> > > > to work under very controlled conditions.
> > > >
> > > > > Do you think if there is a better way to handle this requirement?
> > > >
> > > > You should wait for BNXT_RE_WC_TYPE_COFF in the kernel before returning
> > > > from resize_cq.
> > > The difficulty is that libbnxt_re  in rdma-core has the  queue  the
> > > consumer index used for completion lookup. The driver therefore has to
> > > use copy_from_user to read the queue memory and then check for
> > > BNXT_RE_WC_TYPE_COFF, along with the queue consumer index and the
> > > relevant validity flags. I’ll explore if we have a way to handle this
> > > and get back.
> >
> > The thing is that you need to ensure that after libbnxt_re issued resize_cq command,
> > kernel won't require anything from user-space.
> >
> > Can you cause to your HW to stop generate CQEs before resize_cq?
> we dont have this control (especially on the Receive CQ side).  For
> the Tx side, maybe we can prevent
> posting to the Tx queue.
After discussing with other teams internally, we feel that the
sequence given by you
 should work fine.  As per the sequence, BNXT_RE_WC_TYPE_COFF should
be available when resize request is returned from FW.
We will test your series and confirm above behavior.
> >
> > Thanks