From nobody Sun May 24 19:33:19 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B90751F0E25; Sat, 23 May 2026 02:03:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779501796; cv=none; b=rwX7nqZ0bEVdBq2RDUliplG3mSQkWDPd1CKAEOCy/CD+VEagDeyv6ZYQ97E73P9T6XUgGnKBzHlp0hsq/BScn/1ehyJ1lgPznu1GIstzuX8gMD45ofAWzkuc7EEQm90g5TZ1B9WHf3DaJepeCA3Qx/VJtiQqTRIrQDDRVbrKLPw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779501796; c=relaxed/simple; bh=ALksr8KP5GZd7zBkgmNO017moVCoqJgEJPKwz3cfibM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=K0TFxz+o4u5YY86mQADrJXyssBQsohNTshWf33wTidPnYYaFI5qO0TbRhp6/WKY7ti7xgQ5ZIdaqYj+rOQl/wLEIIzWY5XrSf8vTqKB6KkkjrhnjtIO5CLDpPXxgTQNdaZpXqeFvTKSU6ltiw/WDNK8gYdOYutK152SWvppRe+A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Received: by linux.microsoft.com (Postfix, from userid 1202) id A006120B7168; Fri, 22 May 2026 19:03:06 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com A006120B7168 From: Long Li To: Long Li , Konstantin Taranov , Jakub Kicinski , "David S . Miller" , Paolo Abeni , Eric Dumazet , Andrew Lunn , Jason Gunthorpe , Leon Romanovsky , Haiyang Zhang , "K . Y . Srinivasan" , Wei Liu , Dexuan Cui , shradhagupta@linux.microsoft.com Cc: Simon Horman , netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v11 1/6] net: mana: Create separate EQs for each vPort Date: Fri, 22 May 2026 19:02:51 -0700 Message-ID: <20260523020258.1107742-2-longli@microsoft.com> X-Mailer: git-send-email 2.43.7 In-Reply-To: <20260523020258.1107742-1-longli@microsoft.com> References: <20260523020258.1107742-1-longli@microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable To prepare for assigning vPorts to dedicated MSI-X vectors, remove EQ sharing among the vPorts and create dedicated EQs for each vPort. Move the EQ definition from struct mana_context to struct mana_port_context and update related support functions. Export mana_create_eq() and mana_destroy_eq() for use by the MANA RDMA driver. RSS QPs now take a vport reference via pd->vport_use_count to ensure EQs outlive all QP consumers. The vport must already be configured by a raw QP before an RSS QP can be created. EQs are only destroyed when the last QP (raw or RSS) on the PD releases its reference. Reject cross-port PD sharing for both raw and RSS QPs. Since EQs and vport configuration are per-port, a PD is bound to the port used by its first raw QP. Subsequent QPs on the same PD must use the same port or the creation fails with -EINVAL. Serialize mana_set_channels() against RDMA vport configuration to prevent num_queues from changing while RDMA holds EQs sized to the current value. A channel_changing flag is set under apc->vport_mutex before detach and checked by mana_cfg_vport() when called from the RDMA path, blocking RDMA from grabbing the vport during the entire detach/attach window. When the port is down and RDMA already holds the vport, the channel change is rejected with -EBUSY. Signed-off-by: Long Li --- drivers/infiniband/hw/mana/main.c | 40 ++++-- drivers/infiniband/hw/mana/mana_ib.h | 7 ++ drivers/infiniband/hw/mana/qp.c | 40 ++++-- drivers/net/ethernet/microsoft/mana/mana_en.c | 117 +++++++++++------- .../ethernet/microsoft/mana/mana_ethtool.c | 23 +++- include/net/mana/mana.h | 15 ++- 6 files changed, 175 insertions(+), 67 deletions(-) diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana= /main.c index ac5e75dd3494..f8a9013f0ca3 100644 --- a/drivers/infiniband/hw/mana/main.c +++ b/drivers/infiniband/hw/mana/main.c @@ -20,8 +20,10 @@ void mana_ib_uncfg_vport(struct mana_ib_dev *dev, struct= mana_ib_pd *pd, pd->vport_use_count--; WARN_ON(pd->vport_use_count < 0); =20 - if (!pd->vport_use_count) + if (!pd->vport_use_count) { + mana_destroy_eq(mpc); mana_uncfg_vport(mpc); + } =20 mutex_unlock(&pd->vport_mutex); } @@ -40,13 +42,27 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port= , struct mana_ib_pd *pd, =20 pd->vport_use_count++; if (pd->vport_use_count > 1) { + /* Reject cross-port PD sharing. EQs and vport config + * are per-port, so the PD must stay bound to the port + * that was configured on the first raw QP creation. + */ + if (pd->vport_port !=3D port) { + pd->vport_use_count--; + mutex_unlock(&pd->vport_mutex); + ibdev_dbg(&dev->ib_dev, + "PD already bound to port %u\n", + pd->vport_port); + return -EINVAL; + } ibdev_dbg(&dev->ib_dev, "Skip as this PD is already configured vport\n"); mutex_unlock(&pd->vport_mutex); return 0; } =20 - err =3D mana_cfg_vport(mpc, pd->pdn, doorbell_id); + pd->vport_port =3D port; + + err =3D mana_cfg_vport(mpc, pd->pdn, doorbell_id, true); if (err) { pd->vport_use_count--; mutex_unlock(&pd->vport_mutex); @@ -55,15 +71,23 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port= , struct mana_ib_pd *pd, return err; } =20 - mutex_unlock(&pd->vport_mutex); =20 - pd->tx_shortform_allowed =3D mpc->tx_shortform_allowed; - pd->tx_vp_offset =3D mpc->tx_vp_offset; + err =3D mana_create_eq(mpc); + if (err) { + mana_uncfg_vport(mpc); + pd->vport_use_count--; + } else { + pd->tx_shortform_allowed =3D mpc->tx_shortform_allowed; + pd->tx_vp_offset =3D mpc->tx_vp_offset; + } + + mutex_unlock(&pd->vport_mutex); =20 - ibdev_dbg(&dev->ib_dev, "vport handle %llx pdid %x doorbell_id %x\n", - mpc->port_handle, pd->pdn, doorbell_id); + if (!err) + ibdev_dbg(&dev->ib_dev, "vport handle %llx pdid %x doorbell_id %x\n", + mpc->port_handle, pd->pdn, doorbell_id); =20 - return 0; + return err; } =20 int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata) diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/m= ana/mana_ib.h index c9c94e86a72b..05115b154eb4 100644 --- a/drivers/infiniband/hw/mana/mana_ib.h +++ b/drivers/infiniband/hw/mana/mana_ib.h @@ -102,6 +102,13 @@ struct mana_ib_pd { struct mutex vport_mutex; int vport_use_count; =20 + /* Port bound to this PD for raw QP usage. Only valid when + * vport_use_count > 0. A PD can only be associated with a + * single physical port because per-port EQs and vport + * configuration are tied to the PD's refcount. + */ + u32 vport_port; + bool tx_shortform_allowed; u32 tx_vp_offset; }; diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/q= p.c index 0fbcf449c134..d9a0bf8b2bc9 100644 --- a/drivers/infiniband/hw/mana/qp.c +++ b/drivers/infiniband/hw/mana/qp.c @@ -79,6 +79,7 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, stru= ct ib_pd *pd, struct ib_qp_init_attr *attr, struct ib_udata *udata) { + struct mana_ib_pd *mana_pd =3D container_of(pd, struct mana_ib_pd, ibpd); struct mana_ib_qp *qp =3D container_of(ibqp, struct mana_ib_qp, ibqp); struct mana_ib_dev *mdev =3D container_of(pd->device, struct mana_ib_dev, ib_dev); @@ -155,6 +156,19 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, s= truct ib_pd *pd, =20 qp->port =3D port; =20 + /* Take a reference on the vport to ensure EQs outlive this QP. + * The vport must already be configured by a raw QP on the + * same port =E2=80=94 cross-port PD sharing is not supported. + */ + mutex_lock(&mana_pd->vport_mutex); + if (!mana_pd->vport_use_count || mana_pd->vport_port !=3D port) { + mutex_unlock(&mana_pd->vport_mutex); + ret =3D -EINVAL; + goto fail; + } + mana_pd->vport_use_count++; + mutex_unlock(&mana_pd->vport_mutex); + for (i =3D 0; i < ind_tbl_size; i++) { struct mana_obj_spec wq_spec =3D {}; struct mana_obj_spec cq_spec =3D {}; @@ -171,13 +185,13 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, = struct ib_pd *pd, cq_spec.gdma_region =3D cq->queue.gdma_region; cq_spec.queue_size =3D cq->cqe * COMP_ENTRY_SIZE; cq_spec.modr_ctx_id =3D 0; - eq =3D &mpc->ac->eqs[cq->comp_vector]; + eq =3D &mpc->eqs[cq->comp_vector % mpc->num_queues]; cq_spec.attached_eq =3D eq->eq->id; =20 ret =3D mana_create_wq_obj(mpc, mpc->port_handle, GDMA_RQ, &wq_spec, &cq_spec, &wq->rx_object); if (ret) - goto fail; + goto free_vport; =20 /* The GDMA regions are now owned by the WQ object */ wq->queue.gdma_region =3D GDMA_INVALID_DMA_REGION; @@ -199,7 +213,7 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, st= ruct ib_pd *pd, ret =3D mana_ib_install_cq_cb(mdev, cq); if (ret) { mana_destroy_wq_obj(mpc, GDMA_RQ, wq->rx_object); - goto fail; + goto free_vport; } } resp.num_entries =3D i; @@ -210,7 +224,7 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, st= ruct ib_pd *pd, ucmd.rx_hash_key_len, ucmd.rx_hash_key); if (ret) - goto fail; + goto free_vport; =20 ret =3D ib_copy_to_udata(udata, &resp, sizeof(resp)); if (ret) { @@ -226,7 +240,7 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, st= ruct ib_pd *pd, =20 err_disable_vport_rx: mana_disable_vport_rx(mpc); -fail: +free_vport: while (i-- > 0) { ibwq =3D ind_tbl->ind_tbl[i]; ibcq =3D ibwq->cq; @@ -237,6 +251,9 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, st= ruct ib_pd *pd, mana_destroy_wq_obj(mpc, GDMA_RQ, wq->rx_object); } =20 + mana_ib_uncfg_vport(mdev, mana_pd, port); + +fail: kfree(mana_ind_table); =20 return ret; @@ -299,7 +316,7 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, st= ruct ib_pd *ibpd, =20 err =3D mana_ib_cfg_vport(mdev, port, pd, mana_ucontext->doorbell); if (err) - return -ENODEV; + return err; =20 qp->port =3D port; =20 @@ -321,7 +338,11 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, s= truct ib_pd *ibpd, cq_spec.queue_size =3D send_cq->cqe * COMP_ENTRY_SIZE; cq_spec.modr_ctx_id =3D 0; eq_vec =3D send_cq->comp_vector; - eq =3D &mpc->ac->eqs[eq_vec]; + if (!mpc->eqs) { + err =3D -EINVAL; + goto err_destroy_queue; + } + eq =3D &mpc->eqs[eq_vec % mpc->num_queues]; cq_spec.attached_eq =3D eq->eq->id; =20 err =3D mana_create_wq_obj(mpc, mpc->port_handle, GDMA_SQ, &wq_spec, @@ -785,14 +806,17 @@ static int mana_ib_destroy_qp_rss(struct mana_ib_qp *= qp, { struct mana_ib_dev *mdev =3D container_of(qp->ibqp.device, struct mana_ib_dev, ib_dev); + struct ib_pd *ibpd =3D qp->ibqp.pd; struct mana_port_context *mpc; struct net_device *ndev; + struct mana_ib_pd *pd; struct mana_ib_wq *wq; struct ib_wq *ibwq; int i; =20 ndev =3D mana_ib_get_netdev(qp->ibqp.device, qp->port); mpc =3D netdev_priv(ndev); + pd =3D container_of(ibpd, struct mana_ib_pd, ibpd); =20 /* Disable vPort RX steering before destroying RX WQ objects. * Otherwise firmware still routes traffic to the destroyed queues, @@ -817,6 +841,8 @@ static int mana_ib_destroy_qp_rss(struct mana_ib_qp *qp, mana_destroy_wq_obj(mpc, GDMA_RQ, wq->rx_object); } =20 + mana_ib_uncfg_vport(mdev, pd, qp->port); + return 0; } =20 diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/et= hernet/microsoft/mana/mana_en.c index 82f1461a48e9..7c776f115f5a 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -1298,7 +1298,7 @@ void mana_uncfg_vport(struct mana_port_context *apc) EXPORT_SYMBOL_NS(mana_uncfg_vport, "NET_MANA"); =20 int mana_cfg_vport(struct mana_port_context *apc, u32 protection_dom_id, - u32 doorbell_pg_id) + u32 doorbell_pg_id, bool check_channel_changing) { struct mana_config_vport_resp resp =3D {}; struct mana_config_vport_req req =3D {}; @@ -1323,7 +1323,8 @@ int mana_cfg_vport(struct mana_port_context *apc, u32= protection_dom_id, * Ethernet usage on the same port. */ mutex_lock(&apc->vport_mutex); - if (apc->vport_use_count > 0) { + if (apc->vport_use_count > 0 || + (check_channel_changing && apc->channel_changing)) { mutex_unlock(&apc->vport_mutex); return -EBUSY; } @@ -1623,78 +1624,84 @@ void mana_destroy_wq_obj(struct mana_port_context *= apc, u32 wq_type, } EXPORT_SYMBOL_NS(mana_destroy_wq_obj, "NET_MANA"); =20 -static void mana_destroy_eq(struct mana_context *ac) +void mana_destroy_eq(struct mana_port_context *apc) { + struct mana_context *ac =3D apc->ac; struct gdma_context *gc =3D ac->gdma_dev->gdma_context; struct gdma_queue *eq; int i; =20 - if (!ac->eqs) + if (!apc->eqs) return; =20 - debugfs_remove_recursive(ac->mana_eqs_debugfs); - ac->mana_eqs_debugfs =3D NULL; + debugfs_remove_recursive(apc->mana_eqs_debugfs); + apc->mana_eqs_debugfs =3D NULL; =20 - for (i =3D 0; i < gc->max_num_queues; i++) { - eq =3D ac->eqs[i].eq; + for (i =3D 0; i < apc->num_queues; i++) { + eq =3D apc->eqs[i].eq; if (!eq) continue; =20 mana_gd_destroy_queue(gc, eq); } =20 - kfree(ac->eqs); - ac->eqs =3D NULL; + kfree(apc->eqs); + apc->eqs =3D NULL; } +EXPORT_SYMBOL_NS(mana_destroy_eq, "NET_MANA"); =20 -static void mana_create_eq_debugfs(struct mana_context *ac, int i) +static void mana_create_eq_debugfs(struct mana_port_context *apc, int i) { - struct mana_eq eq =3D ac->eqs[i]; + struct mana_eq eq =3D apc->eqs[i]; char eqnum[32]; =20 sprintf(eqnum, "eq%d", i); - eq.mana_eq_debugfs =3D debugfs_create_dir(eqnum, ac->mana_eqs_debugfs); + eq.mana_eq_debugfs =3D debugfs_create_dir(eqnum, apc->mana_eqs_debugfs); debugfs_create_u32("head", 0400, eq.mana_eq_debugfs, &eq.eq->head); debugfs_create_u32("tail", 0400, eq.mana_eq_debugfs, &eq.eq->tail); debugfs_create_file("eq_dump", 0400, eq.mana_eq_debugfs, eq.eq, &mana_dbg= _q_fops); } =20 -static int mana_create_eq(struct mana_context *ac) +int mana_create_eq(struct mana_port_context *apc) { - struct gdma_dev *gd =3D ac->gdma_dev; + struct gdma_dev *gd =3D apc->ac->gdma_dev; struct gdma_context *gc =3D gd->gdma_context; struct gdma_queue_spec spec =3D {}; int err; int i; =20 - ac->eqs =3D kzalloc_objs(struct mana_eq, gc->max_num_queues); - if (!ac->eqs) + if (WARN_ON(apc->eqs)) + return -EEXIST; + apc->eqs =3D kzalloc_objs(struct mana_eq, apc->num_queues); + if (!apc->eqs) return -ENOMEM; =20 spec.type =3D GDMA_EQ; spec.monitor_avl_buf =3D false; spec.queue_size =3D EQ_SIZE; spec.eq.callback =3D NULL; - spec.eq.context =3D ac->eqs; + spec.eq.context =3D apc->eqs; spec.eq.log2_throttle_limit =3D LOG2_EQ_THROTTLE; =20 - ac->mana_eqs_debugfs =3D debugfs_create_dir("EQs", gc->mana_pci_debugfs); + apc->mana_eqs_debugfs =3D + debugfs_create_dir("EQs", apc->mana_port_debugfs); =20 - for (i =3D 0; i < gc->max_num_queues; i++) { + for (i =3D 0; i < apc->num_queues; i++) { spec.eq.msix_index =3D (i + 1) % gc->num_msix_usable; - err =3D mana_gd_create_mana_eq(gd, &spec, &ac->eqs[i].eq); + err =3D mana_gd_create_mana_eq(gd, &spec, &apc->eqs[i].eq); if (err) { dev_err(gc->dev, "Failed to create EQ %d : %d\n", i, err); goto out; } - mana_create_eq_debugfs(ac, i); + mana_create_eq_debugfs(apc, i); } =20 return 0; out: - mana_destroy_eq(ac); + mana_destroy_eq(apc); return err; } +EXPORT_SYMBOL_NS(mana_create_eq, "NET_MANA"); =20 static int mana_fence_rq(struct mana_port_context *apc, struct mana_rxq *r= xq) { @@ -2459,7 +2466,7 @@ static int mana_create_txq(struct mana_port_context *= apc, spec.monitor_avl_buf =3D false; spec.queue_size =3D cq_size; spec.cq.callback =3D mana_schedule_napi; - spec.cq.parent_eq =3D ac->eqs[i].eq; + spec.cq.parent_eq =3D apc->eqs[i].eq; spec.cq.context =3D cq; err =3D mana_gd_create_mana_wq_cq(gd, &spec, &cq->gdma_cq); if (err) @@ -2852,13 +2859,12 @@ static void mana_create_rxq_debugfs(struct mana_por= t_context *apc, int idx) static int mana_add_rx_queues(struct mana_port_context *apc, struct net_device *ndev) { - struct mana_context *ac =3D apc->ac; struct mana_rxq *rxq; int err =3D 0; int i; =20 for (i =3D 0; i < apc->num_queues; i++) { - rxq =3D mana_create_rxq(apc, i, &ac->eqs[i], ndev); + rxq =3D mana_create_rxq(apc, i, &apc->eqs[i], ndev); if (!rxq) { err =3D -ENOMEM; netdev_err(ndev, "Failed to create rxq %d : %d\n", i, err); @@ -2877,9 +2883,8 @@ static int mana_add_rx_queues(struct mana_port_contex= t *apc, return err; } =20 -static void mana_destroy_vport(struct mana_port_context *apc) +static void mana_destroy_rxqs(struct mana_port_context *apc) { - struct gdma_dev *gd =3D apc->ac->gdma_dev; struct mana_rxq *rxq; u32 rxq_idx; =20 @@ -2891,8 +2896,12 @@ static void mana_destroy_vport(struct mana_port_cont= ext *apc) mana_destroy_rxq(apc, rxq, true); apc->rxqs[rxq_idx] =3D NULL; } +} + +static void mana_destroy_vport(struct mana_port_context *apc) +{ + struct gdma_dev *gd =3D apc->ac->gdma_dev; =20 - mana_destroy_txq(apc); mana_uncfg_vport(apc); =20 if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode) @@ -2913,11 +2922,7 @@ static int mana_create_vport(struct mana_port_contex= t *apc, return err; } =20 - err =3D mana_cfg_vport(apc, gd->pdid, gd->doorbell); - if (err) - return err; - - return mana_create_txq(apc, net); + return mana_cfg_vport(apc, gd->pdid, gd->doorbell, false); } =20 static int mana_rss_table_alloc(struct mana_port_context *apc) @@ -3220,21 +3225,36 @@ int mana_alloc_queues(struct net_device *ndev) =20 err =3D mana_create_vport(apc, ndev); if (err) { - netdev_err(ndev, "Failed to create vPort %u : %d\n", apc->port_idx, err); + netdev_err(ndev, "Failed to create vPort %u : %d\n", + apc->port_idx, err); return err; } =20 + err =3D mana_create_eq(apc); + if (err) { + netdev_err(ndev, "Failed to create EQ on vPort %u: %d\n", + apc->port_idx, err); + goto destroy_vport; + } + + err =3D mana_create_txq(apc, ndev); + if (err) { + netdev_err(ndev, "Failed to create TXQ on vPort %u: %d\n", + apc->port_idx, err); + goto destroy_eq; + } + err =3D netif_set_real_num_tx_queues(ndev, apc->num_queues); if (err) { netdev_err(ndev, "netif_set_real_num_tx_queues () failed for ndev with num_queues %u = : %d\n", apc->num_queues, err); - goto destroy_vport; + goto destroy_txq; } =20 err =3D mana_add_rx_queues(apc, ndev); if (err) - goto destroy_vport; + goto destroy_rxq; =20 apc->rss_state =3D apc->num_queues > 1 ? TRI_STATE_TRUE : TRI_STATE_FALSE; =20 @@ -3243,7 +3263,7 @@ int mana_alloc_queues(struct net_device *ndev) netdev_err(ndev, "netif_set_real_num_rx_queues () failed for ndev with num_queues %u = : %d\n", apc->num_queues, err); - goto destroy_vport; + goto destroy_rxq; } =20 mana_rss_table_init(apc); @@ -3251,19 +3271,25 @@ int mana_alloc_queues(struct net_device *ndev) err =3D mana_config_rss(apc, TRI_STATE_TRUE, true, true); if (err) { netdev_err(ndev, "Failed to configure RSS table: %d\n", err); - goto destroy_vport; + goto destroy_rxq; } =20 if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode) { err =3D mana_pf_register_filter(apc); if (err) - goto destroy_vport; + goto destroy_rxq; } =20 mana_chn_setxdp(apc, mana_xdp_get(apc)); =20 return 0; =20 +destroy_rxq: + mana_destroy_rxqs(apc); +destroy_txq: + mana_destroy_txq(apc); +destroy_eq: + mana_destroy_eq(apc); destroy_vport: mana_destroy_vport(apc); return err; @@ -3368,6 +3394,9 @@ static int mana_dealloc_queues(struct net_device *nde= v) mana_fence_rqs(apc); =20 /* Even in err case, still need to cleanup the vPort */ + mana_destroy_rxqs(apc); + mana_destroy_txq(apc); + mana_destroy_eq(apc); mana_destroy_vport(apc); =20 return 0; @@ -3688,12 +3717,6 @@ int mana_probe(struct gdma_dev *gd, bool resuming) =20 INIT_DELAYED_WORK(&ac->gf_stats_work, mana_gf_stats_work_handler); =20 - err =3D mana_create_eq(ac); - if (err) { - dev_err(dev, "Failed to create EQs: %d\n", err); - goto out; - } - err =3D mana_query_device_cfg(ac, MANA_MAJOR_VERSION, MANA_MINOR_VERSION, MANA_MICRO_VERSION, &num_ports, &bm_hostmode); if (err) @@ -3838,8 +3861,6 @@ void mana_remove(struct gdma_dev *gd, bool suspending) free_netdev(ndev); } =20 - mana_destroy_eq(ac); - if (ac->per_port_queue_reset_wq) { destroy_workqueue(ac->per_port_queue_reset_wq); ac->per_port_queue_reset_wq =3D NULL; diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/n= et/ethernet/microsoft/mana/mana_ethtool.c index 04350973e19e..4633acc976f0 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c @@ -454,6 +454,11 @@ static int mana_set_coalesce(struct net_device *ndev, return err; } =20 +/* mana_set_channels - change the number of queues on a port + * + * Returns -EBUSY if RDMA holds the vport with EQs sized to the + * current num_queues. + */ static int mana_set_channels(struct net_device *ndev, struct ethtool_channels *channels) { @@ -462,10 +467,22 @@ static int mana_set_channels(struct net_device *ndev, unsigned int old_count =3D apc->num_queues; int err; =20 + /* Set channel_changing to block RDMA from grabbing the vport + * during the detach/attach window. mana_cfg_vport() checks + * this flag under vport_mutex and returns -EBUSY if set. + */ + mutex_lock(&apc->vport_mutex); + if (!apc->port_is_up && apc->vport_use_count) { + mutex_unlock(&apc->vport_mutex); + return -EBUSY; + } + apc->channel_changing =3D true; + mutex_unlock(&apc->vport_mutex); + err =3D mana_pre_alloc_rxbufs(apc, ndev->mtu, new_count); if (err) { netdev_err(ndev, "Insufficient memory for new allocations"); - return err; + goto clear_flag; } =20 err =3D mana_detach(ndev, false); @@ -483,6 +500,10 @@ static int mana_set_channels(struct net_device *ndev, =20 out: mana_pre_dealloc_rxbufs(apc); +clear_flag: + mutex_lock(&apc->vport_mutex); + apc->channel_changing =3D false; + mutex_unlock(&apc->vport_mutex); return err; } =20 diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h index d9c27310fd04..5a9b94e0ef34 100644 --- a/include/net/mana/mana.h +++ b/include/net/mana/mana.h @@ -480,8 +480,6 @@ struct mana_context { u8 bm_hostmode; =20 struct mana_ethtool_hc_stats hc_stats; - struct mana_eq *eqs; - struct dentry *mana_eqs_debugfs; struct workqueue_struct *per_port_queue_reset_wq; /* Workqueue for querying hardware stats */ struct delayed_work gf_stats_work; @@ -501,6 +499,9 @@ struct mana_port_context { =20 u8 mac_addr[ETH_ALEN]; =20 + struct mana_eq *eqs; + struct dentry *mana_eqs_debugfs; + enum TRI_STATE rss_state; =20 mana_handle_t default_rxobj; @@ -547,6 +548,12 @@ struct mana_port_context { struct mutex vport_mutex; int vport_use_count; =20 + /* Set by mana_set_channels() under vport_mutex to block RDMA + * from grabbing the vport during the detach/attach window. + * Checked by mana_cfg_vport() when called from the RDMA path. + */ + bool channel_changing; + /* Net shaper handle*/ struct net_shaper_handle handle; =20 @@ -1040,8 +1047,10 @@ void mana_destroy_wq_obj(struct mana_port_context *a= pc, u32 wq_type, mana_handle_t wq_obj); =20 int mana_cfg_vport(struct mana_port_context *apc, u32 protection_dom_id, - u32 doorbell_pg_id); + u32 doorbell_pg_id, bool check_channel_changing); void mana_uncfg_vport(struct mana_port_context *apc); +int mana_create_eq(struct mana_port_context *apc); +void mana_destroy_eq(struct mana_port_context *apc); =20 struct net_device *mana_get_primary_netdev(struct mana_context *ac, u32 port_index, --=20 2.43.0 From nobody Sun May 24 19:33:19 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B90E1230BEC; Sat, 23 May 2026 02:03:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779501796; cv=none; b=YRTBBQPOyrSE8+lpxFbloK2/fmpH15D/SJPgD9r/bWEo7bjnVcnnzQT92MC6pE+7VoJ6i/j7HiA139ZGNueQrqp+DJR5mi9711d+UjBTlNlb/87J8vxCvu+S8MsIcny7EVCBpk0F6vvYe1XD8sybHSus1+dyPeCdesMkcaahhCc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779501796; c=relaxed/simple; bh=GH3t7Poja0Yvnu9tzPHLKtImggbUlduMiW3pBEJzcPs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TgAjX2L20fxZKnnoQjtsznyuJY/cOePM5Fk3WArPT6GsUe2MAu3YMlycf1xaxSdIfjKLklevJpAzMFCb5izn5U+iaoPWG8C3kLyaAqPvBkoVRKgzV57Y6KUrgr5E9n+BI5tdSKy3niE/rorKa0TlV2auA7d3YAcZTyaV5mJVDPM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Received: by linux.microsoft.com (Postfix, from userid 1202) id ADF2E20B7169; Fri, 22 May 2026 19:03:06 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com ADF2E20B7169 From: Long Li To: Long Li , Konstantin Taranov , Jakub Kicinski , "David S . Miller" , Paolo Abeni , Eric Dumazet , Andrew Lunn , Jason Gunthorpe , Leon Romanovsky , Haiyang Zhang , "K . Y . Srinivasan" , Wei Liu , Dexuan Cui , shradhagupta@linux.microsoft.com Cc: Simon Horman , netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v11 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs Date: Fri, 22 May 2026 19:02:52 -0700 Message-ID: <20260523020258.1107742-3-longli@microsoft.com> X-Mailer: git-send-email 2.43.7 In-Reply-To: <20260523020258.1107742-1-longli@microsoft.com> References: <20260523020258.1107742-1-longli@microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When querying the device, adjust the max number of queues to allow dedicated MSI-X vectors for each vPort. The per-vPort queue count is clamped towards MANA_DEF_NUM_QUEUES but will not exceed the hardware maximum reported by the device. MSI-X sharing among vPorts is enabled when there are not enough MSI-X vectors for dedicated allocation, or when the platform does not support dynamic MSI-X allocation (in which case all vectors are pre-allocated at probe time and sharing is always used). The msi_sharing flag is reset at the top of mana_gd_query_max_resources() so it is recomputed from current hardware state on each probe or resume cycle. Clamp apc->max_queues to gc->max_num_queues_vport in mana_init_port() so that on resume, if max_num_queues_vport has decreased due to fewer MSI-X vectors, num_queues is reduced accordingly before EQ allocation. A device reporting zero ports now results in a fatal probe error since the per-vPort MSI-X math requires at least one port. Rename mana_query_device_cfg() to mana_gd_query_device_cfg() as it is used at GDMA device probe time for querying device capabilities. Signed-off-by: Long Li --- .../net/ethernet/microsoft/mana/gdma_main.c | 71 ++++++++++++++++++- drivers/net/ethernet/microsoft/mana/mana_en.c | 45 +++++++----- include/net/mana/gdma.h | 13 +++- 3 files changed, 107 insertions(+), 22 deletions(-) diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/= ethernet/microsoft/mana/gdma_main.c index 712a0881d720..e31eeca3563d 100644 --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c @@ -179,8 +179,21 @@ static int mana_gd_query_max_resources(struct pci_dev = *pdev) struct gdma_context *gc =3D pci_get_drvdata(pdev); struct gdma_query_max_resources_resp resp =3D {}; struct gdma_general_req req =3D {}; + unsigned int max_num_queues; + u8 bm_hostmode; + u16 num_ports; int err; =20 + /* Reset msi_sharing so it is recomputed from current hardware + * state. On resume, num_online_cpus() or num_msix_usable may + * have changed, making dedicated MSI-X feasible where it was + * not before. Only reset on platforms that support dynamic + * MSI-X allocation; on non-dyn platforms msi_sharing is + * unconditionally true (set in mana_gd_setup_hwc_irqs). + */ + if (pci_msix_can_alloc_dyn(to_pci_dev(gc->dev))) + gc->msi_sharing =3D false; + mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_MAX_RESOURCES, sizeof(req), sizeof(resp)); =20 @@ -232,6 +245,45 @@ static int mana_gd_query_max_resources(struct pci_dev = *pdev) debugfs_create_u32("max_num_queues", 0400, gc->mana_pci_debugfs, &gc->max_num_queues); =20 + err =3D mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION, + MANA_MINOR_VERSION, + MANA_MICRO_VERSION, + &num_ports, &bm_hostmode); + if (err) + return err; + + if (!num_ports) { + dev_err(gc->dev, "Failed to detect any vPort\n"); + return -EINVAL; + } + + /* + * Adjust the per-vPort max queue count to allow dedicated + * MSIx for each vPort. Clamp to no less than MANA_DEF_NUM_QUEUES. + */ + max_num_queues =3D (gc->num_msix_usable - 1) / num_ports; + max_num_queues =3D rounddown_pow_of_two(max(max_num_queues, 1U)); + if (max_num_queues < MANA_DEF_NUM_QUEUES) + max_num_queues =3D MANA_DEF_NUM_QUEUES; + + /* + * Use dedicated MSIx for EQs whenever possible, use MSIx sharing for + * Ethernet EQs when (max_num_queues * num_ports > num_msix_usable - 1). + */ + max_num_queues =3D min(gc->max_num_queues, max_num_queues); + if (max_num_queues * num_ports > gc->num_msix_usable - 1) + gc->msi_sharing =3D true; + + /* If MSI is shared, use max allowed value */ + if (gc->msi_sharing) + gc->max_num_queues_vport =3D min(gc->num_msix_usable - 1, + gc->max_num_queues); + else + gc->max_num_queues_vport =3D max_num_queues; + + dev_info(gc->dev, "MSI sharing mode %d max queues %d\n", + gc->msi_sharing, gc->max_num_queues); + return 0; } =20 @@ -1901,6 +1953,7 @@ static int mana_gd_setup_hwc_irqs(struct pci_dev *pde= v) /* Need 1 interrupt for HWC */ max_irqs =3D min(num_online_cpus(), MANA_MAX_NUM_QUEUES) + 1; min_irqs =3D 2; + gc->msi_sharing =3D true; } =20 nvec =3D pci_alloc_irq_vectors(pdev, min_irqs, max_irqs, PCI_IRQ_MSIX); @@ -1979,6 +2032,8 @@ static void mana_gd_remove_irqs(struct pci_dev *pdev) =20 pci_free_irq_vectors(pdev); =20 + bitmap_free(gc->msi_bitmap); + gc->msi_bitmap =3D NULL; gc->max_num_msix =3D 0; gc->num_msix_usable =3D 0; } @@ -2018,6 +2073,10 @@ static int mana_gd_setup(struct pci_dev *pdev) if (err) goto destroy_hwc; =20 + err =3D mana_gd_detect_devices(pdev); + if (err) + goto destroy_hwc; + err =3D mana_gd_query_max_resources(pdev); if (err) goto destroy_hwc; @@ -2028,9 +2087,15 @@ static int mana_gd_setup(struct pci_dev *pdev) goto destroy_hwc; } =20 - err =3D mana_gd_detect_devices(pdev); - if (err) - goto destroy_hwc; + if (!gc->msi_sharing) { + gc->msi_bitmap =3D bitmap_zalloc(gc->num_msix_usable, GFP_KERNEL); + if (!gc->msi_bitmap) { + err =3D -ENOMEM; + goto destroy_hwc; + } + /* Set bit for HWC */ + set_bit(0, gc->msi_bitmap); + } =20 dev_dbg(&pdev->dev, "mana gdma setup successful\n"); return 0; diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/et= hernet/microsoft/mana/mana_en.c index 7c776f115f5a..571648007378 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -1007,10 +1007,9 @@ static int mana_init_port_context(struct mana_port_c= ontext *apc) return !apc->rxqs ? -ENOMEM : 0; } =20 -static int mana_send_request(struct mana_context *ac, void *in_buf, - u32 in_len, void *out_buf, u32 out_len) +static int gdma_mana_send_request(struct gdma_context *gc, void *in_buf, + u32 in_len, void *out_buf, u32 out_len) { - struct gdma_context *gc =3D ac->gdma_dev->gdma_context; struct gdma_resp_hdr *resp =3D out_buf; struct gdma_req_hdr *req =3D in_buf; struct device *dev =3D gc->dev; @@ -1044,6 +1043,14 @@ static int mana_send_request(struct mana_context *ac= , void *in_buf, return 0; } =20 +static int mana_send_request(struct mana_context *ac, void *in_buf, + u32 in_len, void *out_buf, u32 out_len) +{ + struct gdma_context *gc =3D ac->gdma_dev->gdma_context; + + return gdma_mana_send_request(gc, in_buf, in_len, out_buf, out_len); +} + static int mana_verify_resp_hdr(const struct gdma_resp_hdr *resp_hdr, const enum mana_command_code expected_code, const u32 min_size) @@ -1177,11 +1184,10 @@ static void mana_pf_deregister_filter(struct mana_p= ort_context *apc) err, resp.hdr.status); } =20 -static int mana_query_device_cfg(struct mana_context *ac, u32 proto_major_= ver, - u32 proto_minor_ver, u32 proto_micro_ver, - u16 *max_num_vports, u8 *bm_hostmode) +int mana_gd_query_device_cfg(struct gdma_context *gc, u32 proto_major_ver, + u32 proto_minor_ver, u32 proto_micro_ver, + u16 *max_num_vports, u8 *bm_hostmode) { - struct gdma_context *gc =3D ac->gdma_dev->gdma_context; struct mana_query_device_cfg_resp resp =3D {}; struct mana_query_device_cfg_req req =3D {}; struct device *dev =3D gc->dev; @@ -1196,7 +1202,8 @@ static int mana_query_device_cfg(struct mana_context = *ac, u32 proto_major_ver, req.proto_minor_ver =3D proto_minor_ver; req.proto_micro_ver =3D proto_micro_ver; =20 - err =3D mana_send_request(ac, &req, sizeof(req), &resp, sizeof(resp)); + err =3D gdma_mana_send_request(gc, &req, sizeof(req), + &resp, sizeof(resp)); if (err) { dev_err(dev, "Failed to query config: %d", err); return err; @@ -1230,8 +1237,6 @@ static int mana_query_device_cfg(struct mana_context = *ac, u32 proto_major_ver, else *bm_hostmode =3D 0; =20 - debugfs_create_u16("adapter-MTU", 0400, gc->mana_pci_debugfs, &gc->adapte= r_mtu); - return 0; } =20 @@ -3184,6 +3189,8 @@ static int mana_init_port(struct net_device *ndev) max_queues =3D min_t(u32, max_txq, max_rxq); if (apc->max_queues > max_queues) apc->max_queues =3D max_queues; + if (apc->max_queues > gc->max_num_queues_vport) + apc->max_queues =3D gc->max_num_queues_vport; =20 if (apc->num_queues > apc->max_queues) apc->num_queues =3D apc->max_queues; @@ -3442,7 +3449,7 @@ static int mana_probe_port(struct mana_context *ac, i= nt port_idx, int err; =20 ndev =3D alloc_etherdev_mq(sizeof(struct mana_port_context), - gc->max_num_queues); + gc->max_num_queues_vport); if (!ndev) return -ENOMEM; =20 @@ -3451,9 +3458,9 @@ static int mana_probe_port(struct mana_context *ac, i= nt port_idx, apc =3D netdev_priv(ndev); apc->ac =3D ac; apc->ndev =3D ndev; - apc->max_queues =3D gc->max_num_queues; + apc->max_queues =3D gc->max_num_queues_vport; /* Use MANA_DEF_NUM_QUEUES as default, still honoring the HW limit */ - apc->num_queues =3D min(gc->max_num_queues, MANA_DEF_NUM_QUEUES); + apc->num_queues =3D min(gc->max_num_queues_vport, MANA_DEF_NUM_QUEUES); apc->tx_queue_size =3D DEF_TX_BUFFERS_PER_QUEUE; apc->rx_queue_size =3D DEF_RX_BUFFERS_PER_QUEUE; apc->port_handle =3D INVALID_MANA_HANDLE; @@ -3717,13 +3724,18 @@ int mana_probe(struct gdma_dev *gd, bool resuming) =20 INIT_DELAYED_WORK(&ac->gf_stats_work, mana_gf_stats_work_handler); =20 - err =3D mana_query_device_cfg(ac, MANA_MAJOR_VERSION, MANA_MINOR_VERSION, - MANA_MICRO_VERSION, &num_ports, &bm_hostmode); + err =3D mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION, + MANA_MINOR_VERSION, + MANA_MICRO_VERSION, + &num_ports, &bm_hostmode); if (err) goto out; =20 ac->bm_hostmode =3D bm_hostmode; =20 + debugfs_create_u16("adapter-MTU", 0400, + gc->mana_pci_debugfs, &gc->adapter_mtu); + if (!resuming) { ac->num_ports =3D num_ports; } else { @@ -3737,9 +3749,6 @@ int mana_probe(struct gdma_dev *gd, bool resuming) enable_work(&ac->link_change_work); } =20 - if (ac->num_ports =3D=3D 0) - dev_err(dev, "Failed to detect any vPort\n"); - if (ac->num_ports > MAX_PORTS_IN_MANA_DEV) ac->num_ports =3D MAX_PORTS_IN_MANA_DEV; =20 diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h index 70d62bc32837..145cc59dfc19 100644 --- a/include/net/mana/gdma.h +++ b/include/net/mana/gdma.h @@ -399,8 +399,10 @@ struct gdma_context { struct device *dev; struct dentry *mana_pci_debugfs; =20 - /* Per-vPort max number of queues */ + /* Hardware max number of queues */ unsigned int max_num_queues; + /* Per-vPort max number of queues */ + unsigned int max_num_queues_vport; unsigned int max_num_msix; unsigned int num_msix_usable; struct xarray irq_contexts; @@ -447,6 +449,12 @@ struct gdma_context { struct workqueue_struct *service_wq; =20 unsigned long flags; + + /* Indicate if this device is sharing MSI for EQs on MANA */ + bool msi_sharing; + + /* Bitmap tracks where MSI is allocated when it is not shared for EQs */ + unsigned long *msi_bitmap; }; =20 static inline bool mana_gd_is_mana(struct gdma_dev *gd) @@ -1019,4 +1027,7 @@ int mana_gd_resume(struct pci_dev *pdev); =20 bool mana_need_log(struct gdma_context *gc, int err); =20 +int mana_gd_query_device_cfg(struct gdma_context *gc, u32 proto_major_ver, + u32 proto_minor_ver, u32 proto_micro_ver, + u16 *max_num_vports, u8 *bm_hostmode); #endif /* _GDMA_H */ --=20 2.43.0 From nobody Sun May 24 19:33:19 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 01070232395; Sat, 23 May 2026 02:03:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779501796; cv=none; b=a6PeXJtK+5VkceBVL1r1wJOQA+jUacZjCrOQmRXIoj2IPKveNXMVywJFrkamwUm8jYWNK9Gd3mCs5d93gfY85HSmCfIfGJwvRb3TQEzlxZNqW3v6CCNOsLKKXM63G4LPXa6K990UoM7S9JOiSBI1i3rXl/s/xoB4Qmdz4sEpa7U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779501796; c=relaxed/simple; bh=lquAC8/33CsC3EyHLX8f770Xra+nw+wUwLsGwF1+UPE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rkvzX7ey3ydE9W/qXHegXILege30QUPo7JyRIiaWPUY2MUtAGFP1lJ6JlxM+zEluR6tb/qeTlf4FIHs+q8O8R6pxuMnMjC9m/NkzLI4wt1lBfq7jZyFXJJ60owqKpck2RD29sAxY4Mntz939WZiJ3dXbov5ZJQDAOkX+OoLIIU8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Received: by linux.microsoft.com (Postfix, from userid 1202) id BA39E20B716A; Fri, 22 May 2026 19:03:06 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com BA39E20B716A From: Long Li To: Long Li , Konstantin Taranov , Jakub Kicinski , "David S . Miller" , Paolo Abeni , Eric Dumazet , Andrew Lunn , Jason Gunthorpe , Leon Romanovsky , Haiyang Zhang , "K . Y . Srinivasan" , Wei Liu , Dexuan Cui , shradhagupta@linux.microsoft.com Cc: Simon Horman , netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v11 3/6] net: mana: Introduce GIC context with refcounting for interrupt management Date: Fri, 22 May 2026 19:02:53 -0700 Message-ID: <20260523020258.1107742-4-longli@microsoft.com> X-Mailer: git-send-email 2.43.7 In-Reply-To: <20260523020258.1107742-1-longli@microsoft.com> References: <20260523020258.1107742-1-longli@microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To allow Ethernet EQs to use dedicated or shared MSI-X vectors and RDMA EQs to share the same MSI-X, introduce a GIC (GDMA IRQ Context) with reference counting. This allows the driver to create an interrupt context on an assigned or unassigned MSI-X vector and share it across multiple EQ consumers. Signed-off-by: Long Li --- .../net/ethernet/microsoft/mana/gdma_main.c | 169 ++++++++++++++++++ include/net/mana/gdma.h | 12 ++ 2 files changed, 181 insertions(+) diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/= ethernet/microsoft/mana/gdma_main.c index e31eeca3563d..0541d914f27d 100644 --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c @@ -864,6 +864,10 @@ static int mana_gd_register_irq(struct gdma_queue *que= ue, } =20 queue->eq.msix_index =3D msi_index; + /* The caller acquired a GIC reference via mana_gd_get_gic(). + * That refcount prevents mana_gd_put_gic() from erasing this + * irq_contexts entry concurrently. + */ gic =3D xa_load(&gc->irq_contexts, msi_index); if (WARN_ON(!gic)) return -EINVAL; @@ -891,6 +895,10 @@ static void mana_gd_deregister_irq(struct gdma_queue *= queue) if (WARN_ON(msix_index >=3D gc->num_msix_usable)) return; =20 + /* The caller releases the GIC reference via mana_gd_put_gic() + * after this function returns. The refcount guarantees this + * irq_contexts entry is still valid. + */ gic =3D xa_load(&gc->irq_contexts, msix_index); if (WARN_ON(!gic)) return; @@ -1672,6 +1680,166 @@ static irqreturn_t mana_gd_intr(int irq, void *arg) return IRQ_HANDLED; } =20 +void mana_gd_put_gic(struct gdma_context *gc, bool use_msi_bitmap, int msi) +{ + struct pci_dev *dev =3D to_pci_dev(gc->dev); + struct gdma_irq_context *gic; + struct msi_map irq_map; + int irq; + + mutex_lock(&gc->gic_mutex); + + gic =3D xa_load(&gc->irq_contexts, msi); + if (WARN_ON(!gic)) { + mutex_unlock(&gc->gic_mutex); + return; + } + + if (use_msi_bitmap) + gic->bitmap_refs--; + + if (use_msi_bitmap && gic->bitmap_refs =3D=3D 0) + clear_bit(msi, gc->msi_bitmap); + + if (!refcount_dec_and_test(&gic->refcount)) + goto out; + + irq =3D gic->irq; + + irq_update_affinity_hint(irq, NULL); + free_irq(irq, gic); + + if (gic->dyn_msix) { + irq_map.virq =3D irq; + irq_map.index =3D msi; + pci_msix_free_irq(dev, irq_map); + } + + xa_erase(&gc->irq_contexts, msi); + kfree(gic); + +out: + mutex_unlock(&gc->gic_mutex); +} +EXPORT_SYMBOL_NS(mana_gd_put_gic, "NET_MANA"); + +/* + * Get a GIC (GDMA IRQ Context) on a MSI vector + * a MSI can be shared between different EQs, this function supports setti= ng + * up separate MSIs using a bitmap, or directly using the MSI index + * + * @use_msi_bitmap: + * True if MSI is assigned by this function on available slots from bitmap. + * False if MSI is passed from *msi_requested + */ +struct gdma_irq_context *mana_gd_get_gic(struct gdma_context *gc, + bool use_msi_bitmap, + int *msi_requested) +{ + struct pci_dev *dev =3D to_pci_dev(gc->dev); + struct gdma_irq_context *gic; + struct msi_map irq_map =3D { }; + int irq; + int msi; + int err; + + mutex_lock(&gc->gic_mutex); + + if (use_msi_bitmap) { + msi =3D find_first_zero_bit(gc->msi_bitmap, gc->num_msix_usable); + if (msi >=3D gc->num_msix_usable) { + dev_err(gc->dev, "No free MSI vectors available\n"); + gic =3D ERR_PTR(-ENOSPC); + goto out; + } + *msi_requested =3D msi; + } else { + msi =3D *msi_requested; + } + + gic =3D xa_load(&gc->irq_contexts, msi); + if (gic) { + refcount_inc(&gic->refcount); + if (use_msi_bitmap) { + gic->bitmap_refs++; + set_bit(msi, gc->msi_bitmap); + } + goto out; + } + + irq =3D pci_irq_vector(dev, msi); + if (irq =3D=3D -EINVAL) { + irq_map =3D pci_msix_alloc_irq_at(dev, msi, NULL); + if (!irq_map.virq) { + err =3D irq_map.index; + dev_err(gc->dev, + "Failed to alloc irq_map msi %d err %d\n", + msi, err); + gic =3D ERR_PTR(err); + goto out; + } + irq =3D irq_map.virq; + msi =3D irq_map.index; + *msi_requested =3D msi; + } + + gic =3D kzalloc(sizeof(*gic), GFP_KERNEL); + if (!gic) { + gic =3D ERR_PTR(-ENOMEM); + if (irq_map.virq) + pci_msix_free_irq(dev, irq_map); + goto out; + } + + gic->handler =3D mana_gd_process_eq_events; + gic->msi =3D msi; + gic->irq =3D irq; + INIT_LIST_HEAD(&gic->eq_list); + spin_lock_init(&gic->lock); + + if (!gic->msi) + snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_hwc@pci:%s", + pci_name(dev)); + else + snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_msi%d@pci:%s", + gic->msi, pci_name(dev)); + + err =3D request_irq(irq, mana_gd_intr, 0, gic->name, gic); + if (err) { + dev_err(gc->dev, "Failed to request irq %d %s\n", + irq, gic->name); + kfree(gic); + gic =3D ERR_PTR(err); + if (irq_map.virq) + pci_msix_free_irq(dev, irq_map); + goto out; + } + + gic->dyn_msix =3D !!irq_map.virq; + refcount_set(&gic->refcount, 1); + gic->bitmap_refs =3D use_msi_bitmap ? 1 : 0; + + err =3D xa_err(xa_store(&gc->irq_contexts, msi, gic, GFP_KERNEL)); + if (err) { + dev_err(gc->dev, "Failed to store irq context for msi %d: %d\n", + msi, err); + free_irq(irq, gic); + kfree(gic); + gic =3D ERR_PTR(err); + if (irq_map.virq) + pci_msix_free_irq(dev, irq_map); + goto out; + } + + if (use_msi_bitmap) + set_bit(msi, gc->msi_bitmap); + +out: + mutex_unlock(&gc->gic_mutex); + return gic; +} +EXPORT_SYMBOL_NS(mana_gd_get_gic, "NET_MANA"); + int mana_gd_alloc_res_map(u32 res_avail, struct gdma_resource *r) { r->map =3D bitmap_zalloc(res_avail, GFP_KERNEL); @@ -2173,6 +2341,7 @@ static int mana_gd_probe(struct pci_dev *pdev, const = struct pci_device_id *ent) goto release_region; =20 mutex_init(&gc->eq_test_event_mutex); + mutex_init(&gc->gic_mutex); pci_set_drvdata(pdev, gc); gc->bar0_pa =3D pci_resource_start(pdev, 0); gc->bar0_size =3D pci_resource_len(pdev, 0); diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h index 145cc59dfc19..e3ee85c614ec 100644 --- a/include/net/mana/gdma.h +++ b/include/net/mana/gdma.h @@ -388,6 +388,11 @@ struct gdma_irq_context { spinlock_t lock; struct list_head eq_list; char name[MANA_IRQ_NAME_SZ]; + unsigned int msi; + unsigned int irq; + refcount_t refcount; + unsigned int bitmap_refs; + bool dyn_msix; }; =20 enum gdma_context_flags { @@ -450,6 +455,9 @@ struct gdma_context { =20 unsigned long flags; =20 + /* Protect access to GIC context */ + struct mutex gic_mutex; + /* Indicate if this device is sharing MSI for EQs on MANA */ bool msi_sharing; =20 @@ -1027,6 +1035,10 @@ int mana_gd_resume(struct pci_dev *pdev); =20 bool mana_need_log(struct gdma_context *gc, int err); =20 +struct gdma_irq_context *mana_gd_get_gic(struct gdma_context *gc, + bool use_msi_bitmap, + int *msi_requested); +void mana_gd_put_gic(struct gdma_context *gc, bool use_msi_bitmap, int msi= ); int mana_gd_query_device_cfg(struct gdma_context *gc, u32 proto_major_ver, u32 proto_minor_ver, u32 proto_micro_ver, u16 *max_num_vports, u8 *bm_hostmode); --=20 2.43.0 From nobody Sun May 24 19:33:19 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 010E3234964; Sat, 23 May 2026 02:03:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779501796; cv=none; b=cgyCP9KxAyiT4XyBaVSfcNUD25luAloztkmIKuyxyncJ+wzzdpUIxm3lpF1mK2uphCusQ4WHb05GfWeuGelQcME05In+dOQHz9MRjPvbFeoyI8yPu0G56G2kMs75ABZN9Z6Dn7xcwOHBB4FR8dTMn9BQeMVW4XgXOqeOLwL8alI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779501796; c=relaxed/simple; bh=xwRcfFRPfX46OLlNxY/cL1N4zWf0Xs2LLoMpF/+I/Qg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IeNYqXPm2Njg2vwtwDhJjQUaSM56Q2mUYWe6jongYg2md9E4Kzz6Cv5NiZK+Q3hIXR/naV0RiDRBshkeDN02edCGjgfBzzK7zHXh/ZxuL/rZHddp/7U7k+Q4255Tkd+uhyQ9sqM3HgvGFuBRkZQWl/qKREJv8jT7QZvyhD4rTEA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Received: by linux.microsoft.com (Postfix, from userid 1202) id C6F9320B716B; Fri, 22 May 2026 19:03:06 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com C6F9320B716B From: Long Li To: Long Li , Konstantin Taranov , Jakub Kicinski , "David S . Miller" , Paolo Abeni , Eric Dumazet , Andrew Lunn , Jason Gunthorpe , Leon Romanovsky , Haiyang Zhang , "K . Y . Srinivasan" , Wei Liu , Dexuan Cui , shradhagupta@linux.microsoft.com Cc: Simon Horman , netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v11 4/6] net: mana: Use GIC functions to allocate global EQs Date: Fri, 22 May 2026 19:02:54 -0700 Message-ID: <20260523020258.1107742-5-longli@microsoft.com> X-Mailer: git-send-email 2.43.7 In-Reply-To: <20260523020258.1107742-1-longli@microsoft.com> References: <20260523020258.1107742-1-longli@microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Replace the GDMA global interrupt setup code with the new GIC allocation and release functions for managing interrupt contexts. This changes the per-queue interrupt names in /proc/interrupts from mana_q0, mana_q1, ... to mana_msi1, mana_msi2, ... to reflect the MSI-X index rather than a zero-based queue number. The HWC interrupt name (mana_hwc) is unchanged. Signed-off-by: Long Li --- .../net/ethernet/microsoft/mana/gdma_main.c | 104 +++--------------- 1 file changed, 17 insertions(+), 87 deletions(-) diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/= ethernet/microsoft/mana/gdma_main.c index 0541d914f27d..fc21c7f57e23 100644 --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c @@ -1942,7 +1942,7 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pde= v, int nvec) struct gdma_context *gc =3D pci_get_drvdata(pdev); struct gdma_irq_context *gic; bool skip_first_cpu =3D false; - int *irqs, irq, err, i; + int *irqs, err, i; =20 irqs =3D kmalloc_objs(int, nvec); if (!irqs) @@ -1955,30 +1955,13 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *p= dev, int nvec) * further used in irq_setup() */ for (i =3D 1; i <=3D nvec; i++) { - gic =3D kzalloc_obj(*gic); - if (!gic) { - err =3D -ENOMEM; + gic =3D mana_gd_get_gic(gc, false, &i); + if (IS_ERR(gic)) { + err =3D PTR_ERR(gic); goto free_irq; } - gic->handler =3D mana_gd_process_eq_events; - INIT_LIST_HEAD(&gic->eq_list); - spin_lock_init(&gic->lock); - - snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_q%d@pci:%s", - i - 1, pci_name(pdev)); - - /* one pci vector is already allocated for HWC */ - irqs[i - 1] =3D pci_irq_vector(pdev, i); - if (irqs[i - 1] < 0) { - err =3D irqs[i - 1]; - goto free_current_gic; - } - - err =3D request_irq(irqs[i - 1], mana_gd_intr, 0, gic->name, gic); - if (err) - goto free_current_gic; =20 - xa_store(&gc->irq_contexts, i, gic, GFP_KERNEL); + irqs[i - 1] =3D gic->irq; } =20 /* @@ -2000,20 +1983,9 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pd= ev, int nvec) kfree(irqs); return 0; =20 -free_current_gic: - kfree(gic); free_irq: - for (i -=3D 1; i > 0; i--) { - irq =3D pci_irq_vector(pdev, i); - gic =3D xa_load(&gc->irq_contexts, i); - if (WARN_ON(!gic)) - continue; - - irq_update_affinity_hint(irq, NULL); - free_irq(irq, gic); - xa_erase(&gc->irq_contexts, i); - kfree(gic); - } + for (i -=3D 1; i > 0; i--) + mana_gd_put_gic(gc, false, i); kfree(irqs); return err; } @@ -2022,7 +1994,7 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev, i= nt nvec) { struct gdma_context *gc =3D pci_get_drvdata(pdev); struct gdma_irq_context *gic; - int *irqs, *start_irqs, irq; + int *irqs, *start_irqs; unsigned int cpu; int err, i; =20 @@ -2033,34 +2005,13 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev,= int nvec) start_irqs =3D irqs; =20 for (i =3D 0; i < nvec; i++) { - gic =3D kzalloc_obj(*gic); - if (!gic) { - err =3D -ENOMEM; + gic =3D mana_gd_get_gic(gc, false, &i); + if (IS_ERR(gic)) { + err =3D PTR_ERR(gic); goto free_irq; } =20 - gic->handler =3D mana_gd_process_eq_events; - INIT_LIST_HEAD(&gic->eq_list); - spin_lock_init(&gic->lock); - - if (!i) - snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_hwc@pci:%s", - pci_name(pdev)); - else - snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_q%d@pci:%s", - i - 1, pci_name(pdev)); - - irqs[i] =3D pci_irq_vector(pdev, i); - if (irqs[i] < 0) { - err =3D irqs[i]; - goto free_current_gic; - } - - err =3D request_irq(irqs[i], mana_gd_intr, 0, gic->name, gic); - if (err) - goto free_current_gic; - - xa_store(&gc->irq_contexts, i, gic, GFP_KERNEL); + irqs[i] =3D gic->irq; } =20 /* If number of IRQ is one extra than number of online CPUs, @@ -2089,20 +2040,9 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev, = int nvec) kfree(start_irqs); return 0; =20 -free_current_gic: - kfree(gic); free_irq: - for (i -=3D 1; i >=3D 0; i--) { - irq =3D pci_irq_vector(pdev, i); - gic =3D xa_load(&gc->irq_contexts, i); - if (WARN_ON(!gic)) - continue; - - irq_update_affinity_hint(irq, NULL); - free_irq(irq, gic); - xa_erase(&gc->irq_contexts, i); - kfree(gic); - } + for (i -=3D 1; i >=3D 0; i--) + mana_gd_put_gic(gc, false, i); =20 kfree(start_irqs); return err; @@ -2176,26 +2116,16 @@ static int mana_gd_setup_remaining_irqs(struct pci_= dev *pdev) static void mana_gd_remove_irqs(struct pci_dev *pdev) { struct gdma_context *gc =3D pci_get_drvdata(pdev); - struct gdma_irq_context *gic; - int irq, i; + int i; =20 if (gc->max_num_msix < 1) return; =20 for (i =3D 0; i < gc->max_num_msix; i++) { - irq =3D pci_irq_vector(pdev, i); - if (irq < 0) - continue; - - gic =3D xa_load(&gc->irq_contexts, i); - if (WARN_ON(!gic)) + if (!xa_load(&gc->irq_contexts, i)) continue; =20 - /* Need to clear the hint before free_irq */ - irq_update_affinity_hint(irq, NULL); - free_irq(irq, gic); - xa_erase(&gc->irq_contexts, i); - kfree(gic); + mana_gd_put_gic(gc, false, i); } =20 pci_free_irq_vectors(pdev); --=20 2.43.0 From nobody Sun May 24 19:33:19 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5965727A47F; Sat, 23 May 2026 02:03:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779501799; cv=none; b=Qwy5tHXg6oRR+zgwSYsA/JkrFTCRxgfFN+ew89PpEDfk20tMtyuRzEGKAGmiSShtkjKz0rW9Klfs7faIB9jBoXKmTLibfXlvUXI/RHczSDZPPQ6BzauoNa8TuACcg4SMpNHg6ES6EeQqnubqrwxaGTd9ArYw8C+D9xcVOU8ESyg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779501799; c=relaxed/simple; bh=UCElUD480IjCnvdnwEUw59+Jfe3MIUvK0ppnG/1p0M4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PsdDu3+lAc3EDwbufP7deQ5u6kA1p+PCkznWWBkF+wS8reeWnehFGgxRPy1H8dty8Q2P3AJzqs5eD+KRpzSw7sKJ61byRhQCssb/jhaeJ5jJf2YNF/3u0ucbuIm8KAnjXb8uVAGuauWt7dDpV1rWqDuu3B5MIdG66Usg7giuS/Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Received: by linux.microsoft.com (Postfix, from userid 1202) id D5E2F20B716C; Fri, 22 May 2026 19:03:06 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com D5E2F20B716C From: Long Li To: Long Li , Konstantin Taranov , Jakub Kicinski , "David S . Miller" , Paolo Abeni , Eric Dumazet , Andrew Lunn , Jason Gunthorpe , Leon Romanovsky , Haiyang Zhang , "K . Y . Srinivasan" , Wei Liu , Dexuan Cui , shradhagupta@linux.microsoft.com Cc: Simon Horman , netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v11 5/6] net: mana: Allocate interrupt context for each EQ when creating vPort Date: Fri, 22 May 2026 19:02:55 -0700 Message-ID: <20260523020258.1107742-6-longli@microsoft.com> X-Mailer: git-send-email 2.43.7 In-Reply-To: <20260523020258.1107742-1-longli@microsoft.com> References: <20260523020258.1107742-1-longli@microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use GIC functions to create a dedicated interrupt context or acquire a shared interrupt context for each EQ when setting up a vPort. The caller now owns the GIC reference across the EQ create/destroy lifecycle: mana_create_eq() calls mana_gd_get_gic() before creating each EQ and mana_destroy_eq() calls mana_gd_put_gic() after destroying it. The msix_index invalidation is moved from mana_gd_deregister_irq() to the mana_gd_create_eq() error path so that mana_destroy_eq() can read the index before teardown. Signed-off-by: Long Li --- .../net/ethernet/microsoft/mana/gdma_main.c | 2 +- drivers/net/ethernet/microsoft/mana/mana_en.c | 18 +++++++++++++++++- include/net/mana/gdma.h | 1 + 3 files changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/= ethernet/microsoft/mana/gdma_main.c index fc21c7f57e23..10d394dd9653 100644 --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c @@ -912,7 +912,6 @@ static void mana_gd_deregister_irq(struct gdma_queue *q= ueue) } spin_unlock_irqrestore(&gic->lock, flags); =20 - queue->eq.msix_index =3D INVALID_PCI_MSIX_INDEX; synchronize_rcu(); } =20 @@ -1027,6 +1026,7 @@ static int mana_gd_create_eq(struct gdma_dev *gd, out: dev_err(dev, "Failed to create EQ: %d\n", err); mana_gd_destroy_eq(gc, false, queue); + queue->eq.msix_index =3D INVALID_PCI_MSIX_INDEX; return err; } =20 diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/et= hernet/microsoft/mana/mana_en.c index 571648007378..bca381f8bc7b 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -1634,6 +1634,7 @@ void mana_destroy_eq(struct mana_port_context *apc) struct mana_context *ac =3D apc->ac; struct gdma_context *gc =3D ac->gdma_dev->gdma_context; struct gdma_queue *eq; + unsigned int msi; int i; =20 if (!apc->eqs) @@ -1647,7 +1648,9 @@ void mana_destroy_eq(struct mana_port_context *apc) if (!eq) continue; =20 + msi =3D eq->eq.msix_index; mana_gd_destroy_queue(gc, eq); + mana_gd_put_gic(gc, !gc->msi_sharing, msi); } =20 kfree(apc->eqs); @@ -1664,6 +1667,7 @@ static void mana_create_eq_debugfs(struct mana_port_c= ontext *apc, int i) eq.mana_eq_debugfs =3D debugfs_create_dir(eqnum, apc->mana_eqs_debugfs); debugfs_create_u32("head", 0400, eq.mana_eq_debugfs, &eq.eq->head); debugfs_create_u32("tail", 0400, eq.mana_eq_debugfs, &eq.eq->tail); + debugfs_create_u32("irq", 0400, eq.mana_eq_debugfs, &eq.eq->eq.irq); debugfs_create_file("eq_dump", 0400, eq.mana_eq_debugfs, eq.eq, &mana_dbg= _q_fops); } =20 @@ -1672,7 +1676,9 @@ int mana_create_eq(struct mana_port_context *apc) struct gdma_dev *gd =3D apc->ac->gdma_dev; struct gdma_context *gc =3D gd->gdma_context; struct gdma_queue_spec spec =3D {}; + struct gdma_irq_context *gic; int err; + int msi; int i; =20 if (WARN_ON(apc->eqs)) @@ -1692,12 +1698,22 @@ int mana_create_eq(struct mana_port_context *apc) debugfs_create_dir("EQs", apc->mana_port_debugfs); =20 for (i =3D 0; i < apc->num_queues; i++) { - spec.eq.msix_index =3D (i + 1) % gc->num_msix_usable; + msi =3D (i + 1) % gc->num_msix_usable; + + gic =3D mana_gd_get_gic(gc, !gc->msi_sharing, &msi); + if (IS_ERR(gic)) { + err =3D PTR_ERR(gic); + goto out; + } + spec.eq.msix_index =3D msi; + err =3D mana_gd_create_mana_eq(gd, &spec, &apc->eqs[i].eq); if (err) { dev_err(gc->dev, "Failed to create EQ %d : %d\n", i, err); + mana_gd_put_gic(gc, !gc->msi_sharing, msi); goto out; } + apc->eqs[i].eq->eq.irq =3D gic->irq; mana_create_eq_debugfs(apc, i); } =20 diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h index e3ee85c614ec..6a65fedae38f 100644 --- a/include/net/mana/gdma.h +++ b/include/net/mana/gdma.h @@ -342,6 +342,7 @@ struct gdma_queue { void *context; =20 unsigned int msix_index; + unsigned int irq; =20 u32 log2_throttle_limit; } eq; --=20 2.43.0 From nobody Sun May 24 19:33:19 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5982B27B34E; Sat, 23 May 2026 02:03:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779501798; cv=none; b=lI8ECYbqaraCW41e7B5I64z2k+JiTFXAmUiq/jQYAI7anPD58PJ8+UkxIwJ2hjbQc9hgjO7iPlwf26W4Rw+CQ4wSS05IgrkocJsxPBSJGs2cbBz/GtdQ0qlqpJej6NqcL9C+d2NJ4pOXUvXQWrqmC5tzyZQM43tzzFMsRRGlW9o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779501798; c=relaxed/simple; bh=esMVDhOhqDhb7e/enNKiUKf0CS5YFccA32hG11huw8c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BQU6sb3mUbTXK3SFa3W2H4wzg/LbkN3OI02D3vRGPIltAOWryLqwlQQzOx+j3qoxouydkAoZI0EX7DajYGiP+H83Ugs6SYEImbIMXnF3V939hGca0LCQUDpojTquBFa0MRDlpg9spJ2RFFpljJlSCyCNIWb3WflMgwUOWRaK4jU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Received: by linux.microsoft.com (Postfix, from userid 1202) id E2BD220B716D; Fri, 22 May 2026 19:03:06 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com E2BD220B716D From: Long Li To: Long Li , Konstantin Taranov , Jakub Kicinski , "David S . Miller" , Paolo Abeni , Eric Dumazet , Andrew Lunn , Jason Gunthorpe , Leon Romanovsky , Haiyang Zhang , "K . Y . Srinivasan" , Wei Liu , Dexuan Cui , shradhagupta@linux.microsoft.com Cc: Simon Horman , netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v11 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs Date: Fri, 22 May 2026 19:02:56 -0700 Message-ID: <20260523020258.1107742-7-longli@microsoft.com> X-Mailer: git-send-email 2.43.7 In-Reply-To: <20260523020258.1107742-1-longli@microsoft.com> References: <20260523020258.1107742-1-longli@microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use the GIC functions to allocate interrupt contexts for RDMA EQs. These interrupt contexts may be shared with Ethernet EQs when MSI-X vectors are limited. The driver now supports allocating dedicated MSI-X for each EQ. Indicate this capability through driver capability bits. The RDMA EQs pass use_msi_bitmap=3Dfalse to share MSI-X vectors with Ethernet, while the capability flag advertises that the driver supports per-vPort EQ separation when hardware has sufficient vectors. Populate eq.irq on all RDMA EQs for consistency with the Ethernet path. Also relocate the GDMA_DRV_CAP_FLAG_1_HW_VPORT_LINK_AWARE define to its numeric BIT(6) position among the other capability flags. Signed-off-by: Long Li --- drivers/infiniband/hw/mana/main.c | 43 +++++++++++++++++++++++++------ include/net/mana/gdma.h | 7 +++-- 2 files changed, 40 insertions(+), 10 deletions(-) diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana= /main.c index f8a9013f0ca3..cefab12e2659 100644 --- a/drivers/infiniband/hw/mana/main.c +++ b/drivers/infiniband/hw/mana/main.c @@ -764,7 +764,8 @@ int mana_ib_create_eqs(struct mana_ib_dev *mdev) { struct gdma_context *gc =3D mdev_to_gc(mdev); struct gdma_queue_spec spec =3D {}; - int err, i; + struct gdma_irq_context *gic; + int err, i, msi; =20 spec.type =3D GDMA_EQ; spec.monitor_avl_buf =3D false; @@ -772,11 +773,19 @@ int mana_ib_create_eqs(struct mana_ib_dev *mdev) spec.eq.callback =3D mana_ib_event_handler; spec.eq.context =3D mdev; spec.eq.log2_throttle_limit =3D LOG2_EQ_THROTTLE; - spec.eq.msix_index =3D 0; + + msi =3D 0; + gic =3D mana_gd_get_gic(gc, false, &msi); + if (IS_ERR(gic)) + return PTR_ERR(gic); + spec.eq.msix_index =3D msi; =20 err =3D mana_gd_create_mana_eq(mdev->gdma_dev, &spec, &mdev->fatal_err_eq= ); - if (err) + if (err) { + mana_gd_put_gic(gc, false, 0); return err; + } + mdev->fatal_err_eq->eq.irq =3D gic->irq; =20 mdev->eqs =3D kzalloc_objs(struct gdma_queue *, mdev->ib_dev.num_comp_vectors); @@ -786,32 +795,50 @@ int mana_ib_create_eqs(struct mana_ib_dev *mdev) } spec.eq.callback =3D NULL; for (i =3D 0; i < mdev->ib_dev.num_comp_vectors; i++) { - spec.eq.msix_index =3D (i + 1) % gc->num_msix_usable; + msi =3D (i + 1) % gc->num_msix_usable; + + gic =3D mana_gd_get_gic(gc, false, &msi); + if (IS_ERR(gic)) { + err =3D PTR_ERR(gic); + goto destroy_eqs; + } + spec.eq.msix_index =3D msi; + err =3D mana_gd_create_mana_eq(mdev->gdma_dev, &spec, &mdev->eqs[i]); - if (err) + if (err) { + mana_gd_put_gic(gc, false, msi); goto destroy_eqs; + } + mdev->eqs[i]->eq.irq =3D gic->irq; } =20 return 0; =20 destroy_eqs: - while (i-- > 0) + while (i-- > 0) { mana_gd_destroy_queue(gc, mdev->eqs[i]); + mana_gd_put_gic(gc, false, (i + 1) % gc->num_msix_usable); + } kfree(mdev->eqs); destroy_fatal_eq: mana_gd_destroy_queue(gc, mdev->fatal_err_eq); + mana_gd_put_gic(gc, false, 0); return err; } =20 void mana_ib_destroy_eqs(struct mana_ib_dev *mdev) { struct gdma_context *gc =3D mdev_to_gc(mdev); - int i; + int i, msi; =20 mana_gd_destroy_queue(gc, mdev->fatal_err_eq); + mana_gd_put_gic(gc, false, 0); =20 - for (i =3D 0; i < mdev->ib_dev.num_comp_vectors; i++) + for (i =3D 0; i < mdev->ib_dev.num_comp_vectors; i++) { mana_gd_destroy_queue(gc, mdev->eqs[i]); + msi =3D (i + 1) % gc->num_msix_usable; + mana_gd_put_gic(gc, false, msi); + } =20 kfree(mdev->eqs); } diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h index 6a65fedae38f..78afd696b08b 100644 --- a/include/net/mana/gdma.h +++ b/include/net/mana/gdma.h @@ -616,6 +616,7 @@ enum { #define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG BIT(3) #define GDMA_DRV_CAP_FLAG_1_GDMA_PAGES_4MB_1GB_2GB BIT(4) #define GDMA_DRV_CAP_FLAG_1_VARIABLE_INDIRECTION_TABLE_SUPPORT BIT(5) +#define GDMA_DRV_CAP_FLAG_1_HW_VPORT_LINK_AWARE BIT(6) =20 /* Driver can handle holes (zeros) in the device list */ #define GDMA_DRV_CAP_FLAG_1_DEV_LIST_HOLES_SUP BIT(11) @@ -632,7 +633,8 @@ enum { /* Driver detects stalled send queues and recovers them */ #define GDMA_DRV_CAP_FLAG_1_HANDLE_STALL_SQ_RECOVERY BIT(18) =20 -#define GDMA_DRV_CAP_FLAG_1_HW_VPORT_LINK_AWARE BIT(6) +/* Driver supports separate EQ/MSIs for each vPort */ +#define GDMA_DRV_CAP_FLAG_1_EQ_MSI_UNSHARE_MULTI_VPORT BIT(19) =20 /* Driver supports linearizing the skb when num_sge exceeds hardware limit= */ #define GDMA_DRV_CAP_FLAG_1_SKB_LINEARIZE BIT(20) @@ -660,7 +662,8 @@ enum { GDMA_DRV_CAP_FLAG_1_SKB_LINEARIZE | \ GDMA_DRV_CAP_FLAG_1_PROBE_RECOVERY | \ GDMA_DRV_CAP_FLAG_1_HANDLE_STALL_SQ_RECOVERY | \ - GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY) + GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY | \ + GDMA_DRV_CAP_FLAG_1_EQ_MSI_UNSHARE_MULTI_VPORT) =20 #define GDMA_DRV_CAP_FLAGS2 0 =20 --=20 2.43.0