From nobody Mon Jun 8 07:23:59 2026 Received: from mail-pl1-f226.google.com (mail-pl1-f226.google.com [209.85.214.226]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E43633E374 for ; Thu, 4 Jun 2026 19:54:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.226 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780602851; cv=none; b=JHzRE9BvNn2QjrOqk4sEgchdhjkkJr7xn5VU8DD5ILouRYoiKlAMbBSoZ+2jRo8qlKbFdzdXiSMSt3feJd2efxc0jenbdm/aGapHJOEUxbt7h8Ewcw0o3VWg3VROMDKnZQjzPu+D0mvjGVEEuuJ6Dzl6GonbLNrY0ipgULkNdLg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780602851; c=relaxed/simple; bh=78F4ybr8OD9A5NeYg1AE0IZSwhVFegReFPPeEfJVCsk=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=T0EkZ3OyYCFroq4OqvG9hN5b278KtooNxS4pCWpQMIPE5CmUsXggoS7p03nsaLgkWCwLzegVd3zmQAptHliN8houQhEyeSNgQnYnTaz5Tds4QYDiKlQcd4hgchJSSB3062KsRUlPznVsC+GVYf7/Qwo0SPf0Y81a0vSLLg9csq4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=purestorage.com; spf=pass smtp.mailfrom=purestorage.com; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b=TuFPlnTL; arc=none smtp.client-ip=209.85.214.226 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=purestorage.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=purestorage.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b="TuFPlnTL" Received: by mail-pl1-f226.google.com with SMTP id d9443c01a7336-2c0c2d8b95bso7741545ad.1 for ; Thu, 04 Jun 2026 12:54:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1780602849; x=1781207649; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=m2imutk0xngwG04CZ2xivc/DWXVoGd2Q3kkDp+9sgkE=; b=TuFPlnTL8HlHC2fft0FW87TRqfnp1LdlJ5D0cmbbiQSSHH25mJKH2tyhyx4jRftorN s3G69zt1FPIR3faC1XFnBA5T+B9RrPVreYQr47NUnY021zp0LJoTeHcoza/QC/2dx4Bb aZNtK/F9KUOVAD/tzNUoJrtcCmA5j5Yl3YFpNDJZZogweLPQTMiwBWrJSQPiMIfDI2uE RDZ8hAcgyjMx2To5sN4AcOsx60Wz7vaaZlV1juhUP5mSQC1+BuxwLSidhrtVHc+/Pukr fmhiVmvXPyPSSrThSrn47uHSwZoj/99d9d0iK3WzFzEGVGEvAOxhrtFxmxHcTcpNgRkl /Fzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780602849; x=1781207649; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=m2imutk0xngwG04CZ2xivc/DWXVoGd2Q3kkDp+9sgkE=; b=dmmY3Tu0Z++G+I48sTU3qzhU2Xsnp/ConKV+RCrgHjKRP+Z24+zC8+M5jZ9jT2s4hO 7d8IQJRN006Kscw8+J1sAqBRfOGQTqAXgTCzelYK/ksHd1vpGBm0xdHfkAtP10QMoRIF 0Y6rTqcQ4VS4u0xefPF1001WbWb72flXu+2yb3MXJJtw9gYq7pt60zC0OIwqVGfoEc37 DLednXYgI3XJGWyai69dTmzP7FZPEe2XDt9OU2czFpNYtqyBIQqr8n2FdvzcAISjR+MO aNxVOxTOM9Li87xRkztXReFsjvHfWFNDrHef3PMIVla6UwbgUqmS3z00T8VIcLw5RjSR rq2g== X-Forwarded-Encrypted: i=1; AFNElJ/8sas3snXM9o4qgEOBolsHWpCMCQ4SYwUeYznqOy+6jVn1reKX36Ek4zlV5Se7j5VAIROnHcVXK4mtbZU=@vger.kernel.org X-Gm-Message-State: AOJu0Yw1+hEPbo6pjhY6/XmdhHxlFyxchzLm7vE0BuLCXK2BVtM3TiYz 2pgA7P+Ot2ElKXbT2SoX14uwisTxVLZgnhWtZ2YN3wsq5YpFhFnUzWmS8WtxALA928IW5JDDXy3 gzPVmwRLETf+GtwNn3uLPuMHFV5U3h35MVNqrVeP0lTmAyNZmFVsX X-Gm-Gg: Acq92OEaSNoZlSc/2rn5L8qvJkFssdRUIbkOiJzqvOqWqxFYLAKH3MOZ4n2euPgd5Ll FrD3WaodtRdEmumlJNvcp0rhfmVx3wh8OcBhKf7UUW5I9v62UHdCftnigPJE9VupSdGfxMbdONI ko0Yn96mRd2K+YXDRRDWk7RKWKmEL89AIUhsOLJRsJ/alSIhlfZRRyF51iaPf592keg/1mXzKqi ir3rRCTi4sMageutBpLX5VnXvKXixj8SL+HBlA4Od1FnT7+wjQrzzCFhWItOYWYSOCk/xIKI3AD vTi1pZUS+CdCHeyDC0R/m0DbYdgJfhWlvKiwAD7zYPyNnJETnrdFGOMcssSeDaeBU1z8T6/s3QV giXB265IxLjZkT7IauvYq346q5cmZPWvr0P9kJHcMl0BrLIMF X-Received: by 2002:a17:903:37c7:b0:2b9:e82f:bfef with SMTP id d9443c01a7336-2c1e8820055mr114825ad.21.1780602848530; Thu, 04 Jun 2026 12:54:08 -0700 (PDT) Received: from c7-smtp-2023.dev.purestorage.com ([2620:125:9017:12:36:3:5:0]) by smtp-relay.gmail.com with ESMTPS id d9443c01a7336-2c164f88e0csm6202265ad.18.2026.06.04.12.54.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jun 2026 12:54:08 -0700 (PDT) X-Relaying-Domain: purestorage.com Received: from dev-sgogte.dev.purestorage.com (bond0.slc5-n22m24-k8s.dev.purestorage.com [IPv6:2620:125:9025:20::a31:429]) by c7-smtp-2023.dev.purestorage.com (Postfix) with ESMTP id A7AC23404BC; Thu, 4 Jun 2026 13:54:07 -0600 (MDT) Received: by dev-sgogte.dev.purestorage.com (Postfix, from userid 1557734945) id C7237482EB; Thu, 4 Jun 2026 13:54:07 -0600 (MDT) From: Surabhi Gogte To: kbusch@kernel.org, axboe@kernel.dk, hch@lst.de, sagi@grimberg.me Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, mkhalfella@purestorage.com, randyj@purestorage.com, Surabhi Gogte Subject: [PATCH v2] nvme-rdma: parallelize I/O queue allocation and startup Date: Thu, 4 Jun 2026 13:53:21 -0600 Message-ID: <20260604195321.2232838-1-sgogte@purestorage.com> X-Mailer: git-send-email 2.54.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Refactor nvme rdma I/O queue setup to use async API, combining allocation and startup into a single parallel operation per queue. This reduces connection and reconnection setup time when there are delays in establishing connections, which is especially important for high-core-count hosts. Key changes: - Use async API to facilitate parallel calls for io queue setup. - Add qsetup_err atomic to nvme_rdma_ctrl for collecting errors across parallel threads. - Refactor nvme_rdma_alloc_queue() to accept a pre-initialized queue pointer instead of (ctrl, idx, queue_size), updating all call sites including the admin queue path. - Remove nvme_rdma_alloc_io_queues() and nvme_rdma_start_io_queues(); their logic is folded into nvme_rdma_setup_io_queues() and nvme_rdma_configure_io_queues(). - Move queue count negotiation (nvme_set_queue_count, nvmf_set_io_queues) from the removed nvme_rdma_alloc_io_queues() into nvme_rdma_configure_io_queues(). Testing on a 64-core host with 64 IO-queues shows nvme-rdma connection time reduced from ~1.4s to 416ms. Signed-off-by: Surabhi Gogte --- Changes since v1: - Replace dedicated nvme_setup_wq workqueue with async API implementation. --- drivers/nvme/host/rdma.c | 122 ++++++++++++++++++++++----------------- 1 file changed, 68 insertions(+), 54 deletions(-) diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index f77c960f7632..3b33f7be563b 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -125,6 +126,7 @@ struct nvme_rdma_ctrl { struct nvme_ctrl ctrl; bool use_inline_data; u32 io_queues[HCTX_MAX_TYPES]; + atomic_t qsetup_err; }; =20 static inline struct nvme_rdma_ctrl *to_rdma_ctrl(struct nvme_ctrl *ctrl) @@ -566,16 +568,14 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma= _queue *queue) return ret; } =20 -static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl, - int idx, size_t queue_size) +static int nvme_rdma_alloc_queue(struct nvme_rdma_queue *queue) { - struct nvme_rdma_queue *queue; + struct nvme_rdma_ctrl *ctrl =3D queue->ctrl; + int idx =3D nvme_rdma_queue_idx(queue); struct sockaddr *src_addr =3D NULL; int ret; =20 - queue =3D &ctrl->queues[idx]; mutex_init(&queue->queue_lock); - queue->ctrl =3D ctrl; if (idx && ctrl->ctrl.max_integrity_segments) queue->pi_support =3D true; else @@ -587,8 +587,6 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl = *ctrl, else queue->cmnd_capsule_len =3D sizeof(struct nvme_command); =20 - queue->queue_size =3D queue_size; - queue->cm_id =3D rdma_create_id(&init_net, nvme_rdma_cm_handler, queue, RDMA_PS_TCP, IB_QPT_RC); if (IS_ERR(queue->cm_id)) { @@ -694,59 +692,57 @@ static int nvme_rdma_start_queue(struct nvme_rdma_ctr= l *ctrl, int idx) return ret; } =20 -static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl, - int first, int last) +static void nvme_rdma_setup_queue_async(void *setup_queue, async_cookie_t = cookie) { - int i, ret =3D 0; + struct nvme_rdma_queue *queue =3D setup_queue; + int ret; =20 - for (i =3D first; i < last; i++) { - ret =3D nvme_rdma_start_queue(ctrl, i); - if (ret) - goto out_stop_queues; - } + ret =3D nvme_rdma_alloc_queue(queue); + if (ret) + goto out_err; =20 - return 0; + ret =3D nvme_rdma_start_queue(queue->ctrl, nvme_rdma_queue_idx(queue)); + if (ret) + goto out_err; =20 -out_stop_queues: - for (i--; i >=3D first; i--) - nvme_rdma_stop_queue(&ctrl->queues[i]); - return ret; + return; + +out_err: + atomic_cmpxchg(&queue->ctrl->qsetup_err, 0, ret); } =20 -static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl) +static int nvme_rdma_setup_io_queues(struct nvme_rdma_ctrl *ctrl, int firs= t, + int last, size_t queue_size) { - struct nvmf_ctrl_options *opts =3D ctrl->ctrl.opts; - unsigned int nr_io_queues; + int nr_queues =3D last - first; int i, ret; + ASYNC_DOMAIN_EXCLUSIVE(setup_queue_domain); =20 - nr_io_queues =3D nvmf_nr_io_queues(opts); - ret =3D nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues); - if (ret) - return ret; - - if (nr_io_queues =3D=3D 0) { - dev_err(ctrl->ctrl.device, - "unable to set any I/O queues\n"); - return -ENOMEM; - } + atomic_set(&ctrl->qsetup_err, 0); =20 - ctrl->ctrl.queue_count =3D nr_io_queues + 1; - dev_info(ctrl->ctrl.device, - "creating %d I/O queues.\n", nr_io_queues); + for (i =3D 0; i < nr_queues; i++) { + struct nvme_rdma_queue *queue =3D &ctrl->queues[first + i]; =20 - nvmf_set_io_queues(opts, nr_io_queues, ctrl->io_queues); - for (i =3D 1; i < ctrl->ctrl.queue_count; i++) { - ret =3D nvme_rdma_alloc_queue(ctrl, i, - ctrl->ctrl.sqsize + 1); - if (ret) - goto out_free_queues; + queue->ctrl =3D ctrl; + queue->queue_size =3D queue_size; + async_schedule_domain(nvme_rdma_setup_queue_async, queue, + &setup_queue_domain); } =20 - return 0; + async_synchronize_full_domain(&setup_queue_domain); =20 -out_free_queues: - for (i--; i >=3D 1; i--) - nvme_rdma_free_queue(&ctrl->queues[i]); + ret =3D atomic_read(&ctrl->qsetup_err); + if (ret) { + for (i =3D 0; i < nr_queues; i++) { + struct nvme_rdma_queue *queue =3D + &ctrl->queues[first + i]; + + if (test_bit(NVME_RDMA_Q_LIVE, &queue->flags)) + nvme_rdma_stop_queue(queue); + if (test_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags)) + nvme_rdma_free_queue(queue); + } + } =20 return ret; } @@ -783,7 +779,9 @@ static int nvme_rdma_configure_admin_queue(struct nvme_= rdma_ctrl *ctrl, bool pi_capable =3D false; int error; =20 - error =3D nvme_rdma_alloc_queue(ctrl, 0, NVME_AQ_DEPTH); + ctrl->queues[0].ctrl =3D ctrl; + ctrl->queues[0].queue_size =3D NVME_AQ_DEPTH; + error =3D nvme_rdma_alloc_queue(&ctrl->queues[0]); if (error) return error; =20 @@ -864,11 +862,22 @@ static int nvme_rdma_configure_admin_queue(struct nvm= e_rdma_ctrl *ctrl, static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool= new) { int ret, nr_queues; + unsigned int nr_io_queues; =20 - ret =3D nvme_rdma_alloc_io_queues(ctrl); + nr_io_queues =3D nvmf_nr_io_queues(ctrl->ctrl.opts); + ret =3D nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues); if (ret) return ret; =20 + if (nr_io_queues =3D=3D 0) { + dev_err(ctrl->ctrl.device, "unable to set any I/O queues\n"); + return -ENOMEM; + } + + ctrl->ctrl.queue_count =3D nr_io_queues + 1; + dev_info(ctrl->ctrl.device, "creating %d I/O queues.\n", nr_io_queues); + nvmf_set_io_queues(ctrl->ctrl.opts, nr_io_queues, ctrl->io_queues); + if (new) { ret =3D nvme_rdma_alloc_tag_set(&ctrl->ctrl); if (ret) @@ -881,7 +890,9 @@ static int nvme_rdma_configure_io_queues(struct nvme_rd= ma_ctrl *ctrl, bool new) * queue number might have changed. */ nr_queues =3D min(ctrl->tag_set.nr_hw_queues + 1, ctrl->ctrl.queue_count); - ret =3D nvme_rdma_start_io_queues(ctrl, 1, nr_queues); + ret =3D nvme_rdma_setup_io_queues(ctrl, 1, nr_queues, + ctrl->ctrl.sqsize + 1); + if (ret) goto out_cleanup_tagset; =20 @@ -905,12 +916,15 @@ static int nvme_rdma_configure_io_queues(struct nvme_= rdma_ctrl *ctrl, bool new) =20 /* * If the number of queues has increased (reconnect case) - * start all new queues now. + * setup all new queues now. */ - ret =3D nvme_rdma_start_io_queues(ctrl, nr_queues, - ctrl->tag_set.nr_hw_queues + 1); - if (ret) - goto out_wait_freeze_timed_out; + if (ctrl->tag_set.nr_hw_queues + 1 > nr_queues) { + ret =3D nvme_rdma_setup_io_queues(ctrl, nr_queues, + ctrl->tag_set.nr_hw_queues + 1, + ctrl->ctrl.sqsize + 1); + if (ret) + goto out_wait_freeze_timed_out; + } =20 return 0; =20 --=20 2.54.0