From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D5003375A3 for ; Fri, 29 Aug 2025 15:37:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756481855; cv=none; b=R4s5v/NzXEWHaP2r/nZLW3DCR+CHNFJk8hC2Zoo0KdA1lKHD+AKNwEnXfVcdpDZ872myBA6OC9wpyc8VctIkl05r+EWqCe8IxycnKycPHi4KXhiyGdyvk48s1F0ZBiIDvDbI0C2lu4laR+fAtQV+eGPjA1ppdGoRxCY7lThZTRQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756481855; c=relaxed/simple; bh=vw+qLjiyh+G0bkQwHR3d6Hwy+8ZKwQcQXIo7MovUDik=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=qwwK6f4pmNAX+bA52xdpap4m7usmeAmbJ6M8jnaveZ/GVwXsy55bWY3zjDgDI25N5lMpJKGnObnPCEGKzHKx0LQ1KjOHC+96CJRML++/BlVS7FvOkrn5n7kd7zn0vPeIFrmPf3h6jAUbot3lWFWIsfC8iiWXb7JdbRAVv7tz/ow= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=F+UpTaAG; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="F+UpTaAG" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6D3B8C4CEFC; Fri, 29 Aug 2025 15:37:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756481854; bh=vw+qLjiyh+G0bkQwHR3d6Hwy+8ZKwQcQXIo7MovUDik=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=F+UpTaAGtIvtTZDVCVBcZ2Z2+U5j1HXxRHhgm8uqq7c7LQgbGhdbKIzJGuSCAClfQ ToEIL30cXyjSFad0JeHN1coDVYAAHuRuKZKuOuNIxj8aQN90Yyn0ANGojJ2DZz9gP8 vvQ2UmHI/sOMUnJhfLMrNBVKTvjIUakEENhIYrDBGDxXBv75MxkImOq0UdOHDYsn0z PDbPfnpoEQPWk6334vWBWKRlFwJmTiEN9zOtdikEtkpyle9Q4SY+ddL+iq4zOJKNmz IKKLvuC5tJq2QxKPGHYZChApIYWW8G67kOXc4VkB2idobpwAOy+unQjEzlQT3Cg66S QbDc7ELGmOHmQ== From: Daniel Wagner Date: Fri, 29 Aug 2025 17:37:25 +0200 Subject: [PATCH v3 1/4] nvme-fabrics: introduce ref-counting for nvmf_ctrl_options Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250829-nvme-fc-sync-v3-1-d69c87e63aee@kernel.org> References: <20250829-nvme-fc-sync-v3-0-d69c87e63aee@kernel.org> In-Reply-To: <20250829-nvme-fc-sync-v3-0-d69c87e63aee@kernel.org> To: Keith Busch , Christoph Hellwig , Sagi Grimberg , James Smart Cc: Shinichiro Kawasaki , Hannes Reinecke , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Daniel Wagner X-Mailer: b4 0.14.2 nvmf_create_ctrl allocates the options object and frees it when the create_ctrl function fails. TCP and RDMA are doing the initial connect synchronous, while the FC transport offloads the connect attempt to a workqueue. Thus, nvme_fc_init_ctrl has to take owner ship in the error case to avoid double free. But there are several places in the nvme core code which want to access the options pointer in the cleanup path and this is racing with the FC reconnect or delete state machine. E.g nvme_ctrl_free nvme_auth_free ctrl_max_dhchaps ctrl->opts->dhchaps nvme_do_delete_ctrl nvmf_ctrl_subsysnqn ctrl->opts->subsysnqn There are also a few workaround in the existing code to avoid de-referencing a NULL pointer: nvme_class_uevent if (opts) { add_uevent_var(env, "NVME_TRADDR"); add_uevent_var(env, "NVME_TRSVID"); add_uevent_var(env, "NVME_HOST_TRADDR"); add_uevent_var(env, "NVME_HOST_IFACE"); } Furthermore, user space is able to trigger a crash because nvmf_ctrl_options are exposed to sysfs and expects ctrl->opts to be valid always: nvme_sysfs_show_hostnqn ctrl->opts->host->nqn nvme_sysfs_show_hostid ctrl->opts->host->id By introducing ref-counting for the options, it decouples the options lifetime from the controller objects, which avoids all the races mentioned above. Reviewed-by: Hannes Reinecke Signed-off-by: Daniel Wagner --- drivers/nvme/host/fabrics.c | 22 +++++++++++++++++++--- drivers/nvme/host/fabrics.h | 6 +++++- drivers/nvme/host/fc.c | 14 +++++++++----- drivers/nvme/host/rdma.c | 18 +++++++++++++----- drivers/nvme/host/tcp.c | 21 ++++++++++++++------- drivers/nvme/target/loop.c | 19 +++++++++++++------ 6 files changed, 73 insertions(+), 27 deletions(-) diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index 2e58a7ce109052ecdda433f13162911459a81fb3..cb9311c399856de54a2e4d41d41= d295dd1aef31e 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -1279,8 +1279,11 @@ static int nvmf_check_allowed_opts(struct nvmf_ctrl_= options *opts, return 0; } =20 -void nvmf_free_options(struct nvmf_ctrl_options *opts) +static void nvmf_free_options(struct kref *ref) { + struct nvmf_ctrl_options *opts =3D + container_of(ref, struct nvmf_ctrl_options, ref); + nvmf_host_put(opts->host); key_put(opts->keyring); key_put(opts->tls_key); @@ -1294,7 +1297,18 @@ void nvmf_free_options(struct nvmf_ctrl_options *opt= s) kfree(opts->dhchap_ctrl_secret); kfree(opts); } -EXPORT_SYMBOL_GPL(nvmf_free_options); + +int nvmf_ctrl_options_get(struct nvmf_ctrl_options *opts) +{ + return kref_get_unless_zero(&opts->ref); +} +EXPORT_SYMBOL_GPL(nvmf_ctrl_options_get); + +void nvmf_ctrl_options_put(struct nvmf_ctrl_options *opts) +{ + kref_put(&opts->ref, nvmf_free_options); +} +EXPORT_SYMBOL_GPL(nvmf_ctrl_options_put); =20 #define NVMF_REQUIRED_OPTS (NVMF_OPT_TRANSPORT | NVMF_OPT_NQN) #define NVMF_ALLOWED_OPTS (NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \ @@ -1316,6 +1330,8 @@ nvmf_create_ctrl(struct device *dev, const char *buf) if (!opts) return ERR_PTR(-ENOMEM); =20 + kref_init(&opts->ref); + ret =3D nvmf_parse_options(opts, buf); if (ret) goto out_free_opts; @@ -1371,7 +1387,7 @@ nvmf_create_ctrl(struct device *dev, const char *buf) out_unlock: up_read(&nvmf_transports_rwsem); out_free_opts: - nvmf_free_options(opts); + nvmf_ctrl_options_put(opts); return ERR_PTR(ret); } =20 diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h index 1b58ee7d0dcee420a1cf7d1a4b8e7e558b69ecae..bcc1d5f97ee5a03852830bf3ba0= a15f82c7c2c03 100644 --- a/drivers/nvme/host/fabrics.h +++ b/drivers/nvme/host/fabrics.h @@ -72,6 +72,7 @@ enum { /** * struct nvmf_ctrl_options - Used to hold the options specified * with the parsing opts enum. + * @ref: for reference count of the data structure * @mask: Used by the fabrics library to parse through sysfs options * on adding a NVMe controller. * @max_reconnects: maximum number of allowed reconnect attempts before re= moving @@ -112,6 +113,7 @@ enum { * @fast_io_fail_tmo: Fast I/O fail timeout in seconds */ struct nvmf_ctrl_options { + struct kref ref; unsigned mask; int max_reconnects; char *transport; @@ -142,6 +144,9 @@ struct nvmf_ctrl_options { int fast_io_fail_tmo; }; =20 +int nvmf_ctrl_options_get(struct nvmf_ctrl_options *opts); +void nvmf_ctrl_options_put(struct nvmf_ctrl_options *opts); + /* * struct nvmf_transport_ops - used to register a specific * fabric implementation of NVMe fabrics. @@ -225,7 +230,6 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl); int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid); int nvmf_register_transport(struct nvmf_transport_ops *ops); void nvmf_unregister_transport(struct nvmf_transport_ops *ops); -void nvmf_free_options(struct nvmf_ctrl_options *opts); int nvmf_get_address(struct nvme_ctrl *ctrl, char *buf, int size); bool nvmf_should_reconnect(struct nvme_ctrl *ctrl, int status); bool nvmf_ip_options_match(struct nvme_ctrl *ctrl, diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 3e12d4683ac72f9ef9c6dff64d22d5d97e6f3d60..db5dbbf6aee21db45e91ea6f0c1= 22681b9bdaf6f 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -2372,8 +2372,7 @@ nvme_fc_ctrl_free(struct kref *ref) nvme_fc_rport_put(ctrl->rport); =20 ida_free(&nvme_fc_ctrl_cnt, ctrl->cnum); - if (ctrl->ctrl.opts) - nvmf_free_options(ctrl->ctrl.opts); + nvmf_ctrl_options_put(ctrl->ctrl.opts); kfree(ctrl); } =20 @@ -3437,10 +3436,15 @@ nvme_fc_alloc_ctrl(struct device *dev, struct nvmf_= ctrl_options *opts, goto out_fail; } =20 + if (!nvmf_ctrl_options_get(opts)) { + ret =3D -ENOLCK; + goto out_free_ctrl; + } + idx =3D ida_alloc(&nvme_fc_ctrl_cnt, GFP_KERNEL); if (idx < 0) { ret =3D -ENOSPC; - goto out_free_ctrl; + goto out_free_opts; } =20 /* @@ -3512,6 +3516,8 @@ nvme_fc_alloc_ctrl(struct device *dev, struct nvmf_ct= rl_options *opts, out_free_ida: put_device(ctrl->dev); ida_free(&nvme_fc_ctrl_cnt, ctrl->cnum); +out_free_opts: + nvmf_ctrl_options_put(opts); out_free_ctrl: kfree(ctrl); out_fail: @@ -3573,8 +3579,6 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctr= l_options *opts, cancel_work_sync(&ctrl->ctrl.reset_work); cancel_delayed_work_sync(&ctrl->connect_work); =20 - ctrl->ctrl.opts =3D NULL; - /* initiate nvme ctrl ref counting teardown */ nvme_uninit_ctrl(&ctrl->ctrl); =20 diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 190a4cfa8a5ee2e6b97a5a1b304c1338dcd748fa..f0692f1132aaef2f8f8c30e9d78= 24e4bf1f91117 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -976,8 +976,8 @@ static void nvme_rdma_free_ctrl(struct nvme_ctrl *nctrl) list_del(&ctrl->list); mutex_unlock(&nvme_rdma_ctrl_mutex); =20 - nvmf_free_options(nctrl->opts); free_ctrl: + nvmf_ctrl_options_put(nctrl->opts); kfree(ctrl->queues); kfree(ctrl); } @@ -2242,6 +2242,12 @@ static struct nvme_rdma_ctrl *nvme_rdma_alloc_ctrl(s= truct device *dev, ctrl =3D kzalloc(sizeof(*ctrl), GFP_KERNEL); if (!ctrl) return ERR_PTR(-ENOMEM); + + if (!nvmf_ctrl_options_get(opts)) { + ret =3D -ENODEV; + goto out_free_ctrl; + } + ctrl->ctrl.opts =3D opts; INIT_LIST_HEAD(&ctrl->list); =20 @@ -2250,7 +2256,7 @@ static struct nvme_rdma_ctrl *nvme_rdma_alloc_ctrl(st= ruct device *dev, kstrdup(__stringify(NVME_RDMA_IP_PORT), GFP_KERNEL); if (!opts->trsvcid) { ret =3D -ENOMEM; - goto out_free_ctrl; + goto out_free_opts; } opts->mask |=3D NVMF_OPT_TRSVCID; } @@ -2269,13 +2275,13 @@ static struct nvme_rdma_ctrl *nvme_rdma_alloc_ctrl(= struct device *dev, if (ret) { pr_err("malformed src address passed: %s\n", opts->host_traddr); - goto out_free_ctrl; + goto out_free_opts; } } =20 if (!opts->duplicate_connect && nvme_rdma_existing_controller(opts)) { ret =3D -EALREADY; - goto out_free_ctrl; + goto out_free_opts; } =20 INIT_DELAYED_WORK(&ctrl->reconnect_work, @@ -2292,7 +2298,7 @@ static struct nvme_rdma_ctrl *nvme_rdma_alloc_ctrl(st= ruct device *dev, ctrl->queues =3D kcalloc(ctrl->ctrl.queue_count, sizeof(*ctrl->queues), GFP_KERNEL); if (!ctrl->queues) - goto out_free_ctrl; + goto out_free_opts; =20 ret =3D nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_rdma_ctrl_ops, 0 /* no quirks, we're perfect! */); @@ -2303,6 +2309,8 @@ static struct nvme_rdma_ctrl *nvme_rdma_alloc_ctrl(st= ruct device *dev, =20 out_kfree_queues: kfree(ctrl->queues); +out_free_opts: + nvmf_ctrl_options_put(opts); out_free_ctrl: kfree(ctrl); return ERR_PTR(ret); diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index c0fe8cfb7229e16af41286f5edb01586ad617291..df445b4da39ef134c605ca7ef1c= cbeda99a23862 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -2554,8 +2554,8 @@ static void nvme_tcp_free_ctrl(struct nvme_ctrl *nctr= l) list_del(&ctrl->list); mutex_unlock(&nvme_tcp_ctrl_mutex); =20 - nvmf_free_options(nctrl->opts); free_ctrl: + nvmf_ctrl_options_put(nctrl->opts); kfree(ctrl->queues); kfree(ctrl); } @@ -2888,6 +2888,11 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(str= uct device *dev, if (!ctrl) return ERR_PTR(-ENOMEM); =20 + if (!nvmf_ctrl_options_get(opts)) { + ret =3D -ENODEV; + goto out_free_ctrl; + } + INIT_LIST_HEAD(&ctrl->list); ctrl->ctrl.opts =3D opts; ctrl->ctrl.queue_count =3D opts->nr_io_queues + opts->nr_write_queues + @@ -2905,7 +2910,7 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(stru= ct device *dev, kstrdup(__stringify(NVME_TCP_DISC_PORT), GFP_KERNEL); if (!opts->trsvcid) { ret =3D -ENOMEM; - goto out_free_ctrl; + goto out_free_opts; } opts->mask |=3D NVMF_OPT_TRSVCID; } @@ -2915,7 +2920,7 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(stru= ct device *dev, if (ret) { pr_err("malformed address passed: %s:%s\n", opts->traddr, opts->trsvcid); - goto out_free_ctrl; + goto out_free_opts; } =20 if (opts->mask & NVMF_OPT_HOST_TRADDR) { @@ -2924,7 +2929,7 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(stru= ct device *dev, if (ret) { pr_err("malformed src address passed: %s\n", opts->host_traddr); - goto out_free_ctrl; + goto out_free_opts; } } =20 @@ -2933,20 +2938,20 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(st= ruct device *dev, pr_err("invalid interface passed: %s\n", opts->host_iface); ret =3D -ENODEV; - goto out_free_ctrl; + goto out_free_opts; } } =20 if (!opts->duplicate_connect && nvme_tcp_existing_controller(opts)) { ret =3D -EALREADY; - goto out_free_ctrl; + goto out_free_opts; } =20 ctrl->queues =3D kcalloc(ctrl->ctrl.queue_count, sizeof(*ctrl->queues), GFP_KERNEL); if (!ctrl->queues) { ret =3D -ENOMEM; - goto out_free_ctrl; + goto out_free_opts; } =20 ret =3D nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_tcp_ctrl_ops, 0); @@ -2956,6 +2961,8 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(stru= ct device *dev, return ctrl; out_kfree_queues: kfree(ctrl->queues); +out_free_opts: + nvmf_ctrl_options_put(opts); out_free_ctrl: kfree(ctrl); return ERR_PTR(ret); diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c index f85a8441bcc6ecdcfb17e9cf1e883d2817cce9fc..22039535ff9d3c8b2b42ecaa843= b947446a8ba99 100644 --- a/drivers/nvme/target/loop.c +++ b/drivers/nvme/target/loop.c @@ -291,8 +291,8 @@ static void nvme_loop_free_ctrl(struct nvme_ctrl *nctrl) if (nctrl->tagset) nvme_remove_io_tag_set(nctrl); kfree(ctrl->queues); - nvmf_free_options(nctrl->opts); free_ctrl: + nvmf_ctrl_options_put(nctrl->opts); kfree(ctrl); } =20 @@ -567,6 +567,12 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct = device *dev, ctrl =3D kzalloc(sizeof(*ctrl), GFP_KERNEL); if (!ctrl) return ERR_PTR(-ENOMEM); + + if (!nvmf_ctrl_options_get(opts)) { + ret =3D -ENOLCK; + goto out_free_ctrl; + } + ctrl->ctrl.opts =3D opts; INIT_LIST_HEAD(&ctrl->list); =20 @@ -574,10 +580,8 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct = device *dev, =20 ret =3D nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_loop_ctrl_ops, 0 /* no quirks, we're perfect! */); - if (ret) { - kfree(ctrl); - goto out; - } + if (ret) + goto out_free_opts; =20 ret =3D nvme_add_ctrl(&ctrl->ctrl); if (ret) @@ -641,7 +645,10 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct = device *dev, nvme_uninit_ctrl(&ctrl->ctrl); out_put_ctrl: nvme_put_ctrl(&ctrl->ctrl); -out: +out_free_opts: + nvmf_ctrl_options_put(opts); +out_free_ctrl: + kfree(ctrl); if (ret > 0) ret =3D -EIO; return ERR_PTR(ret); --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D81B53375CD for ; Fri, 29 Aug 2025 15:37:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756481857; cv=none; b=AmuTVclKrXK13XYyjgXohMh/2hX3n6CzziiWUG4v5RfDDcOhbpbkUvnDmxa4ibvzB/CscA/cvFHytL2LsRRxZ5+VuJcPABvYQh7m4aR16m5AHCiwcgSv4Lf91oyHO28zPyuNioYlq4NLoCZ45tOM+cFxquojZAeHrUlDQdObEZk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756481857; c=relaxed/simple; bh=Mr2XxeVwhgDsdCkBNqqoxxvgifdIeVB4y93gZzDxi3Y=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=pfgA2aQvCDECZO+j9jqLN1KHJFJXJlaTcWU3IWoaVGwgcVKMDFCtBcYAOT6CoLDU7eypCgRYtci5ylqNIBkzQBZLRsuhF429nlJ3/mPOmF7GTE5Cinoc9qbX2Tam+GSbiUftnML9CzH3W3yGFKytfQAYP6OjqOGxUzGtu5/o/Qk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mJoMFQE9; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mJoMFQE9" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 26274C4CEF0; Fri, 29 Aug 2025 15:37:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756481857; bh=Mr2XxeVwhgDsdCkBNqqoxxvgifdIeVB4y93gZzDxi3Y=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=mJoMFQE9dGTgCVsCM38gB1OW7WaOH2xFgw4JND7AWDUd7iB8aF7ypLr3f7+didaA1 rYZ9t5uV9gv3G4i6V3GckegmWAMWr0VlerZbGZu9g/HSxX4XN3aLXzo2Q7pnBDG6lB sAAr8StkcZ5q/ABDZObMPjU0SDE+JEkW8t/YbSJIJzWhiAp0POGFLGPyeGsAAfHVQA hlpb89cjZ9A4+ygGOV1y1ZB8jR17KzcjH+tZ/kflcWzRJYgSOR4UEnKbQqOd4cVZ+0 DlhLqVaT67pT6lktztK9qTFHsdYTGFIU2488KaKraGNJB4JOWNT1Lb36Wkn6gS0yBg ze4BcIMJr/2Ug== From: Daniel Wagner Date: Fri, 29 Aug 2025 17:37:26 +0200 Subject: [PATCH v3 2/4] nvme-fc: reorganize ctrl ref-counting and cleanup code Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250829-nvme-fc-sync-v3-2-d69c87e63aee@kernel.org> References: <20250829-nvme-fc-sync-v3-0-d69c87e63aee@kernel.org> In-Reply-To: <20250829-nvme-fc-sync-v3-0-d69c87e63aee@kernel.org> To: Keith Busch , Christoph Hellwig , Sagi Grimberg , James Smart Cc: Shinichiro Kawasaki , Hannes Reinecke , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Daniel Wagner X-Mailer: b4 0.14.2 The lifetime of the controller is managed by the upper layers. This ensures that the controller does not go away as long as there are commands in flight. Thus, there is no need to take references in the command handling code. Currently, the refcounting code is partially open-coded for handling error paths. By moving the cleanup code fully under refcounting and releasing references when necessary, the error handling can be simplified. Signed-off-by: Daniel Wagner Reviewed-by: Hannes Reinecke --- drivers/nvme/host/fc.c | 102 ++++++++++++++++-----------------------------= ---- 1 file changed, 32 insertions(+), 70 deletions(-) diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index db5dbbf6aee21db45e91ea6f0c122681b9bdaf6f..3e6c79bf0bfd3884867c9ebeb24= 771323a619934 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -228,6 +228,9 @@ static struct device *fc_udev_device; =20 static void nvme_fc_complete_rq(struct request *rq); =20 +static void nvme_fc_ctrl_put(struct nvme_fc_ctrl *); +static int nvme_fc_ctrl_get(struct nvme_fc_ctrl *); + /* *********************** FC-NVME Port Management ***********************= * */ =20 static void __nvme_fc_delete_hw_queue(struct nvme_fc_ctrl *, @@ -826,7 +829,7 @@ nvme_fc_unregister_remoteport(struct nvme_fc_remote_por= t *portptr) dev_warn(ctrl->ctrl.device, "NVME-FC{%d}: controller connectivity lost.\n", ctrl->cnum); - nvme_delete_ctrl(&ctrl->ctrl); + nvme_fc_ctrl_put(ctrl); } else nvme_fc_ctrl_connectivity_loss(ctrl); } @@ -980,9 +983,6 @@ fc_dma_unmap_sg(struct device *dev, struct scatterlist = *sg, int nents, =20 /* *********************** FC-NVME LS Handling ***************************= * */ =20 -static void nvme_fc_ctrl_put(struct nvme_fc_ctrl *); -static int nvme_fc_ctrl_get(struct nvme_fc_ctrl *); - static void nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg= ); =20 static void @@ -1476,8 +1476,6 @@ nvme_fc_match_disconn_ls(struct nvme_fc_rport *rport, spin_lock_irqsave(&rport->lock, flags); =20 list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) { - if (!nvme_fc_ctrl_get(ctrl)) - continue; spin_lock(&ctrl->lock); if (association_id =3D=3D ctrl->association_id) { oldls =3D ctrl->rcv_disconn; @@ -1485,10 +1483,6 @@ nvme_fc_match_disconn_ls(struct nvme_fc_rport *rport, ret =3D ctrl; } spin_unlock(&ctrl->lock); - if (ret) - /* leave the ctrl get reference */ - break; - nvme_fc_ctrl_put(ctrl); } =20 spin_unlock_irqrestore(&rport->lock, flags); @@ -1567,9 +1561,6 @@ nvme_fc_ls_disconnect_assoc(struct nvmefc_ls_rcv_op *= lsop) /* fail the association */ nvme_fc_error_recovery(ctrl, "Disconnect Association LS received"); =20 - /* release the reference taken by nvme_fc_match_disconn_ls() */ - nvme_fc_ctrl_put(ctrl); - return false; } =20 @@ -2036,7 +2027,6 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req) __nvme_fc_fcpop_chk_teardowns(ctrl, op, opstate); atomic_set(&op->state, FCPOP_STATE_IDLE); op->flags =3D FCOP_FLAGS_AEN; /* clear other flags */ - nvme_fc_ctrl_put(ctrl); goto check_error; } =20 @@ -2349,37 +2339,18 @@ nvme_fc_init_io_queues(struct nvme_fc_ctrl *ctrl) } =20 static void -nvme_fc_ctrl_free(struct kref *ref) +nvme_fc_ctrl_delete(struct kref *ref) { struct nvme_fc_ctrl *ctrl =3D container_of(ref, struct nvme_fc_ctrl, ref); - unsigned long flags; - - if (ctrl->ctrl.tagset) - nvme_remove_io_tag_set(&ctrl->ctrl); =20 - /* remove from rport list */ - spin_lock_irqsave(&ctrl->rport->lock, flags); - list_del(&ctrl->ctrl_list); - spin_unlock_irqrestore(&ctrl->rport->lock, flags); - - nvme_unquiesce_admin_queue(&ctrl->ctrl); - nvme_remove_admin_tag_set(&ctrl->ctrl); - - kfree(ctrl->queues); - - put_device(ctrl->dev); - nvme_fc_rport_put(ctrl->rport); - - ida_free(&nvme_fc_ctrl_cnt, ctrl->cnum); - nvmf_ctrl_options_put(ctrl->ctrl.opts); - kfree(ctrl); + nvme_delete_ctrl(&ctrl->ctrl); } =20 static void nvme_fc_ctrl_put(struct nvme_fc_ctrl *ctrl) { - kref_put(&ctrl->ref, nvme_fc_ctrl_free); + kref_put(&ctrl->ref, nvme_fc_ctrl_delete); } =20 static int @@ -2397,9 +2368,20 @@ nvme_fc_free_ctrl(struct nvme_ctrl *nctrl) { struct nvme_fc_ctrl *ctrl =3D to_fc_ctrl(nctrl); =20 - WARN_ON(nctrl !=3D &ctrl->ctrl); + if (ctrl->ctrl.tagset) + nvme_remove_io_tag_set(&ctrl->ctrl); + + nvme_unquiesce_admin_queue(&ctrl->ctrl); + nvme_remove_admin_tag_set(&ctrl->ctrl); =20 - nvme_fc_ctrl_put(ctrl); + kfree(ctrl->queues); + + put_device(ctrl->dev); + nvme_fc_rport_put(ctrl->rport); + + ida_free(&nvme_fc_ctrl_cnt, ctrl->cnum); + nvmf_ctrl_options_put(ctrl->ctrl.opts); + kfree(ctrl); } =20 /* @@ -2649,9 +2631,6 @@ nvme_fc_start_fcp_op(struct nvme_fc_ctrl *ctrl, struc= t nvme_fc_queue *queue, if (ctrl->rport->remoteport.port_state !=3D FC_OBJSTATE_ONLINE) return BLK_STS_RESOURCE; =20 - if (!nvme_fc_ctrl_get(ctrl)) - return BLK_STS_IOERR; - /* format the FC-NVME CMD IU and fcp_req */ cmdiu->connection_id =3D cpu_to_be64(queue->connection_id); cmdiu->data_len =3D cpu_to_be32(data_len); @@ -2696,7 +2675,6 @@ nvme_fc_start_fcp_op(struct nvme_fc_ctrl *ctrl, struc= t nvme_fc_queue *queue, ret =3D nvme_fc_map_data(ctrl, op->rq, op); if (ret < 0) { nvme_cleanup_cmd(op->rq); - nvme_fc_ctrl_put(ctrl); if (ret =3D=3D -ENOMEM || ret =3D=3D -EAGAIN) return BLK_STS_RESOURCE; return BLK_STS_IOERR; @@ -2737,8 +2715,6 @@ nvme_fc_start_fcp_op(struct nvme_fc_ctrl *ctrl, struc= t nvme_fc_queue *queue, nvme_cleanup_cmd(op->rq); } =20 - nvme_fc_ctrl_put(ctrl); - if (ctrl->rport->remoteport.port_state =3D=3D FC_OBJSTATE_ONLINE && ret !=3D -EBUSY) return BLK_STS_IOERR; @@ -2822,7 +2798,6 @@ nvme_fc_complete_rq(struct request *rq) =20 nvme_fc_unmap_data(ctrl, rq, op); nvme_complete_rq(rq); - nvme_fc_ctrl_put(ctrl); } =20 static void nvme_fc_map_queues(struct blk_mq_tag_set *set) @@ -3251,9 +3226,16 @@ static void nvme_fc_delete_ctrl(struct nvme_ctrl *nctrl) { struct nvme_fc_ctrl *ctrl =3D to_fc_ctrl(nctrl); + unsigned long flags; =20 cancel_work_sync(&ctrl->ioerr_work); cancel_delayed_work_sync(&ctrl->connect_work); + + /* remove from rport list */ + spin_lock_irqsave(&ctrl->rport->lock, flags); + list_del(&ctrl->ctrl_list); + spin_unlock_irqrestore(&ctrl->rport->lock, flags); + /* * kill the association on the link side. this will block * waiting for io to terminate @@ -3307,7 +3289,7 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl= , int status) ctrl->cnum, min_t(int, portptr->dev_loss_tmo, (ctrl->ctrl.opts->max_reconnects * ctrl->ctrl.opts->reconnect_delay))); - WARN_ON(nvme_delete_ctrl(&ctrl->ctrl)); + nvme_fc_ctrl_put(ctrl); } } =20 @@ -3521,6 +3503,7 @@ nvme_fc_alloc_ctrl(struct device *dev, struct nvmf_ct= rl_options *opts, out_free_ctrl: kfree(ctrl); out_fail: + nvme_fc_rport_put(rport); /* exit via here doesn't follow ctlr ref points */ return ERR_PTR(ret); } @@ -3539,7 +3522,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctr= l_options *opts, =20 ret =3D nvme_add_ctrl(&ctrl->ctrl); if (ret) - goto out_put_ctrl; + goto fail_ctrl; =20 ret =3D nvme_alloc_admin_tag_set(&ctrl->ctrl, &ctrl->admin_tag_set, &nvme_fc_admin_mq_ops, @@ -3574,26 +3557,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ct= rl_options *opts, return &ctrl->ctrl; =20 fail_ctrl: - nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING); - cancel_work_sync(&ctrl->ioerr_work); - cancel_work_sync(&ctrl->ctrl.reset_work); - cancel_delayed_work_sync(&ctrl->connect_work); - - /* initiate nvme ctrl ref counting teardown */ - nvme_uninit_ctrl(&ctrl->ctrl); - -out_put_ctrl: - /* Remove core ctrl ref. */ - nvme_put_ctrl(&ctrl->ctrl); - - /* as we're past the point where we transition to the ref - * counting teardown path, if we return a bad pointer here, - * the calling routine, thinking it's prior to the - * transition, will do an rport put. Since the teardown - * path also does a rport put, we do an extra get here to - * so proper order/teardown happens. - */ - nvme_fc_rport_get(rport); + nvme_fc_ctrl_put(ctrl); =20 return ERR_PTR(-EIO); } @@ -3703,8 +3667,6 @@ nvme_fc_create_ctrl(struct device *dev, struct nvmf_c= trl_options *opts) spin_unlock_irqrestore(&nvme_fc_lock, flags); =20 ctrl =3D nvme_fc_init_ctrl(dev, opts, lport, rport); - if (IS_ERR(ctrl)) - nvme_fc_rport_put(rport); return ctrl; } } @@ -3929,7 +3891,7 @@ nvme_fc_delete_controllers(struct nvme_fc_rport *rpor= t) dev_warn(ctrl->ctrl.device, "NVME-FC{%d}: transport unloading: deleting ctrl\n", ctrl->cnum); - nvme_delete_ctrl(&ctrl->ctrl); + nvme_fc_ctrl_put(ctrl); } spin_unlock(&rport->lock); } --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FADB337697 for ; Fri, 29 Aug 2025 15:37:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756481860; cv=none; b=MOh2k7Csr1cWQoC7q3NfhiFW6mTJ6F2/ycOX/ZlZ1yj5jK+yaw3xYSUw078XIvLNn8w22YSyxsZ/choa/ai5gCwojM8UPPozM5NSm/2NJNuL/aGm0L6ifp1Ep8sDurKwdgWM+aF754FO1Q4pJEWLOVBXj3MQ+mcrU3KJ/ggYJvU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756481860; c=relaxed/simple; bh=1D+KKEVMrSYTaxz25aM5XfBdCuC1VdSpDaU+c8DEwFg=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=VI8i6qJVhmqhVtOOYQbBTCQnP+lL7L3AqdE+Qfmco2cOhYsE6ewcyCG2eYwYnCJuFJL88ae819KJJqIs8Ur+16C3sX74dRd8xq81tJcuWCd21y7FV2TbNeVYrIMb4Mv8ERfFz06SNwVNfZ32p/pbXoPIvV/j82Mfwi3DojdYHVU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=R6q5PNE+; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="R6q5PNE+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9C2FEC4CEF0; Fri, 29 Aug 2025 15:37:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756481860; bh=1D+KKEVMrSYTaxz25aM5XfBdCuC1VdSpDaU+c8DEwFg=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=R6q5PNE+cEAEH+7JIhevtWU+ym8Utaq6iEFv5FWmZ+cSV7dBwzkXl2JUC9U8ujVBS A1/IvYT5RvY3xbiWgP7kypXeoVaPYGRV6B87iFlJr+4wZGIaqIPZBZbGZFbf/rvWqf RfhK55t1LINEALEbKziQ/dT0DoogSbhgG0dIj2GCNjREfM45XMZpSHe4/84FLC8GE3 oZCSWibHTXUqhwdpBly9Hf25/CnncwuqOecUx6Penb7+j9zXtPvnBrgXlzCujynP7P GHdpSQQeAOzo7BuF7EFk8vTBPxFfcfXjBCIKlt2XxM3rTQFA2oOTvF4wNhJ3Ndph8S I++CPRlY+nXxg== From: Daniel Wagner Date: Fri, 29 Aug 2025 17:37:27 +0200 Subject: [PATCH v3 3/4] nvme-fc: refactore nvme_fc_reconnect_or_delete Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250829-nvme-fc-sync-v3-3-d69c87e63aee@kernel.org> References: <20250829-nvme-fc-sync-v3-0-d69c87e63aee@kernel.org> In-Reply-To: <20250829-nvme-fc-sync-v3-0-d69c87e63aee@kernel.org> To: Keith Busch , Christoph Hellwig , Sagi Grimberg , James Smart Cc: Shinichiro Kawasaki , Hannes Reinecke , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Daniel Wagner X-Mailer: b4 0.14.2 Instead using if else blocks, apply the 'early return' and 'goto out' pattern. This reduces the overall complexity of this function as the conditions can be read in a linear order. The only function change is that always the next reconnect attempt message will be printed. Signed-off-by: Daniel Wagner Reviewed-by: Hannes Reinecke --- drivers/nvme/host/fc.c | 64 +++++++++++++++++++++++++++-------------------= ---- 1 file changed, 34 insertions(+), 30 deletions(-) diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 3e6c79bf0bfd3884867c9ebeb24771323a619934..5d9e193bd2525ae1a2f08e2a0a1= 6ca41e08a7304 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -3249,7 +3249,6 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl= , int status) struct nvme_fc_rport *rport =3D ctrl->rport; struct nvme_fc_remote_port *portptr =3D &rport->remoteport; unsigned long recon_delay =3D ctrl->ctrl.opts->reconnect_delay * HZ; - bool recon =3D true; =20 if (nvme_ctrl_state(&ctrl->ctrl) !=3D NVME_CTRL_CONNECTING) return; @@ -3259,38 +3258,43 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ct= rl, int status) "NVME-FC{%d}: reset: Reconnect attempt failed (%d)\n", ctrl->cnum, status); } else if (time_after_eq(jiffies, rport->dev_loss_end)) - recon =3D false; + goto delete_log; =20 - if (recon && nvmf_should_reconnect(&ctrl->ctrl, status)) { - if (portptr->port_state =3D=3D FC_OBJSTATE_ONLINE) - dev_info(ctrl->ctrl.device, - "NVME-FC{%d}: Reconnect attempt in %ld " - "seconds\n", - ctrl->cnum, recon_delay / HZ); - else if (time_after(jiffies + recon_delay, rport->dev_loss_end)) - recon_delay =3D rport->dev_loss_end - jiffies; + if (!nvmf_should_reconnect(&ctrl->ctrl, status)) + goto delete_log; =20 - queue_delayed_work(nvme_wq, &ctrl->connect_work, recon_delay); - } else { - if (portptr->port_state =3D=3D FC_OBJSTATE_ONLINE) { - if (status > 0 && (status & NVME_STATUS_DNR)) - dev_warn(ctrl->ctrl.device, - "NVME-FC{%d}: reconnect failure\n", - ctrl->cnum); - else - dev_warn(ctrl->ctrl.device, - "NVME-FC{%d}: Max reconnect attempts " - "(%d) reached.\n", - ctrl->cnum, ctrl->ctrl.nr_reconnects); - } else + if (portptr->port_state !=3D FC_OBJSTATE_ONLINE && + time_after(jiffies + recon_delay, rport->dev_loss_end)) + recon_delay =3D rport->dev_loss_end - jiffies; + + dev_info(ctrl->ctrl.device, + "NVME-FC{%d}: Reconnect attempt in %ld seconds\n", + ctrl->cnum, recon_delay / HZ); + + queue_delayed_work(nvme_wq, &ctrl->connect_work, recon_delay); + + return; + +delete_log: + if (portptr->port_state =3D=3D FC_OBJSTATE_ONLINE) { + if (status > 0 && (status & NVME_STATUS_DNR)) dev_warn(ctrl->ctrl.device, - "NVME-FC{%d}: dev_loss_tmo (%d) expired " - "while waiting for remoteport connectivity.\n", - ctrl->cnum, min_t(int, portptr->dev_loss_tmo, - (ctrl->ctrl.opts->max_reconnects * - ctrl->ctrl.opts->reconnect_delay))); - nvme_fc_ctrl_put(ctrl); - } + "NVME-FC{%d}: reconnect failure\n", + ctrl->cnum); + else + dev_warn(ctrl->ctrl.device, + "NVME-FC{%d}: Max reconnect attempts " + "(%d) reached.\n", + ctrl->cnum, ctrl->ctrl.nr_reconnects); + } else + dev_warn(ctrl->ctrl.device, + "NVME-FC{%d}: dev_loss_tmo (%d) expired " + "while waiting for remoteport connectivity.\n", + ctrl->cnum, min_t(int, portptr->dev_loss_tmo, + (ctrl->ctrl.opts->max_reconnects * + ctrl->ctrl.opts->reconnect_delay))); + + nvme_fc_ctrl_put(ctrl); } =20 static void --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9268338F5C for ; Fri, 29 Aug 2025 15:37:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756481863; cv=none; b=KiYdqe8SefO49Z1CvDpn7dKejFIoR6k9OMh4HCE/DUo7QOGWswYHiEAzPaLBj6iqE7DBnP+G/TSsBk//EFV/SFya+qTKHDHgj4q0wI27YJoz2HzgqtGizy1ePU38dAkjt5y2GwYKSpmzTh7BxO0n7XU7Fjq8KoXreBtau+tGbz0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756481863; c=relaxed/simple; bh=jOI5q3SLcBs+QZqt5jDcPT92riWlRIK70SQlJd0iU9I=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=kOZ/nJZuM4RKGplEWd56X1j+AxyJD0E1VOXmJEaVQLcSMfFm+Fi1k9zp+HIJ6cCNyrx6fdmLYoBbI/B3DqpvoqyYUr2Y7/5m49b4ZVYk3yBTSl9zjez61L1oxfJu//FKkl46QzwJIz1/oVKo718KVv5yT6vqmRM/JJlpiNcKbgg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=A6QYG25O; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="A6QYG25O" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4DD34C4CEF5; Fri, 29 Aug 2025 15:37:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756481862; bh=jOI5q3SLcBs+QZqt5jDcPT92riWlRIK70SQlJd0iU9I=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=A6QYG25ONY8hVufJssfXNil6CZxQms/EnL1n1pWE9PBrzTc0NTLlEU9IkJvEr7Pjh f9G2XGTFUFAyvRDxVJctBr3wHPcVybksPycmqbm9GagvCgibp59JcxNSjKjNc9L//x 7Tmq9HZpw4C0aNHWnbzV8A1MCuwU2ik5MI2JfISlfrihaRjIYXy+OYUPjxZZ/kWAe9 iErEjnRdP2w7fJXpLxqJQr6sQW0n/r7DpbwEJU1fU6/82xOn10+qLTaGJu5nxaVC99 JBnXs9k0dIsA/9lzbVezqgLWUxYJ6EgFaqQU2X0uuH5ps68mrwuxZX1JYzcZGQcT6c lEbrP2/7uD+eA== From: Daniel Wagner Date: Fri, 29 Aug 2025 17:37:28 +0200 Subject: [PATCH v3 4/4] nvme-fc: wait for initial connect attempt to finish Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250829-nvme-fc-sync-v3-4-d69c87e63aee@kernel.org> References: <20250829-nvme-fc-sync-v3-0-d69c87e63aee@kernel.org> In-Reply-To: <20250829-nvme-fc-sync-v3-0-d69c87e63aee@kernel.org> To: Keith Busch , Christoph Hellwig , Sagi Grimberg , James Smart Cc: Shinichiro Kawasaki , Hannes Reinecke , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Daniel Wagner X-Mailer: b4 0.14.2 The TCP and RDMA transport are doing synchronous connects. Thus the write to /dev/nvme-fabrics blocks and will return either success or fail. The FC transport offloads the connect attempt to a workqueue and thus it's an asynchronous operation. The write will succeeds even though the connection attempt is ongoing. This async connect feature was introduced to mitigate problems with transient connect errors and the task to coordinate retries with userspace (nvme-cli). Unfortunately, this makes the transports behave differently. Streamline nvme-fc to wait for the initial connection attempt. In order to support also the async connection attempt introduce a new flag for userspace, which is an opt-in feature. This avoids breaking existing users which are not ready for the FC transport behavior change. Link: https://lore.kernel.org/linux-nvme/0605ac36-16d5-2026-d3c6-62d346db6d= fb@gmail.com/ Signed-off-by: Daniel Wagner --- drivers/nvme/host/fabrics.c | 18 +++++++++++++++++- drivers/nvme/host/fabrics.h | 3 +++ drivers/nvme/host/fc.c | 36 +++++++++++++++++++++++++++++++++++- 3 files changed, 55 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index cb9311c399856de54a2e4d41d41d295dd1aef31e..73e7a1e7925ef2e8ad633e8e2bd= 36207181dcbb6 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -709,6 +709,7 @@ static const match_table_t opt_tokens =3D { { NVMF_OPT_TLS, "tls" }, { NVMF_OPT_CONCAT, "concat" }, #endif + { NVMF_OPT_CONNECT_ASYNC, "connect_async=3D%d" }, { NVMF_OPT_ERR, NULL } }; =20 @@ -738,6 +739,8 @@ static int nvmf_parse_options(struct nvmf_ctrl_options = *opts, opts->tls_key =3D NULL; opts->keyring =3D NULL; opts->concat =3D false; + /* keep backward compatibility with older nvme-cli */ + opts->connect_async =3D true; =20 options =3D o =3D kstrdup(buf, GFP_KERNEL); if (!options) @@ -1064,6 +1067,19 @@ static int nvmf_parse_options(struct nvmf_ctrl_optio= ns *opts, } opts->concat =3D true; break; + case NVMF_OPT_CONNECT_ASYNC: + if (match_int(args, &token)) { + ret =3D -EINVAL; + goto out; + } + if (token < 0 || token > 1) { + pr_err("Invalid connect_async %d value\n", + token); + ret =3D -EINVAL; + goto out; + } + opts->connect_async =3D token; + break; default: pr_warn("unknown parameter or missing value '%s' in ctrl creation reque= st\n", p); @@ -1316,7 +1332,7 @@ EXPORT_SYMBOL_GPL(nvmf_ctrl_options_put); NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\ NVMF_OPT_DISABLE_SQFLOW | NVMF_OPT_DISCOVERY |\ NVMF_OPT_FAIL_FAST_TMO | NVMF_OPT_DHCHAP_SECRET |\ - NVMF_OPT_DHCHAP_CTRL_SECRET) + NVMF_OPT_DHCHAP_CTRL_SECRET | NVMF_OPT_CONNECT_ASYNC) =20 static struct nvme_ctrl * nvmf_create_ctrl(struct device *dev, const char *buf) diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h index bcc1d5f97ee5a03852830bf3ba0a15f82c7c2c03..9015bb3f63e1d0c412cf4af5194= a42411fbb516c 100644 --- a/drivers/nvme/host/fabrics.h +++ b/drivers/nvme/host/fabrics.h @@ -67,6 +67,7 @@ enum { NVMF_OPT_KEYRING =3D 1 << 26, NVMF_OPT_TLS_KEY =3D 1 << 27, NVMF_OPT_CONCAT =3D 1 << 28, + NVMF_OPT_CONNECT_ASYNC =3D 1 << 29, }; =20 /** @@ -111,6 +112,7 @@ enum { * @nr_poll_queues: number of queues for polling I/O * @tos: type of service * @fast_io_fail_tmo: Fast I/O fail timeout in seconds + * @connect_async: Don't wait for the initial connect attempt to succeed o= r fail */ struct nvmf_ctrl_options { struct kref ref; @@ -142,6 +144,7 @@ struct nvmf_ctrl_options { unsigned int nr_poll_queues; int tos; int fast_io_fail_tmo; + bool connect_async; }; =20 int nvmf_ctrl_options_get(struct nvmf_ctrl_options *opts); diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 5d9e193bd2525ae1a2f08e2a0a16ca41e08a7304..ccd67acb55e216602010933914a= fca77248b3de0 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -148,6 +148,7 @@ struct nvme_fc_rport { #define ASSOC_ACTIVE 0 #define ASSOC_FAILED 1 #define FCCTRL_TERMIO 2 +#define INITIAL_SYNC_CONNECT 3 =20 struct nvme_fc_ctrl { spinlock_t lock; @@ -168,6 +169,7 @@ struct nvme_fc_ctrl { =20 struct work_struct ioerr_work; struct delayed_work connect_work; + struct completion connect_completion; =20 struct kref ref; unsigned long flags; @@ -829,6 +831,7 @@ nvme_fc_unregister_remoteport(struct nvme_fc_remote_por= t *portptr) dev_warn(ctrl->ctrl.device, "NVME-FC{%d}: controller connectivity lost.\n", ctrl->cnum); + complete(&ctrl->connect_completion); nvme_fc_ctrl_put(ctrl); } else nvme_fc_ctrl_connectivity_loss(ctrl); @@ -3253,6 +3256,9 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl= , int status) if (nvme_ctrl_state(&ctrl->ctrl) !=3D NVME_CTRL_CONNECTING) return; =20 + if (test_bit(INITIAL_SYNC_CONNECT, &ctrl->flags)) + goto delete; + if (portptr->port_state =3D=3D FC_OBJSTATE_ONLINE) { dev_info(ctrl->ctrl.device, "NVME-FC{%d}: reset: Reconnect attempt failed (%d)\n", @@ -3294,6 +3300,8 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl= , int status) (ctrl->ctrl.opts->max_reconnects * ctrl->ctrl.opts->reconnect_delay))); =20 +delete: + complete(&ctrl->connect_completion); nvme_fc_ctrl_put(ctrl); } =20 @@ -3353,10 +3361,14 @@ nvme_fc_connect_ctrl_work(struct work_struct *work) ret =3D nvme_fc_create_association(ctrl); if (ret) nvme_fc_reconnect_or_delete(ctrl, ret); - else + else { dev_info(ctrl->ctrl.device, "NVME-FC{%d}: controller connect complete\n", ctrl->cnum); + pr_info("%s:%d clear initial sync connect bit\n", __func__, __LINE__); + clear_bit(INITIAL_SYNC_CONNECT, &ctrl->flags); + complete(&ctrl->connect_completion); + } } =20 =20 @@ -3461,6 +3473,7 @@ nvme_fc_alloc_ctrl(struct device *dev, struct nvmf_ct= rl_options *opts, =20 INIT_WORK(&ctrl->ctrl.reset_work, nvme_fc_reset_ctrl_work); INIT_DELAYED_WORK(&ctrl->connect_work, nvme_fc_connect_ctrl_work); + init_completion(&ctrl->connect_completion); INIT_WORK(&ctrl->ioerr_work, nvme_fc_ctrl_ioerr_work); spin_lock_init(&ctrl->lock); =20 @@ -3539,6 +3552,12 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ct= rl_options *opts, list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list); spin_unlock_irqrestore(&rport->lock, flags); =20 + if (!opts->connect_async) { + pr_info("%s:%d set initial connect bit\n", __func__, __LINE__); + set_bit(INITIAL_SYNC_CONNECT, &ctrl->flags); + nvme_fc_ctrl_get(ctrl); + } + if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) { dev_err(ctrl->ctrl.device, "NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum); @@ -3554,6 +3573,20 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ct= rl_options *opts, =20 flush_delayed_work(&ctrl->connect_work); =20 + if (!opts->connect_async) { + enum nvme_ctrl_state state; + + wait_for_completion(&ctrl->connect_completion); + state =3D nvme_ctrl_state(&ctrl->ctrl); + nvme_fc_ctrl_put(ctrl); + + if (state !=3D NVME_CTRL_LIVE) { + /* Cleanup is handled by the connect state machine */ + pr_info("%s:%d ctrl state %d\n", __func__, __LINE__, state); + return ERR_PTR(-EIO); + } + } + dev_info(ctrl->ctrl.device, "NVME-FC{%d}: new ctrl: NQN \"%s\", hostnqn: %s\n", ctrl->cnum, nvmf_ctrl_subsysnqn(&ctrl->ctrl), opts->host->nqn); @@ -3895,6 +3928,7 @@ nvme_fc_delete_controllers(struct nvme_fc_rport *rpor= t) dev_warn(ctrl->ctrl.device, "NVME-FC{%d}: transport unloading: deleting ctrl\n", ctrl->cnum); + complete(&ctrl->connect_completion); nvme_fc_ctrl_put(ctrl); } spin_unlock(&rport->lock); --=20 2.51.0