From nobody Tue Dec 16 16:35:51 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60190218AA7 for ; Thu, 9 Jan 2025 13:31:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736429463; cv=none; b=gCStDyAne4FnrgnTJRd6gFSIw9/D2yCB/7HrNX/k0xvfe5/26Qfco9zvV/7/mJuiU3EG6owk2Ox7g2lJ9tqK/NalUyNpIn6EM8vdXUcmDgSYMNsnwzexIR+YWyyjhLeuKmtIcF+LyjLQb85x7adB7GsK7jKW3tJ0xBn5tG2/rLM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736429463; c=relaxed/simple; bh=SNgDfj00OqljITRWu1bkZd/iWfQC01LmrSzpDRvZOYw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=RrsnBm2h9fRAD3wV57ULKdP2Fm+XV0VdaotZ3X4xWfNEksGxdt4tb9xJbkZEa+QQdmCY+myRSsrAoV2weG1wndHhKliS95Zu+5MYd9PNExEGd5vYhNh9+oVjHxEbmUy3lNBvOQHb1VSBeT0KBEPnB/slIlvIlO1cUzrmgvTUeRg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YpcMkjyy; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YpcMkjyy" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 94B03C4CED6; Thu, 9 Jan 2025 13:31:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1736429462; bh=SNgDfj00OqljITRWu1bkZd/iWfQC01LmrSzpDRvZOYw=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=YpcMkjyyF8mw6YEsYP0LBZ1UUfvtZ3m9HhW8L71zJG5TOAFsKSjKTozRNwqGf1/GB ZxR0SainwcXXZhXIJCLrjsWmZ7DSEpAqkhDm5uTzqUyr3SIoXmdxRLP+jU6DkA8nC/ EE8k24UrWUAmGnkATGffpmSA2NGdgoMeCf3BneEeZlNtF6U9x2gYhpKqAYKlFzL3SK 1HkFSc1DXsU++pZcKyjvWrvPOtTGmacx8NTaXFgWopFyuoQ3GkgH5h88i1jHcyNDdX PREZPZZitZlrHCScqubLXtLqDCySFun0gXzkeQiT17P/unI2ON3nuTTWUysH7+w3m8 AhNHfmQwJQS4g== From: Daniel Wagner Date: Thu, 09 Jan 2025 14:30:47 +0100 Subject: [PATCH v4 1/3] nvme-fc: go straight to connecting state when initializing Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250109-nvme-fc-handle-com-lost-v4-1-fe5cae17b492@kernel.org> References: <20250109-nvme-fc-handle-com-lost-v4-0-fe5cae17b492@kernel.org> In-Reply-To: <20250109-nvme-fc-handle-com-lost-v4-0-fe5cae17b492@kernel.org> To: James Smart , Keith Busch , Christoph Hellwig , Sagi Grimberg , Hannes Reinecke , Paul Ely Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Daniel Wagner X-Mailer: b4 0.14.2 The initial controller initialization mimiks the reconnect loop behavior by switching from NEW to RESETTING and then to CONNECTING. The transition from NEW to CONNECTING is a valid transition, so there is no point entering the RESETTING state. TCP and RDMA also transition directly to CONNECTING state. Reviewed-by: Sagi Grimberg Reviewed-by: Hannes Reinecke Reviewed-by: Christoph Hellwig Signed-off-by: Daniel Wagner --- drivers/nvme/host/fc.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 094be164ffdc0fb79050cfb92c32dfaee8d15622..7409da42b9ee580cdd6fe78c0f9= 3e78c4ad08675 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -3578,8 +3578,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctr= l_options *opts, list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list); spin_unlock_irqrestore(&rport->lock, flags); =20 - if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING) || - !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) { + if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) { dev_err(ctrl->ctrl.device, "NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum); goto fail_ctrl; --=20 2.47.1 From nobody Tue Dec 16 16:35:51 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EBF73219A75 for ; Thu, 9 Jan 2025 13:31:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736429466; cv=none; b=Fqi9vaorwZdzhNaw3KMxvOEh2iA5xo7yrE6EBfWruovLMdN8Mj+2OaRFZmpa3t3DlC9mGKPA/4MrkJqmkdnVqFLmgES57udqNcNMXb80yYKdSQ30IaL0+qavGBjZ94a2D5NFAbJijTP3h247O+eKoAa3jYK+PyhvcR10PO8tEwU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736429466; c=relaxed/simple; bh=nIScwi2DSPh0GvWAvF2mKHT9Nt5wUZKyiIzlhzVSW4w=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=h3EkhDhJFICPdx1hIpY1CW+BCVN6UlVIr6Z1hY1YWvY5r0aJL+r+D5xJWa1B3SlcUadPCtXY4iaTDD2NZsLDqBYic7YU6pYYJFVkB91qSqT+d4ypcBwJzwiDa8KeeRSZV1mc+Vf0rE9AcqN13r+d63SVnFfrVcK3kxdiQ1QMZWs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GPfLd0KS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GPfLd0KS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 20DCCC4CED2; Thu, 9 Jan 2025 13:31:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1736429465; bh=nIScwi2DSPh0GvWAvF2mKHT9Nt5wUZKyiIzlhzVSW4w=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=GPfLd0KSyXPP1kHLy8IabfOv4yfiatJSpxe8jGY8lOKORHtVPEvW/nSWWzf+gOJIj AJehwktHVNsNZk/Hpff24v/vRdmA2nYG/fLeSTuqA6YaYDKU/V2fjMB5l79wNvhhpE wLDCpE+L/DlI/ioi/zc4JL+PLXYyOHIMi3xD/YHT525YTvbB4/QEp83+UwPpxM3Kcq sghV7I2w2ruuxRe3VaoRbxaK/8PY0QWpIb1YwNzdbteNtOGykWQp0s1zJNFNiY1sq1 1Dkz1U9qW8wiRHCFT1IZ584qvx6I2defb2NOTovikfN0dI5B2vBr4ucRgw74QpgV4A 2FFZckyXoJnRw== From: Daniel Wagner Date: Thu, 09 Jan 2025 14:30:48 +0100 Subject: [PATCH v4 2/3] nvme: handle connectivity loss in nvme_set_queue_count Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250109-nvme-fc-handle-com-lost-v4-2-fe5cae17b492@kernel.org> References: <20250109-nvme-fc-handle-com-lost-v4-0-fe5cae17b492@kernel.org> In-Reply-To: <20250109-nvme-fc-handle-com-lost-v4-0-fe5cae17b492@kernel.org> To: James Smart , Keith Busch , Christoph Hellwig , Sagi Grimberg , Hannes Reinecke , Paul Ely Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Daniel Wagner X-Mailer: b4 0.14.2 When the set feature attempts fails with any NVME status code set in nvme_set_queue_count, the function still report success. Though the numbers of queues set to 0. This is done to support controllers in degraded state (the admin queue is still up and running but no IO queues). Though there is an exception. When nvme_set_features reports an host path error, nvme_set_queue_count should propagate this error as the connectivity is lost, which means also the admin queue is not working anymore. Fixes: 9a0be7abb62f ("nvme: refactor set_queue_count") Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Sagi Grimberg Signed-off-by: Daniel Wagner --- drivers/nvme/host/core.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index b72b0e06801490f2c14e7fa09a45a870045a221d..a08ff8b362d0b7bd1a8a343e26d= 564b089c6bad3 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1695,7 +1695,13 @@ int nvme_set_queue_count(struct nvme_ctrl *ctrl, int= *count) =20 status =3D nvme_set_features(ctrl, NVME_FEAT_NUM_QUEUES, q_count, NULL, 0, &result); - if (status < 0) + + /* + * It's either a kernel error or the host observed a connection + * lost. In either case it's not possible communicate with the + * controller and thus enter the error code path. + */ + if (status < 0 || status =3D=3D NVME_SC_HOST_PATH_ERROR) return status; =20 /* --=20 2.47.1 From nobody Tue Dec 16 16:35:51 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59F51219A93 for ; Thu, 9 Jan 2025 13:31:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736429468; cv=none; b=LHodEk0QuA8hHrhMhN7xpL0qNpFbBZPLCPuj7SJKKm5hMTsDvDu7adG5wiHVKZESA5NtB0L0ifzHJDCIR8bJHIf1m2CeT5yW3ubNldh7D2itH4booPThfDyQWxWn9+tVZohPzCczml7yBx6x93R4esqcFApdY0DIPhaXCZuydMA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736429468; c=relaxed/simple; bh=3QGicS42gxFxR+GV2Bop0Uf8DmbC5XfA8QuhBkB8Huw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=n4D5yRESHhZb3qpCRceAUjFecNbZvmcTksi89a0uRl0d7oH8uvOBI36RkSRSG25pWKCKbLhdDdNXZbnkDWxy71ZvQv2rqCUKHvmYBUnfn3lS0wuqaemznzvMCv/2xo41azZ1w+biAsHKv2WfoPImpno72Wt3pnUNM0Xn3lbVrsk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Ti40nWHn; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ti40nWHn" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9C63BC4CEE2; Thu, 9 Jan 2025 13:31:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1736429468; bh=3QGicS42gxFxR+GV2Bop0Uf8DmbC5XfA8QuhBkB8Huw=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=Ti40nWHnCy/ZXojSWdvg1iO3hRS5GpEC6vp8nynZkVEXi52dLW4EqzjW4Iu+er5h5 VHaUg3j5altZz6OsHIOd84ZJMedlm5AfMl8saD19CI5JOIHt3DGZj7r0df0Z1FG3LF xNRu2Hfax79Wl6uCZlR7kP6jWCys+o0v4QZj5S/PlCCh0QOgTPDVKel6HgJOovlL1T HGCGRBcudEHn2d7OGA08Eo6c7YCyjuNvKT+V1e/J/bxn+H6xEYTlQ6kGcaGHyQ8+MU wE+ZJMhuNbBopQHO718cNn/1SZ88qoCuMOY4apM4qIg2p79ntyhEMxjEdFxI8uhSGQ XJjdE/DagdbJw== From: Daniel Wagner Date: Thu, 09 Jan 2025 14:30:49 +0100 Subject: [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250109-nvme-fc-handle-com-lost-v4-3-fe5cae17b492@kernel.org> References: <20250109-nvme-fc-handle-com-lost-v4-0-fe5cae17b492@kernel.org> In-Reply-To: <20250109-nvme-fc-handle-com-lost-v4-0-fe5cae17b492@kernel.org> To: James Smart , Keith Busch , Christoph Hellwig , Sagi Grimberg , Hannes Reinecke , Paul Ely Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Daniel Wagner X-Mailer: b4 0.14.2 When a connectivity loss occurs while nvme_fc_create_assocation is being executed, it's possible that the ctrl ends up stuck in the LIVE state: 1) nvme nvme10: NVME-FC{10}: create association : ... 2) nvme nvme10: NVME-FC{10}: controller connectivity lost. Awaiting Reconnect nvme nvme10: queue_size 128 > ctrl maxcmd 32, reducing to maxcmd 3) nvme nvme10: Could not set queue count (880) nvme nvme10: Failed to configure AEN (cfg 900) 4) nvme nvme10: NVME-FC{10}: controller connect complete 5) nvme nvme10: failed nvme_keep_alive_end_io error=3D4 A connection attempt starts 1) and the ctrl is in state CONNECTING. Shortly after the LLDD driver detects a connection lost event and calls nvme_fc_ctrl_connectivity_loss 2). Because we are still in CONNECTING state, this event is ignored. nvme_fc_create_association continues to run in parallel and tries to communicate with the controller and these commands will fail. Though these errors are filtered out, e.g in 3) setting the I/O queues numbers fails which leads to an early exit in nvme_fc_create_io_queues. Because the number of IO queues is 0 at this point, there is nothing left in nvme_fc_create_association which could detected the connection drop. Thus the ctrl enters LIVE state 4). Eventually the keep alive handler times out 5) but because nothing is being done, the ctrl stays in LIVE state. There is already the ASSOC_FAILED flag to track connectivity loss event but this bit is set too late in the recovery code path. Move this into the connectivity loss event handler and synchronize it with the state change. This ensures that the ASSOC_FAILED flag is seen by nvme_fc_create_io_queues and it does not enter the LIVE state after a connectivity loss event. If the connectivity loss event happens after we entered the LIVE state the normal error recovery path is executed. Signed-off-by: Daniel Wagner Reviewed-by: Hannes Reinecke Reviewed-by: Sagi Grimberg --- drivers/nvme/host/fc.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 7409da42b9ee580cdd6fe78c0f93e78c4ad08675..55884d3df6f291cfddb4742e135= b54a72f1cfa05 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -781,11 +781,19 @@ nvme_fc_abort_lsops(struct nvme_fc_rport *rport) static void nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl) { + enum nvme_ctrl_state state; + unsigned long flags; + dev_info(ctrl->ctrl.device, "NVME-FC{%d}: controller connectivity lost. Awaiting " "Reconnect", ctrl->cnum); =20 - switch (nvme_ctrl_state(&ctrl->ctrl)) { + spin_lock_irqsave(&ctrl->lock, flags); + set_bit(ASSOC_FAILED, &ctrl->flags); + state =3D nvme_ctrl_state(&ctrl->ctrl); + spin_unlock_irqrestore(&ctrl->lock, flags); + + switch (state) { case NVME_CTRL_NEW: case NVME_CTRL_LIVE: /* @@ -2542,7 +2550,6 @@ nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, cha= r *errmsg) */ if (ctrl->ctrl.state =3D=3D NVME_CTRL_CONNECTING) { __nvme_fc_abort_outstanding_ios(ctrl, true); - set_bit(ASSOC_FAILED, &ctrl->flags); dev_warn(ctrl->ctrl.device, "NVME-FC{%d}: transport error during (re)connect\n", ctrl->cnum); @@ -3167,12 +3174,18 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctr= l) else ret =3D nvme_fc_recreate_io_queues(ctrl); } - if (!ret && test_bit(ASSOC_FAILED, &ctrl->flags)) - ret =3D -EIO; if (ret) goto out_term_aen_ops; =20 - changed =3D nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE); + spin_lock_irqsave(&ctrl->lock, flags); + if (!test_bit(ASSOC_FAILED, &ctrl->flags)) + changed =3D nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE); + else + ret =3D -EIO; + spin_unlock_irqrestore(&ctrl->lock, flags); + + if (ret) + goto out_term_aen_ops; =20 ctrl->ctrl.nr_reconnects =3D 0; =20 --=20 2.47.1