From nobody Tue Dec 2 00:05:23 2025 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A48B930C613 for ; Wed, 26 Nov 2025 02:13:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764123212; cv=none; b=lmY/bXIdIm3huMqm0zk1rtgQ8yNHGujPHTEFznMYjpmV98xUur0+SH+xGRaUeBxOQ3mMe83g+sRK6BAqaxBu7aP5kXmDsOrjhpuYv30MWlnz9gMGjYtyZPigWNGUQbFv9sB4lKMLHxSvEkdwDW4/AMvmC8Iyh1gvRF5NbpOmN4U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764123212; c=relaxed/simple; bh=ALTopKS1tprmQPNMJ2ZcDp4mJw2/zjNpJN9hlFIgCRM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=spnNfeunYUxtMB4Wdxvu2rfBtMfl3csjZ0cP+AokeaSJ6/gfKk/51MzQGtChhAxuLnaUTgAymcsWtiMxTcIxtsg4k8rgpjUgczKz3L14bh/EKP0gmQ8/T8FFY3JtKICLOr3oW1Mn53ljDmDUzzXzKoVy+7Ha7g0rxo6xfIFSGUQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com; spf=fail smtp.mailfrom=purestorage.com; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b=BNdSd9O0; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=purestorage.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b="BNdSd9O0" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-7ad1cd0db3bso5354486b3a.1 for ; Tue, 25 Nov 2025 18:13:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1764123210; x=1764728010; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BQmj5nXyQdvOjMIGkVkElVH5J/B/tLkJ1kT/MLrqiIw=; b=BNdSd9O0MYVkFapbdpas0+qHTOt304S6qafbc8DlgoaoidcfCG+x2NdOmZdfvHVn9B +LFh8TDVWqXAHSEjZh1l+o1iVbWOmVArLn6Pp3Ai2J+C3XEzM4eaKHe3kdWljoERZaMV WRSe0E0jPUhNGsRZMNe833ivyXXx/Marenilka59PwgSwafOHnA7aZSSMViO0F/rSFFt hydsp91mk0XUIl/Y4UIkNTdkvD+WOpuMU1axr3dwtjpbZ5oGW1li5pvDgpSxtDYzpTti RYdaQgnky+nHLlL7kev8S77iOHADVaKCguxydvfskL4w0UCmdO5jgZxjmk58DiQ7HQjY 9HSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764123210; x=1764728010; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=BQmj5nXyQdvOjMIGkVkElVH5J/B/tLkJ1kT/MLrqiIw=; b=a2/P5DLKKpxAUJRwXoYc+O3ydwiqU4lnIrofLDHEkQxVtUltZeUmbF2TsE7CSr+vjK p8QVFyg61SQHXo9e5WQXxBEWGshO7qqTelSDUUWg1rm3EM31+vl6WfIjteEF28eHv1Gd +AtsKLLr/d7nAdeQHcyTSdw0yJYDErpAF7N3XfyJFMlsMp9H0zzVOquNMLFXPFVzdW6f GnKStnD06agiw2cBg344UcmRvgJNfCy3BToAXkDDvI+sgXTelDaxsodXsl/AAydjWi85 /EYuXCLGfvUJKOFtJrkxYUEdY0+qspMXj3kHoLTrDpn8qk1IvvfdCZGZw8ANZbiW65wk tb1A== X-Forwarded-Encrypted: i=1; AJvYcCXko3DQfek7mZKdU/audeBSByZsJC176Z/UGByC7Wu0P5WoUEeFP6EvkrGtXWm6pPQA56m7cIEbTQLzhOE=@vger.kernel.org X-Gm-Message-State: AOJu0YytiByhaRGOz/bNE1bdNWr6ZcsnhVcZ1E54yUE2Um6hm0jVQWK7 jEmepyZ5P5ggDLZJgLkmKe++MdlZ8ItJgGoRINFo7k8pPLZsny2bqBRFZPj6wuDnbq8= X-Gm-Gg: ASbGncu7Jwb2K9trZ/8tvLQc7JG31LLiN3t2ibj/9qpr+on5nRVn5xspxksebFlcFx7 EY2GXLm2KmrvLAEfiQGhWm/Eb3FRMq152YLNMG0ohwBgjlgzKy1W6v/zGpjWqKgbQDXcZE5wF9f 47foXTYd1Utm/fliCoBvGr2PQDBWL1EBz7vv+38LvPo5WhTAP3RLtXCD/lEqD/DOChFEdN2ISMy IrKKpVpPXaHHP+ACxnigvAnNXU91vJj7Kkfbh7vO8qRPvHGqVjvt+zKPufIqmbVlzkKi6VlfjB+ 6n9Y2Vdy50dgXwtm5RsBs0OPghCZZ+ve06FC2HeqfHnFkSkePGIWgqnqiYRKDZD7c5RWKOBbHyZ uj0QMn9d/d5X5F8d0pi8tqqif0JVBbrO57Qdl61AXMQZMvFuaZ/4hYQbSz0t3lJ2B9CZip5QdkK 1FPkHsTTnNiHjIYMKSoWhws5NQ2aMLV1JyxcH1+V0Y5CTv X-Google-Smtp-Source: AGHT+IFZdLGKDcA+r7MKkSNJWp8uXhLGGJGNxCq8RRfg9SzsFBF2+PRo74PS1C/YdROofIqV0sX2Ag== X-Received: by 2002:a05:7022:6621:b0:11b:9386:8273 with SMTP id a92af1059eb24-11cbba6ec84mr3823215c88.48.1764123209644; Tue, 25 Nov 2025 18:13:29 -0800 (PST) Received: from apollo.purestorage.com ([208.88.152.253]) by smtp.googlemail.com with ESMTPSA id a92af1059eb24-11cc631c236sm17922979c88.7.2025.11.25.18.13.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Nov 2025 18:13:29 -0800 (PST) From: Mohamed Khalfella To: Chaitanya Kulkarni , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg Cc: Aaron Dailey , Randy Jennings , John Meneghini , Hannes Reinecke , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Mohamed Khalfella Subject: [RFC PATCH 10/14] nvme-tcp: Use CCR to recover controller that hits an error Date: Tue, 25 Nov 2025 18:11:57 -0800 Message-ID: <20251126021250.2583630-11-mkhalfella@purestorage.com> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20251126021250.2583630-1-mkhalfella@purestorage.com> References: <20251126021250.2583630-1-mkhalfella@purestorage.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" An alive nvme controller that hits an error now will move to RECOVERING state instead of RESETTING state. In RECOVERING state ctrl->err_work will attempt to use cross-controller recovery to terminate inflight IOs on the controller. If CCR succeeds, then switch to RESETTING state and continue error recovery as usuall by tearing down controller and attempt reconnecting to target. If CCR fails, then the behavior of recovery depends on whether CQT is supported or not. If CQT is supported, switch to time-based recovery by holding inflight IOs until it is safe for them to be retried. If CQT is not supported proceed to retry requests immediately, as the code currently does. To support implementing time-based recovery turn ctrl->err_work into delayed work. Update nvme_tcp_timeout() to not complete inflight IOs while controller in RECOVERING state. Signed-off-by: Mohamed Khalfella --- drivers/nvme/host/tcp.c | 52 +++++++++++++++++++++++++++++++++++------ 1 file changed, 45 insertions(+), 7 deletions(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 9a96df1a511c..ec9a713490a9 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -193,7 +193,7 @@ struct nvme_tcp_ctrl { struct sockaddr_storage src_addr; struct nvme_ctrl ctrl; =20 - struct work_struct err_work; + struct delayed_work err_work; struct delayed_work connect_work; struct nvme_tcp_request async_req; u32 io_queues[HCTX_MAX_TYPES]; @@ -611,11 +611,12 @@ static void nvme_tcp_init_recv_ctx(struct nvme_tcp_qu= eue *queue) =20 static void nvme_tcp_error_recovery(struct nvme_ctrl *ctrl) { - if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING)) + if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RECOVERING) && + !nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING)) return; =20 dev_warn(ctrl->device, "starting error recovery\n"); - queue_work(nvme_reset_wq, &to_tcp_ctrl(ctrl)->err_work); + queue_delayed_work(nvme_reset_wq, &to_tcp_ctrl(ctrl)->err_work, 0); } =20 static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue, @@ -2470,12 +2471,48 @@ static void nvme_tcp_reconnect_ctrl_work(struct wor= k_struct *work) nvme_tcp_reconnect_or_remove(ctrl, ret); } =20 +static int nvme_tcp_recover_ctrl(struct nvme_ctrl *ctrl) +{ + unsigned long rem; + + if (test_and_clear_bit(NVME_CTRL_RECOVERED, &ctrl->flags)) { + dev_info(ctrl->device, "completed time-based recovery\n"); + goto done; + } + + rem =3D nvme_recover_ctrl(ctrl); + if (!rem) + goto done; + + if (!ctrl->cqt) { + dev_info(ctrl->device, + "CCR failed, CQT not supported, skip time-based recovery\n"); + goto done; + } + + dev_info(ctrl->device, + "CCR failed, switch to time-based recovery, timeout =3D %ums\n", + jiffies_to_msecs(rem)); + set_bit(NVME_CTRL_RECOVERED, &ctrl->flags); + queue_delayed_work(nvme_reset_wq, &to_tcp_ctrl(ctrl)->err_work, rem); + return -EAGAIN; + +done: + nvme_end_ctrl_recovery(ctrl); + return 0; +} + static void nvme_tcp_error_recovery_work(struct work_struct *work) { - struct nvme_tcp_ctrl *tcp_ctrl =3D container_of(work, + struct nvme_tcp_ctrl *tcp_ctrl =3D container_of(to_delayed_work(work), struct nvme_tcp_ctrl, err_work); struct nvme_ctrl *ctrl =3D &tcp_ctrl->ctrl; =20 + if (nvme_ctrl_state(ctrl) =3D=3D NVME_CTRL_RECOVERING) { + if (nvme_tcp_recover_ctrl(ctrl)) + return; + } + if (nvme_tcp_key_revoke_needed(ctrl)) nvme_auth_revoke_tls_key(ctrl); nvme_stop_keep_alive(ctrl); @@ -2545,7 +2582,7 @@ static void nvme_reset_ctrl_work(struct work_struct *= work) =20 static void nvme_tcp_stop_ctrl(struct nvme_ctrl *ctrl) { - flush_work(&to_tcp_ctrl(ctrl)->err_work); + flush_delayed_work(&to_tcp_ctrl(ctrl)->err_work); cancel_delayed_work_sync(&to_tcp_ctrl(ctrl)->connect_work); } =20 @@ -2640,6 +2677,7 @@ static enum blk_eh_timer_return nvme_tcp_timeout(stru= ct request *rq) { struct nvme_tcp_request *req =3D blk_mq_rq_to_pdu(rq); struct nvme_ctrl *ctrl =3D &req->queue->ctrl->ctrl; + enum nvme_ctrl_state state =3D nvme_ctrl_state(ctrl); struct nvme_tcp_cmd_pdu *pdu =3D nvme_tcp_req_cmd_pdu(req); struct nvme_command *cmd =3D &pdu->cmd; int qid =3D nvme_tcp_queue_id(req->queue); @@ -2649,7 +2687,7 @@ static enum blk_eh_timer_return nvme_tcp_timeout(stru= ct request *rq) rq->tag, nvme_cid(rq), pdu->hdr.type, cmd->common.opcode, nvme_fabrics_opcode_str(qid, cmd), qid); =20 - if (nvme_ctrl_state(ctrl) !=3D NVME_CTRL_LIVE) { + if (state !=3D NVME_CTRL_LIVE && state !=3D NVME_CTRL_RECOVERING) { /* * If we are resetting, connecting or deleting we should * complete immediately because we may block controller @@ -2903,7 +2941,7 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(stru= ct device *dev, =20 INIT_DELAYED_WORK(&ctrl->connect_work, nvme_tcp_reconnect_ctrl_work); - INIT_WORK(&ctrl->err_work, nvme_tcp_error_recovery_work); + INIT_DELAYED_WORK(&ctrl->err_work, nvme_tcp_error_recovery_work); INIT_WORK(&ctrl->ctrl.reset_work, nvme_reset_ctrl_work); =20 if (!(opts->mask & NVMF_OPT_TRSVCID)) { --=20 2.51.2