From nobody Tue Dec 2 00:05:28 2025 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65A2E30DEA7 for ; Wed, 26 Nov 2025 02:13:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764123213; cv=none; b=GThX3axfvHDP/WoXSrrehkKg6UWPmcVefr44S6Kh7ZXQnclknycm/qquAJEsgo8+bN045AB4qGgxALolFcdevzrm9JixO7vpWi9gyIizl2OgUCGiSRHT5SlHiKKDl5uCKVeS/LOPwhKYm1JbFTnHqemPJSLwqebDzwmOtzx03k0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764123213; c=relaxed/simple; bh=p3pupVGjzaWfxhmPxwzf5wpAj9ggyhiFqq9vux5ImOk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GjjfQq7YIzbR4iWyKCZTsl1ZwNQzCCFY0SiC9Kcsw4fWRwLXj5jNiOJwwxkN0rURsjkei/N10eHGkFkkinh3r3xW+4XMMoArZ8FAPW/PLo1tpbjNFHCNCW6S42jYebhCMwHhy3rvQaNVH82dHEEc6RXBwzKd0kMkE+J7wmAV3t8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com; spf=fail smtp.mailfrom=purestorage.com; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b=YQsZAPBr; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=purestorage.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b="YQsZAPBr" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-295548467c7so72598515ad.2 for ; Tue, 25 Nov 2025 18:13:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1764123211; x=1764728011; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=aH2GP6wi0yyOImZ0Dp3pp8w4EIT67hcH8CtToZuUsos=; b=YQsZAPBrqNXwyF4PxP4sScKmMLF8fviP4rbT/Yum7nyKQ7Jy2sY0tvPYe8wH62LbBA 0usfE3HTLYLfGWzRPAzZsH4E0FQuyRfOBialHeLg/Nn3P98eNmvglwMLhfjJzZ+YUbTf hyc0MUSoZ4u0XKHTPotpBXncNWZQhNyYpsfnb9YjtZyRbino6qq1f7UKm8FMy3RtxPth HgFozgOOzcANlGfuh0LOst6RhpAlQsJlZv9xuMU2E+yyS13jLepuzKZ6s0n1CQWxt5KM 9TShfWOgrXXdqowq2Neu5JHIAm19dDFzo0l3pI6ugk3jx2yPS7VFL/9hBnZdyzeuINaK 5ylQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764123211; x=1764728011; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=aH2GP6wi0yyOImZ0Dp3pp8w4EIT67hcH8CtToZuUsos=; b=pSnegUEnHfdoRir03HP07FJQD8+Icc1YHiaqDjH6sAWk0DNwlS30KSVNPzewVWMApd x5T/7TJR5mpUaKmCdvPE1MTywj0v/6tcSN4hF92s7iGRucpyUvLZkLhiDXmKMyZDtBY8 6pYd5hIMB8bX9QvNvOKvT/7t7HjwM6toPuQJ7CLx0k1uikYpGfIWMzwJDkJqUA2V+Ja9 y2Kd8sgIykj4BTZhC2xeonqn4ejkxdOwSPuxzLdKjcxBxR3Fqcd+6/YI+oCgVeAveNIy n1JJu6U/hw2Gv/2btA7dLCN4LMiamx1NIQdUlKJhRTVLlD+SGV8uqReHNfKHZ6OVVmc8 vNQw== X-Forwarded-Encrypted: i=1; AJvYcCXvvo7qwG9wVLXirtKrblZHz9NBdN/R1mnbza1EsDrNs0zYfxA6w3UqnLG7VWy2ure/N1hZiqiU23L3lEk=@vger.kernel.org X-Gm-Message-State: AOJu0YwtUtv13tOuOj8ePn4fOfiEC4DVi94aw+CmvM0Kf7hKMwicHy6L JTsIaz3+NKnEj+fnIdnnDRhvhh5p2+rbze/E54+R+d7NRz5MKITrQrw1EgPZeWDuKI8= X-Gm-Gg: ASbGncvYTlBsOTIA7tyv4Pid52cdqmfzoEOj2sl/DtaY15rW2/zb40YraAxBeYQW1sz ya8IXJwdlYE4R+Ije7OIXYrFK3kFoQUrtPI23ZM1zIhPNcFOSy5gdcBxwrxes0jjC2873wanxqL YK5Uzkzs96507Js2DyQMVonFPQMJp70m3SMv9sLy01nF5Q8ubpyyV1dlJS2UJjkzAh7vSJXjatG e90xyWgGiMFYDvU6oFZvJiH6JkuH4+/oaywkhDkeJZwDWEN3zS3my5k9RGINS5LfY2CdAYD7RHA uWKMNkOoJQxeaw2Gi7CfwMJyccmrGKsmNU1DUL8ReVBoilBMHmzA58EFoCxarXQ6fX9bmM/zNtT znJCgb3phF7jjPR3qdAdGvLvTcXeiYAe9IZ0s46HqvLOb0GqX68CapfdpF/bt7z2oQrdyN2l800 6lTvh8+NuEPv4awK9l58MMvExTn1ZCMETyJ00w2NXNfdlh X-Google-Smtp-Source: AGHT+IFvIR3so/zGM7TzoMU9iN0J/zu9A86U8fi764PcWo6NnydsjrOq2DlhO3KmCVnIDz1un0nNTQ== X-Received: by 2002:a05:7022:f909:b0:119:e569:fbad with SMTP id a92af1059eb24-11c9d852d48mr7072685c88.28.1764123210516; Tue, 25 Nov 2025 18:13:30 -0800 (PST) Received: from apollo.purestorage.com ([208.88.152.253]) by smtp.googlemail.com with ESMTPSA id a92af1059eb24-11cc631c236sm17922979c88.7.2025.11.25.18.13.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Nov 2025 18:13:30 -0800 (PST) From: Mohamed Khalfella To: Chaitanya Kulkarni , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg Cc: Aaron Dailey , Randy Jennings , John Meneghini , Hannes Reinecke , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Mohamed Khalfella Subject: [RFC PATCH 11/14] nvme-rdma: Use CCR to recover controller that hits an error Date: Tue, 25 Nov 2025 18:11:58 -0800 Message-ID: <20251126021250.2583630-12-mkhalfella@purestorage.com> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20251126021250.2583630-1-mkhalfella@purestorage.com> References: <20251126021250.2583630-1-mkhalfella@purestorage.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" An alive nvme controller that hits an error will now move to RECOVERING state instead of RESETTING state. In RECOVERING state, ctrl->err_work will attempt to use cross-controller recovery to terminate inflight IOs on the controller. If CCR succeeds, then switch to RESETTING state and continue error recovery as usuall by tearing down the controller, and attempting reconnect to target. If CCR fails, the behavior of recovery depends on whether CQT is supported or not. If CQT is supported, switch to time-based recovery by holding inflight IOs until it is safe for them to be retried. If CQT is not supported proceed to retry requests immediately, as the code currently does. To support implementing time-based recovery turn ctrl->err_work into delayed work. Update nvme_rdma_timeout() to not complete inflight IOs while controller in RECOVERING state. Signed-off-by: Mohamed Khalfella --- drivers/nvme/host/rdma.c | 51 ++++++++++++++++++++++++++++++++++------ 1 file changed, 44 insertions(+), 7 deletions(-) diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 190a4cfa8a5e..4a8bb2614468 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -106,7 +106,7 @@ struct nvme_rdma_ctrl { =20 /* other member variables */ struct blk_mq_tag_set tag_set; - struct work_struct err_work; + struct delayed_work err_work; =20 struct nvme_rdma_qe async_event_sqe; =20 @@ -961,7 +961,7 @@ static void nvme_rdma_stop_ctrl(struct nvme_ctrl *nctrl) { struct nvme_rdma_ctrl *ctrl =3D to_rdma_ctrl(nctrl); =20 - flush_work(&ctrl->err_work); + flush_delayed_work(&ctrl->err_work); cancel_delayed_work_sync(&ctrl->reconnect_work); } =20 @@ -1120,11 +1120,46 @@ static void nvme_rdma_reconnect_ctrl_work(struct wo= rk_struct *work) nvme_rdma_reconnect_or_remove(ctrl, ret); } =20 +static int nvme_rdma_recover_ctrl(struct nvme_ctrl *ctrl) +{ + unsigned long rem; + + if (test_and_clear_bit(NVME_CTRL_RECOVERED, &ctrl->flags)) { + dev_info(ctrl->device, "completed time-based recovery\n"); + goto done; + } + + rem =3D nvme_recover_ctrl(ctrl); + if (!rem) + goto done; + + if (!ctrl->cqt) { + dev_info(ctrl->device, + "CCR failed, CQT not supported, skip time-based recovery\n"); + goto done; + } + + dev_info(ctrl->device, + "CCR failed, switch to time-based recovery, timeout =3D %ums\n", + jiffies_to_msecs(rem)); + set_bit(NVME_CTRL_RECOVERED, &ctrl->flags); + queue_delayed_work(nvme_reset_wq, &to_rdma_ctrl(ctrl)->err_work, rem); + return -EAGAIN; + +done: + nvme_end_ctrl_recovery(ctrl); + return 0; +} static void nvme_rdma_error_recovery_work(struct work_struct *work) { - struct nvme_rdma_ctrl *ctrl =3D container_of(work, + struct nvme_rdma_ctrl *ctrl =3D container_of(to_delayed_work(work), struct nvme_rdma_ctrl, err_work); =20 + if (nvme_ctrl_state(&ctrl->ctrl) =3D=3D NVME_CTRL_RECOVERING) { + if (nvme_rdma_recover_ctrl(&ctrl->ctrl)) + return; + } + nvme_stop_keep_alive(&ctrl->ctrl); flush_work(&ctrl->ctrl.async_event_work); nvme_rdma_teardown_io_queues(ctrl, false); @@ -1147,11 +1182,12 @@ static void nvme_rdma_error_recovery_work(struct wo= rk_struct *work) =20 static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl) { - if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING)) + if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECOVERING) && + !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING)) return; =20 dev_warn(ctrl->ctrl.device, "starting error recovery\n"); - queue_work(nvme_reset_wq, &ctrl->err_work); + queue_delayed_work(nvme_reset_wq, &ctrl->err_work, 0); } =20 static void nvme_rdma_end_request(struct nvme_rdma_request *req) @@ -1955,6 +1991,7 @@ static enum blk_eh_timer_return nvme_rdma_timeout(str= uct request *rq) struct nvme_rdma_request *req =3D blk_mq_rq_to_pdu(rq); struct nvme_rdma_queue *queue =3D req->queue; struct nvme_rdma_ctrl *ctrl =3D queue->ctrl; + enum nvme_ctrl_state state =3D nvme_ctrl_state(&ctrl->ctrl); struct nvme_command *cmd =3D req->req.cmd; int qid =3D nvme_rdma_queue_idx(queue); =20 @@ -1963,7 +2000,7 @@ static enum blk_eh_timer_return nvme_rdma_timeout(str= uct request *rq) rq->tag, nvme_cid(rq), cmd->common.opcode, nvme_fabrics_opcode_str(qid, cmd), qid); =20 - if (nvme_ctrl_state(&ctrl->ctrl) !=3D NVME_CTRL_LIVE) { + if (state !=3D NVME_CTRL_LIVE && state !=3D NVME_CTRL_RECOVERING) { /* * If we are resetting, connecting or deleting we should * complete immediately because we may block controller @@ -2280,7 +2317,7 @@ static struct nvme_rdma_ctrl *nvme_rdma_alloc_ctrl(st= ruct device *dev, =20 INIT_DELAYED_WORK(&ctrl->reconnect_work, nvme_rdma_reconnect_ctrl_work); - INIT_WORK(&ctrl->err_work, nvme_rdma_error_recovery_work); + INIT_DELAYED_WORK(&ctrl->err_work, nvme_rdma_error_recovery_work); INIT_WORK(&ctrl->ctrl.reset_work, nvme_rdma_reset_ctrl_work); =20 ctrl->ctrl.queue_count =3D opts->nr_io_queues + opts->nr_write_queues + --=20 2.51.2