From nobody Fri Apr 3 00:00:14 2026 Received: from mail-dl1-f44.google.com (mail-dl1-f44.google.com [74.125.82.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF32E2C2349 for ; Sat, 14 Feb 2026 04:28:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771043315; cv=none; b=UZBCUW3KP1VByc5br8VQsnaOhhHpIPlaq+5svGI/JyLrMb7O/CK/JlHU9m+BhiCv+yDjMG/aINpfaC19Y40V05wZEt9ug6K6wlMpaQmrPl4XHadT4k5D8mXW/+d/qb8t01YY7okW+JatM1ldfmTpXHuwK6tyiR+LXjLa0QPmfJw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771043315; c=relaxed/simple; bh=gvbdYeCJZGAGcP7vbCYXSkR0RaKRHDdbTKKsQjbEvuA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=j2INS6n3fJkx0x6J46JmI7bs67hYo+3sB44b681B/JRIfVzRjEFtfVr+H4n/PbzrlyT7GbqSQgOEAAUFMqUukRlgKUXCSvHm1YbvnxdjbKsJWKQCuNfIG3yGa7GhES1HUF/F3Sk9PUPLlcUnpsamyj0Uj2mm4fRZDndEC0vZiX8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com; spf=fail smtp.mailfrom=purestorage.com; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b=LciNHkVQ; arc=none smtp.client-ip=74.125.82.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=purestorage.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b="LciNHkVQ" Received: by mail-dl1-f44.google.com with SMTP id a92af1059eb24-1273349c56bso2148201c88.0 for ; Fri, 13 Feb 2026 20:28:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1771043308; x=1771648108; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zCK7MkIJ9W8S4oRts15YT/FeKYBXhBdPnkffI4uC3PQ=; b=LciNHkVQY2golepIOWccsjaSy2CFun6Nntnr9x6WCszhJBRlKh3yaEKnUodYfxAZ14 +X1SQhWGoKPXp9/OJjqmJ2YuON7t3SoHGlJTbC8SbCMuY8pHEsHslVt9uj4N4TWUTnRY GC0bM23M9/8MmAMReZB/cq3swaIIu4U2puvlJ8k9iX7zgJ0Ms7Nsvw6TZ4/e+bhcCPCT vd9+xIs8AaDIOVmwS14ToeBuBKNBp9PQUEzPull7FuryW7tZHnOt2mdqsue61uL8r4IS +8hbG4Pus3ZQY4rmNguxgY5sQ28QgOz5XfnbiPWlOVUJWmGEwUb7hxUTGGiEnLlsFGOp Yhrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771043308; x=1771648108; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=zCK7MkIJ9W8S4oRts15YT/FeKYBXhBdPnkffI4uC3PQ=; b=KAwMvyW/VLgSgrm+npTnN3ivARRkN5loyUnr9+6tgmtzQHMch37nGQwiOQRumRGL3w RB8RQnvrIBRBRRdgeTKC5NSU/55wSK8F+zaO/8gy9lqS1Fd/XGUZturV7tTjB//2jN5i 9Ee+rIjE/dS6WlGCXay7ZOAa/mpeA0sF1DKyeTU1ZzRQK3WGPt2kTOlRqJVykpXhQiX1 oQgRBUA0K6nbjZ8ymfoM9LE4nl7TfmQ5zQo0mIscLXsXhVixLu4Mruul+NsQL8tvvpnQ KFhxTeCnpGv7lDzkczCurwxH86EPMz/hPutB2Lw1kSoSSi6/VYpfgxATx5gUrVTAMWP6 hZcA== X-Forwarded-Encrypted: i=1; AJvYcCWzvXnNlHQ3aylM2YtGx1OfJdHIWTtrZ15GDkr0k4uJfXQdvDw3Nb+dfWvrQMLkwtbU6JqZS+jZjhu2Syg=@vger.kernel.org X-Gm-Message-State: AOJu0Ywz4HToBHNuW84zVbf07YR8Y0PBavx5cI9ZXGthi8pVlhwaDKFf FsUlp0AtkVzcX6PCPrVto+HP53OcziX32LChyb5Y5HVKn/EJYaKSBKZMngC/Su59IY0= X-Gm-Gg: AZuq6aKJZ/DSz2HqRlVEiAigi86nubocQyBJ7m9dKXG/B1OpzqWDDStpta7BmOeX3Xa iC0V1L09y6MDu7p8sbrVNQIq35IJKuP036I42KDSMeO/K59RW7gBBL7r4UMmwvNyUd/6G554SD5 ei+GEQ27Q01T17VWcAZj0YScrhugAWBI7g0N+EYGIYIcwQv8felHg+H522g1WaDJ3Cay4RrwdVB z+We47Qo7tQOOiK+xjSkzXQBn92BG1PGD/2600YKDKgEvUUDce7cPqFwRaAx8zheMF5pDYltlyg wClPRrHBwWwEp9ApjPfqnIbZntXTs/NYhv+SJ4sp2Ekh4yU1Ne1ReuaFSZzzzAReKFp8sv+hWTL Mbqf4DGNK3pFR1Lu2Sau7ucqDqxXhf+mwY8yFzot/UcoEInb6hggxPJpVM6hoML7osSVItu8MJu hZLkirZC0oW6PiYY2v74sXRqM8WX3uTWUaF5CRXVA3Fs6ltsnDGQNLbiOM X-Received: by 2002:a05:7022:6ba9:b0:127:3863:6440 with SMTP id a92af1059eb24-12741c02ba9mr605463c88.47.1771043307739; Fri, 13 Feb 2026 20:28:27 -0800 (PST) Received: from apollo.purestorage.com ([208.88.152.253]) by smtp.googlemail.com with ESMTPSA id a92af1059eb24-12742cbc900sm1021042c88.14.2026.02.13.20.28.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Feb 2026 20:28:27 -0800 (PST) From: Mohamed Khalfella To: Justin Tee , Naresh Gottumukkala , Paul Ely , Chaitanya Kulkarni , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , James Smart , Hannes Reinecke Cc: Aaron Dailey , Randy Jennings , Dhaval Giani , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Mohamed Khalfella Subject: [PATCH v3 08/21] nvme: Implement cross-controller reset recovery Date: Fri, 13 Feb 2026 20:25:09 -0800 Message-ID: <20260214042753.4073668-9-mkhalfella@purestorage.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260214042753.4073668-1-mkhalfella@purestorage.com> References: <20260214042753.4073668-1-mkhalfella@purestorage.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A host that has more than one path connecting to an nvme subsystem typically has an nvme controller associated with every path. This is mostly applicable to nvmeof. If one path goes down, inflight IOs on that path should not be retried immediately on another path because this could lead to data corruption as described in TP4129. TP8028 defines cross-controller reset mechanism that can be used by host to terminate IOs on the failed path using one of the remaining healthy paths. Only after IOs are terminated, or long enough time passes as defined by TP4129, inflight IOs should be retried on another path. Implement core cross-controller reset shared logic to be used by the transports. Signed-off-by: Mohamed Khalfella --- drivers/nvme/host/constants.c | 1 + drivers/nvme/host/core.c | 141 ++++++++++++++++++++++++++++++++++ drivers/nvme/host/nvme.h | 9 +++ 3 files changed, 151 insertions(+) diff --git a/drivers/nvme/host/constants.c b/drivers/nvme/host/constants.c index dc90df9e13a2..f679efd5110e 100644 --- a/drivers/nvme/host/constants.c +++ b/drivers/nvme/host/constants.c @@ -46,6 +46,7 @@ static const char * const nvme_admin_ops[] =3D { [nvme_admin_virtual_mgmt] =3D "Virtual Management", [nvme_admin_nvme_mi_send] =3D "NVMe Send MI", [nvme_admin_nvme_mi_recv] =3D "NVMe Receive MI", + [nvme_admin_cross_ctrl_reset] =3D "Cross Controller Reset", [nvme_admin_dbbuf] =3D "Doorbell Buffer Config", [nvme_admin_format_nvm] =3D "Format NVM", [nvme_admin_security_send] =3D "Security Send", diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 231d402e9bfb..765b1524b3ed 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -554,6 +554,146 @@ void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl) } EXPORT_SYMBOL_GPL(nvme_cancel_admin_tagset); =20 +static struct nvme_ctrl *nvme_find_ctrl_ccr(struct nvme_ctrl *ictrl, + u32 min_cntlid) +{ + struct nvme_subsystem *subsys =3D ictrl->subsys; + struct nvme_ctrl *ctrl, *sctrl =3D NULL; + unsigned long flags; + + mutex_lock(&nvme_subsystems_lock); + list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) { + if (ctrl->cntlid < min_cntlid) + continue; + + if (atomic_dec_if_positive(&ctrl->ccr_limit) < 0) + continue; + + spin_lock_irqsave(&ctrl->lock, flags); + if (ctrl->state !=3D NVME_CTRL_LIVE) { + spin_unlock_irqrestore(&ctrl->lock, flags); + atomic_inc(&ctrl->ccr_limit); + continue; + } + + /* + * We got a good candidate source controller that is locked and + * LIVE. However, no guarantee ctrl will not be deleted after + * ctrl->lock is released. Get a ref of both ctrl and admin_q + * so they do not disappear until we are done with them. + */ + WARN_ON_ONCE(!blk_get_queue(ctrl->admin_q)); + nvme_get_ctrl(ctrl); + spin_unlock_irqrestore(&ctrl->lock, flags); + sctrl =3D ctrl; + break; + } + mutex_unlock(&nvme_subsystems_lock); + return sctrl; +} + +static void nvme_put_ctrl_ccr(struct nvme_ctrl *sctrl) +{ + atomic_inc(&sctrl->ccr_limit); + blk_put_queue(sctrl->admin_q); + nvme_put_ctrl(sctrl); +} + +static int nvme_issue_wait_ccr(struct nvme_ctrl *sctrl, struct nvme_ctrl *= ictrl) +{ + struct nvme_ccr_entry ccr =3D { }; + union nvme_result res =3D { 0 }; + struct nvme_command c =3D { }; + unsigned long flags, tmo; + bool completed =3D false; + int ret =3D 0; + u32 result; + + init_completion(&ccr.complete); + ccr.ictrl =3D ictrl; + + spin_lock_irqsave(&sctrl->lock, flags); + list_add_tail(&ccr.list, &sctrl->ccr_list); + spin_unlock_irqrestore(&sctrl->lock, flags); + + c.ccr.opcode =3D nvme_admin_cross_ctrl_reset; + c.ccr.ciu =3D ictrl->ciu; + c.ccr.icid =3D cpu_to_le16(ictrl->cntlid); + c.ccr.cirn =3D cpu_to_le64(ictrl->cirn); + ret =3D __nvme_submit_sync_cmd(sctrl->admin_q, &c, &res, + NULL, 0, NVME_QID_ANY, 0); + if (ret) { + ret =3D -EIO; + goto out; + } + + result =3D le32_to_cpu(res.u32); + if (result & 0x01) /* Immediate Reset Successful */ + goto out; + + tmo =3D secs_to_jiffies(ictrl->kato); + if (!wait_for_completion_timeout(&ccr.complete, tmo)) { + ret =3D -ETIMEDOUT; + goto out; + } + + completed =3D true; + +out: + spin_lock_irqsave(&sctrl->lock, flags); + list_del(&ccr.list); + spin_unlock_irqrestore(&sctrl->lock, flags); + if (completed) { + if (ccr.ccrs =3D=3D NVME_CCR_STATUS_SUCCESS) + return 0; + return -EREMOTEIO; + } + return ret; +} + +unsigned long nvme_fence_ctrl(struct nvme_ctrl *ictrl) +{ + unsigned long deadline, now, timeout; + struct nvme_ctrl *sctrl; + u32 min_cntlid =3D 0; + int ret; + + timeout =3D nvme_fence_timeout_ms(ictrl); + dev_info(ictrl->device, "attempting CCR, timeout %lums\n", timeout); + + now =3D jiffies; + deadline =3D now + msecs_to_jiffies(timeout); + while (time_before(now, deadline)) { + sctrl =3D nvme_find_ctrl_ccr(ictrl, min_cntlid); + if (!sctrl) { + /* CCR failed, switch to time-based recovery */ + return deadline - now; + } + + ret =3D nvme_issue_wait_ccr(sctrl, ictrl); + if (!ret) { + dev_info(ictrl->device, "CCR succeeded using %s\n", + dev_name(sctrl->device)); + nvme_put_ctrl_ccr(sctrl); + return 0; + } + + min_cntlid =3D sctrl->cntlid + 1; + nvme_put_ctrl_ccr(sctrl); + now =3D jiffies; + + if (ret =3D=3D -EIO) /* CCR command failed */ + continue; + + /* CCR operation failed or timed out */ + return time_before(now, deadline) ? deadline - now : 0; + } + + dev_info(ictrl->device, "CCR reached timeout, call it done\n"); + return 0; +} +EXPORT_SYMBOL_GPL(nvme_fence_ctrl); + bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl, enum nvme_ctrl_state new_state) { @@ -5121,6 +5261,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct dev= ice *dev, =20 mutex_init(&ctrl->scan_lock); INIT_LIST_HEAD(&ctrl->namespaces); + INIT_LIST_HEAD(&ctrl->ccr_list); xa_init(&ctrl->cels); ctrl->dev =3D dev; ctrl->ops =3D ops; diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index b1c37eb3379e..f3ab9411cac5 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -279,6 +279,13 @@ enum nvme_ctrl_flags { NVME_CTRL_FROZEN =3D 6, }; =20 +struct nvme_ccr_entry { + struct list_head list; + struct completion complete; + struct nvme_ctrl *ictrl; + u8 ccrs; +}; + struct nvme_ctrl { bool comp_seen; bool identified; @@ -296,6 +303,7 @@ struct nvme_ctrl { struct blk_mq_tag_set *tagset; struct blk_mq_tag_set *admin_tagset; struct list_head namespaces; + struct list_head ccr_list; struct mutex namespaces_lock; struct srcu_struct srcu; struct device ctrl_device; @@ -813,6 +821,7 @@ blk_status_t nvme_host_path_error(struct request *req); bool nvme_cancel_request(struct request *req, void *data); void nvme_cancel_tagset(struct nvme_ctrl *ctrl); void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl); +unsigned long nvme_fence_ctrl(struct nvme_ctrl *ctrl); bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl, enum nvme_ctrl_state new_state); int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown); --=20 2.52.0