From nobody Mon Feb 9 07:54:46 2026 Received: from mail-dl1-f42.google.com (mail-dl1-f42.google.com [74.125.82.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98C8D38A9A0 for ; Fri, 30 Jan 2026 22:36:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769812599; cv=none; b=luAjg8/OJsusJHQpOLRQiJr/GlKYbcJaKPhzmxqr8kZjIKcq8aWuyOlT+1g+VQY+5dY0pFipIg3+i9+rfGKfJb606yp40LJtWIXAkn/vCqqiDn+L39sFOMTUZfodd6jBYds3t43rrcEIwZZX0Mqk4tiHw60Ms+gXwG7YGhARpkc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769812599; c=relaxed/simple; bh=4dLf+jpbwji2Sni15TR/T4gZJ9NUyvzCAm7JwtBULB4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XjRmYsclbpM0VlhgTbeoQFVa8L6jJI3ZeNhOMuoUd6VzxjHZNrB/87tIN4x4AhVO3IE9zbnK0CIVrvTMqTWOfWUPhNn81TSKC7TdL4cbUxQOVU3o8QrWOwsat7emGGYzWMDjAZ4nPlxlBTHx9EGootfxJuBtpIgxbn9lohJCd68= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com; spf=fail smtp.mailfrom=purestorage.com; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b=YD7JBndt; arc=none smtp.client-ip=74.125.82.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=purestorage.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b="YD7JBndt" Received: by mail-dl1-f42.google.com with SMTP id a92af1059eb24-124566b6693so2426887c88.0 for ; Fri, 30 Jan 2026 14:36:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1769812596; x=1770417396; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LdJVqXRIK4MPnW0UBX/tzXNFanNilVzCHZaDiU6kEj8=; b=YD7JBndtNVQbebZ+s6dnlF5AvI9BLgy7165csYc788AQ9UxZCKqLzrR20J1jH/Ug/N yOIHtH3wNjo76GQpQsBmelArpWH2TcE5+2a8RMGR1/KCfgon1ydj/lZqQTqYHm+1VH5l tZ9sAXLAE9bNWBtWOyrND+uEq6XEa2UuWoLK7IcRnxGkZ+vndx7bMGSxAnxaEKSjyxri pUi7YRQlfbCg+BGLNEHIJGgMje8Zmlph1akbz/T+fAGy9SGgoJTFaLd5+/pxTFKfCcLO zYhaOXCvDJJUHm0vVgqb4fgdGSxFqHPffJRet60KwzQ9bBK3R+BfJq5Gvg9xrgSuq0aQ EyHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769812596; x=1770417396; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=LdJVqXRIK4MPnW0UBX/tzXNFanNilVzCHZaDiU6kEj8=; b=WuEazQmpQQDeI6d7wbGkL5mHl/M2Ut/jQXZGxe9IIJftyj0vvcBCnDA75Bv0j7u4PA OBfWN+ZKqw8RRHSzp5+PL/+R8ZazM51l5AP59l6dId4wo8ZUY7aR8rfp0ptD/AUIYYGZ cuyvC+jiqr8jbV8/a42RAmyQdub/v++DPmt5LZRpPN8CE+enAJy9hAtY6R21n30hDMc1 Pj1nVAjccUpGOjr04noaDsTnm5YbwwQvhNwUIyRU8AXh1ntowKYbr2RGZWwp71ub9Bts kWvPsdpSCnUopGZrtt41L5jDrowH3EyO+HQonHe8W//0amVvDkbN+M46VmP4yk10Wi/Y XizA== X-Forwarded-Encrypted: i=1; AJvYcCX13zNq49khefjT4EX1CE8izeFAri9nRmB2WfMmRL2Ht4chCuBLNRWvCIpHb/wQqb7OV7VOl7qQKQKgjfs=@vger.kernel.org X-Gm-Message-State: AOJu0YziI3zyMry+T1y2jaj8htk1lzsylhON1gi1YoofS4iYgGJ/g1Sf lS5Scvbij8QWBA3kyTIdseCgeXv62Z8sx3s+iivdXSJJnOji3gQXmuUo/HuxZcZLOfg= X-Gm-Gg: AZuq6aIQyLiitsxqNTfXGrsjFmenZGM8nrTcgA2js8ES6APAeGfFsaf7AIY+lAWXX0V 5VF7an4vAcQcCghekgN4lUmrYAqkB/QPouiG+3S3DcZW5tqoQ1avwreqUBnnyMNLESQrJ7hVuWt EoDPADEvL/TgpcYQfKLWt1kG7F2irZJRpHqMEQFBGyURY5qoFSESbdR9sLhxEvWyuVbod26FEQc fnuCSflwPA1GqdinqGftXCqTpfSkV5OgPhHCCuNda363cC1PK6J5Z+NNlm+80O1q01LebPhvbMl 0g/gDy3tzTKfPH1ahZhqnillDKSiKAr3Rriz4bBDeT5U486ZQ9DscophPtTTU5GMLhc3Vdnri8z aGszGyHPsBEPGAjaNvuJwL1VPznZzv+O5f7Y5m2KLjK0yN3ZxYDKbAY9mCeEHCgCzzLOdkuD5xD RGQEUP/F9Zl4jZRn/n8xUzQZU/ecupjbVzbw== X-Received: by 2002:a05:7022:606:b0:11b:d4a8:d24d with SMTP id a92af1059eb24-125c0fbc90emr2380277c88.12.1769812595440; Fri, 30 Jan 2026 14:36:35 -0800 (PST) Received: from apollo.purestorage.com ([208.88.152.253]) by smtp.googlemail.com with ESMTPSA id a92af1059eb24-124a9d6b906sm13161717c88.4.2026.01.30.14.36.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jan 2026 14:36:35 -0800 (PST) From: Mohamed Khalfella To: Justin Tee , Naresh Gottumukkala , Paul Ely , Chaitanya Kulkarni , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg Cc: Aaron Dailey , Randy Jennings , Dhaval Giani , Hannes Reinecke , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Mohamed Khalfella Subject: [PATCH v2 13/14] nvme-fc: Use CCR to recover controller that hits an error Date: Fri, 30 Jan 2026 14:34:17 -0800 Message-ID: <20260130223531.2478849-14-mkhalfella@purestorage.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260130223531.2478849-1-mkhalfella@purestorage.com> References: <20260130223531.2478849-1-mkhalfella@purestorage.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" An alive nvme controller that hits an error now will move to FENCING state instead of RESETTING state. ctrl->fencing_work attempts CCR to terminate inflight IOs. If CCR succeeds, switch to FENCED -> RESETTING and continue error recovery as usual. If CCR fails, the behavior depends on whether the subsystem supports CQT or not. If CQT is not supported then reset the controller immediately as if CCR succeeded in order to maintain the current behavior. If CQT is supported switch to time-based recovery. Schedule ctrl->fenced_work resets the controller when time based recovery finishes. Either ctrl->err_work or ctrl->reset_work can run after a controller is fenced. Flush fencing work when either work run. Signed-off-by: Mohamed Khalfella --- drivers/nvme/host/fc.c | 60 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index f8f6071b78ed..3a01aeb39081 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -166,6 +166,8 @@ struct nvme_fc_ctrl { struct blk_mq_tag_set admin_tag_set; struct blk_mq_tag_set tag_set; =20 + struct work_struct fencing_work; + struct delayed_work fenced_work; struct work_struct ioerr_work; struct delayed_work connect_work; =20 @@ -1866,12 +1868,59 @@ __nvme_fc_fcpop_chk_teardowns(struct nvme_fc_ctrl *= ctrl, } } =20 +static void nvme_fc_fenced_work(struct work_struct *work) +{ + struct nvme_fc_ctrl *fc_ctrl =3D container_of(to_delayed_work(work), + struct nvme_fc_ctrl, fenced_work); + struct nvme_ctrl *ctrl =3D &fc_ctrl->ctrl; + + nvme_change_ctrl_state(ctrl, NVME_CTRL_FENCED); + if (nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING)) + queue_work(nvme_reset_wq, &fc_ctrl->ioerr_work); +} + +static void nvme_fc_fencing_work(struct work_struct *work) +{ + struct nvme_fc_ctrl *fc_ctrl =3D + container_of(work, struct nvme_fc_ctrl, fencing_work); + struct nvme_ctrl *ctrl =3D &fc_ctrl->ctrl; + unsigned long rem; + + rem =3D nvme_fence_ctrl(ctrl); + if (!rem) + goto done; + + if (!ctrl->cqt) { + dev_info(ctrl->device, + "CCR failed, CQT not supported, skip time-based recovery\n"); + goto done; + } + + dev_info(ctrl->device, + "CCR failed, switch to time-based recovery, timeout =3D %ums\n", + jiffies_to_msecs(rem)); + queue_delayed_work(nvme_wq, &fc_ctrl->fenced_work, rem); + return; + +done: + nvme_change_ctrl_state(ctrl, NVME_CTRL_FENCED); + if (nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING)) + queue_work(nvme_reset_wq, &fc_ctrl->ioerr_work); +} + +static void nvme_fc_flush_fencing_work(struct nvme_fc_ctrl *ctrl) +{ + flush_work(&ctrl->fencing_work); + flush_delayed_work(&ctrl->fenced_work); +} + static void nvme_fc_ctrl_ioerr_work(struct work_struct *work) { struct nvme_fc_ctrl *ctrl =3D container_of(work, struct nvme_fc_ctrl, ioerr_work); =20 + nvme_fc_flush_fencing_work(ctrl); nvme_fc_error_recovery(ctrl); } =20 @@ -1896,6 +1945,14 @@ EXPORT_SYMBOL_GPL(nvme_fc_io_getuuid); static void nvme_fc_start_ioerr_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg) { + if (nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_FENCING)) { + dev_warn(ctrl->ctrl.device, + "NVME-FC{%d}: starting controller fencing %s\n", + ctrl->cnum, errmsg); + queue_work(nvme_wq, &ctrl->fencing_work); + return; + } + if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING)) return; =20 @@ -3297,6 +3354,7 @@ nvme_fc_reset_ctrl_work(struct work_struct *work) struct nvme_fc_ctrl *ctrl =3D container_of(work, struct nvme_fc_ctrl, ctrl.reset_work); =20 + nvme_fc_flush_fencing_work(ctrl); nvme_stop_ctrl(&ctrl->ctrl); =20 /* will block will waiting for io to terminate */ @@ -3471,6 +3529,8 @@ nvme_fc_alloc_ctrl(struct device *dev, struct nvmf_ct= rl_options *opts, =20 INIT_WORK(&ctrl->ctrl.reset_work, nvme_fc_reset_ctrl_work); INIT_DELAYED_WORK(&ctrl->connect_work, nvme_fc_connect_ctrl_work); + INIT_DELAYED_WORK(&ctrl->fenced_work, nvme_fc_fenced_work); + INIT_WORK(&ctrl->fencing_work, nvme_fc_fencing_work); INIT_WORK(&ctrl->ioerr_work, nvme_fc_ctrl_ioerr_work); spin_lock_init(&ctrl->lock); =20 --=20 2.52.0