From nobody Mon Feb 9 20:09:31 2026 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C31DB26B76A for ; Sun, 21 Dec 2025 21:26:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766352384; cv=none; b=bxLlSbFRKWhj9KAX8PX0m0PltzNdPk8kU3hzO1vcHR28C0/vxGZ2Gn9rvYrg7kiEbNVn5TxBUYzTAzvyTqpl7/2CyXYuXGuqK9boYwDBbSH5O2xjF6jreY9IrsM8aDMfIwUFnJ8waNS3F4ipDEXlaOPW70yZ213Acc2qkfysnWk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766352384; c=relaxed/simple; bh=xmdL83Sb0HpjieuD6YLEuwuBwU8sMDrI16GLhwFZlws=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Mlk2Ehsc6/eHMJxN13cxstwznS2EA2a+bJJQYiTdUZkRwrR1gp5FgImqF7u7xnzPnuXNi6Ck4xF95wqi0zZfRwckfCBY+BG9KN9S/VfnpL32p944t9ROb3eAIb/bdr2pgPP5SlclCpO3YTNIG9T14ulk2NJgvyWIXR9ttc4ZAQo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=EdayLtbi; arc=none smtp.client-ip=209.85.215.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EdayLtbi" Received: by mail-pg1-f169.google.com with SMTP id 41be03b00d2f7-c1cf2f0523eso2235473a12.3 for ; Sun, 21 Dec 2025 13:26:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766352382; x=1766957182; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=R6/4PdDkK325eCTawTD9HSWjcdgXFVKGULVX1QePMHQ=; b=EdayLtbizqiXDDzQQ5+uhqkxijguj3uRS0QZIgCTpjH2NiDrTNb4DjUeoqLwZT4I0d twnA0j56xvg8hQYPW+Nc9RVTTcWGvvym3tIU0B2wakcMvxKa17eueZ1KgZdzcnlQeL+Q w/pxP3kz2Huaabti9Hk5ZwdTSiHtQdRPPacBjvbK8redzFv/8PPnoDQDrVreIwimgk9v WsXUaxy0KbbKU+etuJ5JLOVPI5Zr8JO5RTy0eqSHQKX13aZV9UhDWGQtR9ra7yvFKr52 96tcJMiIEfbjGedY9fRp9Xq8yiY/cBI0B3AUeAimS2V+RMqrJx2lNbIWFLwXQVcuPlkm z5LA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766352382; x=1766957182; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=R6/4PdDkK325eCTawTD9HSWjcdgXFVKGULVX1QePMHQ=; b=YCzyyF6bgeZI/VsFeGVvri3sygF/CNbKM/pbk0ziEZ3rF9HT8aM56Nw+HMDJqHmPEY hj7UlsDsooWf8tdbmrDpwPbkzh/DK1hd4PZPV65VJcRhbvZaupvyzvtSianAl5B9b5Xm nNLOsPpEypoImnLDRFlH8XT6yqqknnoBHeYGt/U3FWEv819Ymp3yESV1Nq4TpyvMmbNO 8FWPjkzZj5BlL5V9+KGu1uFdtO6F49obrNfz0zWUzTBP/6PlOA0lZryJtozwZiBus+Bs A8B0KUER4xxpzYsgUkXEIGAHpYmUl/x0xdg3teQ1Q80Tu6YaUxY9hO/1lFr5rSU8QzsI fzXA== X-Forwarded-Encrypted: i=1; AJvYcCUTUGxcrQOU91+WmWlZ5977Pl2Pv4PoTFB6GXz8X5/JIXLMwBVTj2YPDwv2ECeo664s+p5lXMQvI13+4p4=@vger.kernel.org X-Gm-Message-State: AOJu0YzHanJXvClI5ikzuI+XsY7h3MV9ZcP++U4u8+9681dzgI9JlGwu Jx9p/nYxi2krkkdB8JYaoxmBBEJvIdUaj7iYLXX2vxyFTQYOzMgnJ6R6 X-Gm-Gg: AY/fxX6SIuv0NEw6CY/bBQ1o+OQ7/+872Pm48wv+BXPaton11rjEjBN+ZOxwYjqyOsS JLgyfQN/uO/2QrVs3cZLURwgI78E8e7lI8er1NYNgR3TIhZLMjl2+1bG4LhPu7JtRxpVlG0LkUy Chq1eeoGWk6wkKey1ON3qJeGYWaO8WWCMHjl49uagdDsgQ/iiuPerRzkWJpKU3q0NIaw+Fcn1lJ GUnh3tQCSR8ZwewR7s4m24wr04IN6enkoHrt5EXmM6blHqvuD/WYQJ7dYOympC5pOl8untSPw0m 7vSh5H0Bw4hyECr43H4/1TZPoWBL4u7e6kAOf4Ul8kF7DPnsDf5ZP2QsQYBqPdrau9j7BD9Wyk2 Le/HL0qgjoc2OlJQCO4GLLIsHmaF7DxQT1iFyGg6szLpirOKLGcoaGxKt8d0taoOvFada716IBt N/lRmGJFmKKhf1ogkP6o2iB4yq1883F3KQ X-Google-Smtp-Source: AGHT+IGrpVW5CVFaMaE62yphLWemUMvRSMIiTPPm20ZRA5pTWKNsm1voy2OyOGq/LkzD8vOlHtH+NA== X-Received: by 2002:a05:7300:8813:b0:2ab:ca55:b760 with SMTP id 5a478bee46e88-2b05ecda39fmr9489118eec.43.1766352381981; Sun, 21 Dec 2025 13:26:21 -0800 (PST) Received: from [192.168.5.71] ([172.59.162.202]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-2b05fe99410sm24884636eec.2.2025.12.21.13.26.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Dec 2025 13:26:21 -0800 (PST) From: Alex Tran Date: Sun, 21 Dec 2025 13:26:11 -0800 Subject: [PATCH 2/2] nvme/host: add delayed retries upon non-fatal error during ns validation Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251221-nvme_ns_validation-v1-2-9f7a385707af@gmail.com> References: <20251221-nvme_ns_validation-v1-0-9f7a385707af@gmail.com> In-Reply-To: <20251221-nvme_ns_validation-v1-0-9f7a385707af@gmail.com> To: Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Alex Tran X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=4315; i=alex.t.tran@gmail.com; h=from:subject:message-id; bh=xmdL83Sb0HpjieuD6YLEuwuBwU8sMDrI16GLhwFZlws=; b=owEBbQKS/ZANAwAKAXT5fTREJs3IAcsmYgBpSGX5yoG0OV7z6ig3L5a3OgZTx3dLApDtHklI/ a4NO5X7QW6JAjMEAAEKAB0WIQQAohViG04SVxUVrcd0+X00RCbNyAUCaUhl+QAKCRB0+X00RCbN yHMVD/sH07MU0ZekxBsU3Drchr/G6M2ymIN6VpI3oZitiFcLRM+9EnU4AiIsS33R9AC4VpAF9ff wFoeoAIe0PMXlNG+R66jRzSu49P7h1mPnFa8naiB0k3HmmSfHduBEur/q1psjeOKdfADseDKnd0 yOBRljY8TYWZPJab2PaDQpuMTVC8nu458DHnLy4AEtJaLiH2hFTNZDp7jRx6GYDDx1u5KWHBdlb /QbR2cGGl4OZEplvDJO/BrUgpY6TIp3ZV+GfXqtYNwGs6wfMn1/yyKuJ9RJcTu6Lvm2ygVvdwQz KAwvqrQh1qBRW7B2hwaQu4feiGka/XVgleUO4JR9c1hsLCCeadUc99RqFrsTDqPqr6dkkPvT/Nz U/sE0W8YChdnGUTBeXb7NSKoHe67cOnQNtWdE70rpxyklQLqtEtdxDuMM9qFsqgzJOcN3CQ1AjI OLVq+9ChMRhTmAXCL22Kz6Dl6Vez+9E+TJ59ORgmja/v8vuGK1GTu8C7xXNuoV/B24E7eqk8Bu6 WoMjbFHvF7KmnAjMwQk6phESmU8M5O0Nhao8/Z1Aey7C/42eJfhy2jL9OZca2pWOQqjfaLuPPNT IKyUq1697hlwwaXDpMnaVeLRkX3YE44zgHztrL4X5rqUavQFopoPQE4tPfNvrSvtk5Cy890SsKU Bj95OQE5hDu2dLg== X-Developer-Key: i=alex.t.tran@gmail.com; a=openpgp; fpr=00A215621B4E12571515ADC774F97D344426CDC8 If a non-fatal error is received during nvme namespace validation, it should not be ignored and the namespace should not be removed immediately. Rather, delayed retires should be performed on the namespace validation process. This handles non-fatal issues more robustly, by retrying a few times before giving up and removing the namespace. The number of retries is set to 3 and the interval between retries is set to 3 seconds. Signed-off-by: Alex Tran --- drivers/nvme/host/core.c | 43 +++++++++++++++++++++++++++++++++++++++---- drivers/nvme/host/nvme.h | 9 +++++++++ 2 files changed, 48 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index fab321e79b7cdbb89d96d950c1cc8c1128906770..2e208d894b27f85f7f6358eb697= be262ce45aed6 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -139,6 +139,7 @@ static void nvme_update_keep_alive(struct nvme_ctrl *ct= rl, struct nvme_command *cmd); static int nvme_get_log_lsi(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp, u8 csi, void *log, size_t size, u64 offset, u16 lsi); +static void nvme_validate_ns_work(struct work_struct *work); =20 void nvme_queue_scan(struct nvme_ctrl *ctrl) { @@ -4118,6 +4119,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, str= uct nvme_ns_info *info) ns->ctrl =3D ctrl; kref_init(&ns->kref); =20 + INIT_DELAYED_WORK(&ns->validate_work, nvme_validate_ns_work); + if (nvme_init_ns_head(ns, info)) goto out_cleanup_disk; =20 @@ -4215,6 +4218,8 @@ static void nvme_ns_remove(struct nvme_ns *ns) { bool last_path =3D false; =20 + cancel_delayed_work_sync(&ns->validate_work); + if (test_and_set_bit(NVME_NS_REMOVING, &ns->flags)) return; =20 @@ -4285,12 +4290,42 @@ static void nvme_validate_ns(struct nvme_ns *ns, st= ruct nvme_ns_info *info) out: /* * Only remove the namespace if we got a fatal error back from the - * device, otherwise ignore the error and just move on. - * - * TODO: we should probably schedule a delayed retry here. + * device, otherwise delayed retries are performed. */ - if (ret > 0 && (ret & NVME_STATUS_DNR)) + if (ret > 0 && (ret & NVME_STATUS_DNR)) { nvme_ns_remove(ns); + } else if (ret > 0) { + if (ns->validate_retries < NVME_NS_VALIDATION_MAX_RETRIES) { + ns->validate_retries++; + + if (!nvme_get_ns(ns)) + return; + + dev_warn( + ns->ctrl->device, + "validation failed for nsid %d, retry %d/%d in %ds\n", + ns->head->ns_id, ns->validate_retries, + NVME_NS_VALIDATION_MAX_RETRIES, + NVME_NS_VALIDATION_RETRY_INTERVAL); + memcpy(&ns->pending_info, info, sizeof(*info)); + schedule_delayed_work( + &ns->validate_work, + NVME_NS_VALIDATION_RETRY_INTERVAL * HZ); + } else { + dev_err(ns->ctrl->device, + "validation failed for nsid %d after %d retries\n", + ns->head->ns_id, + NVME_NS_VALIDATION_MAX_RETRIES); + } + } +} + +static void nvme_validate_ns_work(struct work_struct *work) +{ + struct nvme_ns *ns =3D container_of(to_delayed_work(work), struct nvme_ns, + validate_work); + nvme_validate_ns(ns, &ns->pending_info); + nvme_put_ns(ns); } =20 static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index ff4e7213131298a1a019eaa3822ca26f857b2443..17a4123e5e4da9828ef5662acca= 54e6aa9fd3cb9 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -46,6 +46,12 @@ extern unsigned int admin_timeout; #define NVME_CTRL_PAGE_SHIFT 12 #define NVME_CTRL_PAGE_SIZE (1 << NVME_CTRL_PAGE_SHIFT) =20 +/* + * Default to 3 retries in intervals of 3 seconds for namespace validation + */ +#define NVME_NS_VALIDATION_MAX_RETRIES 3 +#define NVME_NS_VALIDATION_RETRY_INTERVAL 3 + extern struct workqueue_struct *nvme_wq; extern struct workqueue_struct *nvme_reset_wq; extern struct workqueue_struct *nvme_delete_wq; @@ -565,6 +571,9 @@ struct nvme_ns { struct device cdev_device; =20 struct nvme_fault_inject fault_inject; + struct delayed_work validate_work; + struct nvme_ns_info pending_info; + unsigned int validate_retries; }; =20 /* NVMe ns supports metadata actions by the controller (generate/strip) */ --=20 2.51.0