From nobody Sun Feb 8 17:13:46 2026 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92AF33A1E9D for ; Sun, 4 Jan 2026 00:52:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767487925; cv=none; b=c0ZL3AUlw+KueySdf9yeNjJ5LXVXgNMii3LwsqbAsxkstOoCPsoyDtZvAg2RuqrVOAFssUKmOG/CM3fTHXaTg4pTLk/7eFPROMUXd/jettAxhnogo9Q2E7MmBTPPX9H0kqykl8qk2WhylQO93222niNDBIDxtLUFipiSX3horuM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767487925; c=relaxed/simple; bh=Cmmmomj8lhHqur2QMx2mor83y6/OVMYkXUWMVB/Tmio=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=uCRWxS03nHtxVgCn7av+op8CMxphbaPfhaIlanbnVUaXKImbq9/H2a8tqnRUIvmWlTe8qrgrOdhVmv5ZHBjH6/V5SnfuJUQhGlHdxqX9tKR4B4iPUnbWXpz1l4P5J03Dv7cfbFj5XJ25Ofi2IQ9D0x0CmSVih0qycM/g0NbWXJk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=h9NEFC++; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="h9NEFC++" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-2a12ebe4b74so232979285ad.0 for ; Sat, 03 Jan 2026 16:52:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1767487923; x=1768092723; darn=vger.kernel.org; h=cc:to:message-id:content-transfer-encoding:mime-version:subject :date:from:from:to:cc:subject:date:message-id:reply-to; bh=xOuESBykRjwInDeyHixzVLD+QYGInpmQJHsDXEQedTE=; b=h9NEFC++C+tsCRvA91sATx6+1j0/Z5Hripds6hRaXEhXRGKv1fMh04Ec3dE6EzzNjE j9XtF/y14uh1ECcayNyyIfZUYoqlUY9mih3qR7U8Niim5y2YGcluqKU/3udPEGXCHkGA huN7z4RkldbhOCIUkAAVxBTByxSta+TuUJGFyx9ohaqIpnfB8nBhSijDxIesfBMXagLm D5jGGcawrdVME94KVJKLCgrxCZXct5YEKhgO7D0W8rVRHF66AiAyAqdrdyWVFZA6h/6O Iww01Kq3tUTDoOC0pXOEKvmkAYVD+eafFXXsKiq+Xk7dSoxOg6tMry49MaseTnHmyCFV ye5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767487923; x=1768092723; h=cc:to:message-id:content-transfer-encoding:mime-version:subject :date:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=xOuESBykRjwInDeyHixzVLD+QYGInpmQJHsDXEQedTE=; b=Q8HUEwVwmEw5/e9aF/sGeHJX7NqBGl5+dUuJ++8Vx+yG1ff7opv1G/AHl1lq9mfr64 F73dDQ/RT9PfN+UATctpX1+eP4WWsabp84hR1WeVCIX2buCpIcm0ro1BxH3VwCFqtpjp dBPXtwuAxiReDnTk0Ojj1RODX/u9AkekbnJXejffEVodzkBAlFZEAqyXhojUa8W8J9H4 oz7IW2hjQ3gp/F6HsRVF7Z8FZaBdkI3r/8H20Wk3qUuV2vluI9i3oz+saoL7fpzAekw3 6qL06ENMr2kRi06kFM8tTg1LBekguzD7FDFL1Iouass9Mk2me1oHZb98IXM6+qSYDyGA qJSg== X-Forwarded-Encrypted: i=1; AJvYcCWC4vGc62ar8dUoGSKf6qjCXYourqVSEHI2Zk9DaceFHg7pANEHdR+z4XTq+gSXDouOj3CkfCckbbX/454=@vger.kernel.org X-Gm-Message-State: AOJu0YxgQnGO2fIZ/MMi8zqWbJQuVqht4VlNEklZKZmCF8Z0FEDpQ2p8 qtPH6zYYH0xB2M0b1Q9lLQ5kwNbTPihb+m3iH07TSAJXNZ6FKLMvh7L9 X-Gm-Gg: AY/fxX4AqWI2uKbTbege19ImlEtC8H5RfrdKoc150p+umxX/MvW3IOaYVMioppSNwlM fpv/6qs9/PBNP9V+EMulgRzQcpfWr/tNW5hlPVVohSuT63uWylUroLrhc0CB+uvtlYbfgaLvqee bdHbXPf2pudpGVBRk8Skav49mCNJ2wSOoS73fvp7WNWbo26uBlXXK1Czc+/hZm4ZUMR+luoqBpp MljX2Z/oNEwl871Bm4F89DHg11iDzFnjOFIWv5ISAAHy9NF98Up45eI2xdDHPHAyuYpHPnErhBd Pw9yV8acoNsrtIpYEXGslQihPe2rzqvsrPxbdnz8WwkV02LohsA2tWXTmgn0CUtszsdg3IWy35M d/F1BC6qk0RRDk8GxRVYuzKJDfj+tOALgegHDUK7V3XbexLtrktfOYg8PntWEOX620gle/hSZrs wbhTB2rxl2C9uux9k= X-Google-Smtp-Source: AGHT+IEM5YAVp1hO4jI8G+b/4iCizVJHAgrtd7TCV9On/0SPTxyMk5C5L+hZm5jw6QCdCoo6CaArJg== X-Received: by 2002:a17:903:2441:b0:2a0:b02b:210c with SMTP id d9443c01a7336-2a2f2208e65mr421259995ad.1.1767487922746; Sat, 03 Jan 2026 16:52:02 -0800 (PST) Received: from [192.168.1.83] ([61.71.57.214]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2a2f3c82a10sm409701275ad.26.2026.01.03.16.52.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 03 Jan 2026 16:52:02 -0800 (PST) From: Alex Tran Date: Sat, 03 Jan 2026 16:52:00 -0800 Subject: [PATCH v3] nvme/host: add delayed retries upon non-fatal error during ns validation Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260103-nvme_ns_validation-v3-1-c9ce67393261@gmail.com> X-B4-Tracking: v=1; b=H4sIAK+5WWkC/22NwQqDMBBEf0X23JRk07Slp/5HEVl1qwsaJZHQI v57U6G3Ht8w82aFyEE4wq1YIXCSKJPPYA8FND35jpW0mQE1OoPolE8jVz5WiQZpacl1ZY0mfSG 61lhDHs6Bn/LapY8ycy9xmcJ7/0j4TX+68z9dQmWUZlNbMid0zt67kWQ4NtMI5bZtH6LV73m0A AAA X-Change-ID: 20251225-nvme_ns_validation-310a07aa8b2b To: Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Alex Tran X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=3896; i=alex.t.tran@gmail.com; h=from:subject:message-id; bh=Cmmmomj8lhHqur2QMx2mor83y6/OVMYkXUWMVB/Tmio=; b=owEBbQKS/ZANAwAKAXT5fTREJs3IAcsmYgBpWbmyrRM6uJeouHP8gKRBMipO5IqvBgyoSZQAa 2mS+CmKMkiJAjMEAAEKAB0WIQQAohViG04SVxUVrcd0+X00RCbNyAUCaVm5sgAKCRB0+X00RCbN yK2LD/9QzRtiQ0VrzSGxW6r4k89QDB3aDqGnWajqOBMpVu5HJ+clfTbKqwtMB305Dug6OXiQjvk NxnVECJ9fFqA98XaVWVgWDvu9tzOJ601l2cZjNQyXvqQePtujt3F7Jpf8sWL19Ia30m5SMkB+Qw zuKm2Vl7wI9rPxyZdgK3VNgBfyVwYrYi31P9FSuruznX8Xxr5Cn2QTRm4KCdUXPCoV9/oH7F3Fn rdcYNyv1B8sH0HBWDFhcqy9UZQhfQPGR9UhlqgrxXellgy17CDxpaH6NyU5HOFSUeDc+d7JIQHH rh0Km2cG03vz4Zr2yErixNOibYm+Sr5bLEUsrZPXeAQT5fQ6HruCU9A5+NzBvbwh+vOW8u+sVZX iD6FvTxXzivIxNHf6ThBl25M9Dz2OpGAoKGKRdLnDdGelcC8BWBnQQHbPwXqDuoF0dfmactDvjj v1Ae46NKJq/rG09IbhP7vQFelyvSlxq22d3sbDmaLPp9jcVzgaKWiAmeJAPvVkBor7dI5YMjPsx OHOXgt2LuN2LH15jr0xlWvcGTvKyTvCvi1Wln+f1623Bw+F1w4DjAU/buhOL9dU1MXIRokxUv9a OBWki/z/oyMx3sbZFWKYMwRxAY5knr1LMSgI+FE7szMlD4XpJ/PYiPmRGa+owWjTb+562xEQDKX aOwH9uhC/UctraA== X-Developer-Key: i=alex.t.tran@gmail.com; a=openpgp; fpr=00A215621B4E12571515ADC774F97D344426CDC8 If a non-fatal error is received during nvme namespace validation, it should not be ignored and the namespace should not be removed immediately. Rather, delayed retries should be performed on the namespace validation process. This handles non-fatal issues more robustly, by retrying a few times before giving up and removing the namespace. The number of retries is set to 3 and the interval between retries is set to 3 seconds. The retries are handled locally. Upon success then end. Upon fatal error then remove the namespace before ending. Upon non-fatal error, retry until the max retry amount is hit before ending and removing the ns. Signed-off-by: Alex Tran Reviewed-by: Sagi Grimberg --- Changes in v3: - Simplify retry loop to i < NVME_NS_VALIDATION_MAX_RETRIES - Use dev_dbg print - Remove ns when still receiving non-fatal error after max retries - Link to v2: https://lore.kernel.org/r/20251226-nvme_ns_validation-v2-1-0e= 1b3a142553@gmail.com Changes in v2: - Simplify retry logic with local loop instead of delayed work. --- drivers/nvme/host/core.c | 31 +++++++++++++++++++++++-------- drivers/nvme/host/nvme.h | 6 ++++++ 2 files changed, 29 insertions(+), 8 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 7bf228df6001f1f4d0b3c570de285a5eb17bb08e..ed28abb6462e46ca1203a6a8b16= 894d36817db2c 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4289,23 +4289,38 @@ static void nvme_ns_remove_by_nsid(struct nvme_ctrl= *ctrl, u32 nsid) static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_info *info) { int ret =3D NVME_SC_INVALID_NS | NVME_STATUS_DNR; + unsigned int i; =20 if (!nvme_ns_ids_equal(&ns->head->ids, &info->ids)) { - dev_err(ns->ctrl->device, - "identifiers changed for nsid %d\n", ns->head->ns_id); + dev_err(ns->ctrl->device, "identifiers changed for nsid %d\n", + ns->head->ns_id); goto out; } =20 - ret =3D nvme_update_ns_info(ns, info); + for (i =3D 0; i < NVME_NS_VALIDATION_MAX_RETRIES; i++) { + ret =3D nvme_update_ns_info(ns, info); + if (ret =3D=3D 0) + return; + + if (ret > 0 && (ret & NVME_STATUS_DNR)) + goto out; + + dev_dbg(ns->ctrl->device, + "validation failed for nsid %d, retry %d/%d in %dms\n", + ns->head->ns_id, i + 1, NVME_NS_VALIDATION_MAX_RETRIES, + NVME_NS_VALIDATION_RETRY_INTERVAL); + msleep(NVME_NS_VALIDATION_RETRY_INTERVAL); + } + + dev_err(ns->ctrl->device, + "validation failed for nsid %d after %d retries\n", + ns->head->ns_id, NVME_NS_VALIDATION_MAX_RETRIES); out: /* * Only remove the namespace if we got a fatal error back from the - * device, otherwise ignore the error and just move on. - * - * TODO: we should probably schedule a delayed retry here. + * device or if a non-fatal error still appears after max retries. */ - if (ret > 0 && (ret & NVME_STATUS_DNR)) - nvme_ns_remove(ns); + nvme_ns_remove(ns); } =20 static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 9a5f28c5103c5c42777bd9309a983ef0196c1b95..dcbdc7fa8af0cb838b3f1a774d4= c67fa69a00050 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -46,6 +46,12 @@ extern unsigned int admin_timeout; #define NVME_CTRL_PAGE_SHIFT 12 #define NVME_CTRL_PAGE_SIZE (1 << NVME_CTRL_PAGE_SHIFT) =20 +/* + * Default to 3 retries in intervals of 3000ms for namespace validation + */ +#define NVME_NS_VALIDATION_MAX_RETRIES 3 +#define NVME_NS_VALIDATION_RETRY_INTERVAL 3000 + extern struct workqueue_struct *nvme_wq; extern struct workqueue_struct *nvme_reset_wq; extern struct workqueue_struct *nvme_delete_wq; --- base-commit: fa084c35afa13ab07a860ef0936cd987f9aa0460 change-id: 20251225-nvme_ns_validation-310a07aa8b2b Best regards, --=20 Alex Tran