[PATCH] nvme: Force ns info updates on validation if NID is bogus

Yihan Xin posted 1 patch 2 months, 2 weeks ago
drivers/nvme/host/core.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
[PATCH] nvme: Force ns info updates on validation if NID is bogus
Posted by Yihan Xin 2 months, 2 weeks ago
When validating a namespace, nvme_update_ns_info()
would be skipped if nsid changed. However, this
happens everytime the in-use controller is
reattached if NID is bogus, causing nsid not being
restored to the previous one, eg /dev/nvme0n2 ->
/dev/nvme0n1.

Don't skip ns info updates on this circumstance.

Signed-off-by: Yihan Xin <xyh1996@gmail.com>
---
 drivers/nvme/host/core.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1236e3aa00ed..c0875fb93b8d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3979,11 +3979,24 @@ static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_info *info)
 	int ret = NVME_SC_INVALID_NS | NVME_STATUS_DNR;
 
 	if (!nvme_ns_ids_equal(&ns->head->ids, &info->ids)) {
-		dev_err(ns->ctrl->device,
-			"identifiers changed for nsid %d\n", ns->head->ns_id);
-		goto out;
+		/*
+		 * Don't skip ns info updates if the NID is bogus as it
+		 * changes everytime the in-use controller is reattached
+		 * to the bus and thus the namespace is recognized as
+		 * another one.
+		 */
+		if (ns->ctrl->quirks & NVME_QUIRK_BOGUS_NID) {
+			dev_info(ns->ctrl->device,
+				 "Ignoring nsid change for bogus ns\n");
+		} else {
+			dev_err(ns->ctrl->device,
+				"identifiers changed for nsid %d\n",
+				ns->head->ns_id);
+			goto out;
+		}
 	}
 
+
 	ret = nvme_update_ns_info(ns, info);
 out:
 	/*
-- 
2.46.0
Re: [PATCH] nvme: Force ns info updates on validation if NID is bogus
Posted by Christoph Hellwig 2 months, 2 weeks ago
On Tue, Sep 10, 2024 at 05:50:06PM +0800, Yihan Xin wrote:
> When validating a namespace, nvme_update_ns_info()
> would be skipped if nsid changed. However, this
> happens everytime the in-use controller is
> reattached if NID is bogus, causing nsid not being
> restored to the previous one, eg /dev/nvme0n2 ->
> /dev/nvme0n1.

What do you mean with restoring the nsid?
Re: [PATCH] nvme: Force ns info updates on validation if NID is bogus
Posted by Yihan Xin 2 months, 2 weeks ago
Dear Christoph and others,

I mean that, for example, there is nsid 1 on the device,
the kernel discovers it as nvme0n1. When the device
had been reconnected (in my scenario, after resuming
from an s2idle), the kernel would recognize the namespace
as nsid 2, although it is still nsid 1 on the device,
which makes the block device not found.

As Hannes said, it shall be myself misunderstood the
reason that leads to this situation.
I will try looking into the real reason. Maybe there is
something wrong in the controller?

I am sorry for the inconvenience causing to you all.
(Please forgive my poor misunderstanding, I'm very new
to kernel development.)

Best regards,
Yihan Xin

Christoph Hellwig 於 2024/9/12 17:40 寫道:
> On Tue, Sep 10, 2024 at 05:50:06PM +0800, Yihan Xin wrote:
>> When validating a namespace, nvme_update_ns_info()
>> would be skipped if nsid changed. However, this
>> happens everytime the in-use controller is
>> reattached if NID is bogus, causing nsid not being
>> restored to the previous one, eg /dev/nvme0n2 ->
>> /dev/nvme0n1.
> 
> What do you mean with restoring the nsid?
> 
Re: [PATCH] nvme: Force ns info updates on validation if NID is bogus
Posted by Hannes Reinecke 2 months, 2 weeks ago
On 9/10/24 11:50, Yihan Xin wrote:
> When validating a namespace, nvme_update_ns_info()
> would be skipped if nsid changed. However, this
> happens everytime the in-use controller is
> reattached if NID is bogus, causing nsid not being
> restored to the previous one, eg /dev/nvme0n2 ->
> /dev/nvme0n1.
> 
> Don't skip ns info updates on this circumstance.
> 
> Signed-off-by: Yihan Xin <xyh1996@gmail.com>
> ---
>   drivers/nvme/host/core.c | 19 ++++++++++++++++---
>   1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 1236e3aa00ed..c0875fb93b8d 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3979,11 +3979,24 @@ static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_info *info)
>   	int ret = NVME_SC_INVALID_NS | NVME_STATUS_DNR;
>   
>   	if (!nvme_ns_ids_equal(&ns->head->ids, &info->ids)) {
> -		dev_err(ns->ctrl->device,
> -			"identifiers changed for nsid %d\n", ns->head->ns_id);
> -		goto out;
> +		/*
> +		 * Don't skip ns info updates if the NID is bogus as it
> +		 * changes everytime the in-use controller is reattached
> +		 * to the bus and thus the namespace is recognized as
> +		 * another one.
> +		 */
> +		if (ns->ctrl->quirks & NVME_QUIRK_BOGUS_NID) {
> +			dev_info(ns->ctrl->device,
> +				 "Ignoring nsid change for bogus ns\n");
> +		} else {
> +			dev_err(ns->ctrl->device,
> +				"identifiers changed for nsid %d\n",
> +				ns->head->ns_id);
> +			goto out;
> +		}
>   	}
>   
> +
>   	ret = nvme_update_ns_info(ns, info);
>   out:
>   	/*

Nope. A namespace is identified by both, the NSID and the namespace 
identifiers (GUID, UUID, you name it).
In our implementation we identify the namespace by NSID, and then
validate that the identifers match.
If you have a device which changes NSIDs you would need to swizzle
that logic around, ie identify the namespace by GUID/UUID, and then
check if the NSID matches.

But you'll run into issues here, as the NSID is stored in head->ns_id,
so if you attach a namespace with a different NSID you'll end up 
addressing the wrong namespace when sending commands.

So this 'fix' is actually wrong ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich