[PATCH v2 2/5] nvme-fc: do not retry when auth fails or connection is refused

Daniel Wagner posted 5 patches 1 year, 11 months ago
There is a newer version of this series
[PATCH v2 2/5] nvme-fc: do not retry when auth fails or connection is refused
Posted by Daniel Wagner 1 year, 11 months ago
There is no point in retrying to connect if the authentication fails.

Connection refused is also issued from the authentication path, thus
also do not retry.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/host/fc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index a5b29e9ad342..b81046c9f171 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3312,6 +3312,8 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl, int status)
 			ctrl->cnum, status);
 		if (status > 0 && (status & NVME_SC_DNR))
 			recon = false;
+		if (status == NVME_SC_AUTH_REQUIRED || status == -ECONNREFUSED)
+			recon = false;
 	} else if (time_after_eq(jiffies, rport->dev_loss_end))
 		recon = false;
 
-- 
2.43.1
Re: [PATCH v2 2/5] nvme-fc: do not retry when auth fails or connection is refused
Posted by Hannes Reinecke 1 year, 11 months ago
On 2/21/24 14:24, Daniel Wagner wrote:
> There is no point in retrying to connect if the authentication fails.
> 
> Connection refused is also issued from the authentication path, thus
> also do not retry.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>   drivers/nvme/host/fc.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index a5b29e9ad342..b81046c9f171 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -3312,6 +3312,8 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl, int status)
>   			ctrl->cnum, status);
>   		if (status > 0 && (status & NVME_SC_DNR))
>   			recon = false;
> +		if (status == NVME_SC_AUTH_REQUIRED || status == -ECONNREFUSED)
> +			recon = false;
>   	} else if (time_after_eq(jiffies, rport->dev_loss_end))
>   		recon = false;
>   
Is this still required after the patchset to retry authentication errors?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

Re: Re: [PATCH v2 2/5] nvme-fc: do not retry when auth fails or connection is refused
Posted by Daniel Wagner 1 year, 11 months ago
On Wed, Feb 21, 2024 at 04:53:44PM +0100, Hannes Reinecke wrote:
> On 2/21/24 14:24, Daniel Wagner wrote:
> > There is no point in retrying to connect if the authentication fails.
> > 
> > Connection refused is also issued from the authentication path, thus
> > also do not retry.
> > 
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Daniel Wagner <dwagner@suse.de>
> > ---
> >   drivers/nvme/host/fc.c | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> > index a5b29e9ad342..b81046c9f171 100644
> > --- a/drivers/nvme/host/fc.c
> > +++ b/drivers/nvme/host/fc.c
> > @@ -3312,6 +3312,8 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl, int status)
> >   			ctrl->cnum, status);
> >   		if (status > 0 && (status & NVME_SC_DNR))
> >   			recon = false;
> > +		if (status == NVME_SC_AUTH_REQUIRED || status == -ECONNREFUSED)
> > +			recon = false;
> >   	} else if (time_after_eq(jiffies, rport->dev_loss_end))
> >   		recon = false;
> Is this still required after the patchset to retry authentication errors?

Do you mean

  48dae46676d1 ("nvme: enable retries for authentication commands")

?

In this case yes, I've tested on top of this patch. This breaks the loop
where the provided key is invalid or is missing. The loop would happy
retry until reaching max of retries.
Re: [PATCH v2 2/5] nvme-fc: do not retry when auth fails or connection is refused
Posted by Hannes Reinecke 1 year, 11 months ago
On 2/21/24 17:37, Daniel Wagner wrote:
> On Wed, Feb 21, 2024 at 04:53:44PM +0100, Hannes Reinecke wrote:
>> On 2/21/24 14:24, Daniel Wagner wrote:
>>> There is no point in retrying to connect if the authentication fails.
>>>
>>> Connection refused is also issued from the authentication path, thus
>>> also do not retry.
>>>
>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>> Signed-off-by: Daniel Wagner <dwagner@suse.de>
>>> ---
>>>    drivers/nvme/host/fc.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
>>> index a5b29e9ad342..b81046c9f171 100644
>>> --- a/drivers/nvme/host/fc.c
>>> +++ b/drivers/nvme/host/fc.c
>>> @@ -3312,6 +3312,8 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl, int status)
>>>    			ctrl->cnum, status);
>>>    		if (status > 0 && (status & NVME_SC_DNR))
>>>    			recon = false;
>>> +		if (status == NVME_SC_AUTH_REQUIRED || status == -ECONNREFUSED)
>>> +			recon = false;
>>>    	} else if (time_after_eq(jiffies, rport->dev_loss_end))
>>>    		recon = false;
>> Is this still required after the patchset to retry authentication errors?
> 
> Do you mean
> 
>    48dae46676d1 ("nvme: enable retries for authentication commands")
> 
> ?
Yes.

> 
> In this case yes, I've tested on top of this patch. This breaks the loop
> where the provided key is invalid or is missing. The loop would happy
> retry until reaching max of retries.

But that's to be expected, no? After all, that's _precisely_ what 
NVME_SC_DNR is for; if you shouldn't retry, that bit is set.
If it's not set, you should.
So why is NVME_SC_AUTH_REQUIRED an exception?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

Re: [PATCH v2 2/5] nvme-fc: do not retry when auth fails or connection is refused
Posted by Daniel Wagner 1 year, 11 months ago
On Thu, Feb 22, 2024 at 07:46:12AM +0100, Hannes Reinecke wrote:
> On 2/21/24 17:37, Daniel Wagner wrote:
> > On Wed, Feb 21, 2024 at 04:53:44PM +0100, Hannes Reinecke wrote:
> > In this case yes, I've tested on top of this patch. This breaks the loop
> > where the provided key is invalid or is missing. The loop would happy
> > retry until reaching max of retries.
> 
> But that's to be expected, no?

Why? If the key is wrong/missing it will be likely wrong/missing the
next retry again. So what's the point in retrying?

> After all, that's _precisely_ what
> NVME_SC_DNR is for;
> if you shouldn't retry, that bit is set.
> If it's not set, you should.

Okay, in this case there is a bug in the auth code somewhere.

> So why is NVME_SC_AUTH_REQUIRED an exception?

status is either NVME_SC_AUTH_REQUIRED or -ECONNREFUSED for the first
part of the nvme/041 (no key set). In this case DNR bit is not set.

Note, when -ECONNREFUSED is return

	if (status > 0 && (status & NVME_SC_DNR))

will not catch it either.

Glad we have these tests now.
Re: [PATCH v2 2/5] nvme-fc: do not retry when auth fails or connection is refused
Posted by Daniel Wagner 1 year, 11 months ago
On Thu, Feb 22, 2024 at 08:45:04AM +0100, Daniel Wagner wrote:
> On Thu, Feb 22, 2024 at 07:46:12AM +0100, Hannes Reinecke wrote:
> > On 2/21/24 17:37, Daniel Wagner wrote:
> > > On Wed, Feb 21, 2024 at 04:53:44PM +0100, Hannes Reinecke wrote:
> > > In this case yes, I've tested on top of this patch. This breaks the loop
> > > where the provided key is invalid or is missing. The loop would happy
> > > retry until reaching max of retries.
> > 
> > But that's to be expected, no?
> 
> Why? If the key is wrong/missing it will be likely wrong/missing the
> next retry again. So what's the point in retrying?
> 
> > After all, that's _precisely_ what
> > NVME_SC_DNR is for;
> > if you shouldn't retry, that bit is set.
> > If it's not set, you should.
> 
> Okay, in this case there is a bug in the auth code somewhere.

With the change below nvme/041 also passes:

modified   drivers/nvme/host/fabrics.c
@@ -467,7 +467,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
 		if (result & NVME_CONNECT_AUTHREQ_ASCR) {
 			dev_warn(ctrl->device,
 				 "qid 0: secure concatenation is not supported\n");
-			ret = NVME_SC_AUTH_REQUIRED;
+			ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
 			goto out_free_data;
 		}
 		/* Authentication required */
@@ -475,7 +475,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
 		if (ret) {
 			dev_warn(ctrl->device,
 				 "qid 0: authentication setup failed\n");
-			ret = NVME_SC_AUTH_REQUIRED;
+			ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
 			goto out_free_data;
 		}
 		ret = nvme_auth_wait(ctrl, 0);
@@ -540,8 +540,9 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid)
 		/* Secure concatenation is not implemented */
 		if (result & NVME_CONNECT_AUTHREQ_ASCR) {
 			dev_warn(ctrl->device,
-				 "qid 0: secure concatenation is not supported\n");
-			ret = NVME_SC_AUTH_REQUIRED;
+				 "qid %d: secure concatenation is not supported\n",
+				 qid);
+			ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
 			goto out_free_data;
 		}
 		/* Authentication required */
@@ -549,7 +550,7 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid)
 		if (ret) {
 			dev_warn(ctrl->device,
 				 "qid %d: authentication setup failed\n", qid);
-			ret = NVME_SC_AUTH_REQUIRED;
+			ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
 		} else {
 			ret = nvme_auth_wait(ctrl, qid);
 			if (ret)

Is this what you had in mind?
Re: [PATCH v2 2/5] nvme-fc: do not retry when auth fails or connection is refused
Posted by Hannes Reinecke 1 year, 11 months ago
On 2/22/24 18:02, Daniel Wagner wrote:
> On Thu, Feb 22, 2024 at 08:45:04AM +0100, Daniel Wagner wrote:
>> On Thu, Feb 22, 2024 at 07:46:12AM +0100, Hannes Reinecke wrote:
>>> On 2/21/24 17:37, Daniel Wagner wrote:
>>>> On Wed, Feb 21, 2024 at 04:53:44PM +0100, Hannes Reinecke wrote:
>>>> In this case yes, I've tested on top of this patch. This breaks the loop
>>>> where the provided key is invalid or is missing. The loop would happy
>>>> retry until reaching max of retries.
>>>
>>> But that's to be expected, no?
>>
>> Why? If the key is wrong/missing it will be likely wrong/missing the
>> next retry again. So what's the point in retrying?
>>
>>> After all, that's _precisely_ what
>>> NVME_SC_DNR is for;
>>> if you shouldn't retry, that bit is set.
>>> If it's not set, you should.
>>
>> Okay, in this case there is a bug in the auth code somewhere.
> 
> With the change below nvme/041 also passes:
> 
> modified   drivers/nvme/host/fabrics.c
> @@ -467,7 +467,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
>   		if (result & NVME_CONNECT_AUTHREQ_ASCR) {
>   			dev_warn(ctrl->device,
>   				 "qid 0: secure concatenation is not supported\n");
> -			ret = NVME_SC_AUTH_REQUIRED;
> +			ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>   			goto out_free_data;
>   		}
>   		/* Authentication required */
> @@ -475,7 +475,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
>   		if (ret) {
>   			dev_warn(ctrl->device,
>   				 "qid 0: authentication setup failed\n");
> -			ret = NVME_SC_AUTH_REQUIRED;
> +			ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>   			goto out_free_data;
>   		}
>   		ret = nvme_auth_wait(ctrl, 0);
> @@ -540,8 +540,9 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid)
>   		/* Secure concatenation is not implemented */
>   		if (result & NVME_CONNECT_AUTHREQ_ASCR) {
>   			dev_warn(ctrl->device,
> -				 "qid 0: secure concatenation is not supported\n");
> -			ret = NVME_SC_AUTH_REQUIRED;
> +				 "qid %d: secure concatenation is not supported\n",
> +				 qid);
> +			ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>   			goto out_free_data;
>   		}
>   		/* Authentication required */
> @@ -549,7 +550,7 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid)
>   		if (ret) {
>   			dev_warn(ctrl->device,
>   				 "qid %d: authentication setup failed\n", qid);
> -			ret = NVME_SC_AUTH_REQUIRED;
> +			ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>   		} else {
>   			ret = nvme_auth_wait(ctrl, qid);
>   			if (ret)
> 
> Is this what you had in mind?

Which, incidentally, is basically the patch I just posted.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

Re: [PATCH v2 2/5] nvme-fc: do not retry when auth fails or connection is refused
Posted by Sagi Grimberg 1 year, 11 months ago

On 23/02/2024 13:58, Hannes Reinecke wrote:
> On 2/22/24 18:02, Daniel Wagner wrote:
>> On Thu, Feb 22, 2024 at 08:45:04AM +0100, Daniel Wagner wrote:
>>> On Thu, Feb 22, 2024 at 07:46:12AM +0100, Hannes Reinecke wrote:
>>>> On 2/21/24 17:37, Daniel Wagner wrote:
>>>>> On Wed, Feb 21, 2024 at 04:53:44PM +0100, Hannes Reinecke wrote:
>>>>> In this case yes, I've tested on top of this patch. This breaks 
>>>>> the loop
>>>>> where the provided key is invalid or is missing. The loop would happy
>>>>> retry until reaching max of retries.
>>>>
>>>> But that's to be expected, no?
>>>
>>> Why? If the key is wrong/missing it will be likely wrong/missing the
>>> next retry again. So what's the point in retrying?
>>>
>>>> After all, that's _precisely_ what
>>>> NVME_SC_DNR is for;
>>>> if you shouldn't retry, that bit is set.
>>>> If it's not set, you should.
>>>
>>> Okay, in this case there is a bug in the auth code somewhere.
>>
>> With the change below nvme/041 also passes:
>>
>> modified   drivers/nvme/host/fabrics.c
>> @@ -467,7 +467,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
>>           if (result & NVME_CONNECT_AUTHREQ_ASCR) {
>>               dev_warn(ctrl->device,
>>                    "qid 0: secure concatenation is not supported\n");
>> -            ret = NVME_SC_AUTH_REQUIRED;
>> +            ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>>               goto out_free_data;
>>           }
>>           /* Authentication required */
>> @@ -475,7 +475,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
>>           if (ret) {
>>               dev_warn(ctrl->device,
>>                    "qid 0: authentication setup failed\n");
>> -            ret = NVME_SC_AUTH_REQUIRED;
>> +            ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>>               goto out_free_data;
>>           }
>>           ret = nvme_auth_wait(ctrl, 0);
>> @@ -540,8 +540,9 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, 
>> u16 qid)
>>           /* Secure concatenation is not implemented */
>>           if (result & NVME_CONNECT_AUTHREQ_ASCR) {
>>               dev_warn(ctrl->device,
>> -                 "qid 0: secure concatenation is not supported\n");
>> -            ret = NVME_SC_AUTH_REQUIRED;
>> +                 "qid %d: secure concatenation is not supported\n",
>> +                 qid);
>> +            ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>>               goto out_free_data;
>>           }
>>           /* Authentication required */
>> @@ -549,7 +550,7 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, 
>> u16 qid)
>>           if (ret) {
>>               dev_warn(ctrl->device,
>>                    "qid %d: authentication setup failed\n", qid);
>> -            ret = NVME_SC_AUTH_REQUIRED;
>> +            ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>>           } else {
>>               ret = nvme_auth_wait(ctrl, qid);
>>               if (ret)
>>
>> Is this what you had in mind?
>
> Which, incidentally, is basically the patch I just posted.

Reading this, the patchset from Hannes now is clearer.
Isn't the main issue is that nvme-fc tries to periodicly reconnect
when failing the first connect? This is exactly what the test expects
it to do right?
Re: [PATCH v2 2/5] nvme-fc: do not retry when auth fails or connection is refused
Posted by Daniel Wagner 1 year, 11 months ago
On Thu, Mar 07, 2024 at 12:25:16PM +0200, Sagi Grimberg wrote:
 > > Is this what you had in mind?
> > 
> > Which, incidentally, is basically the patch I just posted.
> 
> Reading this, the patchset from Hannes now is clearer.
> Isn't the main issue is that nvme-fc tries to periodicly reconnect
> when failing the first connect? This is exactly what the test expects
> it to do right?

Yes, the test expects that the initial connect attempt fails. nvme-fc is
using one connect path and doesn't distinguish between the initial
connect attempt and a reconnect.

All fabric transport share the same problem when the connection has been
established and later one a connection drop happens and a reconnnect is
executed. The blktest nvme/048 case extension I've posted tests the
reconnect attempt after the controller key has changed. In this
scenario, nvme-tcp, nvme-rdma will also do a reconnect attempt although
DNR is set.