[v3] nvme-fc: fix race with connectivity loss and nvme_fc_create_association

[PATCH v3 2/3] nvme: trigger reset when keep alive fails

Posted by Daniel Wagner 1 year, 2 months ago

nvme_keep_alive_work setups a keep alive command and uses
blk_execute_rq_nowait to send out the command in an asynchronously
manner. Eventually, nvme_keep_alive_end_io is called. If the status
argument is 0, a new keep alive is send out. When the status argument is
not 0, only an error is logged. The keep alive machinery does not
trigger the error recovery.

The FC driver is relying on the keep alive machinery to trigger recovery
when an error is detected. Whenever an error happens during the creation
of the association the idea is that the operation is aborted and retried
later. Though there is a window where an error happens and
nvme_fc_create_assocation can't detect the error.

         1) nvme nvme10: NVME-FC{10}: create association : ...
         2) nvme nvme10: NVME-FC{10}: controller connectivity lost. Awaiting Reconnect
            nvme nvme10: queue_size 128 > ctrl maxcmd 32, reducing to maxcmd
         3) nvme nvme10: Could not set queue count (880)
            nvme nvme10: Failed to configure AEN (cfg 900)
         4) nvme nvme10: NVME-FC{10}: controller connect complete
         5) nvme nvme10: failed nvme_keep_alive_end_io error=4

A connection attempt starts 1) and the ctrl is in state CONNECTING.
Shortly after the LLDD driver detects a connection lost event and calls
nvme_fc_ctrl_connectivity_loss 2). Because we are still in CONNECTING
state, this event is ignored.

nvme_fc_create_association continues to run in parallel and tries to
communicate with the controller and those commands fail. Though these
errors are filtered out, e.g in 3) setting the I/O queues numbers fails
which leads to an early exit in nvme_fc_create_io_queues. Because the
number of IO queues is 0 at this point, there is nothing left in
nvme_fc_create_association which could detected the connection drop.
Thus the ctrl enters LIVE state 4).

The keep alive timer fires and a keep alive command is send off but
gets rejected by nvme_fc_queue_rq and the rq status is set to
NVME_SC_HOST_PATH_ERROR. The nvme status is then mapped to a block layer
status BLK_STS_TRANSPORT/4 in nvme_end_req. Eventually,
nvme_keep_alive_end_io sees the status != 0 and just logs an error 5).

We should obviously detect the problem in 3) and abort there (will
address this later), but that still leaves a race window open. There is
a race window open in nvme_fc_create_association after starting the IO
queues and setting the ctrl state to LIVE.

Thus trigger a reset from the keep alive handler when an error is
reported.

Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 drivers/nvme/host/core.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index bfd71511c85f8b1a9508c6ea062475ff51bf27fe..2a07c2c540b26c8cbe886711abaf6f0afbe6c4df 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1320,6 +1320,12 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
 		dev_err(ctrl->device,
 			"failed nvme_keep_alive_end_io error=%d\n",
 				status);
+		/*
+		 * The driver reports that we lost the connection,
+		 * trigger a recovery.
+		 */
+		if (status == BLK_STS_TRANSPORT)
+			nvme_reset_ctrl(ctrl);
 		return RQ_END_IO_NONE;
 	}
 

-- 
2.47.0

Re: [PATCH v3 2/3] nvme: trigger reset when keep alive fails

Posted by Sagi Grimberg 1 year, 1 month ago



On 29/11/2024 11:28, Daniel Wagner wrote:
> nvme_keep_alive_work setups a keep alive command and uses
> blk_execute_rq_nowait to send out the command in an asynchronously
> manner. Eventually, nvme_keep_alive_end_io is called. If the status
> argument is 0, a new keep alive is send out. When the status argument is
> not 0, only an error is logged. The keep alive machinery does not
> trigger the error recovery.
>
> The FC driver is relying on the keep alive machinery to trigger recovery
> when an error is detected. Whenever an error happens during the creation
> of the association the idea is that the operation is aborted and retried
> later. Though there is a window where an error happens and
> nvme_fc_create_assocation can't detect the error.
>
>           1) nvme nvme10: NVME-FC{10}: create association : ...
>           2) nvme nvme10: NVME-FC{10}: controller connectivity lost. Awaiting Reconnect
>              nvme nvme10: queue_size 128 > ctrl maxcmd 32, reducing to maxcmd
>           3) nvme nvme10: Could not set queue count (880)
>              nvme nvme10: Failed to configure AEN (cfg 900)
>           4) nvme nvme10: NVME-FC{10}: controller connect complete
>           5) nvme nvme10: failed nvme_keep_alive_end_io error=4
>
> A connection attempt starts 1) and the ctrl is in state CONNECTING.
> Shortly after the LLDD driver detects a connection lost event and calls
> nvme_fc_ctrl_connectivity_loss 2). Because we are still in CONNECTING
> state, this event is ignored.
>
> nvme_fc_create_association continues to run in parallel and tries to
> communicate with the controller and those commands fail. Though these
> errors are filtered out, e.g in 3) setting the I/O queues numbers fails
> which leads to an early exit in nvme_fc_create_io_queues. Because the
> number of IO queues is 0 at this point, there is nothing left in
> nvme_fc_create_association which could detected the connection drop.
> Thus the ctrl enters LIVE state 4).
>
> The keep alive timer fires and a keep alive command is send off but
> gets rejected by nvme_fc_queue_rq and the rq status is set to
> NVME_SC_HOST_PATH_ERROR. The nvme status is then mapped to a block layer
> status BLK_STS_TRANSPORT/4 in nvme_end_req. Eventually,
> nvme_keep_alive_end_io sees the status != 0 and just logs an error 5).
>
> We should obviously detect the problem in 3) and abort there (will
> address this later), but that still leaves a race window open. There is
> a race window open in nvme_fc_create_association after starting the IO
> queues and setting the ctrl state to LIVE.
>
> Thus trigger a reset from the keep alive handler when an error is
> reported.
>
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> ---
>   drivers/nvme/host/core.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index bfd71511c85f8b1a9508c6ea062475ff51bf27fe..2a07c2c540b26c8cbe886711abaf6f0afbe6c4df 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1320,6 +1320,12 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
>   		dev_err(ctrl->device,
>   			"failed nvme_keep_alive_end_io error=%d\n",
>   				status);
> +		/*
> +		 * The driver reports that we lost the connection,
> +		 * trigger a recovery.
> +		 */
> +		if (status == BLK_STS_TRANSPORT)
> +			nvme_reset_ctrl(ctrl);
>   		return RQ_END_IO_NONE;
>   	}
>   
>

A lengthy explanation that results in nvme core behavior that assumes a 
very specific driver
behavior. Isn't the root of the problem that FC is willing to live 
peacefully with a controller
without any queues/connectivity to it without periodically reconnecting?

Re: [PATCH v3 2/3] nvme: trigger reset when keep alive fails

Posted by Daniel Wagner 1 year, 1 month ago

On Tue, Dec 24, 2024 at 12:31:35PM +0200, Sagi Grimberg wrote:
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index bfd71511c85f8b1a9508c6ea062475ff51bf27fe..2a07c2c540b26c8cbe886711abaf6f0afbe6c4df 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -1320,6 +1320,12 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
> >   		dev_err(ctrl->device,
> >   			"failed nvme_keep_alive_end_io error=%d\n",
> >   				status);
> > +		/*
> > +		 * The driver reports that we lost the connection,
> > +		 * trigger a recovery.
> > +		 */
> > +		if (status == BLK_STS_TRANSPORT)
> > +			nvme_reset_ctrl(ctrl);
> >   		return RQ_END_IO_NONE;
> >   	}
> > 
> 
> A lengthy explanation that results in nvme core behavior that assumes a very
> specific driver behavior.

I tried to explain exactly what's going on, so we can discuss possible
solutions without communicating past each other.

In the meantime I started on a patch set for the TP4129 related changes
in the spec (KATO Corrections and Clarifications). These changes would
also depend on the kato timeout handler triggering a reset.

I am fine with dropping this change for now and discuss it in the light
of TP4129 if this is what you prefer?

> Isn't the root of the problem that FC is willing to live
> peacefully with a controller
> without any queues/connectivity to it without periodically reconnecting?

The root problem is that the connect lost event gets ignored in the
CONNECTING state for the first connection attempt. All will work fine
for RECONNECTING state.

Maybe something like this instead? (untested)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index c4cbe3ce81f7..1f1d1d62a978 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -148,6 +148,7 @@ struct nvme_fc_rport {
 #define ASSOC_ACTIVE		0
 #define ASSOC_FAILED		1
 #define FCCTRL_TERMIO		2
+#define CONNECTIVITY_LOST	3

 struct nvme_fc_ctrl {
 	spinlock_t		lock;
@@ -785,6 +786,8 @@ nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
 		"NVME-FC{%d}: controller connectivity lost. Awaiting "
 		"Reconnect", ctrl->cnum);

+	set_bit(CONNECTIVITY_LOST, &ctrl->flags);
+
 	switch (nvme_ctrl_state(&ctrl->ctrl)) {
 	case NVME_CTRL_NEW:
 	case NVME_CTRL_LIVE:
@@ -3071,6 +3074,8 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 	if (nvme_fc_ctlr_active_on_rport(ctrl))
 		return -ENOTUNIQ;

+	clear_bit(CONNECTIVITY_LOST, &ctrl->flags);
+
 	dev_info(ctrl->ctrl.device,
 		"NVME-FC{%d}: create association : host wwpn 0x%016llx "
 		" rport wwpn 0x%016llx: NQN \"%s\"\n",
@@ -3174,6 +3179,11 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)

 	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);

+	if (test_bit(CONNECTIVITY_LOST, &ctrl->flags)) {
+		ret = -EIO;
+		goto out_term_aeo_ops;
+	}
+
 	ctrl->ctrl.nr_reconnects = 0;

 	if (changed)

Re: [PATCH v3 2/3] nvme: trigger reset when keep alive fails

Posted by Sagi Grimberg 1 year, 1 month ago



On 07/01/2025 16:38, Daniel Wagner wrote:
> On Tue, Dec 24, 2024 at 12:31:35PM +0200, Sagi Grimberg wrote:
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index bfd71511c85f8b1a9508c6ea062475ff51bf27fe..2a07c2c540b26c8cbe886711abaf6f0afbe6c4df 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -1320,6 +1320,12 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
>>>    		dev_err(ctrl->device,
>>>    			"failed nvme_keep_alive_end_io error=%d\n",
>>>    				status);
>>> +		/*
>>> +		 * The driver reports that we lost the connection,
>>> +		 * trigger a recovery.
>>> +		 */
>>> +		if (status == BLK_STS_TRANSPORT)
>>> +			nvme_reset_ctrl(ctrl);
>>>    		return RQ_END_IO_NONE;
>>>    	}
>>>
>> A lengthy explanation that results in nvme core behavior that assumes a very
>> specific driver behavior.
> I tried to explain exactly what's going on, so we can discuss possible
> solutions without communicating past each other.
>
> In the meantime I started on a patch set for the TP4129 related changes
> in the spec (KATO Corrections and Clarifications). These changes would
> also depend on the kato timeout handler triggering a reset.
>
> I am fine with dropping this change for now and discuss it in the light
> of TP4129 if this is what you prefer?
>
>> Isn't the root of the problem that FC is willing to live
>> peacefully with a controller
>> without any queues/connectivity to it without periodically reconnecting?
> The root problem is that the connect lost event gets ignored in the
> CONNECTING state for the first connection attempt. All will work fine
> for RECONNECTING state.
>
> Maybe something like this instead? (untested)
>
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index c4cbe3ce81f7..1f1d1d62a978 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -148,6 +148,7 @@ struct nvme_fc_rport {
>   #define ASSOC_ACTIVE		0
>   #define ASSOC_FAILED		1
>   #define FCCTRL_TERMIO		2
> +#define CONNECTIVITY_LOST	3
>
>   struct nvme_fc_ctrl {
>   	spinlock_t		lock;
> @@ -785,6 +786,8 @@ nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
>   		"NVME-FC{%d}: controller connectivity lost. Awaiting "
>   		"Reconnect", ctrl->cnum);
>
> +	set_bit(CONNECTIVITY_LOST, &ctrl->flags);
> +
>   	switch (nvme_ctrl_state(&ctrl->ctrl)) {
>   	case NVME_CTRL_NEW:
>   	case NVME_CTRL_LIVE:
> @@ -3071,6 +3074,8 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
>   	if (nvme_fc_ctlr_active_on_rport(ctrl))
>   		return -ENOTUNIQ;
>
> +	clear_bit(CONNECTIVITY_LOST, &ctrl->flags);
> +
>   	dev_info(ctrl->ctrl.device,
>   		"NVME-FC{%d}: create association : host wwpn 0x%016llx "
>   		" rport wwpn 0x%016llx: NQN \"%s\"\n",
> @@ -3174,6 +3179,11 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
>
>   	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
>
> +	if (test_bit(CONNECTIVITY_LOST, &ctrl->flags)) {
> +		ret = -EIO;
> +		goto out_term_aeo_ops;
> +	}
> +
>   	ctrl->ctrl.nr_reconnects = 0;
>
>   	if (changed)

This looks a lot better to me.

Re: [PATCH v3 2/3] nvme: trigger reset when keep alive fails

Posted by Christoph Hellwig 1 year, 2 months ago

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Re: [PATCH v3 2/3] nvme: trigger reset when keep alive fails

Posted by Hannes Reinecke 1 year, 2 months ago

On 11/29/24 10:28, Daniel Wagner wrote:
> nvme_keep_alive_work setups a keep alive command and uses
> blk_execute_rq_nowait to send out the command in an asynchronously
> manner. Eventually, nvme_keep_alive_end_io is called. If the status
> argument is 0, a new keep alive is send out. When the status argument is
> not 0, only an error is logged. The keep alive machinery does not
> trigger the error recovery.
> 
> The FC driver is relying on the keep alive machinery to trigger recovery
> when an error is detected. Whenever an error happens during the creation
> of the association the idea is that the operation is aborted and retried
> later. Though there is a window where an error happens and
> nvme_fc_create_assocation can't detect the error.
> 
>           1) nvme nvme10: NVME-FC{10}: create association : ...
>           2) nvme nvme10: NVME-FC{10}: controller connectivity lost. Awaiting Reconnect
>              nvme nvme10: queue_size 128 > ctrl maxcmd 32, reducing to maxcmd
>           3) nvme nvme10: Could not set queue count (880)
>              nvme nvme10: Failed to configure AEN (cfg 900)
>           4) nvme nvme10: NVME-FC{10}: controller connect complete
>           5) nvme nvme10: failed nvme_keep_alive_end_io error=4
> 
> A connection attempt starts 1) and the ctrl is in state CONNECTING.
> Shortly after the LLDD driver detects a connection lost event and calls
> nvme_fc_ctrl_connectivity_loss 2). Because we are still in CONNECTING
> state, this event is ignored.
> 
> nvme_fc_create_association continues to run in parallel and tries to
> communicate with the controller and those commands fail. Though these
> errors are filtered out, e.g in 3) setting the I/O queues numbers fails
> which leads to an early exit in nvme_fc_create_io_queues. Because the
> number of IO queues is 0 at this point, there is nothing left in
> nvme_fc_create_association which could detected the connection drop.
> Thus the ctrl enters LIVE state 4).
> 
> The keep alive timer fires and a keep alive command is send off but
> gets rejected by nvme_fc_queue_rq and the rq status is set to
> NVME_SC_HOST_PATH_ERROR. The nvme status is then mapped to a block layer
> status BLK_STS_TRANSPORT/4 in nvme_end_req. Eventually,
> nvme_keep_alive_end_io sees the status != 0 and just logs an error 5).
> 
> We should obviously detect the problem in 3) and abort there (will
> address this later), but that still leaves a race window open. There is
> a race window open in nvme_fc_create_association after starting the IO
> queues and setting the ctrl state to LIVE.
> 
> Thus trigger a reset from the keep alive handler when an error is
> reported.
> 
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> ---
>   drivers/nvme/host/core.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index bfd71511c85f8b1a9508c6ea062475ff51bf27fe..2a07c2c540b26c8cbe886711abaf6f0afbe6c4df 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1320,6 +1320,12 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
>   		dev_err(ctrl->device,
>   			"failed nvme_keep_alive_end_io error=%d\n",
>   				status);
> +		/*
> +		 * The driver reports that we lost the connection,
> +		 * trigger a recovery.
> +		 */
> +		if (status == BLK_STS_TRANSPORT)
> +			nvme_reset_ctrl(ctrl);
>   		return RQ_END_IO_NONE;
>   	}
>   
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich