[v4] nvme-fc: fix race with connectivity loss and nvme_fc_create_association

[PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting

Posted by Daniel Wagner 1 year, 1 month ago

When a connectivity loss occurs while nvme_fc_create_assocation is
being executed, it's possible that the ctrl ends up stuck in the LIVE
state:

  1) nvme nvme10: NVME-FC{10}: create association : ...
  2) nvme nvme10: NVME-FC{10}: controller connectivity lost.
                  Awaiting Reconnect
     nvme nvme10: queue_size 128 > ctrl maxcmd 32, reducing to maxcmd
  3) nvme nvme10: Could not set queue count (880)
     nvme nvme10: Failed to configure AEN (cfg 900)
  4) nvme nvme10: NVME-FC{10}: controller connect complete
  5) nvme nvme10: failed nvme_keep_alive_end_io error=4

A connection attempt starts 1) and the ctrl is in state CONNECTING.
Shortly after the LLDD driver detects a connection lost event and calls
nvme_fc_ctrl_connectivity_loss 2). Because we are still in CONNECTING
state, this event is ignored.

nvme_fc_create_association continues to run in parallel and tries to
communicate with the controller and these commands will fail. Though
these errors are filtered out, e.g in 3) setting the I/O queues numbers
fails which leads to an early exit in nvme_fc_create_io_queues. Because
the number of IO queues is 0 at this point, there is nothing left in
nvme_fc_create_association which could detected the connection drop.
Thus the ctrl enters LIVE state 4).

Eventually the keep alive handler times out 5) but because nothing is
being done, the ctrl stays in LIVE state.

There is already the ASSOC_FAILED flag to track connectivity loss event
but this bit is set too late in the recovery code path. Move this into
the connectivity loss event handler and synchronize it with the state
change. This ensures that the ASSOC_FAILED flag is seen by
nvme_fc_create_io_queues and it does not enter the LIVE state after a
connectivity loss event. If the connectivity loss event happens after we
entered the LIVE state the normal error recovery path is executed.

Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 drivers/nvme/host/fc.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 7409da42b9ee580cdd6fe78c0f93e78c4ad08675..55884d3df6f291cfddb4742e135b54a72f1cfa05 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -781,11 +781,19 @@ nvme_fc_abort_lsops(struct nvme_fc_rport *rport)
 static void
 nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
 {
+	enum nvme_ctrl_state state;
+	unsigned long flags;
+
 	dev_info(ctrl->ctrl.device,
 		"NVME-FC{%d}: controller connectivity lost. Awaiting "
 		"Reconnect", ctrl->cnum);
 
-	switch (nvme_ctrl_state(&ctrl->ctrl)) {
+	spin_lock_irqsave(&ctrl->lock, flags);
+	set_bit(ASSOC_FAILED, &ctrl->flags);
+	state = nvme_ctrl_state(&ctrl->ctrl);
+	spin_unlock_irqrestore(&ctrl->lock, flags);
+
+	switch (state) {
 	case NVME_CTRL_NEW:
 	case NVME_CTRL_LIVE:
 		/*
@@ -2542,7 +2550,6 @@ nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg)
 	 */
 	if (ctrl->ctrl.state == NVME_CTRL_CONNECTING) {
 		__nvme_fc_abort_outstanding_ios(ctrl, true);
-		set_bit(ASSOC_FAILED, &ctrl->flags);
 		dev_warn(ctrl->ctrl.device,
 			"NVME-FC{%d}: transport error during (re)connect\n",
 			ctrl->cnum);
@@ -3167,12 +3174,18 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 		else
 			ret = nvme_fc_recreate_io_queues(ctrl);
 	}
-	if (!ret && test_bit(ASSOC_FAILED, &ctrl->flags))
-		ret = -EIO;
 	if (ret)
 		goto out_term_aen_ops;
 
-	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
+	spin_lock_irqsave(&ctrl->lock, flags);
+	if (!test_bit(ASSOC_FAILED, &ctrl->flags))
+		changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
+	else
+		ret = -EIO;
+	spin_unlock_irqrestore(&ctrl->lock, flags);
+
+	if (ret)
+		goto out_term_aen_ops;
 
 	ctrl->ctrl.nr_reconnects = 0;
 

-- 
2.47.1

Re: [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting

Posted by Shinichiro Kawasaki 12 months ago

On Jan 09, 2025 / 14:30, Daniel Wagner wrote:
> When a connectivity loss occurs while nvme_fc_create_assocation is
> being executed, it's possible that the ctrl ends up stuck in the LIVE
> state:
>
[...]
> 
> There is already the ASSOC_FAILED flag to track connectivity loss event
> but this bit is set too late in the recovery code path. Move this into
> the connectivity loss event handler and synchronize it with the state
> change. This ensures that the ASSOC_FAILED flag is seen by
> nvme_fc_create_io_queues and it does not enter the LIVE state after a
> connectivity loss event. If the connectivity loss event happens after we
> entered the LIVE state the normal error recovery path is executed.

I found many test cases in blktests nvme group fail with v6.14-rc2 kernel with
fc transport. I bisected and found this patch, as the commit ee59e3820ca9, is
the trigger. When I revert the commit from v6.14-rc2, the failure disappears.

Here I share the kernel message log observed at the test case nvme/003. The
kernel reported "BUG: sleeping function called from invalid context".


Feb 12 12:03:08 testnode1 unknown: run blktests nvme/003 at 2025-02-12 12:03:08
Feb 12 12:03:08 testnode1 kernel: loop0: detected capacity change from 0 to 2097152
Feb 12 12:03:08 testnode1 kernel: nvmet: adding nsid 1 to subsystem blktests-subsystem-1
Feb 12 12:03:08 testnode1 kernel: nvme nvme1: NVME-FC{0}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
Feb 12 12:03:08 testnode1 kernel: (NULL device *): {0:0} Association created
Feb 12 12:03:08 testnode1 kernel: nvmet: Created discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
Feb 12 12:03:08 testnode1 kernel: BUG: sleeping function called from invalid context at kernel/workqueue.c:4355
Feb 12 12:03:08 testnode1 kernel: in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 68705, name: kworker/u16:14
Feb 12 12:03:08 testnode1 kernel: preempt_count: 1, expected: 0
Feb 12 12:03:08 testnode1 kernel: RCU nest depth: 0, expected: 0
Feb 12 12:03:08 testnode1 kernel: 3 locks held by kworker/u16:14/68705:
Feb 12 12:03:08 testnode1 kernel:  #0: ffff88813a90b148 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: process_one_work+0xf20/0x1460
Feb 12 12:03:08 testnode1 kernel:  #1: ffff88813b397d38 ((work_completion)(&(&ctrl->connect_work)->work)){+.+.}-{0:0}, at: process_one_work+0x7de/0x1460
Feb 12 12:03:08 testnode1 kernel:  #2: ffff8881ab5bc018 (&ctrl->lock#2){....}-{3:3}, at: nvme_fc_connect_ctrl_work.cold+0x2200/0x2c9d [nvme_fc]
Feb 12 12:03:08 testnode1 kernel: irq event stamp: 140088
Feb 12 12:03:08 testnode1 kernel: hardirqs last  enabled at (140087): [<ffffffff90e222ec>] _raw_spin_unlock_irqrestore+0x4c/0x60
Feb 12 12:03:08 testnode1 kernel: hardirqs last disabled at (140088): [<ffffffff90e21fef>] _raw_spin_lock_irqsave+0x5f/0x70
Feb 12 12:03:08 testnode1 kernel: softirqs last  enabled at (139874): [<ffffffff8e5158b9>] __irq_exit_rcu+0x109/0x210
Feb 12 12:03:08 testnode1 kernel: softirqs last disabled at (139867): [<ffffffff8e5158b9>] __irq_exit_rcu+0x109/0x210
Feb 12 12:03:08 testnode1 kernel: Preemption disabled at:
Feb 12 12:03:08 testnode1 kernel: [<0000000000000000>] 0x0
Feb 12 12:03:08 testnode1 kernel: CPU: 0 UID: 0 PID: 68705 Comm: kworker/u16:14 Not tainted 6.14.0-rc2 #256
Feb 12 12:03:08 testnode1 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
Feb 12 12:03:08 testnode1 kernel: Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc]
Feb 12 12:03:08 testnode1 kernel: Call Trace:
Feb 12 12:03:08 testnode1 kernel:  <TASK>
Feb 12 12:03:08 testnode1 kernel:  dump_stack_lvl+0x6a/0x90
Feb 12 12:03:08 testnode1 kernel:  __might_resched.cold+0x1f7/0x23d
Feb 12 12:03:08 testnode1 kernel:  ? __pfx___might_resched+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  cancel_delayed_work_sync+0x67/0xb0
Feb 12 12:03:08 testnode1 kernel:  nvme_change_ctrl_state+0x104/0x2f0 [nvme_core]
Feb 12 12:03:08 testnode1 kernel:  nvme_fc_connect_ctrl_work.cold+0x2252/0x2c9d [nvme_fc]
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_nvme_fc_connect_ctrl_work+0x10/0x10 [nvme_fc]
Feb 12 12:03:08 testnode1 kernel:  process_one_work+0x85a/0x1460
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_lock_acquire+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_process_one_work+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  ? assign_work+0x16c/0x240
Feb 12 12:03:08 testnode1 kernel:  ? lock_is_held_type+0xd5/0x130
Feb 12 12:03:08 testnode1 kernel:  worker_thread+0x5e2/0xfc0
Feb 12 12:03:08 testnode1 kernel:  ? __kthread_parkme+0xb1/0x1d0
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_worker_thread+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_worker_thread+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  kthread+0x39d/0x750
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  ret_from_fork+0x30/0x70
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  ret_from_fork_asm+0x1a/0x30
Feb 12 12:03:08 testnode1 kernel:  </TASK>
Feb 12 12:03:08 testnode1 kernel: nvme nvme1: NVME-FC{0}: controller connect complete
Feb 12 12:03:08 testnode1 kernel: nvme nvme1: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
Feb 12 12:03:08 testnode1 kernel: nvme nvme2: NVME-FC{1}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
Feb 12 12:03:08 testnode1 kernel: (NULL device *): {0:1} Association created
Feb 12 12:03:08 testnode1 kernel: nvmet: Created discovery controller 2 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:f35c1212-8678-4a7f-99e5-05e61788783b.
Feb 12 12:03:08 testnode1 kernel: nvme nvme2: NVME-FC{1}: controller connect complete
Feb 12 12:03:08 testnode1 kernel: nvme nvme2: NVME-FC{1}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", hostnqn: nqn.2014-08.org.nvmexpress:uuid:f35c1212-8678-4a7f-99e5-05e61788783b
Feb 12 12:03:08 testnode1 kernel: nvme nvme2: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Feb 12 12:03:08 testnode1 kernel: (NULL device *): {0:1} Association deleted
Feb 12 12:03:08 testnode1 kernel: (NULL device *): {0:1} Association freed
Feb 12 12:03:08 testnode1 kernel: (NULL device *): Disconnect LS failed: No Association
Feb 12 12:03:18 testnode1 kernel: nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Feb 12 12:03:18 testnode1 kernel: (NULL device *): {0:0} Association deleted
Feb 12 12:03:18 testnode1 kernel: (NULL device *): {0:0} Association freed
Feb 12 12:03:18 testnode1 kernel: (NULL device *): Disconnect LS failed: No Association
Feb 12 12:03:18 testnode1 kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x10001100ab000001:pn-0x20001100ab000001 - nn-0x10001100aa000001:pn-0x20001100aa000001 combination not found

Re: [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting

Posted by Daniel Wagner 12 months ago

On Thu, Feb 13, 2025 at 07:16:31AM +0000, Shinichiro Kawasaki wrote:
> On Jan 09, 2025 / 14:30, Daniel Wagner wrote:
> > When a connectivity loss occurs while nvme_fc_create_assocation is
> > being executed, it's possible that the ctrl ends up stuck in the LIVE
> > state:
> >
> [...]
> > 
> > There is already the ASSOC_FAILED flag to track connectivity loss event
> > but this bit is set too late in the recovery code path. Move this into
> > the connectivity loss event handler and synchronize it with the state
> > change. This ensures that the ASSOC_FAILED flag is seen by
> > nvme_fc_create_io_queues and it does not enter the LIVE state after a
> > connectivity loss event. If the connectivity loss event happens after we
> > entered the LIVE state the normal error recovery path is executed.
> 
> I found many test cases in blktests nvme group fail with v6.14-rc2 kernel with
> fc transport. I bisected and found this patch, as the commit ee59e3820ca9, is
> the trigger. When I revert the commit from v6.14-rc2, the failure disappears.
> 
> Here I share the kernel message log observed at the test case nvme/003. The
> kernel reported "BUG: sleeping function called from invalid context".

Thanks for digging into it.

The problem is that the patch moves the nvme_change_ctrl_state call
under the ctrl->lock and nvme_change_ctrl_state is also
cancel_delayed_work_sync which doesn't work:

  spin_lock_irqsave
    nvme_change_ctrl_state
      nvme_stop_failfast_work
        cancel_delayed_work_sync

I've checked if there is another caller of nvme_change_ctrl_state is
taking a lock, all looks fine. If there would be way to move the
failfast code bits out of the nvme_change_ctrl_state function would be a
nice solution.

That means the idea to synchronize the state change with the
ASSOC_FAILED bit under the lock is not going to work. I am trying to
figure out a solution.

Re: [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting

Posted by Daniel Wagner 12 months ago

On Thu, Feb 13, 2025 at 10:14:46AM +0100, Daniel Wagner wrote:
> That means the idea to synchronize the state change with the
> ASSOC_FAILED bit under the lock is not going to work. I am trying to
> figure out a solution.

I found a possible solution. The only 'catch' is to make the state
machine a bit more restrictive. The only valid transition to LIVE would
be from CONNECTING. We can do this because even the PCI driver is doing
all the state transitions NEW -> CONNECTING -> LIVE. It's important that
we can't enter LIVE from RESETTING to get the patch below working.

We don't have to rely on another variable to figure in which state the
ctrl is. The nvme_fc_ctrl_connectivity_loss callback needs always to
trigger a reset. If the ctrl is not in LIVE it is a no-op. This makes it
possible to remove the lock around the ASSOC_FAILED and the state read
operation.

In nvme_fc_create_association we only have to check if we can enter the
LIVE state (thus we were in CONNECTING) and if this fails we entered the
RESETTING state and should return an error.


diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 818d4e49aab5..f028913e2e62 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -564,8 +564,6 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 	switch (new_state) {
 	case NVME_CTRL_LIVE:
 		switch (old_state) {
-		case NVME_CTRL_NEW:
-		case NVME_CTRL_RESETTING:
 		case NVME_CTRL_CONNECTING:
 			changed = true;
 			fallthrough;
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index f4f1866fbd5b..e740814fd1ea 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -781,61 +781,12 @@ nvme_fc_abort_lsops(struct nvme_fc_rport *rport)
 static void
 nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
 {
-	enum nvme_ctrl_state state;
-	unsigned long flags;
-
 	dev_info(ctrl->ctrl.device,
 		"NVME-FC{%d}: controller connectivity lost. Awaiting "
 		"Reconnect", ctrl->cnum);

-	spin_lock_irqsave(&ctrl->lock, flags);
 	set_bit(ASSOC_FAILED, &ctrl->flags);
-	state = nvme_ctrl_state(&ctrl->ctrl);
-	spin_unlock_irqrestore(&ctrl->lock, flags);
-
-	switch (state) {
-	case NVME_CTRL_NEW:
-	case NVME_CTRL_LIVE:
-		/*
-		 * Schedule a controller reset. The reset will terminate the
-		 * association and schedule the reconnect timer.  Reconnects
-		 * will be attempted until either the ctlr_loss_tmo
-		 * (max_retries * connect_delay) expires or the remoteport's
-		 * dev_loss_tmo expires.
-		 */
-		if (nvme_reset_ctrl(&ctrl->ctrl)) {
-			dev_warn(ctrl->ctrl.device,
-				"NVME-FC{%d}: Couldn't schedule reset.\n",
-				ctrl->cnum);
-			nvme_delete_ctrl(&ctrl->ctrl);
-		}
-		break;
-
-	case NVME_CTRL_CONNECTING:
-		/*
-		 * The association has already been terminated and the
-		 * controller is attempting reconnects.  No need to do anything
-		 * futher.  Reconnects will be attempted until either the
-		 * ctlr_loss_tmo (max_retries * connect_delay) expires or the
-		 * remoteport's dev_loss_tmo expires.
-		 */
-		break;
-
-	case NVME_CTRL_RESETTING:
-		/*
-		 * Controller is already in the process of terminating the
-		 * association.  No need to do anything further. The reconnect
-		 * step will kick in naturally after the association is
-		 * terminated.
-		 */
-		break;
-
-	case NVME_CTRL_DELETING:
-	case NVME_CTRL_DELETING_NOIO:
-	default:
-		/* no action to take - let it delete */
-		break;
-	}
+	nvme_reset_ctrl(&ctrl->ctrl);
 }

 /**
@@ -3177,23 +3128,18 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 		else
 			ret = nvme_fc_recreate_io_queues(ctrl);
 	}
+	if (!ret && test_bit(ASSOC_FAILED, &ctrl->flags))
+		ret = -EIO;
 	if (ret)
 		goto out_term_aen_ops;

-	spin_lock_irqsave(&ctrl->lock, flags);
-	if (!test_bit(ASSOC_FAILED, &ctrl->flags))
-		changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
-	else
+	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE)) {
 		ret = -EIO;
-	spin_unlock_irqrestore(&ctrl->lock, flags);
-
-	if (ret)
 		goto out_term_aen_ops;
+	}

 	ctrl->ctrl.nr_reconnects = 0;
-
-	if (changed)
-		nvme_start_ctrl(&ctrl->ctrl);
+	nvme_start_ctrl(&ctrl->ctrl);

 	return 0;	/* Success */

Re: [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting

Posted by Shinichiro Kawasaki 12 months ago

On Feb 13, 2025 / 15:22, Daniel Wagner wrote:
> On Thu, Feb 13, 2025 at 10:14:46AM +0100, Daniel Wagner wrote:
> > That means the idea to synchronize the state change with the
> > ASSOC_FAILED bit under the lock is not going to work. I am trying to
> > figure out a solution.
> 
> I found a possible solution. The only 'catch' is to make the state
> machine a bit more restrictive. The only valid transition to LIVE would
> be from CONNECTING. We can do this because even the PCI driver is doing
> all the state transitions NEW -> CONNECTING -> LIVE. It's important that
> we can't enter LIVE from RESETTING to get the patch below working.
> 
> We don't have to rely on another variable to figure in which state the
> ctrl is. The nvme_fc_ctrl_connectivity_loss callback needs always to
> trigger a reset. If the ctrl is not in LIVE it is a no-op. This makes it
> possible to remove the lock around the ASSOC_FAILED and the state read
> operation.
> 
> In nvme_fc_create_association we only have to check if we can enter the
> LIVE state (thus we were in CONNECTING) and if this fails we entered the
> RESETTING state and should return an error.
> 
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 818d4e49aab5..f028913e2e62 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -564,8 +564,6 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>  	switch (new_state) {
>  	case NVME_CTRL_LIVE:
>  		switch (old_state) {
> -		case NVME_CTRL_NEW:
> -		case NVME_CTRL_RESETTING:
>  		case NVME_CTRL_CONNECTING:
>  			changed = true;
>  			fallthrough;
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index f4f1866fbd5b..e740814fd1ea 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -781,61 +781,12 @@ nvme_fc_abort_lsops(struct nvme_fc_rport *rport)
>  static void
>  nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
>  {
> -	enum nvme_ctrl_state state;
> -	unsigned long flags;
> -
>  	dev_info(ctrl->ctrl.device,
>  		"NVME-FC{%d}: controller connectivity lost. Awaiting "
>  		"Reconnect", ctrl->cnum);
> 
> -	spin_lock_irqsave(&ctrl->lock, flags);
>  	set_bit(ASSOC_FAILED, &ctrl->flags);
> -	state = nvme_ctrl_state(&ctrl->ctrl);
> -	spin_unlock_irqrestore(&ctrl->lock, flags);
> -
> -	switch (state) {
> -	case NVME_CTRL_NEW:
> -	case NVME_CTRL_LIVE:
> -		/*
> -		 * Schedule a controller reset. The reset will terminate the
> -		 * association and schedule the reconnect timer.  Reconnects
> -		 * will be attempted until either the ctlr_loss_tmo
> -		 * (max_retries * connect_delay) expires or the remoteport's
> -		 * dev_loss_tmo expires.
> -		 */
> -		if (nvme_reset_ctrl(&ctrl->ctrl)) {
> -			dev_warn(ctrl->ctrl.device,
> -				"NVME-FC{%d}: Couldn't schedule reset.\n",
> -				ctrl->cnum);
> -			nvme_delete_ctrl(&ctrl->ctrl);
> -		}
> -		break;
> -
> -	case NVME_CTRL_CONNECTING:
> -		/*
> -		 * The association has already been terminated and the
> -		 * controller is attempting reconnects.  No need to do anything
> -		 * futher.  Reconnects will be attempted until either the
> -		 * ctlr_loss_tmo (max_retries * connect_delay) expires or the
> -		 * remoteport's dev_loss_tmo expires.
> -		 */
> -		break;
> -
> -	case NVME_CTRL_RESETTING:
> -		/*
> -		 * Controller is already in the process of terminating the
> -		 * association.  No need to do anything further. The reconnect
> -		 * step will kick in naturally after the association is
> -		 * terminated.
> -		 */
> -		break;
> -
> -	case NVME_CTRL_DELETING:
> -	case NVME_CTRL_DELETING_NOIO:
> -	default:
> -		/* no action to take - let it delete */
> -		break;
> -	}
> +	nvme_reset_ctrl(&ctrl->ctrl);
>  }
> 
>  /**
> @@ -3177,23 +3128,18 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
>  		else
>  			ret = nvme_fc_recreate_io_queues(ctrl);
>  	}
> +	if (!ret && test_bit(ASSOC_FAILED, &ctrl->flags))
> +		ret = -EIO;
>  	if (ret)
>  		goto out_term_aen_ops;
> 
> -	spin_lock_irqsave(&ctrl->lock, flags);
> -	if (!test_bit(ASSOC_FAILED, &ctrl->flags))
> -		changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
> -	else
> +	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE)) {
>  		ret = -EIO;
> -	spin_unlock_irqrestore(&ctrl->lock, flags);
> -
> -	if (ret)
>  		goto out_term_aen_ops;
> +	}
> 
>  	ctrl->ctrl.nr_reconnects = 0;
> -
> -	if (changed)
> -		nvme_start_ctrl(&ctrl->ctrl);
> +	nvme_start_ctrl(&ctrl->ctrl);
> 
>  	return 0;	/* Success */

Thanks Daniel. I applied the patch on top of v6.14-rc2, and it avoided the
blktests failures. I ran the nvme test group with other transports (loop, rdma
and tcp), and observed no regression. It looks good from test run point of view.

Of note is that I observed a compiler warning at kenerl build:

drivers/nvme/host/fc.c: In function ‘nvme_fc_create_association’:
drivers/nvme/host/fc.c:3025:14: warning: unused variable ‘changed’ [-Wunused-variable]
 3025 |         bool changed;
      |              ^~~~~~~

Re: [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting

Posted by Hannes Reinecke 1 year ago

On 1/9/25 14:30, Daniel Wagner wrote:
> When a connectivity loss occurs while nvme_fc_create_assocation is
> being executed, it's possible that the ctrl ends up stuck in the LIVE
> state:
> 
>    1) nvme nvme10: NVME-FC{10}: create association : ...
>    2) nvme nvme10: NVME-FC{10}: controller connectivity lost.
>                    Awaiting Reconnect
>       nvme nvme10: queue_size 128 > ctrl maxcmd 32, reducing to maxcmd
>    3) nvme nvme10: Could not set queue count (880)
>       nvme nvme10: Failed to configure AEN (cfg 900)
>    4) nvme nvme10: NVME-FC{10}: controller connect complete
>    5) nvme nvme10: failed nvme_keep_alive_end_io error=4
> 
> A connection attempt starts 1) and the ctrl is in state CONNECTING.
> Shortly after the LLDD driver detects a connection lost event and calls
> nvme_fc_ctrl_connectivity_loss 2). Because we are still in CONNECTING
> state, this event is ignored.
> 
> nvme_fc_create_association continues to run in parallel and tries to
> communicate with the controller and these commands will fail. Though
> these errors are filtered out, e.g in 3) setting the I/O queues numbers
> fails which leads to an early exit in nvme_fc_create_io_queues. Because
> the number of IO queues is 0 at this point, there is nothing left in
> nvme_fc_create_association which could detected the connection drop.
> Thus the ctrl enters LIVE state 4).
> 
> Eventually the keep alive handler times out 5) but because nothing is
> being done, the ctrl stays in LIVE state.
> 
> There is already the ASSOC_FAILED flag to track connectivity loss event
> but this bit is set too late in the recovery code path. Move this into
> the connectivity loss event handler and synchronize it with the state
> change. This ensures that the ASSOC_FAILED flag is seen by
> nvme_fc_create_io_queues and it does not enter the LIVE state after a
> connectivity loss event. If the connectivity loss event happens after we
> entered the LIVE state the normal error recovery path is executed.
> 
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> ---
>   drivers/nvme/host/fc.c | 23 ++++++++++++++++++-----
>   1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index 7409da42b9ee580cdd6fe78c0f93e78c4ad08675..55884d3df6f291cfddb4742e135b54a72f1cfa05 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -781,11 +781,19 @@ nvme_fc_abort_lsops(struct nvme_fc_rport *rport)
>   static void
>   nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
>   {
> +	enum nvme_ctrl_state state;
> +	unsigned long flags;
> +
>   	dev_info(ctrl->ctrl.device,
>   		"NVME-FC{%d}: controller connectivity lost. Awaiting "
>   		"Reconnect", ctrl->cnum);
>   
> -	switch (nvme_ctrl_state(&ctrl->ctrl)) {
> +	spin_lock_irqsave(&ctrl->lock, flags);
> +	set_bit(ASSOC_FAILED, &ctrl->flags);
> +	state = nvme_ctrl_state(&ctrl->ctrl);
> +	spin_unlock_irqrestore(&ctrl->lock, flags);
> +
> +	switch (state) {
>   	case NVME_CTRL_NEW:
>   	case NVME_CTRL_LIVE:
>   		/*
> @@ -2542,7 +2550,6 @@ nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg)
>   	 */
>   	if (ctrl->ctrl.state == NVME_CTRL_CONNECTING) {
>   		__nvme_fc_abort_outstanding_ios(ctrl, true);
> -		set_bit(ASSOC_FAILED, &ctrl->flags);
>   		dev_warn(ctrl->ctrl.device,
>   			"NVME-FC{%d}: transport error during (re)connect\n",
>   			ctrl->cnum);
> @@ -3167,12 +3174,18 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
>   		else
>   			ret = nvme_fc_recreate_io_queues(ctrl);
>   	}
> -	if (!ret && test_bit(ASSOC_FAILED, &ctrl->flags))
> -		ret = -EIO;
>   	if (ret)
>   		goto out_term_aen_ops;
>   
> -	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
> +	spin_lock_irqsave(&ctrl->lock, flags);
> +	if (!test_bit(ASSOC_FAILED, &ctrl->flags))
> +		changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
> +	else
> +		ret = -EIO;
> +	spin_unlock_irqrestore(&ctrl->lock, flags);
> +
> +	if (ret)
> +		goto out_term_aen_ops;
>   
>   	ctrl->ctrl.nr_reconnects = 0;
>   
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

Re: [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting

Posted by Sagi Grimberg 1 year ago

Looks better to me, but FC folks should probably review this one

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

[PATCH v4 1/3] nvme-fc: go straight to connecting state when initializing
[PATCH v4 2/3] nvme: handle connectivity loss in nvme_set_queue_count
[PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting