[v0] nvme-fc: fix blktests nvme/041

[PATCH v0 1/6] nvme-fabrics: introduce connect_sync option

Posted by Daniel Wagner 1 year, 11 months ago

The TCP and RDMA transport are doing a synchronous connect, meaning the
syscal returns with the final result, that is. it either failed or
succeeded.

This isn't the case for FC. This transport just setups and triggers
the connect and returns without waiting on the result. Introduce a flag
to allow user space to control the behavior, wait or don't wait.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/host/fabrics.c | 6 +++++-
 drivers/nvme/host/fabrics.h | 3 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 3499acbf6a82..7d33f0f5824f 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -678,6 +678,7 @@ static const match_table_t opt_tokens = {
 #ifdef CONFIG_NVME_TCP_TLS
 	{ NVMF_OPT_TLS,			"tls"			},
 #endif
+	{ NVMF_OPT_CONNECT_SYNC,	"connect_sync"		},
 	{ NVMF_OPT_ERR,			NULL			}
 };
 
@@ -1024,6 +1025,9 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 			}
 			opts->tls = true;
 			break;
+		case NVMF_OPT_CONNECT_SYNC:
+			opts->connect_sync = true;
+			break;
 		default:
 			pr_warn("unknown parameter or missing value '%s' in ctrl creation request\n",
 				p);
@@ -1245,7 +1249,7 @@ EXPORT_SYMBOL_GPL(nvmf_free_options);
 				 NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
 				 NVMF_OPT_DISABLE_SQFLOW | NVMF_OPT_DISCOVERY |\
 				 NVMF_OPT_FAIL_FAST_TMO | NVMF_OPT_DHCHAP_SECRET |\
-				 NVMF_OPT_DHCHAP_CTRL_SECRET)
+				 NVMF_OPT_DHCHAP_CTRL_SECRET | NVMF_OPT_CONNECT_SYNC)
 
 static struct nvme_ctrl *
 nvmf_create_ctrl(struct device *dev, const char *buf)
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 06cc54851b1b..01d3ef545f14 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -73,6 +73,7 @@ enum {
 	NVMF_OPT_TLS		= 1 << 25,
 	NVMF_OPT_KEYRING	= 1 << 26,
 	NVMF_OPT_TLS_KEY	= 1 << 27,
+	NVMF_OPT_CONNECT_SYNC	= 1 << 28,
 };
 
 /**
@@ -115,6 +116,7 @@ enum {
  * @nr_poll_queues: number of queues for polling I/O
  * @tos: type of service
  * @fast_io_fail_tmo: Fast I/O fail timeout in seconds
+ * @connect_sync: wait for connect attempt(s) to succeed or fail
  */
 struct nvmf_ctrl_options {
 	unsigned		mask;
@@ -144,6 +146,7 @@ struct nvmf_ctrl_options {
 	unsigned int		nr_poll_queues;
 	int			tos;
 	int			fast_io_fail_tmo;
+	bool			connect_sync;
 };
 
 /*
-- 
2.43.0

Re: [PATCH v0 1/6] nvme-fabrics: introduce connect_sync option

Posted by Christoph Hellwig 1 year, 11 months ago

On Fri, Feb 16, 2024 at 09:45:21AM +0100, Daniel Wagner wrote:
> The TCP and RDMA transport are doing a synchronous connect, meaning the
> syscal returns with the final result, that is. it either failed or
> succeeded.
> 
> This isn't the case for FC. This transport just setups and triggers
> the connect and returns without waiting on the result.

That's really weird and unexpected.  James, can you explain the reason
behind this?

> Introduce a flag
> to allow user space to control the behavior, wait or don't wait.

I'd expect this to be the default, but I'll wait to hear more about
the rationale.  If we keep the async default the option looks sensible.

Re: [PATCH v0 1/6] nvme-fabrics: introduce connect_sync option

Posted by Hannes Reinecke 1 year, 11 months ago

On 2/16/24 10:49, Christoph Hellwig wrote:
> On Fri, Feb 16, 2024 at 09:45:21AM +0100, Daniel Wagner wrote:
>> The TCP and RDMA transport are doing a synchronous connect, meaning the
>> syscal returns with the final result, that is. it either failed or
>> succeeded.
>>
>> This isn't the case for FC. This transport just setups and triggers
>> the connect and returns without waiting on the result.
> 
> That's really weird and unexpected.  James, can you explain the reason
> behind this?
> 
Reason is that the initial connect attempt might fail with an temporary 
failure, and will need to be retried. And rather than implementing two 
methods for handling this (one for the initial connect, and another one
for reconnect where one _has_ to use a workqueue) as eg TCP and RDMA
has implemented it FC is using a single code path for handling both.

Temporary failure on initial connect is far more likely on FC than on
other transports due to the way how FC-NVMe is modelled; essentially
one has to log into the remote port for each protocol. So if you run
in a dual fabric (with both FCP and NVMe) you'll need to log into the
same remote port twice. Depending on the implementation the target might 
only be capable of handling one port login at the same time, so the
other one will be failed with a temporary error.
That's why it's a common issue with FC. It _might_ happen with TCP, too,
but apparently not regularly otherwise we would have seen quite some
failures here; TCP can't really handle temporary failures for the
initial connect.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

Re: [PATCH v0 1/6] nvme-fabrics: introduce connect_sync option

Posted by Daniel Wagner 1 year, 11 months ago

On Fri, Feb 16, 2024 at 10:49:09AM +0100, Christoph Hellwig wrote:
> On Fri, Feb 16, 2024 at 09:45:21AM +0100, Daniel Wagner wrote:
> > The TCP and RDMA transport are doing a synchronous connect, meaning the
> > syscal returns with the final result, that is. it either failed or
> > succeeded.
> > 
> > This isn't the case for FC. This transport just setups and triggers
> > the connect and returns without waiting on the result.
> 
> That's really weird and unexpected.  James, can you explain the reason
> behind this?

James answered this point on my attempt to make this synchronous:

https://lore.kernel.org/linux-nvme/0605ac36-16d5-2026-d3c6-62d346db6dfb@gmail.com/

> > Introduce a flag
> > to allow user space to control the behavior, wait or don't wait.
> 
> I'd expect this to be the default, but I'll wait to hear more about
> the rationale.  If we keep the async default the option looks sensible.

Ideally, we could agree on behavior which is the same for all
transports.

Re: [PATCH v0 1/6] nvme-fabrics: introduce connect_sync option

Posted by Christoph Hellwig 1 year, 11 months ago

On Fri, Feb 16, 2024 at 05:44:02PM +0100, Daniel Wagner wrote:
> James answered this point on my attempt to make this synchronous:
> 
> https://lore.kernel.org/linux-nvme/0605ac36-16d5-2026-d3c6-62d346db6dfb@gmail.com/

That needs to go into the commit log.  And I call complete BS on that
to be honest.

> Ideally, we could agree on behavior which is the same for all
> transports.

sync and async opt in it is.  I'm still pissed FC did this differently
without any proper discussion of the tradeoffs.

[PATCH v0 1/6] nvme-fabrics: introduce connect_sync option
[PATCH v0 2/6] nvme-fc: rename free_ctrl callback to match name pattern
[PATCH v0 3/6] nvme-fc: do not retry when auth fails or connection is refused
[PATCH v0 4/6] nvme-fabrics: introduce ref counting for nvmf_ctrl_options
[PATCH v0 5/6] nvme-fc: redesign locking and refcounting
[PATCH v0 6/6] nvme-fc: wait for connect attempt to finish