The TCP and RDMA transport are doing a synchronous connect, meaning the
syscal returns with the final result, that is. it either failed or
succeeded.
This isn't the case for FC. This transport just setups and triggers
the connect and returns without waiting on the result. Introduce a flag
to allow user space to control the behavior, wait or don't wait.
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
drivers/nvme/host/fabrics.c | 6 +++++-
drivers/nvme/host/fabrics.h | 3 +++
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 3499acbf6a82..7d33f0f5824f 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -678,6 +678,7 @@ static const match_table_t opt_tokens = {
#ifdef CONFIG_NVME_TCP_TLS
{ NVMF_OPT_TLS, "tls" },
#endif
+ { NVMF_OPT_CONNECT_SYNC, "connect_sync" },
{ NVMF_OPT_ERR, NULL }
};
@@ -1024,6 +1025,9 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
}
opts->tls = true;
break;
+ case NVMF_OPT_CONNECT_SYNC:
+ opts->connect_sync = true;
+ break;
default:
pr_warn("unknown parameter or missing value '%s' in ctrl creation request\n",
p);
@@ -1245,7 +1249,7 @@ EXPORT_SYMBOL_GPL(nvmf_free_options);
NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
NVMF_OPT_DISABLE_SQFLOW | NVMF_OPT_DISCOVERY |\
NVMF_OPT_FAIL_FAST_TMO | NVMF_OPT_DHCHAP_SECRET |\
- NVMF_OPT_DHCHAP_CTRL_SECRET)
+ NVMF_OPT_DHCHAP_CTRL_SECRET | NVMF_OPT_CONNECT_SYNC)
static struct nvme_ctrl *
nvmf_create_ctrl(struct device *dev, const char *buf)
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 06cc54851b1b..01d3ef545f14 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -73,6 +73,7 @@ enum {
NVMF_OPT_TLS = 1 << 25,
NVMF_OPT_KEYRING = 1 << 26,
NVMF_OPT_TLS_KEY = 1 << 27,
+ NVMF_OPT_CONNECT_SYNC = 1 << 28,
};
/**
@@ -115,6 +116,7 @@ enum {
* @nr_poll_queues: number of queues for polling I/O
* @tos: type of service
* @fast_io_fail_tmo: Fast I/O fail timeout in seconds
+ * @connect_sync: wait for connect attempt(s) to succeed or fail
*/
struct nvmf_ctrl_options {
unsigned mask;
@@ -144,6 +146,7 @@ struct nvmf_ctrl_options {
unsigned int nr_poll_queues;
int tos;
int fast_io_fail_tmo;
+ bool connect_sync;
};
/*
--
2.43.0
On Fri, Feb 16, 2024 at 09:45:21AM +0100, Daniel Wagner wrote: > The TCP and RDMA transport are doing a synchronous connect, meaning the > syscal returns with the final result, that is. it either failed or > succeeded. > > This isn't the case for FC. This transport just setups and triggers > the connect and returns without waiting on the result. That's really weird and unexpected. James, can you explain the reason behind this? > Introduce a flag > to allow user space to control the behavior, wait or don't wait. I'd expect this to be the default, but I'll wait to hear more about the rationale. If we keep the async default the option looks sensible.
On 2/16/24 10:49, Christoph Hellwig wrote: > On Fri, Feb 16, 2024 at 09:45:21AM +0100, Daniel Wagner wrote: >> The TCP and RDMA transport are doing a synchronous connect, meaning the >> syscal returns with the final result, that is. it either failed or >> succeeded. >> >> This isn't the case for FC. This transport just setups and triggers >> the connect and returns without waiting on the result. > > That's really weird and unexpected. James, can you explain the reason > behind this? > Reason is that the initial connect attempt might fail with an temporary failure, and will need to be retried. And rather than implementing two methods for handling this (one for the initial connect, and another one for reconnect where one _has_ to use a workqueue) as eg TCP and RDMA has implemented it FC is using a single code path for handling both. Temporary failure on initial connect is far more likely on FC than on other transports due to the way how FC-NVMe is modelled; essentially one has to log into the remote port for each protocol. So if you run in a dual fabric (with both FCP and NVMe) you'll need to log into the same remote port twice. Depending on the implementation the target might only be capable of handling one port login at the same time, so the other one will be failed with a temporary error. That's why it's a common issue with FC. It _might_ happen with TCP, too, but apparently not regularly otherwise we would have seen quite some failures here; TCP can't really handle temporary failures for the initial connect. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
On Fri, Feb 16, 2024 at 10:49:09AM +0100, Christoph Hellwig wrote: > On Fri, Feb 16, 2024 at 09:45:21AM +0100, Daniel Wagner wrote: > > The TCP and RDMA transport are doing a synchronous connect, meaning the > > syscal returns with the final result, that is. it either failed or > > succeeded. > > > > This isn't the case for FC. This transport just setups and triggers > > the connect and returns without waiting on the result. > > That's really weird and unexpected. James, can you explain the reason > behind this? James answered this point on my attempt to make this synchronous: https://lore.kernel.org/linux-nvme/0605ac36-16d5-2026-d3c6-62d346db6dfb@gmail.com/ > > Introduce a flag > > to allow user space to control the behavior, wait or don't wait. > > I'd expect this to be the default, but I'll wait to hear more about > the rationale. If we keep the async default the option looks sensible. Ideally, we could agree on behavior which is the same for all transports.
On Fri, Feb 16, 2024 at 05:44:02PM +0100, Daniel Wagner wrote: > James answered this point on my attempt to make this synchronous: > > https://lore.kernel.org/linux-nvme/0605ac36-16d5-2026-d3c6-62d346db6dfb@gmail.com/ That needs to go into the commit log. And I call complete BS on that to be honest. > Ideally, we could agree on behavior which is the same for all > transports. sync and async opt in it is. I'm still pissed FC did this differently without any proper discussion of the tradeoffs.
© 2016 - 2026 Red Hat, Inc.