drivers/nvme/host/core.c | 4 ++-- drivers/nvme/host/fabrics.h | 10 ++++------ 2 files changed, 6 insertions(+), 8 deletions(-)
From: Chunguang Xu <chunguang.xu@shopee.com>
We found a issue on production environment while using NVMe over RDMA,
admin_q reconnect failed forever while remote target and network is ok.
After dig into it, we found it may caused by a ABBA deadlock due to tag
allocation. In my case, the tag was hold by a keep alive request
waiting inside admin_q, as we quiesced admin_q while reset ctrl, so the
request maked as idle and will not process before reset success. As
fabric_q shares tagset with admin_q, while reconnect remote target, we
need a tag for connect command, but the only one reserved tag was held
by keep alive command which waiting inside admin_q. As a result, we
failed to reconnect admin_q forever. In order to fix this issue, I
think we should keep two reserved tags for admin queue.
Fixes: ed01fee283a0 ("nvme-fabrics: only reserve a single tag")
Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
---
v3: rearrange commit log, no code change
v2: keep two reserved tags for admin_q instead of drop keep alive request
drivers/nvme/host/core.c | 4 ++--
drivers/nvme/host/fabrics.h | 10 ++++------
2 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0a96362912ce..3d394a075d20 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4359,7 +4359,7 @@ int nvme_alloc_admin_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
set->ops = ops;
set->queue_depth = NVME_AQ_MQ_TAG_DEPTH;
if (ctrl->ops->flags & NVME_F_FABRICS)
- set->reserved_tags = NVMF_RESERVED_TAGS;
+ set->reserved_tags = NVMF_ADMIN_RESERVED_TAGS;
set->numa_node = ctrl->numa_node;
set->flags = BLK_MQ_F_NO_SCHED;
if (ctrl->ops->flags & NVME_F_BLOCKING)
@@ -4428,7 +4428,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
if (ctrl->quirks & NVME_QUIRK_SHARED_TAGS)
set->reserved_tags = NVME_AQ_DEPTH;
else if (ctrl->ops->flags & NVME_F_FABRICS)
- set->reserved_tags = NVMF_RESERVED_TAGS;
+ set->reserved_tags = NVMF_IO_RESERVED_TAGS;
set->numa_node = ctrl->numa_node;
set->flags = BLK_MQ_F_SHOULD_MERGE;
if (ctrl->ops->flags & NVME_F_BLOCKING)
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 06cc54851b1b..a4def76a182d 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -18,12 +18,10 @@
/* default is -1: the fail fast mechanism is disabled */
#define NVMF_DEF_FAIL_FAST_TMO -1
-/*
- * Reserved one command for internal usage. This command is used for sending
- * the connect command, as well as for the keep alive command on the admin
- * queue once live.
- */
-#define NVMF_RESERVED_TAGS 1
+/* Reserved for connect */
+#define NVMF_IO_RESERVED_TAGS 1
+/* Reserved for connect and keep alive */
+#define NVMF_ADMIN_RESERVED_TAGS 2
/*
* Define a host as seen by the target. We allocate one at boot, but also
--
2.25.1
Thanks, applied to nvme-6.9.
The fix looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> But given that's we consolidate to a single place each for setting up the tagsets for admin and I/O queues, what about just killing the symbolic name and moving the assignments and comments directly into the only users?
Christoph Hellwig <hch@lst.de> 于2024年3月8日周五 22:42写道: > > The fix looks good: > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > But given that's we consolidate to a single place each for setting > up the tagsets for admin and I/O queues, what about just killing > the symbolic name and moving the assignments and comments directly > into the only users? This works now, but I donot know whether nvme_alloc_admin_tag_set()/nvme_alloc_io_tag_set() will be suitable for all driver in future, such as driver for apple device not use these two funcs to init tagset (anyway it not use these two macros too), so maybe new driver would use these value in other position. >
On Sat, Mar 09, 2024 at 12:29:27AM +0800, 许春光 wrote: > This works now, but I donot know whether > nvme_alloc_admin_tag_set()/nvme_alloc_io_tag_set() > will be suitable for all driver in future, such as driver for apple > device not use these two funcs > to init tagset (anyway it not use these two macros too), so maybe new > driver would use these > value in other position. nvme-apply should realy be converted to use the generic helpers, I just need some help from the maintainers. I'll ping them. But I'm fine with just taking this bug fix as-is and clean this up later.
Christoph Hellwig <hch@lst.de> 于2024年3月9日周六 00:31写道: > > On Sat, Mar 09, 2024 at 12:29:27AM +0800, 许春光 wrote: > > This works now, but I donot know whether > > nvme_alloc_admin_tag_set()/nvme_alloc_io_tag_set() > > will be suitable for all driver in future, such as driver for apple > > device not use these two funcs > > to init tagset (anyway it not use these two macros too), so maybe new > > driver would use these > > value in other position. > > nvme-apply should realy be converted to use the generic helpers, > I just need some help from the maintainers. I'll ping them. noted, that sounds great. > > But I'm fine with just taking this bug fix as-is and clean this up > later. Sorry for delay to reply, I found the patch have applied just about 10 minutes ago. According what you plan to do, I think as-is maybe fine, But anyway if need, I will send another patch to cleanup, thanks.
On Sat, Mar 09, 2024 at 12:43:12AM +0800, 许春光 wrote: > Sorry for delay to reply, I found the patch have applied just about 10 > minutes ago. > According what you plan to do, I think as-is maybe fine, But anyway if > need, I will > send another patch to cleanup, thanks. The next pull request will be late next week, so I'll back out the commit if you want to submit something else before then. Or you can just submit an incremental improvement against the current tree, and that's also fine.
Keith Busch <kbusch@kernel.org> 于2024年3月9日周六 00:48写道: > > On Sat, Mar 09, 2024 at 12:43:12AM +0800, 许春光 wrote: > > Sorry for delay to reply, I found the patch have applied just about 10 > > minutes ago. > > According what you plan to do, I think as-is maybe fine, But anyway if > > need, I will > > send another patch to cleanup, thanks. > > The next pull request will be late next week, so I'll back out the > commit if you want to submit something else before then. Or you can just > submit an incremental improvement against the current tree, and that's > also fine. Got, I will send V4 according to the suggestion of Christoph, thanks.
noted, sounds great. Christoph Hellwig <hch@lst.de> 于2024年3月9日周六 00:31写道: > > On Sat, Mar 09, 2024 at 12:29:27AM +0800, 许春光 wrote: > > This works now, but I donot know whether > > nvme_alloc_admin_tag_set()/nvme_alloc_io_tag_set() > > will be suitable for all driver in future, such as driver for apple > > device not use these two funcs > > to init tagset (anyway it not use these two macros too), so maybe new > > driver would use these > > value in other position. > > nvme-apply should realy be converted to use the generic helpers, > I just need some help from the maintainers. I'll ping them. > > But I'm fine with just taking this bug fix as-is and clean this up > later.
On 08/03/2024 16:42, Christoph Hellwig wrote: > The fix looks good: > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > But given that's we consolidate to a single place each for setting > up the tagsets for admin and I/O queues, what about just killing > the symbolic name and moving the assignments and comments directly > into the only users? > That works too
© 2016 - 2026 Red Hat, Inc.