[PATCH] nvme-fabrics: use reserved tag for reg read/write command

brookxu.cn posted 1 patch 1 year, 7 months ago
There is a newer version of this series
drivers/nvme/host/fabrics.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
[PATCH] nvme-fabrics: use reserved tag for reg read/write command
Posted by brookxu.cn 1 year, 7 months ago
From: Chunguang Xu <chunguang.xu@shopee.com>

In some scenarios, if too many commands are issued by nvme command in
the same time by user tasks, this may exhaust all tags of admin_q. If
a reset (nvme reset or IO timeout) occurs before these commands finish,
reconnect routine may fail to update nvme regs due to insufficient tags,
which will cause kernel hang forever. In order to workaround this issue,
maybe we can let reg_read32()/reg_read64()/reg_write32() use reserved
tags. This maybe safe for nvmf:

1. For the disable ctrl path,  we will not issue connect command
2. For the enable ctrl / fw activate path, since connect and reg_xx()
   are called serially.

So the reserved tags may still be enough while reg_xx() use reserved tags.

Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com>
---
 drivers/nvme/host/fabrics.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 1f0ea1f32d22..f6416f8553f0 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -180,7 +180,7 @@ int nvmf_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
 	cmd.prop_get.offset = cpu_to_le32(off);
 
 	ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res, NULL, 0,
-			NVME_QID_ANY, 0);
+			NVME_QID_ANY, NVME_SUBMIT_RESERVED);
 
 	if (ret >= 0)
 		*val = le64_to_cpu(res.u64);
@@ -226,7 +226,7 @@ int nvmf_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val)
 	cmd.prop_get.offset = cpu_to_le32(off);
 
 	ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res, NULL, 0,
-			NVME_QID_ANY, 0);
+			NVME_QID_ANY, NVME_SUBMIT_RESERVED);
 
 	if (ret >= 0)
 		*val = le64_to_cpu(res.u64);
@@ -271,7 +271,7 @@ int nvmf_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val)
 	cmd.prop_set.value = cpu_to_le64(val);
 
 	ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, NULL, NULL, 0,
-			NVME_QID_ANY, 0);
+			NVME_QID_ANY, NVME_SUBMIT_RESERVED);
 	if (unlikely(ret))
 		dev_err(ctrl->device,
 			"Property Set error: %d, offset %#x\n",
-- 
2.25.1
Re: [PATCH] nvme-fabrics: use reserved tag for reg read/write command
Posted by Chaitanya Kulkarni 1 year, 7 months ago
On 4/29/24 19:17, brookxu.cn wrote:
> From: Chunguang Xu <chunguang.xu@shopee.com>
>
> In some scenarios, if too many commands are issued by nvme command in
> the same time by user tasks, this may exhaust all tags of admin_q. If
> a reset (nvme reset or IO timeout) occurs before these commands finish,
> reconnect routine may fail to update nvme regs due to insufficient tags,
> which will cause kernel hang forever. In order to workaround this issue,
> maybe we can let reg_read32()/reg_read64()/reg_write32() use reserved
> tags. This maybe safe for nvmf:
>
> 1. For the disable ctrl path,  we will not issue connect command
> 2. For the enable ctrl / fw activate path, since connect and reg_xx()
>     are called serially.
>
> So the reserved tags may still be enough while reg_xx() use reserved tags.
>
> Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com>
> ---
>

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck


Re: [PATCH] nvme-fabrics: use reserved tag for reg read/write command
Posted by Sagi Grimberg 1 year, 7 months ago
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Re: [PATCH] nvme-fabrics: use reserved tag for reg read/write command
Posted by Chaitanya Kulkarni 1 year, 7 months ago
On 4/29/2024 7:17 PM, brookxu.cn wrote:
> From: Chunguang Xu <chunguang.xu@shopee.com>
> 
> In some scenarios, if too many commands are issued by nvme command in
> the same time by user tasks, this may exhaust all tags of admin_q. If
> a reset (nvme reset or IO timeout) occurs before these commands finish,
> reconnect routine may fail to update nvme regs due to insufficient tags,
> which will cause kernel hang forever. In order to workaround this issue,
> maybe we can let reg_read32()/reg_read64()/reg_write32() use reserved
> tags. This maybe safe for nvmf:
> 
> 1. For the disable ctrl path,  we will not issue connect command
> 2. For the enable ctrl / fw activate path, since connect and reg_xx()
>     are called serially.
> 

Given the complexity of the scenario described above, is it possible to 
write a script for this scenario that will trigger this and submit to 
blktest ? not that this is a blocker to get this patch reviewed, but 
believe it is needed in long run, WDYT ?

> So the reserved tags may still be enough while reg_xx() use reserved tags.
> 
> Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com>
> ---

-ck


Re: [PATCH] nvme-fabrics: use reserved tag for reg read/write command
Posted by 许春光 1 year, 7 months ago
Chaitanya Kulkarni <chaitanyak@nvidia.com> 于2024年4月30日周二 11:47写道:
>
> On 4/29/2024 7:17 PM, brookxu.cn wrote:
> > From: Chunguang Xu <chunguang.xu@shopee.com>
> >
> > In some scenarios, if too many commands are issued by nvme command in
> > the same time by user tasks, this may exhaust all tags of admin_q. If
> > a reset (nvme reset or IO timeout) occurs before these commands finish,
> > reconnect routine may fail to update nvme regs due to insufficient tags,
> > which will cause kernel hang forever. In order to workaround this issue,
> > maybe we can let reg_read32()/reg_read64()/reg_write32() use reserved
> > tags. This maybe safe for nvmf:
> >
> > 1. For the disable ctrl path,  we will not issue connect command
> > 2. For the enable ctrl / fw activate path, since connect and reg_xx()
> >     are called serially.
> >
>
> Given the complexity of the scenario described above, is it possible to
> write a script for this scenario that will trigger this and submit to
> blktest ? not that this is a blocker to get this patch reviewed, but
> believe it is needed in long run, WDYT ?

Thanks for you reply, I can easily reproduce it in my VMs by following steps:
STEP 1. In order to reduce the complexity of reproduction, I reduce
NVME_AQ_MQ_TAG_DEPTH from 31 to 8

STEP 2. Create a nvme device by NVMe over tcp, such as following command:
nvme connect -t tcp -a 192.168.122.20 -s 4420 -n
nqn.2014-08.org.nvmexpress.mytest

STEP 3. Buind and run the c++ program issues nvme commands as followed:
#include <sys/types.h>
#include <signal.h>
#include <unistd.h>
#include <vector>
#include <set>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>

const int concurrency = 64;
std::set<pid_t> chlds;

int __exit = 0;
void  sigint_proc(int signo)
{
        __exit = 1;
}

int main(int argc, char **argv)
{
        signal(SIGINT, sigint_proc);

        for (auto i  = 0; i < concurrency; i++) {
                auto pid = fork();
                if (!pid) {
                        while (true) {
                                system("nvme list -o json 2>&1 > /dev/null");
                        }
                }

                chlds.insert(pid);
        }

        while (!__exit) {
                if (chlds.empty())
                        break;

                for (auto pid : chlds) {
                        int wstatus, ret;
                        ret = waitpid(pid, &wstatus, WNOWAIT);
                        if (ret > 0) {
                                chlds.erase(pid);
                                break;
                        }
                }
                usleep(1000);
        }

        // exit
        for (auto pid : chlds)
                kill(pid, SIGKILL);

        return 0;
}

STEP 4. Open a new console, running the followed command:
while [ true ]; do nvme reset /dev/nvme0; sleep `echo "$RANDOM%1" | bc`; done

We will reproduce this issue soon.
>
> > So the reserved tags may still be enough while reg_xx() use reserved tags.
> >
> > Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com>
> > ---
>
> -ck
>
>