drivers/nvme/host/core.c | 3 +++ 1 file changed, 3 insertions(+)
From: Yu Kuai <yukuai3@huawei.com>
It's found nvme mpath IO inflight counter can be decreased to negtive by
following stack:
CPU: 12 UID: 0 PID: 466 Comm: kworker/12:1H Tainted: G
6.16.0-rc3.yu+ #2 PREEMPT(voluntary)
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:bdev_end_io_acct+0x494/0x5c0
Call Trace:
<TASK>
nvme_end_req+0x4d/0x70 [nvme_core]
nvme_failover_req+0x3bd/0x530 [nvme_core]
nvme_fail_nonready_command+0x12c/0x170 [nvme_core]
nvme_fc_queue_rq+0x463/0x720 [nvme_fc]
blk_mq_dispatch_rq_list+0x358/0x1260
__blk_mq_sched_dispatch_requests+0x2dd/0x480
blk_mq_sched_dispatch_requests+0xa6/0x140
blk_mq_run_work_fn+0x1bb/0x2a0
process_one_work+0x8ca/0x1950
worker_thread+0x58d/0xcf0
kthread+0x3d5/0x7a0
ret_from_fork+0x403/0x510
ret_from_fork_asm+0x1a/0x30
</TASK>
The IO inflight counter is not increased from nvme_fail_nonready_command()
yet, hence decrease it will cause it to be negative.
This is not a problem for blk-mq request because it's already
initialized before issuing, however, nvme request is only initialized from
following nvme_setup_cmd(). Fix the problem by clearing it in
nvme_fail_nonready_command().
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Closes: https://lore.kernel.org/all/CAHj4cs_+dauobyYyP805t33WMJVzOWj=7+51p4_j9rA63D9sog@mail.gmail.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/nvme/host/core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 92697f98c601..8caafa25c010 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -764,6 +764,9 @@ blk_status_t nvme_fail_nonready_command(struct nvme_ctrl *ctrl,
!test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags) &&
!blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
return BLK_STS_RESOURCE;
+
+ if (!(rq->rq_flags & RQF_DONTPREP))
+ nvme_clear_nvme_request(rq);
return nvme_host_path_error(rq);
}
EXPORT_SYMBOL_GPL(nvme_fail_nonready_command);
--
2.39.2
Are you going to resend this with the feedback from Sagi taken into account?
Hi, 在 2025/07/14 21:42, Christoph Hellwig 写道: > Are you going to resend this with the feedback from Sagi taken into > account? > Sure, sorry that I totally forgot about this patch. Thanks, Kuai
First, we need change the patch title to clarify that it fixes a bug. i.e. something like: nvme: fix nvme-mpath misaccounting of inflight active IO Second, we need to add a fixes tag (i.e. addition of nvme-mpath nr_active accounting) Third, we need a code-comment that explains this subtlety because it is not trivial. On 28/06/2025 9:46, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > It's found nvme mpath IO inflight counter can be decreased to negtive by > following stack: > > CPU: 12 UID: 0 PID: 466 Comm: kworker/12:1H Tainted: G > 6.16.0-rc3.yu+ #2 PREEMPT(voluntary) > Workqueue: kblockd blk_mq_run_work_fn > RIP: 0010:bdev_end_io_acct+0x494/0x5c0 > Call Trace: > <TASK> > nvme_end_req+0x4d/0x70 [nvme_core] > nvme_failover_req+0x3bd/0x530 [nvme_core] > nvme_fail_nonready_command+0x12c/0x170 [nvme_core] > nvme_fc_queue_rq+0x463/0x720 [nvme_fc] > blk_mq_dispatch_rq_list+0x358/0x1260 > __blk_mq_sched_dispatch_requests+0x2dd/0x480 > blk_mq_sched_dispatch_requests+0xa6/0x140 > blk_mq_run_work_fn+0x1bb/0x2a0 > process_one_work+0x8ca/0x1950 > worker_thread+0x58d/0xcf0 > kthread+0x3d5/0x7a0 > ret_from_fork+0x403/0x510 > ret_from_fork_asm+0x1a/0x30 > </TASK> > > The IO inflight counter is not increased from nvme_fail_nonready_command() > yet, hence decrease it will cause it to be negative. > > This is not a problem for blk-mq request because it's already > initialized before issuing, however, nvme request is only initialized from > following nvme_setup_cmd(). Fix the problem by clearing it in > nvme_fail_nonready_command(). > > Reported-by: Yi Zhang <yi.zhang@redhat.com> > Closes: https://lore.kernel.org/all/CAHj4cs_+dauobyYyP805t33WMJVzOWj=7+51p4_j9rA63D9sog@mail.gmail.com/ > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > drivers/nvme/host/core.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 92697f98c601..8caafa25c010 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -764,6 +764,9 @@ blk_status_t nvme_fail_nonready_command(struct nvme_ctrl *ctrl, > !test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags) && > !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH)) > return BLK_STS_RESOURCE; > + > + if (!(rq->rq_flags & RQF_DONTPREP)) > + nvme_clear_nvme_request(rq); > return nvme_host_path_error(rq); > } > EXPORT_SYMBOL_GPL(nvme_fail_nonready_command);
© 2016 - 2025 Red Hat, Inc.