drivers/nvme/host/nvme.h | 4 ++++ 1 file changed, 4 insertions(+)
From: Liu Song <liusong@linux.alibaba.com>
NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
so when "nr_ctx" is equal to 1, there is no possibility of remote
request, so the corresponding process can be simplified.
Signed-off-by: Liu Song <liusong@linux.alibaba.com>
---
drivers/nvme/host/nvme.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 216acbe..cc21896 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -672,6 +672,10 @@ static inline bool nvme_try_complete_req(struct request *req, __le16 status,
nvme_should_fail(req);
if (unlikely(blk_should_fake_timeout(req->q)))
return true;
+ if (likely(req->mq_hctx->nr_ctx == 1)) {
+ WRITE_ONCE(req->state, MQ_RQ_COMPLETE);
+ return false;
+ }
return blk_mq_complete_request_remote(req);
}
--
1.8.3.1
On 9/17/22 10:40 AM, Liu Song wrote: > From: Liu Song <liusong@linux.alibaba.com> > > NVMe devices usually have a 1:1 mapping between "ctx" and "hctx", > so when "nr_ctx" is equal to 1, there is no possibility of remote > request, so the corresponding process can be simplified. If the worry is the call overhead of blk_mq_complete_request_remote(), why don't we just make that available as an inline instead? That seems vastly superior to providing a random shortcut in a driver to avoid calling it. -- Jens Axboe
On 2022/9/18 00:50, Jens Axboe wrote: > On 9/17/22 10:40 AM, Liu Song wrote: >> From: Liu Song <liusong@linux.alibaba.com> >> >> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx", >> so when "nr_ctx" is equal to 1, there is no possibility of remote >> request, so the corresponding process can be simplified. > If the worry is the call overhead of blk_mq_complete_request_remote(), > why don't we just make that available as an inline instead? That seems > vastly superior to providing a random shortcut in a driver to avoid > calling it. Hi This is what I think about it. If it is an SSD with only one hw queue, remote requests will appear occasionally. As a real multi-queue device, nvme usually does not have remote requests, so we don't need to care about it. So even if "blk_mq_complete_request_remote" is called, there is a high probability that it will return false, and the cost of changing the call to "blk_mq_complete_request_remote" to determine whether "req->mq_hctx->nr_ctx" is 1 is not only a function call, but more judgments in "blk_mq_complete_request_remote". If "blk_mq_complete_request_remote" is decorated as inline, it also depends on the compiler, there is uncertainty. Thanks
On 9/18/22 10:10 AM, Liu Song wrote: > > On 2022/9/18 00:50, Jens Axboe wrote: >> On 9/17/22 10:40 AM, Liu Song wrote: >>> From: Liu Song <liusong@linux.alibaba.com> >>> >>> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx", >>> so when "nr_ctx" is equal to 1, there is no possibility of remote >>> request, so the corresponding process can be simplified. >> If the worry is the call overhead of blk_mq_complete_request_remote(), >> why don't we just make that available as an inline instead? That seems >> vastly superior to providing a random shortcut in a driver to avoid >> calling it. > > Hi > > This is what I think about it. If it is an SSD with only one hw queue, > remote requests will appear occasionally. As a real multi-queue > device, nvme usually does not have remote requests, so we don't need > to care about it. So even if "blk_mq_complete_request_remote" is > called, there is a high probability that it will return false, and the > cost of changing the call to "blk_mq_complete_request_remote" to > determine whether "req->mq_hctx->nr_ctx" is 1 is not only a function > call, but more judgments in "blk_mq_complete_request_remote". If > "blk_mq_complete_request_remote" is decorated as inline, it also > depends on the compiler, there is uncertainty. I'm not disagreeing with any of that, my point is just that you're hacking around this in the nvme driver. This is problematic whenever core changes happen, because now we have to touch individual drivers. While the expectation is that there are no remote IPI completions for NVMe, queue starved devices do exist and those do see remote completions. This optimization belongs in the blk-mq core, not in nvme. I do think it makes sense, you just need to solve it in blk-mq rather than in the nvme driver. -- Jens Axboe
On Mon, Sep 19, 2022 at 08:10:31AM -0600, Jens Axboe wrote: > I'm not disagreeing with any of that, my point is just that you're > hacking around this in the nvme driver. This is problematic whenever > core changes happen, because now we have to touch individual drivers. > While the expectation is that there are no remote IPI completions for > NVMe, queue starved devices do exist and those do see remote > completions. > > This optimization belongs in the blk-mq core, not in nvme. I do think it > makes sense, you just need to solve it in blk-mq rather than in the nvme > driver. I'd also really see solid numbers to justify it. And btw, having more than one core per queue is quite common in nvme. Even many enterprise SSDs only have 64 queues, and some of the low-end consumers ones have very low number of queues that are not enough for the number of cores in even mid-end desktop CPUs.
On 2022/9/19 22:35, Christoph Hellwig wrote: > On Mon, Sep 19, 2022 at 08:10:31AM -0600, Jens Axboe wrote: >> I'm not disagreeing with any of that, my point is just that you're >> hacking around this in the nvme driver. This is problematic whenever >> core changes happen, because now we have to touch individual drivers. >> While the expectation is that there are no remote IPI completions for >> NVMe, queue starved devices do exist and those do see remote >> completions. >> >> This optimization belongs in the blk-mq core, not in nvme. I do think it >> makes sense, you just need to solve it in blk-mq rather than in the nvme >> driver. > I'd also really see solid numbers to justify it. > > And btw, having more than one core per queue is quite common in > nvme. Even many enterprise SSDs only have 64 queues, and some of > the low-end consumers ones have very low number of queues that are > not enough for the number of cores in even mid-end desktop CPUs. Hi Thank you for your suggestion. Here is what I think about it. NVMe devices that can support one core per queue are usually high-performance, so I prefer to make more optimizations for high-performance devices, while for ordinary consumption class NVMe devices, such modifications will not bring special additional overhead. Thanks
From: Liu Song <liusong@linux.alibaba.com>
High-performance NVMe devices usually support a large hw queue, which
ensures a 1:1 mapping of hctx and ctx. In this case there will be no
remote request, so we don't need to care about it.
Signed-off-by: Liu Song <liusong@linux.alibaba.com>
---
block/blk-mq.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c11949d..9626ea1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1093,10 +1093,12 @@ bool blk_mq_complete_request_remote(struct request *rq)
WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
/*
- * For a polled request, always complete locally, it's pointless
- * to redirect the completion.
+ * For request which hctx has only one ctx mapping,
+ * or a polled request, always complete locally,
+ * it's pointless to redirect the completion.
*/
- if (rq->cmd_flags & REQ_POLLED)
+ if (rq->mq_hctx->nr_ctx == 1 ||
+ rq->cmd_flags & REQ_POLLED)
return false;
if (blk_mq_complete_need_ipi(rq)) {
--
1.8.3.1
On Wed, 21 Sep 2022 11:32:03 +0800, Liu Song wrote:
> From: Liu Song <liusong@linux.alibaba.com>
>
> High-performance NVMe devices usually support a large hw queue, which
> ensures a 1:1 mapping of hctx and ctx. In this case there will be no
> remote request, so we don't need to care about it.
>
>
> [...]
Applied, thanks!
[1/1] blk-mq: hctx has only one ctx mapping is no need to redirect the completion
commit: f168420c62e7a106961f2489a89f6ade84fe3f27
Best regards,
--
Jens Axboe
On Wed, Sep 21, 2022 at 11:32:03AM +0800, Liu Song wrote: > From: Liu Song <liusong@linux.alibaba.com> > > High-performance NVMe devices usually support a large hw queue, which a larger number of? > /* > - * For a polled request, always complete locally, it's pointless > - * to redirect the completion. > + * For request which hctx has only one ctx mapping, > + * or a polled request, always complete locally, > + * it's pointless to redirect the completion. > */ > - if (rq->cmd_flags & REQ_POLLED) > + if (rq->mq_hctx->nr_ctx == 1 || > + rq->cmd_flags & REQ_POLLED) Some very odd comment formatting and and indentation here.
On 2022/9/22 14:20, Christoph Hellwig wrote: > On Wed, Sep 21, 2022 at 11:32:03AM +0800, Liu Song wrote: >> From: Liu Song <liusong@linux.alibaba.com> >> >> High-performance NVMe devices usually support a large hw queue, which > a larger number of? > >> /* >> - * For a polled request, always complete locally, it's pointless >> - * to redirect the completion. >> + * For request which hctx has only one ctx mapping, >> + * or a polled request, always complete locally, >> + * it's pointless to redirect the completion. >> */ >> - if (rq->cmd_flags & REQ_POLLED) >> + if (rq->mq_hctx->nr_ctx == 1 || >> + rq->cmd_flags & REQ_POLLED) > Some very odd comment formatting and and indentation here. Thanks, I will issue a V2 patch as suggested.
From: Liu Song <liusong@linux.alibaba.com>
High-performance NVMe devices usually support a larger number of hw queue,
which ensures a 1:1 mapping of hctx and ctx. In this case there will be no
remote request, so we don't need to care about it.
Signed-off-by: Liu Song <liusong@linux.alibaba.com>
---
block/blk-mq.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c11949d..7f9be13 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1093,10 +1093,11 @@ bool blk_mq_complete_request_remote(struct request *rq)
WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
/*
- * For a polled request, always complete locally, it's pointless
- * to redirect the completion.
+ * If the request's hctx has only one ctx mapping,
+ * or is a polled request, both always complete locally,
+ * it's pointless to redirect the completion.
*/
- if (rq->cmd_flags & REQ_POLLED)
+ if (rq->mq_hctx->nr_ctx == 1 || rq->cmd_flags & REQ_POLLED)
return false;
if (blk_mq_complete_need_ipi(rq)) {
--
1.8.3.1
© 2016 - 2026 Red Hat, Inc.