nvme: request remote is usually not involved for nvme devices

[RFC PATCH] nvme: request remote is usually not involved for nvme devices

Posted by Liu Song 3 years, 6 months ago

From: Liu Song <liusong@linux.alibaba.com>

NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
so when "nr_ctx" is equal to 1, there is no possibility of remote
request, so the corresponding process can be simplified.

Signed-off-by: Liu Song <liusong@linux.alibaba.com>
---
 drivers/nvme/host/nvme.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 216acbe..cc21896 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -672,6 +672,10 @@ static inline bool nvme_try_complete_req(struct request *req, __le16 status,
 	nvme_should_fail(req);
 	if (unlikely(blk_should_fake_timeout(req->q)))
 		return true;
+	if (likely(req->mq_hctx->nr_ctx == 1)) {
+		WRITE_ONCE(req->state, MQ_RQ_COMPLETE);
+		return false;
+	}
 	return blk_mq_complete_request_remote(req);
 }
 
-- 
1.8.3.1

Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices

Posted by Jens Axboe 3 years, 6 months ago

On 9/17/22 10:40 AM, Liu Song wrote:
> From: Liu Song <liusong@linux.alibaba.com>
> 
> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
> so when "nr_ctx" is equal to 1, there is no possibility of remote
> request, so the corresponding process can be simplified.

If the worry is the call overhead of blk_mq_complete_request_remote(),
why don't we just make that available as an inline instead? That seems
vastly superior to providing a random shortcut in a driver to avoid
calling it.

-- 
Jens Axboe

Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices

Posted by Liu Song 3 years, 6 months ago

On 2022/9/18 00:50, Jens Axboe wrote:
> On 9/17/22 10:40 AM, Liu Song wrote:
>> From: Liu Song <liusong@linux.alibaba.com>
>>
>> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
>> so when "nr_ctx" is equal to 1, there is no possibility of remote
>> request, so the corresponding process can be simplified.
> If the worry is the call overhead of blk_mq_complete_request_remote(),
> why don't we just make that available as an inline instead? That seems
> vastly superior to providing a random shortcut in a driver to avoid
> calling it.

Hi

This is what I think about it. If it is an SSD with only one hw queue, 
remote requests will
appear occasionally. As a real multi-queue device, nvme usually does not 
have remote
requests, so we don't need to care about it. So even if 
"blk_mq_complete_request_remote"
is called, there is a high probability that it will return false, and 
the cost of changing the call
to "blk_mq_complete_request_remote" to determine whether 
"req->mq_hctx->nr_ctx" is 1 is
not only a function call, but more judgments in 
"blk_mq_complete_request_remote".
If "blk_mq_complete_request_remote" is decorated as inline, it also 
depends on the compiler,
there is uncertainty.

Thanks

Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices

Posted by Jens Axboe 3 years, 6 months ago

On 9/18/22 10:10 AM, Liu Song wrote:
> 
> On 2022/9/18 00:50, Jens Axboe wrote:
>> On 9/17/22 10:40 AM, Liu Song wrote:
>>> From: Liu Song <liusong@linux.alibaba.com>
>>>
>>> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
>>> so when "nr_ctx" is equal to 1, there is no possibility of remote
>>> request, so the corresponding process can be simplified.
>> If the worry is the call overhead of blk_mq_complete_request_remote(),
>> why don't we just make that available as an inline instead? That seems
>> vastly superior to providing a random shortcut in a driver to avoid
>> calling it.
> 
> Hi
> 
> This is what I think about it. If it is an SSD with only one hw queue,
> remote requests will appear occasionally. As a real multi-queue
> device, nvme usually does not have remote requests, so we don't need
> to care about it. So even if "blk_mq_complete_request_remote" is
> called, there is a high probability that it will return false, and the
> cost of changing the call to "blk_mq_complete_request_remote" to
> determine whether "req->mq_hctx->nr_ctx" is 1 is not only a function
> call, but more judgments in "blk_mq_complete_request_remote". If
> "blk_mq_complete_request_remote" is decorated as inline, it also
> depends on the compiler, there is uncertainty.

I'm not disagreeing with any of that, my point is just that you're
hacking around this in the nvme driver. This is problematic whenever
core changes happen, because now we have to touch individual drivers.
While the expectation is that there are no remote IPI completions for
NVMe, queue starved devices do exist and those do see remote
completions.

This optimization belongs in the blk-mq core, not in nvme. I do think it
makes sense, you just need to solve it in blk-mq rather than in the nvme
driver.

-- 
Jens Axboe

Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices

Posted by Christoph Hellwig 3 years, 6 months ago

On Mon, Sep 19, 2022 at 08:10:31AM -0600, Jens Axboe wrote:
> I'm not disagreeing with any of that, my point is just that you're
> hacking around this in the nvme driver. This is problematic whenever
> core changes happen, because now we have to touch individual drivers.
> While the expectation is that there are no remote IPI completions for
> NVMe, queue starved devices do exist and those do see remote
> completions.
> 
> This optimization belongs in the blk-mq core, not in nvme. I do think it
> makes sense, you just need to solve it in blk-mq rather than in the nvme
> driver.

I'd also really see solid numbers to justify it.

And btw, having more than one core per queue is quite common in
nvme.  Even many enterprise SSDs only have 64 queues, and some of
the low-end consumers ones have very low number of queues that are
not enough for the number of cores in even mid-end desktop CPUs.

Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices

Posted by Liu Song 3 years, 6 months ago

On 2022/9/19 22:35, Christoph Hellwig wrote:
> On Mon, Sep 19, 2022 at 08:10:31AM -0600, Jens Axboe wrote:
>> I'm not disagreeing with any of that, my point is just that you're
>> hacking around this in the nvme driver. This is problematic whenever
>> core changes happen, because now we have to touch individual drivers.
>> While the expectation is that there are no remote IPI completions for
>> NVMe, queue starved devices do exist and those do see remote
>> completions.
>>
>> This optimization belongs in the blk-mq core, not in nvme. I do think it
>> makes sense, you just need to solve it in blk-mq rather than in the nvme
>> driver.
> I'd also really see solid numbers to justify it.
>
> And btw, having more than one core per queue is quite common in
> nvme.  Even many enterprise SSDs only have 64 queues, and some of
> the low-end consumers ones have very low number of queues that are
> not enough for the number of cores in even mid-end desktop CPUs.

Hi

Thank you for your suggestion. Here is what I think about it. NVMe 
devices that can support
one core per queue are usually high-performance, so I prefer to make 
more optimizations
for high-performance devices, while for ordinary consumption class NVMe 
devices, such
modifications will not bring special additional overhead.

Thanks

[RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion

Posted by Liu Song 3 years, 6 months ago

From: Liu Song <liusong@linux.alibaba.com>

High-performance NVMe devices usually support a large hw queue, which
ensures a 1:1 mapping of hctx and ctx. In this case there will be no
remote request, so we don't need to care about it.

Signed-off-by: Liu Song <liusong@linux.alibaba.com>
---
 block/blk-mq.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c11949d..9626ea1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1093,10 +1093,12 @@ bool blk_mq_complete_request_remote(struct request *rq)
 	WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
 
 	/*
-	 * For a polled request, always complete locally, it's pointless
-	 * to redirect the completion.
+	 * For request which hctx has only one ctx mapping,
+	 * or a polled request, always complete locally,
+	 * it's pointless to redirect the completion.
 	 */
-	if (rq->cmd_flags & REQ_POLLED)
+	if (rq->mq_hctx->nr_ctx == 1 ||
+		rq->cmd_flags & REQ_POLLED)
 		return false;
 
 	if (blk_mq_complete_need_ipi(rq)) {
-- 
1.8.3.1

Re: [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion

Posted by Jens Axboe 3 years, 6 months ago

On Wed, 21 Sep 2022 11:32:03 +0800, Liu Song wrote:
> From: Liu Song <liusong@linux.alibaba.com>
> 
> High-performance NVMe devices usually support a large hw queue, which
> ensures a 1:1 mapping of hctx and ctx. In this case there will be no
> remote request, so we don't need to care about it.
> 
> 
> [...]

Applied, thanks!

[1/1] blk-mq: hctx has only one ctx mapping is no need to redirect the completion
      commit: f168420c62e7a106961f2489a89f6ade84fe3f27

Best regards,
-- 
Jens Axboe

Re: [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion

Posted by Christoph Hellwig 3 years, 6 months ago

On Wed, Sep 21, 2022 at 11:32:03AM +0800, Liu Song wrote:
> From: Liu Song <liusong@linux.alibaba.com>
> 
> High-performance NVMe devices usually support a large hw queue, which

a larger number of?

>  	/*
> -	 * For a polled request, always complete locally, it's pointless
> -	 * to redirect the completion.
> +	 * For request which hctx has only one ctx mapping,
> +	 * or a polled request, always complete locally,
> +	 * it's pointless to redirect the completion.
>  	 */
> -	if (rq->cmd_flags & REQ_POLLED)
> +	if (rq->mq_hctx->nr_ctx == 1 ||
> +		rq->cmd_flags & REQ_POLLED)

Some very odd comment formatting and and indentation here.

Re: [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion

Posted by Liu Song 3 years, 6 months ago

On 2022/9/22 14:20, Christoph Hellwig wrote:

> On Wed, Sep 21, 2022 at 11:32:03AM +0800, Liu Song wrote:
>> From: Liu Song <liusong@linux.alibaba.com>
>>
>> High-performance NVMe devices usually support a large hw queue, which
> a larger number of?
>
>>   	/*
>> -	 * For a polled request, always complete locally, it's pointless
>> -	 * to redirect the completion.
>> +	 * For request which hctx has only one ctx mapping,
>> +	 * or a polled request, always complete locally,
>> +	 * it's pointless to redirect the completion.
>>   	 */
>> -	if (rq->cmd_flags & REQ_POLLED)
>> +	if (rq->mq_hctx->nr_ctx == 1 ||
>> +		rq->cmd_flags & REQ_POLLED)
> Some very odd comment formatting and and indentation here.

Thanks, I will issue a V2 patch as suggested.

[RFC PATCH v2] blk-mq: hctx has only one ctx mapping is no need to redirect the completion

Posted by Liu Song 3 years, 6 months ago

From: Liu Song <liusong@linux.alibaba.com>

High-performance NVMe devices usually support a larger number of hw queue,
which ensures a 1:1 mapping of hctx and ctx. In this case there will be no
remote request, so we don't need to care about it.

Signed-off-by: Liu Song <liusong@linux.alibaba.com>
---
 block/blk-mq.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c11949d..7f9be13 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1093,10 +1093,11 @@ bool blk_mq_complete_request_remote(struct request *rq)
 	WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
 
 	/*
-	 * For a polled request, always complete locally, it's pointless
-	 * to redirect the completion.
+	 * If the request's hctx has only one ctx mapping,
+	 * or is a polled request, both always complete locally,
+	 * it's pointless to redirect the completion.
 	 */
-	if (rq->cmd_flags & REQ_POLLED)
+	if (rq->mq_hctx->nr_ctx == 1 || rq->cmd_flags & REQ_POLLED)
 		return false;
 
 	if (blk_mq_complete_need_ipi(rq)) {
-- 
1.8.3.1