[v2] lib/sbitmap: fix shallow_depth tag allocation

[PATCH RFC v2 4/4] block/mq-deadline: introduce min_async_depth

Posted by Yu Kuai 1 year, 1 month ago

From: Yu Kuai <yukuai3@huawei.com>

min_shallow_depth must be less or equal to any shallow_depth value, and
it's 1 currently, and this will change default wake_batch to 1, causing
performance degradation for fast disk with high concurrency. This patch
make following changes:

- set default minimal async_depth to 64, to avoid performance
  degradation in the commen case. And user can set lower value if
  necessary.
- disable throttling asynchronous requests by default, to prevent
  performance degradation in some special setup. User must set a value
  to async_depth to enable it.
- if async_depth is set already, don't reset it if user sets new
  nr_requests.

Fixes: 07757588e507 ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/mq-deadline.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 1f0d175a941e..9be0a33985ce 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -24,6 +24,16 @@
 #include "blk-mq-debugfs.h"
 #include "blk-mq-sched.h"
 
+/*
+ * async_depth is used to reserve scheduler tags for synchronous requests,
+ * and the value will affect sbitmap wake_batch. The default minimal value is 64
+ * because the corresponding wake_batch is 8, and lower wake_batch may affect
+ * IO performance.
+ */
+static unsigned int min_async_depth = 64;
+module_param(min_async_depth, int, 0444);
+MODULE_PARM_DESC(min_async_depth, "The minimal number of tags available for asynchronous requests");
+
 /*
  * See Documentation/block/deadline-iosched.rst
  */
@@ -513,9 +523,12 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
 	struct deadline_data *dd = q->elevator->elevator_data;
 	struct blk_mq_tags *tags = hctx->sched_tags;
 
-	dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
+	if (q->nr_requests > min_async_depth)
+		sbitmap_queue_min_shallow_depth(&tags->bitmap_tags,
+						min_async_depth);
 
-	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
+	if (q->nr_requests <= dd->async_depth)
+		dd->async_depth = 0;
 }
 
 /* Called by blk_mq_init_hctx() and blk_mq_init_sched(). */
@@ -814,7 +827,7 @@ STORE_JIFFIES(deadline_write_expire_store, &dd->fifo_expire[DD_WRITE], 0, INT_MA
 STORE_JIFFIES(deadline_prio_aging_expire_store, &dd->prio_aging_expire, 0, INT_MAX);
 STORE_INT(deadline_writes_starved_store, &dd->writes_starved, INT_MIN, INT_MAX);
 STORE_INT(deadline_front_merges_store, &dd->front_merges, 0, 1);
-STORE_INT(deadline_async_depth_store, &dd->async_depth, 1, INT_MAX);
+STORE_INT(deadline_async_depth_store, &dd->async_depth, min_async_depth, INT_MAX);
 STORE_INT(deadline_fifo_batch_store, &dd->fifo_batch, 0, INT_MAX);
 #undef STORE_FUNCTION
 #undef STORE_INT
-- 
2.39.2

Re: [PATCH RFC v2 4/4] block/mq-deadline: introduce min_async_depth

Posted by Bart Van Assche 1 year, 1 month ago

On 12/16/24 6:40 PM, Yu Kuai wrote:
> +static unsigned int min_async_depth = 64;
> +module_param(min_async_depth, int, 0444);
> +MODULE_PARM_DESC(min_async_depth, "The minimal number of tags available for asynchronous requests");

Users may not like it that this parameter is read-only.

> @@ -513,9 +523,12 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
>   	struct deadline_data *dd = q->elevator->elevator_data;
>   	struct blk_mq_tags *tags = hctx->sched_tags;
>   
> -	dd->async_depth = max(1UL, 3 * q->nr_requests / 4);

Shouldn't this assignment be retained instead of removing it? 
Additionally, some time ago a user requested to initialize 
dd->async_depth to q->nr_requests instead of 3/4 of that value because
the lower value introduced a performance regression.

Thanks,

Bart.

Re: [PATCH RFC v2 4/4] block/mq-deadline: introduce min_async_depth

Posted by Yu Kuai 1 year, 1 month ago

Hi,

在 2024/12/18 6:13, Bart Van Assche 写道:
> On 12/16/24 6:40 PM, Yu Kuai wrote:
>> +static unsigned int min_async_depth = 64;
>> +module_param(min_async_depth, int, 0444);
>> +MODULE_PARM_DESC(min_async_depth, "The minimal number of tags 
>> available for asynchronous requests");
> 
> Users may not like it that this parameter is read-only.
> 
>> @@ -513,9 +523,12 @@ static void dd_depth_updated(struct blk_mq_hw_ctx 
>> *hctx)
>>       struct deadline_data *dd = q->elevator->elevator_data;
>>       struct blk_mq_tags *tags = hctx->sched_tags;
>> -    dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
> 
> Shouldn't this assignment be retained instead of removing it? 
> Additionally, some time ago a user requested to initialize 
> dd->async_depth to q->nr_requests instead of 3/4 of that value because
> the lower value introduced a performance regression.
dd->async_depth is initialized to 0 now, functionally I think
it's the same as q->nr_requests. And I do explain this in commit
message, maybe it's not clear?

BTW, if user sets new nr_requests and async_depth < new nr_requests,
async_depth won't be reset after this patch.

Thanks,
Kuai

> 
> Thanks,
> 
> Bart.
> 
> .
>

Re: [PATCH RFC v2 4/4] block/mq-deadline: introduce min_async_depth

Posted by Bart Van Assche 1 year, 1 month ago

On 12/17/24 5:12 PM, Yu Kuai wrote:
> dd->async_depth is initialized to 0 now, functionally I think
> it's the same as q->nr_requests. And I do explain this in commit
> message, maybe it's not clear?

It would be good to add a comment in the source code that explains that
__blk_mq_get_tag() does not restrict tag allocation if dd->async_depth
is zero because that causes data->shallow_depth to be zero.

Thanks,

Bart.

Re: [PATCH RFC v2 4/4] block/mq-deadline: introduce min_async_depth

Posted by Yu Kuai 1 year, 1 month ago

Hi,

在 2024/12/19 2:06, Bart Van Assche 写道:
> On 12/17/24 5:12 PM, Yu Kuai wrote:
>> dd->async_depth is initialized to 0 now, functionally I think
>> it's the same as q->nr_requests. And I do explain this in commit
>> message, maybe it's not clear?
> 
> It would be good to add a comment in the source code that explains that
> __blk_mq_get_tag() does not restrict tag allocation if dd->async_depth
> is zero because that causes data->shallow_depth to be zero.
> 

Ok.

Thanks,
Kuai

> Thanks,
> 
> Bart.
> .
>

Re: [PATCH RFC v2 4/4] block/mq-deadline: introduce min_async_depth

Posted by Yu Kuai 1 year, 1 month ago

Hi,

在 2024/12/18 9:12, Yu Kuai 写道:
> 
> Users may not like it that this parameter is read-only.

I can't make this read-write, because set lower value will cause
problems for existing elevator, because wake_batch has to be
updated as well.

Thanks,
Kuai

Re: [PATCH RFC v2 4/4] block/mq-deadline: introduce min_async_depth

Posted by Bart Van Assche 1 year, 1 month ago

On 12/17/24 5:14 PM, Yu Kuai wrote:
> I can't make this read-write, because set lower value will cause
> problems for existing elevator, because wake_batch has to be
> updated as well.

Should the request queue perhaps be frozen before wake_batch is updated?

Thanks,

Bart.

Re: [PATCH RFC v2 4/4] block/mq-deadline: introduce min_async_depth

Posted by Yu Kuai 1 year, 1 month ago

Hi,

在 2024/12/19 2:00, Bart Van Assche 写道:
> On 12/17/24 5:14 PM, Yu Kuai wrote:
>> I can't make this read-write, because set lower value will cause
>> problems for existing elevator, because wake_batch has to be
>> updated as well.
> 
> Should the request queue perhaps be frozen before wake_batch is updated?

Yes, we should. The good thing is for now it's frozen already:
  - update nr_requests context;
  - switch elevator;

However, if you mean do this while writing async_depth, freeze queue
is not enough, we have to ping all the hctx as well by q->sysfs_lock,
which is not possible.

Or if you mean do this while write the new min_async_depth, then we have
to update wat_batch for all the queues in the system, too crazy for
me...

Thanks,
Kuai

> 
> Thanks,
> 
> Bart.
> 
> .
>

Re: [PATCH RFC v2 4/4] block/mq-deadline: introduce min_async_depth

Posted by Bart Van Assche 1 year, 1 month ago

On 12/18/24 5:21 PM, Yu Kuai wrote:
> Hi,
> 
> 在 2024/12/19 2:00, Bart Van Assche 写道:
>> On 12/17/24 5:14 PM, Yu Kuai wrote:
>>> I can't make this read-write, because set lower value will cause
>>> problems for existing elevator, because wake_batch has to be
>>> updated as well.
>>
>> Should the request queue perhaps be frozen before wake_batch is updated?
> 
> Yes, we should. The good thing is for now it's frozen already:
>   - update nr_requests context;
>   - switch elevator;
> 
> However, if you mean do this while writing async_depth, freeze queue
> is not enough, we have to ping all the hctx as well by q->sysfs_lock,
> which is not possible.
> 
> Or if you mean do this while write the new min_async_depth, then we have
> to update wat_batch for all the queues in the system, too crazy for
> me...

Should min_async_depth perhaps be a request queue attribute instead of
an mq-deadline I/O scheduler attribute?

Thanks,

Bart.

Re: [PATCH RFC v2 4/4] block/mq-deadline: introduce min_async_depth

Posted by Yu Kuai 1 year, 1 month ago

Hi,

在 2024/12/20 3:25, Bart Van Assche 写道:
> On 12/18/24 5:21 PM, Yu Kuai wrote:
>> Hi,
>>
>> 在 2024/12/19 2:00, Bart Van Assche 写道:
>>> On 12/17/24 5:14 PM, Yu Kuai wrote:
>>>> I can't make this read-write, because set lower value will cause
>>>> problems for existing elevator, because wake_batch has to be
>>>> updated as well.
>>>
>>> Should the request queue perhaps be frozen before wake_batch is updated?
>>
>> Yes, we should. The good thing is for now it's frozen already:
>>   - update nr_requests context;
>>   - switch elevator;
>>
>> However, if you mean do this while writing async_depth, freeze queue
>> is not enough, we have to ping all the hctx as well by q->sysfs_lock,
>> which is not possible.
>>
>> Or if you mean do this while write the new min_async_depth, then we have
>> to update wat_batch for all the queues in the system, too crazy for
>> me...
> 
> Should min_async_depth perhaps be a request queue attribute instead of
> an mq-deadline I/O scheduler attribute?

Yes, I think this make sense, at least kyber and deadline can both
benefit from this. And I might must add a new async_depth_updated() api
to the elevator ops.

Thanks,
Kuai

> 
> Thanks,
> 
> Bart.
> 
> 
> .
>