From: Yu Kuai <yukuai3@huawei.com>
min_shallow_depth must be less or equal to any shallow_depth value, and
it's 1 currently, and this will change default wake_batch to 1, causing
performance degradation for fast disk with high concurrency. This patch
make following changes:
- set default minimal async_depth to 64, to avoid performance
degradation in the commen case. And user can set lower value if
necessary.
- disable throttling asynchronous requests by default, to prevent
performance degradation in some special setup. User must set a value
to async_depth to enable it.
- if async_depth is set already, don't reset it if user sets new
nr_requests.
Fixes: 07757588e507 ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
block/mq-deadline.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 1f0d175a941e..9be0a33985ce 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -24,6 +24,16 @@
#include "blk-mq-debugfs.h"
#include "blk-mq-sched.h"
+/*
+ * async_depth is used to reserve scheduler tags for synchronous requests,
+ * and the value will affect sbitmap wake_batch. The default minimal value is 64
+ * because the corresponding wake_batch is 8, and lower wake_batch may affect
+ * IO performance.
+ */
+static unsigned int min_async_depth = 64;
+module_param(min_async_depth, int, 0444);
+MODULE_PARM_DESC(min_async_depth, "The minimal number of tags available for asynchronous requests");
+
/*
* See Documentation/block/deadline-iosched.rst
*/
@@ -513,9 +523,12 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
struct deadline_data *dd = q->elevator->elevator_data;
struct blk_mq_tags *tags = hctx->sched_tags;
- dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
+ if (q->nr_requests > min_async_depth)
+ sbitmap_queue_min_shallow_depth(&tags->bitmap_tags,
+ min_async_depth);
- sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
+ if (q->nr_requests <= dd->async_depth)
+ dd->async_depth = 0;
}
/* Called by blk_mq_init_hctx() and blk_mq_init_sched(). */
@@ -814,7 +827,7 @@ STORE_JIFFIES(deadline_write_expire_store, &dd->fifo_expire[DD_WRITE], 0, INT_MA
STORE_JIFFIES(deadline_prio_aging_expire_store, &dd->prio_aging_expire, 0, INT_MAX);
STORE_INT(deadline_writes_starved_store, &dd->writes_starved, INT_MIN, INT_MAX);
STORE_INT(deadline_front_merges_store, &dd->front_merges, 0, 1);
-STORE_INT(deadline_async_depth_store, &dd->async_depth, 1, INT_MAX);
+STORE_INT(deadline_async_depth_store, &dd->async_depth, min_async_depth, INT_MAX);
STORE_INT(deadline_fifo_batch_store, &dd->fifo_batch, 0, INT_MAX);
#undef STORE_FUNCTION
#undef STORE_INT
--
2.39.2
On 12/16/24 6:40 PM, Yu Kuai wrote: > +static unsigned int min_async_depth = 64; > +module_param(min_async_depth, int, 0444); > +MODULE_PARM_DESC(min_async_depth, "The minimal number of tags available for asynchronous requests"); Users may not like it that this parameter is read-only. > @@ -513,9 +523,12 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx) > struct deadline_data *dd = q->elevator->elevator_data; > struct blk_mq_tags *tags = hctx->sched_tags; > > - dd->async_depth = max(1UL, 3 * q->nr_requests / 4); Shouldn't this assignment be retained instead of removing it? Additionally, some time ago a user requested to initialize dd->async_depth to q->nr_requests instead of 3/4 of that value because the lower value introduced a performance regression. Thanks, Bart.
Hi, 在 2024/12/18 6:13, Bart Van Assche 写道: > On 12/16/24 6:40 PM, Yu Kuai wrote: >> +static unsigned int min_async_depth = 64; >> +module_param(min_async_depth, int, 0444); >> +MODULE_PARM_DESC(min_async_depth, "The minimal number of tags >> available for asynchronous requests"); > > Users may not like it that this parameter is read-only. > >> @@ -513,9 +523,12 @@ static void dd_depth_updated(struct blk_mq_hw_ctx >> *hctx) >> struct deadline_data *dd = q->elevator->elevator_data; >> struct blk_mq_tags *tags = hctx->sched_tags; >> - dd->async_depth = max(1UL, 3 * q->nr_requests / 4); > > Shouldn't this assignment be retained instead of removing it? > Additionally, some time ago a user requested to initialize > dd->async_depth to q->nr_requests instead of 3/4 of that value because > the lower value introduced a performance regression. dd->async_depth is initialized to 0 now, functionally I think it's the same as q->nr_requests. And I do explain this in commit message, maybe it's not clear? BTW, if user sets new nr_requests and async_depth < new nr_requests, async_depth won't be reset after this patch. Thanks, Kuai > > Thanks, > > Bart. > > . >
On 12/17/24 5:12 PM, Yu Kuai wrote: > dd->async_depth is initialized to 0 now, functionally I think > it's the same as q->nr_requests. And I do explain this in commit > message, maybe it's not clear? It would be good to add a comment in the source code that explains that __blk_mq_get_tag() does not restrict tag allocation if dd->async_depth is zero because that causes data->shallow_depth to be zero. Thanks, Bart.
Hi, 在 2024/12/19 2:06, Bart Van Assche 写道: > On 12/17/24 5:12 PM, Yu Kuai wrote: >> dd->async_depth is initialized to 0 now, functionally I think >> it's the same as q->nr_requests. And I do explain this in commit >> message, maybe it's not clear? > > It would be good to add a comment in the source code that explains that > __blk_mq_get_tag() does not restrict tag allocation if dd->async_depth > is zero because that causes data->shallow_depth to be zero. > Ok. Thanks, Kuai > Thanks, > > Bart. > . >
Hi, 在 2024/12/18 9:12, Yu Kuai 写道: > > Users may not like it that this parameter is read-only. I can't make this read-write, because set lower value will cause problems for existing elevator, because wake_batch has to be updated as well. Thanks, Kuai
On 12/17/24 5:14 PM, Yu Kuai wrote: > I can't make this read-write, because set lower value will cause > problems for existing elevator, because wake_batch has to be > updated as well. Should the request queue perhaps be frozen before wake_batch is updated? Thanks, Bart.
Hi, 在 2024/12/19 2:00, Bart Van Assche 写道: > On 12/17/24 5:14 PM, Yu Kuai wrote: >> I can't make this read-write, because set lower value will cause >> problems for existing elevator, because wake_batch has to be >> updated as well. > > Should the request queue perhaps be frozen before wake_batch is updated? Yes, we should. The good thing is for now it's frozen already: - update nr_requests context; - switch elevator; However, if you mean do this while writing async_depth, freeze queue is not enough, we have to ping all the hctx as well by q->sysfs_lock, which is not possible. Or if you mean do this while write the new min_async_depth, then we have to update wat_batch for all the queues in the system, too crazy for me... Thanks, Kuai > > Thanks, > > Bart. > > . >
On 12/18/24 5:21 PM, Yu Kuai wrote: > Hi, > > 在 2024/12/19 2:00, Bart Van Assche 写道: >> On 12/17/24 5:14 PM, Yu Kuai wrote: >>> I can't make this read-write, because set lower value will cause >>> problems for existing elevator, because wake_batch has to be >>> updated as well. >> >> Should the request queue perhaps be frozen before wake_batch is updated? > > Yes, we should. The good thing is for now it's frozen already: > - update nr_requests context; > - switch elevator; > > However, if you mean do this while writing async_depth, freeze queue > is not enough, we have to ping all the hctx as well by q->sysfs_lock, > which is not possible. > > Or if you mean do this while write the new min_async_depth, then we have > to update wat_batch for all the queues in the system, too crazy for > me... Should min_async_depth perhaps be a request queue attribute instead of an mq-deadline I/O scheduler attribute? Thanks, Bart.
Hi, 在 2024/12/20 3:25, Bart Van Assche 写道: > On 12/18/24 5:21 PM, Yu Kuai wrote: >> Hi, >> >> 在 2024/12/19 2:00, Bart Van Assche 写道: >>> On 12/17/24 5:14 PM, Yu Kuai wrote: >>>> I can't make this read-write, because set lower value will cause >>>> problems for existing elevator, because wake_batch has to be >>>> updated as well. >>> >>> Should the request queue perhaps be frozen before wake_batch is updated? >> >> Yes, we should. The good thing is for now it's frozen already: >> - update nr_requests context; >> - switch elevator; >> >> However, if you mean do this while writing async_depth, freeze queue >> is not enough, we have to ping all the hctx as well by q->sysfs_lock, >> which is not possible. >> >> Or if you mean do this while write the new min_async_depth, then we have >> to update wat_batch for all the queues in the system, too crazy for >> me... > > Should min_async_depth perhaps be a request queue attribute instead of > an mq-deadline I/O scheduler attribute? Yes, I think this make sense, at least kyber and deadline can both benefit from this. And I might must add a new async_depth_updated() api to the elevator ops. Thanks, Kuai > > Thanks, > > Bart. > > > . >
© 2016 - 2025 Red Hat, Inc.