From nobody Mon Oct 6 12:02:03 2025 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E59D21C178; Tue, 22 Jul 2025 07:30:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753169461; cv=none; b=GrydULQNWQDZG7RyNJsGhKm7i6QWorw3tfhnbY1m9v6RkiAPKqtzfHTlfDNpIktBig//OiZBJyiy2A1AbcpQhsx5c15VCd17LZ94GF3jWmTwWn9RadxMmqvWVdxTMNPrNw3j6LpfMi3kStak33P+LYq1HyM0FNckOgFG4EV1hyw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753169461; c=relaxed/simple; bh=CshaAEJ50QzSmD4QbdcSERaTf9DX+000tbGE3KVJv6s=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=SSlEyGh1OjUDpyq9bcFFKcDprzMgw5q4jeCgvv8qaez8DWeEonnItizf4NJzEcr1pIB7w5wDkr4zN3GBmKyIdFeLvfUjfTITj5XvDkDsL58hpCPEh89EqP4bERBT7CcF9vGyFoIdwlDY1pS/9hnE1Gw1tY/3ugT1ArsYazUgIpI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4bmTTC3bxCzKHMvb; Tue, 22 Jul 2025 15:30:55 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 345241A124B; Tue, 22 Jul 2025 15:30:54 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgDnYhMrPn9ovjJdBA--.52549S5; Tue, 22 Jul 2025 15:30:54 +0800 (CST) From: Yu Kuai To: dlemoal@kernel.org, hare@suse.de, tj@kernel.org, josef@toxicpanda.com, axboe@kernel.dk, yukuai3@huawei.com Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com Subject: [PATCH 1/6] mq-deadline: switch to use high layer elevator lock Date: Tue, 22 Jul 2025 15:24:26 +0800 Message-Id: <20250722072431.610354-2-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250722072431.610354-1-yukuai1@huaweicloud.com> References: <20250722072431.610354-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgDnYhMrPn9ovjJdBA--.52549S5 X-Coremail-Antispam: 1UD129KBjvJXoW3Jr43Gr18ZryDAF47XFy8uFg_yoWfXF1rpF W5KanIyw4rXFsrXF1DJayDZr4agw4I9ry7tr93Gw4fKFn7Ar9rXF1UGF4Fvrs5Jr97CFsI gF4jqa98AF17JwUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUm014x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWU JVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67 kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY 6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0x vEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVj vjDU0xZFpf9x0JU4OJ5UUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai Introduce a new spinlock in elevator_queue, and switch dd->lock to use the new lock. There are no functional changes. Signed-off-by: Yu Kuai --- block/elevator.c | 1 + block/elevator.h | 4 ++-- block/mq-deadline.c | 57 ++++++++++++++++++++++----------------------- 3 files changed, 31 insertions(+), 31 deletions(-) diff --git a/block/elevator.c b/block/elevator.c index ab22542e6cf0..91df270d9d91 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -144,6 +144,7 @@ struct elevator_queue *elevator_alloc(struct request_qu= eue *q, eq->type =3D e; kobject_init(&eq->kobj, &elv_ktype); mutex_init(&eq->sysfs_lock); + spin_lock_init(&eq->lock); hash_init(eq->hash); =20 return eq; diff --git a/block/elevator.h b/block/elevator.h index a07ce773a38f..cbbac4f7825c 100644 --- a/block/elevator.h +++ b/block/elevator.h @@ -110,12 +110,12 @@ struct request *elv_rqhash_find(struct request_queue = *q, sector_t offset); /* * each queue has an elevator_queue associated with it */ -struct elevator_queue -{ +struct elevator_queue { struct elevator_type *type; void *elevator_data; struct kobject kobj; struct mutex sysfs_lock; + spinlock_t lock; unsigned long flags; DECLARE_HASHTABLE(hash, ELV_HASH_BITS); }; diff --git a/block/mq-deadline.c b/block/mq-deadline.c index 2edf1cac06d5..e31da6de7764 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -101,7 +101,7 @@ struct deadline_data { u32 async_depth; int prio_aging_expire; =20 - spinlock_t lock; + spinlock_t *lock; }; =20 /* Maps an I/O priority class to a deadline scheduler priority. */ @@ -213,7 +213,7 @@ static void dd_merged_requests(struct request_queue *q,= struct request *req, const u8 ioprio_class =3D dd_rq_ioclass(next); const enum dd_prio prio =3D ioprio_class_to_prio[ioprio_class]; =20 - lockdep_assert_held(&dd->lock); + lockdep_assert_held(dd->lock); =20 dd->per_prio[prio].stats.merged++; =20 @@ -253,7 +253,7 @@ static u32 dd_queued(struct deadline_data *dd, enum dd_= prio prio) { const struct io_stats_per_prio *stats =3D &dd->per_prio[prio].stats; =20 - lockdep_assert_held(&dd->lock); + lockdep_assert_held(dd->lock); =20 return stats->inserted - atomic_read(&stats->completed); } @@ -323,7 +323,7 @@ static struct request *__dd_dispatch_request(struct dea= dline_data *dd, enum dd_prio prio; u8 ioprio_class; =20 - lockdep_assert_held(&dd->lock); + lockdep_assert_held(dd->lock); =20 if (!list_empty(&per_prio->dispatch)) { rq =3D list_first_entry(&per_prio->dispatch, struct request, @@ -434,7 +434,7 @@ static struct request *dd_dispatch_prio_aged_requests(s= truct deadline_data *dd, enum dd_prio prio; int prio_cnt; =20 - lockdep_assert_held(&dd->lock); + lockdep_assert_held(dd->lock); =20 prio_cnt =3D !!dd_queued(dd, DD_RT_PRIO) + !!dd_queued(dd, DD_BE_PRIO) + !!dd_queued(dd, DD_IDLE_PRIO); @@ -466,7 +466,7 @@ static struct request *dd_dispatch_request(struct blk_m= q_hw_ctx *hctx) struct request *rq; enum dd_prio prio; =20 - spin_lock(&dd->lock); + spin_lock(dd->lock); rq =3D dd_dispatch_prio_aged_requests(dd, now); if (rq) goto unlock; @@ -482,8 +482,7 @@ static struct request *dd_dispatch_request(struct blk_m= q_hw_ctx *hctx) } =20 unlock: - spin_unlock(&dd->lock); - + spin_unlock(dd->lock); return rq; } =20 @@ -552,9 +551,9 @@ static void dd_exit_sched(struct elevator_queue *e) WARN_ON_ONCE(!list_empty(&per_prio->fifo_list[DD_READ])); WARN_ON_ONCE(!list_empty(&per_prio->fifo_list[DD_WRITE])); =20 - spin_lock(&dd->lock); + spin_lock(dd->lock); queued =3D dd_queued(dd, prio); - spin_unlock(&dd->lock); + spin_unlock(dd->lock); =20 WARN_ONCE(queued !=3D 0, "statistics for priority %d: i %u m %u d %u c %u\n", @@ -601,7 +600,7 @@ static int dd_init_sched(struct request_queue *q, struc= t elevator_type *e) dd->last_dir =3D DD_WRITE; dd->fifo_batch =3D fifo_batch; dd->prio_aging_expire =3D prio_aging_expire; - spin_lock_init(&dd->lock); + dd->lock =3D &eq->lock; =20 /* We dispatch from request queue wide instead of hw queue */ blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q); @@ -657,9 +656,9 @@ static bool dd_bio_merge(struct request_queue *q, struc= t bio *bio, struct request *free =3D NULL; bool ret; =20 - spin_lock(&dd->lock); + spin_lock(dd->lock); ret =3D blk_mq_sched_try_merge(q, bio, nr_segs, &free); - spin_unlock(&dd->lock); + spin_unlock(dd->lock); =20 if (free) blk_mq_free_request(free); @@ -681,7 +680,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hct= x, struct request *rq, struct dd_per_prio *per_prio; enum dd_prio prio; =20 - lockdep_assert_held(&dd->lock); + lockdep_assert_held(dd->lock); =20 prio =3D ioprio_class_to_prio[ioprio_class]; per_prio =3D &dd->per_prio[prio]; @@ -725,7 +724,7 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hc= tx, struct deadline_data *dd =3D q->elevator->elevator_data; LIST_HEAD(free); =20 - spin_lock(&dd->lock); + spin_lock(dd->lock); while (!list_empty(list)) { struct request *rq; =20 @@ -733,7 +732,7 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hc= tx, list_del_init(&rq->queuelist); dd_insert_request(hctx, rq, flags, &free); } - spin_unlock(&dd->lock); + spin_unlock(dd->lock); =20 blk_mq_free_requests(&free); } @@ -849,13 +848,13 @@ static const struct elv_fs_entry deadline_attrs[] =3D= { #define DEADLINE_DEBUGFS_DDIR_ATTRS(prio, data_dir, name) \ static void *deadline_##name##_fifo_start(struct seq_file *m, \ loff_t *pos) \ - __acquires(&dd->lock) \ + __acquires(dd->lock) \ { \ struct request_queue *q =3D m->private; \ struct deadline_data *dd =3D q->elevator->elevator_data; \ struct dd_per_prio *per_prio =3D &dd->per_prio[prio]; \ \ - spin_lock(&dd->lock); \ + spin_lock(dd->lock); \ return seq_list_start(&per_prio->fifo_list[data_dir], *pos); \ } \ \ @@ -870,12 +869,12 @@ static void *deadline_##name##_fifo_next(struct seq_f= ile *m, void *v, \ } \ \ static void deadline_##name##_fifo_stop(struct seq_file *m, void *v) \ - __releases(&dd->lock) \ + __releases(dd->lock) \ { \ struct request_queue *q =3D m->private; \ struct deadline_data *dd =3D q->elevator->elevator_data; \ \ - spin_unlock(&dd->lock); \ + spin_unlock(dd->lock); \ } \ \ static const struct seq_operations deadline_##name##_fifo_seq_ops =3D { \ @@ -941,11 +940,11 @@ static int dd_queued_show(void *data, struct seq_file= *m) struct deadline_data *dd =3D q->elevator->elevator_data; u32 rt, be, idle; =20 - spin_lock(&dd->lock); + spin_lock(dd->lock); rt =3D dd_queued(dd, DD_RT_PRIO); be =3D dd_queued(dd, DD_BE_PRIO); idle =3D dd_queued(dd, DD_IDLE_PRIO); - spin_unlock(&dd->lock); + spin_unlock(dd->lock); =20 seq_printf(m, "%u %u %u\n", rt, be, idle); =20 @@ -957,7 +956,7 @@ static u32 dd_owned_by_driver(struct deadline_data *dd,= enum dd_prio prio) { const struct io_stats_per_prio *stats =3D &dd->per_prio[prio].stats; =20 - lockdep_assert_held(&dd->lock); + lockdep_assert_held(dd->lock); =20 return stats->dispatched + stats->merged - atomic_read(&stats->completed); @@ -969,11 +968,11 @@ static int dd_owned_by_driver_show(void *data, struct= seq_file *m) struct deadline_data *dd =3D q->elevator->elevator_data; u32 rt, be, idle; =20 - spin_lock(&dd->lock); + spin_lock(dd->lock); rt =3D dd_owned_by_driver(dd, DD_RT_PRIO); be =3D dd_owned_by_driver(dd, DD_BE_PRIO); idle =3D dd_owned_by_driver(dd, DD_IDLE_PRIO); - spin_unlock(&dd->lock); + spin_unlock(dd->lock); =20 seq_printf(m, "%u %u %u\n", rt, be, idle); =20 @@ -983,13 +982,13 @@ static int dd_owned_by_driver_show(void *data, struct= seq_file *m) #define DEADLINE_DISPATCH_ATTR(prio) \ static void *deadline_dispatch##prio##_start(struct seq_file *m, \ loff_t *pos) \ - __acquires(&dd->lock) \ + __acquires(dd->lock) \ { \ struct request_queue *q =3D m->private; \ struct deadline_data *dd =3D q->elevator->elevator_data; \ struct dd_per_prio *per_prio =3D &dd->per_prio[prio]; \ \ - spin_lock(&dd->lock); \ + spin_lock(dd->lock); \ return seq_list_start(&per_prio->dispatch, *pos); \ } \ \ @@ -1004,12 +1003,12 @@ static void *deadline_dispatch##prio##_next(struct = seq_file *m, \ } \ \ static void deadline_dispatch##prio##_stop(struct seq_file *m, void *v) \ - __releases(&dd->lock) \ + __releases(dd->lock) \ { \ struct request_queue *q =3D m->private; \ struct deadline_data *dd =3D q->elevator->elevator_data; \ \ - spin_unlock(&dd->lock); \ + spin_unlock(dd->lock); \ } \ \ static const struct seq_operations deadline_dispatch##prio##_seq_ops =3D {= \ --=20 2.39.2 From nobody Mon Oct 6 12:02:03 2025 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 703592206BE; Tue, 22 Jul 2025 07:30:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753169459; cv=none; b=jH3DLEqixprZCIzu++YQJ6xpn8j2f9pZcGmTqVTHYdz+1wXX8mepYpPc7gyp48oeLSfbVTi2MMzBjhz25DCRL36Z31QG8a9no+fmXQxV/YZF4TlRxBdtCRgRgjypDth3tFoS8Usr/aMHVghpxls0Q2smy91RhsxmVLHixe19544= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753169459; c=relaxed/simple; bh=/RaYqj3j98hCwzssoKdXSriKns4zIZSdelvog1GUyOY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=jo1rGsXbHOyYfT6z1a/Z/sXQXWUPfZ93UG4QKRxWWbz1Yvj5tKUhxP/dANdByDL8F6X7Fm0L3nWwG7kynGsJHb/lu01VWvoksnENIbwURntQuG7aYyU5OuXaOLub+6JfSAtdOUA/P6MntYCtvGmXs0ljIproNc+1KrwFxDAr5r4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4bmTTD0FRXzYQvJY; Tue, 22 Jul 2025 15:30:56 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id BC95C1A1949; Tue, 22 Jul 2025 15:30:54 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgDnYhMrPn9ovjJdBA--.52549S6; Tue, 22 Jul 2025 15:30:54 +0800 (CST) From: Yu Kuai To: dlemoal@kernel.org, hare@suse.de, tj@kernel.org, josef@toxicpanda.com, axboe@kernel.dk, yukuai3@huawei.com Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com Subject: [PATCH 2/6] block, bfq: don't grab queue_lock from io path Date: Tue, 22 Jul 2025 15:24:27 +0800 Message-Id: <20250722072431.610354-3-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250722072431.610354-1-yukuai1@huaweicloud.com> References: <20250722072431.610354-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgDnYhMrPn9ovjJdBA--.52549S6 X-Coremail-Antispam: 1UD129KBjvJXoWxZF1rArWrKw43JrWxJFW7CFg_yoW7JryDpF WaqanIyr10gr47uryfJ3ZrZrn3W3W8ursrta93tw4Fkr92yrnav3Wjyry2vF1S9rWkAFsF vr4UK3ykAr48uaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUm014x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWU JVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67 kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY 6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0x vEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVj vjDU0xZFpf9x0JUQXo7UUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai Currently issue io can grab queue_lock three times from bfq_bio_merge(), bfq_limit_depth() and bfq_prepare_request(), the queue_lock is not necessary if icq is already created: - queue_usage_counter is already grabbed and queue won't exist; - current thread won't exist; - if other thread is allocating and inserting new icq to ioc->icq_tree, rcu can be used to protect lookup icq from the raidx tree, it's safe to use extracted icq until queue or current thread exit; If ioc or icq is not created, then bfq_prepare_request() will create it, which means the task is issuing io to queue the first time, this can consider a slow path and queue_lock will still be held to protect inserting allocated icq to ioc->icq_tree. Signed-off-by: Yu Kuai --- block/bfq-iosched.c | 24 +++++++----------------- block/blk-ioc.c | 43 ++++++++++++++++++++++++++++++++++++++----- block/blk.h | 2 +- 3 files changed, 46 insertions(+), 23 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 0cb1e9873aab..58d57c482acd 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -454,17 +454,13 @@ static struct bfq_io_cq *icq_to_bic(struct io_cq *icq) */ static struct bfq_io_cq *bfq_bic_lookup(struct request_queue *q) { - struct bfq_io_cq *icq; - unsigned long flags; - - if (!current->io_context) - return NULL; + struct io_cq *icq; =20 - spin_lock_irqsave(&q->queue_lock, flags); - icq =3D icq_to_bic(ioc_lookup_icq(q)); - spin_unlock_irqrestore(&q->queue_lock, flags); + rcu_read_lock(); + icq =3D ioc_lookup_icq_rcu(q); + rcu_read_unlock(); =20 - return icq; + return icq_to_bic(icq); } =20 /* @@ -2456,16 +2452,10 @@ static void bfq_remove_request(struct request_queue= *q, static bool bfq_bio_merge(struct request_queue *q, struct bio *bio, unsigned int nr_segs) { + /* bic will not be freed until current or elevator exit */ + struct bfq_io_cq *bic =3D bfq_bic_lookup(q); struct bfq_data *bfqd =3D q->elevator->elevator_data; struct request *free =3D NULL; - /* - * bfq_bic_lookup grabs the queue_lock: invoke it now and - * store its return value for later use, to avoid nesting - * queue_lock inside the bfqd->lock. We assume that the bic - * returned by bfq_bic_lookup does not go away before - * bfqd->lock is taken. - */ - struct bfq_io_cq *bic =3D bfq_bic_lookup(q); bool ret; =20 spin_lock_irq(&bfqd->lock); diff --git a/block/blk-ioc.c b/block/blk-ioc.c index ce82770c72ab..0be097a37e22 100644 --- a/block/blk-ioc.c +++ b/block/blk-ioc.c @@ -314,7 +314,7 @@ int __copy_io(unsigned long clone_flags, struct task_st= ruct *tsk) * Look up io_cq associated with @ioc - @q pair from @ioc. Must be called * with @q->queue_lock held. */ -struct io_cq *ioc_lookup_icq(struct request_queue *q) +static struct io_cq *ioc_lookup_icq(struct request_queue *q) { struct io_context *ioc =3D current->io_context; struct io_cq *icq; @@ -341,7 +341,40 @@ struct io_cq *ioc_lookup_icq(struct request_queue *q) rcu_read_unlock(); return icq; } -EXPORT_SYMBOL(ioc_lookup_icq); + +/** + * ioc_lookup_icq_rcu - lookup io_cq from ioc in io path + * @q: the associated request_queue + * + * Look up io_cq associated with @ioc - @q pair from @ioc. Must be called + * from io path, either return NULL if current issue io to @q for the first + * time, or return a valid icq. + */ +struct io_cq *ioc_lookup_icq_rcu(struct request_queue *q) +{ + struct io_context *ioc =3D current->io_context; + struct io_cq *icq; + + WARN_ON_ONCE(percpu_ref_is_zero(&q->q_usage_counter)); + + if (!ioc) + return NULL; + + icq =3D rcu_dereference(ioc->icq_hint); + if (icq && icq->q =3D=3D q) + return icq; + + icq =3D radix_tree_lookup(&ioc->icq_tree, q->id); + if (!icq) + return NULL; + + if (WARN_ON_ONCE(icq->q !=3D q)) + return NULL; + + rcu_assign_pointer(ioc->icq_hint, icq); + return icq; +} +EXPORT_SYMBOL(ioc_lookup_icq_rcu); =20 /** * ioc_create_icq - create and link io_cq @@ -420,9 +453,9 @@ struct io_cq *ioc_find_get_icq(struct request_queue *q) } else { get_io_context(ioc); =20 - spin_lock_irq(&q->queue_lock); - icq =3D ioc_lookup_icq(q); - spin_unlock_irq(&q->queue_lock); + rcu_read_lock(); + icq =3D ioc_lookup_icq_rcu(q); + rcu_read_unlock(); } =20 if (!icq) { diff --git a/block/blk.h b/block/blk.h index 468aa83c5a22..3c078e517d59 100644 --- a/block/blk.h +++ b/block/blk.h @@ -460,7 +460,7 @@ static inline void req_set_nomerge(struct request_queue= *q, struct request *req) * Internal io_context interface */ struct io_cq *ioc_find_get_icq(struct request_queue *q); -struct io_cq *ioc_lookup_icq(struct request_queue *q); +struct io_cq *ioc_lookup_icq_rcu(struct request_queue *q); #ifdef CONFIG_BLK_ICQ void ioc_clear_queue(struct request_queue *q); #else --=20 2.39.2 From nobody Mon Oct 6 12:02:03 2025 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 897AF223714; Tue, 22 Jul 2025 07:30:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753169460; cv=none; b=hyf6H8+vNU8o2NT6Q2602ZtSi+HXXXWYmfNrxoJ9npsv4WgFADjEqm03mlnz7On1ZBtN8anZa/kmvTKdbFd6f5gP0wlsbmIrSeMxrTrHnw+/dWEdiecfi6P18k0EA9gJhg8MPJxwi/O28yWu7jamBnVRJAtbGLQR0c/Qqd/Jyv4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753169460; c=relaxed/simple; bh=QrC1XTWBD1MtfM+ZqVn0ZINv56qi17OAjYZXjk+K5So=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qKyzVl1RgKr4MSkxcpIa+rOFNCP5UjrUDoQi7XmoGA8hYj7rnXSexh9dG3USioilwyxrBmS5dpf5oEupOilI0skM2hakH3ixQlFBMnntbE2Tng21ce61mx7+jZJzOiNaWZpZV0KA2fCupixPu4LcVzGuhr19cSZIva6bTdFxWgc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4bmTTD4GDHzYQvJY; Tue, 22 Jul 2025 15:30:56 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 514631A1955; Tue, 22 Jul 2025 15:30:55 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgDnYhMrPn9ovjJdBA--.52549S7; Tue, 22 Jul 2025 15:30:55 +0800 (CST) From: Yu Kuai To: dlemoal@kernel.org, hare@suse.de, tj@kernel.org, josef@toxicpanda.com, axboe@kernel.dk, yukuai3@huawei.com Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com Subject: [PATCH 3/6] block, bfq: switch to use elevator lock Date: Tue, 22 Jul 2025 15:24:28 +0800 Message-Id: <20250722072431.610354-4-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250722072431.610354-1-yukuai1@huaweicloud.com> References: <20250722072431.610354-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgDnYhMrPn9ovjJdBA--.52549S7 X-Coremail-Antispam: 1UD129KBjvJXoW3WryrKrWkZw1ktw1rWFWkXrb_yoWfXF45pa 1fKF4ayw48Xr10qF48Gw4qqr42gws5u3srKryfX3yftFWrt3sxX3WFyryFvF4SqFn7Crsx Wr1jg39YyF1UAaUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWU JVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67 kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY 6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42 IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIev Ja73UjIFyTuYvjfUO_MaUUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai Replace the internal spinlock bfqd->lock with the new spinlock in elevator_queue. There are no functional changes. Signed-off-by: Yu Kuai Reviewed-by: Damien Le Moal --- block/bfq-cgroup.c | 4 ++-- block/bfq-iosched.c | 50 ++++++++++++++++++++++----------------------- block/bfq-iosched.h | 2 +- 3 files changed, 28 insertions(+), 28 deletions(-) diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c index 9fb9f3533150..1717bac7eccc 100644 --- a/block/bfq-cgroup.c +++ b/block/bfq-cgroup.c @@ -878,7 +878,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd) unsigned long flags; int i; =20 - spin_lock_irqsave(&bfqd->lock, flags); + spin_lock_irqsave(bfqd->lock, flags); =20 if (!entity) /* root group */ goto put_async_queues; @@ -923,7 +923,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd) put_async_queues: bfq_put_async_queues(bfqd, bfqg); =20 - spin_unlock_irqrestore(&bfqd->lock, flags); + spin_unlock_irqrestore(bfqd->lock, flags); /* * @blkg is going offline and will be ignored by * blkg_[rw]stat_recursive_sum(). Transfer stats to the parent so diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 58d57c482acd..11b81b11242c 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -469,7 +469,7 @@ static struct bfq_io_cq *bfq_bic_lookup(struct request_= queue *q) */ void bfq_schedule_dispatch(struct bfq_data *bfqd) { - lockdep_assert_held(&bfqd->lock); + lockdep_assert_held(bfqd->lock); =20 if (bfqd->queued !=3D 0) { bfq_log(bfqd, "schedule dispatch"); @@ -594,7 +594,7 @@ static bool bfqq_request_over_limit(struct bfq_data *bf= qd, int level; =20 retry: - spin_lock_irq(&bfqd->lock); + spin_lock_irq(bfqd->lock); bfqq =3D bic_to_bfqq(bic, op_is_sync(opf), act_idx); if (!bfqq) goto out; @@ -606,7 +606,7 @@ static bool bfqq_request_over_limit(struct bfq_data *bf= qd, /* +1 for bfqq entity, root cgroup not included */ depth =3D bfqg_to_blkg(bfqq_group(bfqq))->blkcg->css.cgroup->level + 1; if (depth > alloc_depth) { - spin_unlock_irq(&bfqd->lock); + spin_unlock_irq(bfqd->lock); if (entities !=3D inline_entities) kfree(entities); entities =3D kmalloc_array(depth, sizeof(*entities), GFP_NOIO); @@ -664,7 +664,7 @@ static bool bfqq_request_over_limit(struct bfq_data *bf= qd, } } out: - spin_unlock_irq(&bfqd->lock); + spin_unlock_irq(bfqd->lock); if (entities !=3D inline_entities) kfree(entities); return ret; @@ -2458,7 +2458,7 @@ static bool bfq_bio_merge(struct request_queue *q, st= ruct bio *bio, struct request *free =3D NULL; bool ret; =20 - spin_lock_irq(&bfqd->lock); + spin_lock_irq(bfqd->lock); =20 if (bic) { /* @@ -2476,7 +2476,7 @@ static bool bfq_bio_merge(struct request_queue *q, st= ruct bio *bio, =20 ret =3D blk_mq_sched_try_merge(q, bio, nr_segs, &free); =20 - spin_unlock_irq(&bfqd->lock); + spin_unlock_irq(bfqd->lock); if (free) blk_mq_free_request(free); =20 @@ -2651,7 +2651,7 @@ static void bfq_end_wr(struct bfq_data *bfqd) struct bfq_queue *bfqq; int i; =20 - spin_lock_irq(&bfqd->lock); + spin_lock_irq(bfqd->lock); =20 for (i =3D 0; i < bfqd->num_actuators; i++) { list_for_each_entry(bfqq, &bfqd->active_list[i], bfqq_list) @@ -2661,7 +2661,7 @@ static void bfq_end_wr(struct bfq_data *bfqd) bfq_bfqq_end_wr(bfqq); bfq_end_wr_async(bfqd); =20 - spin_unlock_irq(&bfqd->lock); + spin_unlock_irq(bfqd->lock); } =20 static sector_t bfq_io_struct_pos(void *io_struct, bool request) @@ -5307,7 +5307,7 @@ static struct request *bfq_dispatch_request(struct bl= k_mq_hw_ctx *hctx) struct bfq_queue *in_serv_queue; bool waiting_rq, idle_timer_disabled =3D false; =20 - spin_lock_irq(&bfqd->lock); + spin_lock_irq(bfqd->lock); =20 in_serv_queue =3D bfqd->in_service_queue; waiting_rq =3D in_serv_queue && bfq_bfqq_wait_request(in_serv_queue); @@ -5318,7 +5318,7 @@ static struct request *bfq_dispatch_request(struct bl= k_mq_hw_ctx *hctx) waiting_rq && !bfq_bfqq_wait_request(in_serv_queue); } =20 - spin_unlock_irq(&bfqd->lock); + spin_unlock_irq(bfqd->lock); bfq_update_dispatch_stats(hctx->queue, rq, idle_timer_disabled ? in_serv_queue : NULL, idle_timer_disabled); @@ -5496,9 +5496,9 @@ static void bfq_exit_icq(struct io_cq *icq) * this is the last time these queues are accessed. */ if (bfqd) { - spin_lock_irqsave(&bfqd->lock, flags); + spin_lock_irqsave(bfqd->lock, flags); _bfq_exit_icq(bic, bfqd->num_actuators); - spin_unlock_irqrestore(&bfqd->lock, flags); + spin_unlock_irqrestore(bfqd->lock, flags); } else { _bfq_exit_icq(bic, BFQ_MAX_ACTUATORS); } @@ -6254,10 +6254,10 @@ static void bfq_insert_request(struct blk_mq_hw_ctx= *hctx, struct request *rq, if (!cgroup_subsys_on_dfl(io_cgrp_subsys) && rq->bio) bfqg_stats_update_legacy_io(q, rq); #endif - spin_lock_irq(&bfqd->lock); + spin_lock_irq(bfqd->lock); bfqq =3D bfq_init_rq(rq); if (blk_mq_sched_try_insert_merge(q, rq, &free)) { - spin_unlock_irq(&bfqd->lock); + spin_unlock_irq(bfqd->lock); blk_mq_free_requests(&free); return; } @@ -6290,7 +6290,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *= hctx, struct request *rq, * merge). */ cmd_flags =3D rq->cmd_flags; - spin_unlock_irq(&bfqd->lock); + spin_unlock_irq(bfqd->lock); =20 bfq_update_insert_stats(q, bfqq, idle_timer_disabled, cmd_flags); @@ -6671,7 +6671,7 @@ static void bfq_finish_requeue_request(struct request= *rq) rq->io_start_time_ns, rq->cmd_flags); =20 - spin_lock_irqsave(&bfqd->lock, flags); + spin_lock_irqsave(bfqd->lock, flags); if (likely(rq->rq_flags & RQF_STARTED)) { if (rq =3D=3D bfqd->waited_rq) bfq_update_inject_limit(bfqd, bfqq); @@ -6681,7 +6681,7 @@ static void bfq_finish_requeue_request(struct request= *rq) bfqq_request_freed(bfqq); bfq_put_queue(bfqq); RQ_BIC(rq)->requests--; - spin_unlock_irqrestore(&bfqd->lock, flags); + spin_unlock_irqrestore(bfqd->lock, flags); =20 /* * Reset private fields. In case of a requeue, this allows @@ -7012,7 +7012,7 @@ bfq_idle_slice_timer_body(struct bfq_data *bfqd, stru= ct bfq_queue *bfqq) enum bfqq_expiration reason; unsigned long flags; =20 - spin_lock_irqsave(&bfqd->lock, flags); + spin_lock_irqsave(bfqd->lock, flags); =20 /* * Considering that bfqq may be in race, we should firstly check @@ -7022,7 +7022,7 @@ bfq_idle_slice_timer_body(struct bfq_data *bfqd, stru= ct bfq_queue *bfqq) * been cleared in __bfq_bfqd_reset_in_service func. */ if (bfqq !=3D bfqd->in_service_queue) { - spin_unlock_irqrestore(&bfqd->lock, flags); + spin_unlock_irqrestore(bfqd->lock, flags); return; } =20 @@ -7050,7 +7050,7 @@ bfq_idle_slice_timer_body(struct bfq_data *bfqd, stru= ct bfq_queue *bfqq) =20 schedule_dispatch: bfq_schedule_dispatch(bfqd); - spin_unlock_irqrestore(&bfqd->lock, flags); + spin_unlock_irqrestore(bfqd->lock, flags); } =20 /* @@ -7176,10 +7176,10 @@ static void bfq_exit_queue(struct elevator_queue *e) =20 hrtimer_cancel(&bfqd->idle_slice_timer); =20 - spin_lock_irq(&bfqd->lock); + spin_lock_irq(bfqd->lock); list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, bfqq_list) bfq_deactivate_bfqq(bfqd, bfqq, false, false); - spin_unlock_irq(&bfqd->lock); + spin_unlock_irq(bfqd->lock); =20 for (actuator =3D 0; actuator < bfqd->num_actuators; actuator++) WARN_ON_ONCE(bfqd->rq_in_driver[actuator]); @@ -7193,10 +7193,10 @@ static void bfq_exit_queue(struct elevator_queue *e) #ifdef CONFIG_BFQ_GROUP_IOSCHED blkcg_deactivate_policy(bfqd->queue->disk, &blkcg_policy_bfq); #else - spin_lock_irq(&bfqd->lock); + spin_lock_irq(bfqd->lock); bfq_put_async_queues(bfqd, bfqd->root_group); kfree(bfqd->root_group); - spin_unlock_irq(&bfqd->lock); + spin_unlock_irq(bfqd->lock); #endif =20 blk_stat_disable_accounting(bfqd->queue); @@ -7361,7 +7361,7 @@ static int bfq_init_queue(struct request_queue *q, st= ruct elevator_type *e) /* see comments on the definition of next field inside bfq_data */ bfqd->actuator_load_threshold =3D 4; =20 - spin_lock_init(&bfqd->lock); + bfqd->lock =3D &eq->lock; =20 /* * The invocation of the next bfq_create_group_hierarchy diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index 687a3a7ba784..d70eb6529dab 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -795,7 +795,7 @@ struct bfq_data { /* fallback dummy bfqq for extreme OOM conditions */ struct bfq_queue oom_bfqq; =20 - spinlock_t lock; + spinlock_t *lock; =20 /* * bic associated with the task issuing current bio for --=20 2.39.2 From nobody Mon Oct 6 12:02:03 2025 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7789622D7BF; Tue, 22 Jul 2025 07:30:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753169462; cv=none; b=iyiTJXB59QNp4nCt6uLY5Fv3HRihcmuEYPMBBfSeDPwQA5bzRtTj9Ty0vWUJke6DZgLBL9lFPFJ2zFgwuY8HjNTiH02BV+CJheiNllhc/adnGCvpBkkZjFM2Ck102IggOujlwcAkjI6PxCffoKbrITjB3oYgg8QC4HyG7/1A79M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753169462; c=relaxed/simple; bh=LgchtcgwWjC3WfIVegH0QDuQkTDQpcVk8Z4MT76AGA0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Fi4eabvLC60Z/dKdaCxqZZB6jJ90gYv1SgHyNZtOKlqf/Tznx3J/V4WhgAZ7LwOzAIGeYZR7o/w+vBbrLVc9fj2Ju0F+2yLLdXyiA1XLsbQ1umi6OL0+tLy2kjFFhfTlaOToUYaLuu/qjv/d9eMbMtjzSNsbrJKpU3e5qy/3sM0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4bmTTF115CzYQvJp; Tue, 22 Jul 2025 15:30:57 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id D2CC41A1578; Tue, 22 Jul 2025 15:30:55 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgDnYhMrPn9ovjJdBA--.52549S8; Tue, 22 Jul 2025 15:30:55 +0800 (CST) From: Yu Kuai To: dlemoal@kernel.org, hare@suse.de, tj@kernel.org, josef@toxicpanda.com, axboe@kernel.dk, yukuai3@huawei.com Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com Subject: [PATCH 4/6] elevator: factor elevator lock out of dispatch_request method Date: Tue, 22 Jul 2025 15:24:29 +0800 Message-Id: <20250722072431.610354-5-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250722072431.610354-1-yukuai1@huaweicloud.com> References: <20250722072431.610354-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgDnYhMrPn9ovjJdBA--.52549S8 X-Coremail-Antispam: 1UD129KBjvJXoWxXw1UJw47Gr1fGr4rKw48Zwb_yoW5tF45pF 47Ka1ayF48XF1jqF97uanxJw17KwsrZ3srGr9rGr4ftFnrJrs3GFn5Ka4UZFWfZrs3CFsF gr1qq3s5Zw1Ig37anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCw CI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnI WIevJa73UjIFyTuYvjfUOyIUUUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai Currently, both mq-deadline and bfq have global spin lock that will be grabbed inside elevator methods like dispatch_request, insert_requests, and bio_merge. And the global lock is the main reason mq-deadline and bfq can't scale very well. For dispatch_request method, current behavior is dispatching one request at a time. In the case of multiple dispatching contexts, this behavior will cause huge lock contention and messing up the requests dispatching order. And folloiwng patches will support requests batch dispatching to fix thoses problems. While dispatching request, blk_mq_get_disatpch_budget() and blk_mq_get_driver_tag() must be called, and they are not ready to be called inside elevator methods, hence introduce a new method like dispatch_requests is not possible. In conclusion, this patch factor the global lock out of dispatch_request method, and following patches will support request batch dispatch by calling the methods multiple time while holding the lock. Signed-off-by: Yu Kuai --- block/bfq-iosched.c | 3 --- block/blk-mq-sched.c | 6 ++++++ block/mq-deadline.c | 5 +---- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 11b81b11242c..9f8a256e43f2 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -5307,8 +5307,6 @@ static struct request *bfq_dispatch_request(struct bl= k_mq_hw_ctx *hctx) struct bfq_queue *in_serv_queue; bool waiting_rq, idle_timer_disabled =3D false; =20 - spin_lock_irq(bfqd->lock); - in_serv_queue =3D bfqd->in_service_queue; waiting_rq =3D in_serv_queue && bfq_bfqq_wait_request(in_serv_queue); =20 @@ -5318,7 +5316,6 @@ static struct request *bfq_dispatch_request(struct bl= k_mq_hw_ctx *hctx) waiting_rq && !bfq_bfqq_wait_request(in_serv_queue); } =20 - spin_unlock_irq(bfqd->lock); bfq_update_dispatch_stats(hctx->queue, rq, idle_timer_disabled ? in_serv_queue : NULL, idle_timer_disabled); diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 55a0fd105147..82c4f4eef9ed 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -98,6 +98,7 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ct= x *hctx) max_dispatch =3D hctx->queue->nr_requests; =20 do { + bool sq_sched =3D blk_queue_sq_sched(q); struct request *rq; int budget_token; =20 @@ -113,7 +114,12 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw= _ctx *hctx) if (budget_token < 0) break; =20 + if (sq_sched) + spin_lock_irq(&e->lock); rq =3D e->type->ops.dispatch_request(hctx); + if (sq_sched) + spin_unlock_irq(&e->lock); + if (!rq) { blk_mq_put_dispatch_budget(q, budget_token); /* diff --git a/block/mq-deadline.c b/block/mq-deadline.c index e31da6de7764..a008e41bc861 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -466,10 +466,9 @@ static struct request *dd_dispatch_request(struct blk_= mq_hw_ctx *hctx) struct request *rq; enum dd_prio prio; =20 - spin_lock(dd->lock); rq =3D dd_dispatch_prio_aged_requests(dd, now); if (rq) - goto unlock; + return rq; =20 /* * Next, dispatch requests in priority order. Ignore lower priority @@ -481,8 +480,6 @@ static struct request *dd_dispatch_request(struct blk_m= q_hw_ctx *hctx) break; } =20 -unlock: - spin_unlock(dd->lock); return rq; } =20 --=20 2.39.2 From nobody Mon Oct 6 12:02:03 2025 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCEA826ACB; Tue, 22 Jul 2025 07:30:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753169461; cv=none; b=aSMMXRHBKrQsSBHx/ymstU+vA95HtHNKQdpTAHjyToNmfQSaDXD3uiRhJLm9uk5mzn/ous1/NfYvD3GsZoFtYqCrO8ppZz40oYwQeTqMbkCdxM9c5vxOJZ2RNHHt/HcyH1kuJ+b6AV7Grla+HlRGdevmJSsjJq/3KEJr4Hq1QuM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753169461; c=relaxed/simple; bh=mdNZ5xQybbsbLQx2lXa2qR0eQ4Dm+LCKOn4nHhz7mQI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=YGad1yL8M+P6+Xe7dK7cxtI5iXwoUVtjjbwzjuR9mC1vtp5/WjDEsdJtzfWZVWL22VgEAzk+FtBJFLBse8oKyhh/tlLrQZ1iogU4016fGG3eEEDMsudxFsjn1VKZVAvHweNsfzTljV5sj5Ut6On5wjNeCujJBD+NWt8QR/TO24A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4bmTTF4sySzYQvJg; Tue, 22 Jul 2025 15:30:57 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 664221A0194; Tue, 22 Jul 2025 15:30:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgDnYhMrPn9ovjJdBA--.52549S9; Tue, 22 Jul 2025 15:30:56 +0800 (CST) From: Yu Kuai To: dlemoal@kernel.org, hare@suse.de, tj@kernel.org, josef@toxicpanda.com, axboe@kernel.dk, yukuai3@huawei.com Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com Subject: [PATCH 5/6] blk-mq-sched: refactor __blk_mq_do_dispatch_sched() Date: Tue, 22 Jul 2025 15:24:30 +0800 Message-Id: <20250722072431.610354-6-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250722072431.610354-1-yukuai1@huaweicloud.com> References: <20250722072431.610354-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgDnYhMrPn9ovjJdBA--.52549S9 X-Coremail-Antispam: 1UD129KBjvJXoWxuw4UWw1DWrWkCry3uryxGrg_yoW3JFWxpF 4fGF43t395XF4jqF929w43Jw1Sy3yxuasrWryrKr4fJwn0qrs8Jrn5JFyUAFs7JrZ5uFZF 9r4DW3s8AFn2vrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUma14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCw CI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsG vfC2KfnxnUUI43ZEXa7VUbPC7UUUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai Introduce struct sched_dispatch_ctx, and split the helper into elevator_dispatch_one_request() and elevator_finish_dispatch(). Also and comments about the non-error return value. Make code cleaner, and make it easier to add a new branch to dispatch a batch of requests at a time in the next patch. Signed-off-by: Yu Kuai --- block/blk-mq-sched.c | 196 ++++++++++++++++++++++++++----------------- 1 file changed, 119 insertions(+), 77 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 82c4f4eef9ed..f18aecf710ad 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -74,91 +74,100 @@ static bool blk_mq_dispatch_hctx_list(struct list_head= *rq_list) =20 #define BLK_MQ_BUDGET_DELAY 3 /* ms units */ =20 -/* - * Only SCSI implements .get_budget and .put_budget, and SCSI restarts - * its queue by itself in its completion handler, so we don't need to - * restart queue if .get_budget() fails to get the budget. - * - * Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has = to - * be run again. This is necessary to avoid starving flushes. - */ -static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) -{ - struct request_queue *q =3D hctx->queue; - struct elevator_queue *e =3D q->elevator; - bool multi_hctxs =3D false, run_queue =3D false; - bool dispatched =3D false, busy =3D false; - unsigned int max_dispatch; - LIST_HEAD(rq_list); - int count =3D 0; +struct sched_dispatch_ctx { + struct blk_mq_hw_ctx *hctx; + struct elevator_queue *e; + struct request_queue *q; =20 - if (hctx->dispatch_busy) - max_dispatch =3D 1; - else - max_dispatch =3D hctx->queue->nr_requests; + struct list_head rq_list; + int count; =20 - do { - bool sq_sched =3D blk_queue_sq_sched(q); - struct request *rq; - int budget_token; + bool multi_hctxs; + bool run_queue; + bool busy; +}; =20 - if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) - break; +static bool elevator_can_dispatch(struct sched_dispatch_ctx *ctx) +{ + if (ctx->e->type->ops.has_work && + !ctx->e->type->ops.has_work(ctx->hctx)) + return false; =20 - if (!list_empty_careful(&hctx->dispatch)) { - busy =3D true; - break; - } + if (!list_empty_careful(&ctx->hctx->dispatch)) { + ctx->busy =3D true; + return false; + } =20 - budget_token =3D blk_mq_get_dispatch_budget(q); - if (budget_token < 0) - break; + return true; +} =20 - if (sq_sched) - spin_lock_irq(&e->lock); - rq =3D e->type->ops.dispatch_request(hctx); - if (sq_sched) - spin_unlock_irq(&e->lock); +static bool elevator_dispatch_one_request(struct sched_dispatch_ctx *ctx) +{ + bool sq_sched =3D blk_queue_sq_sched(ctx->q); + struct request *rq; + int budget_token; =20 - if (!rq) { - blk_mq_put_dispatch_budget(q, budget_token); - /* - * We're releasing without dispatching. Holding the - * budget could have blocked any "hctx"s with the - * same queue and if we didn't dispatch then there's - * no guarantee anyone will kick the queue. Kick it - * ourselves. - */ - run_queue =3D true; - break; - } + if (!elevator_can_dispatch(ctx)) + return false; =20 - blk_mq_set_rq_budget_token(rq, budget_token); + budget_token =3D blk_mq_get_dispatch_budget(ctx->q); + if (budget_token < 0) + return false; =20 - /* - * Now this rq owns the budget which has to be released - * if this rq won't be queued to driver via .queue_rq() - * in blk_mq_dispatch_rq_list(). - */ - list_add_tail(&rq->queuelist, &rq_list); - count++; - if (rq->mq_hctx !=3D hctx) - multi_hctxs =3D true; + if (sq_sched) + spin_lock_irq(&ctx->e->lock); + rq =3D ctx->e->type->ops.dispatch_request(ctx->hctx); + if (sq_sched) + spin_unlock_irq(&ctx->e->lock); =20 + if (!rq) { + blk_mq_put_dispatch_budget(ctx->q, budget_token); /* - * If we cannot get tag for the request, stop dequeueing - * requests from the IO scheduler. We are unlikely to be able - * to submit them anyway and it creates false impression for - * scheduling heuristics that the device can take more IO. + * We're releasing without dispatching. Holding the + * budget could have blocked any "hctx"s with the + * same queue and if we didn't dispatch then there's + * no guarantee anyone will kick the queue. Kick it + * ourselves. */ - if (!blk_mq_get_driver_tag(rq)) - break; - } while (count < max_dispatch); + ctx->run_queue =3D true; + return false; + } =20 - if (!count) { - if (run_queue) - blk_mq_delay_run_hw_queues(q, BLK_MQ_BUDGET_DELAY); - } else if (multi_hctxs) { + blk_mq_set_rq_budget_token(rq, budget_token); + + /* + * Now this rq owns the budget which has to be released + * if this rq won't be queued to driver via .queue_rq() + * in blk_mq_dispatch_rq_list(). + */ + list_add_tail(&rq->queuelist, &ctx->rq_list); + ctx->count++; + if (rq->mq_hctx !=3D ctx->hctx) + ctx->multi_hctxs =3D true; + + /* + * If we cannot get tag for the request, stop dequeueing + * requests from the IO scheduler. We are unlikely to be able + * to submit them anyway and it creates false impression for + * scheduling heuristics that the device can take more IO. + */ + return blk_mq_get_driver_tag(rq); +} + +/* + * Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has = to + * be run again. This is necessary to avoid starving flushes. + * Return 0 if no request is dispatched. + * Return 1 if at least one request is dispatched. + */ +static int elevator_finish_dispatch(struct sched_dispatch_ctx *ctx) +{ + bool dispatched =3D false; + + if (!ctx->count) { + if (ctx->run_queue) + blk_mq_delay_run_hw_queues(ctx->q, BLK_MQ_BUDGET_DELAY); + } else if (ctx->multi_hctxs) { /* * Requests from different hctx may be dequeued from some * schedulers, such as bfq and deadline. @@ -166,19 +175,52 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_h= w_ctx *hctx) * Sort the requests in the list according to their hctx, * dispatch batching requests from same hctx at a time. */ - list_sort(NULL, &rq_list, sched_rq_cmp); + list_sort(NULL, &ctx->rq_list, sched_rq_cmp); do { - dispatched |=3D blk_mq_dispatch_hctx_list(&rq_list); - } while (!list_empty(&rq_list)); + dispatched |=3D blk_mq_dispatch_hctx_list(&ctx->rq_list); + } while (!list_empty(&ctx->rq_list)); } else { - dispatched =3D blk_mq_dispatch_rq_list(hctx, &rq_list, false); + dispatched =3D blk_mq_dispatch_rq_list(ctx->hctx, &ctx->rq_list, + false); } =20 - if (busy) + if (ctx->busy) return -EAGAIN; + return !!dispatched; } =20 +/* + * Only SCSI implements .get_budget and .put_budget, and SCSI restarts + * its queue by itself in its completion handler, so we don't need to + * restart queue if .get_budget() fails to get the budget. + * + * See elevator_finish_dispatch() for return values. + */ +static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) +{ + unsigned int max_dispatch; + struct sched_dispatch_ctx ctx =3D { + .hctx =3D hctx, + .q =3D hctx->queue, + .e =3D hctx->queue->elevator, + }; + + INIT_LIST_HEAD(&ctx.rq_list); + + if (hctx->dispatch_busy) + max_dispatch =3D 1; + else + max_dispatch =3D hctx->queue->nr_requests; + + do { + if (!elevator_dispatch_one_request(&ctx)) + break; + } while (ctx.count < max_dispatch); + + return elevator_finish_dispatch(&ctx); +} + static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) { unsigned long end =3D jiffies + HZ; --=20 2.39.2 From nobody Mon Oct 6 12:02:03 2025 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4DD412D0C86; Tue, 22 Jul 2025 07:31:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753169462; cv=none; b=COF1Nwi+KS4ZLyYhGca3DvnyEftU7HpKMrRSd5rGPllA8NPQy5HcBDzstbIz2MT1TsWOa73AUyR/krBvuV8n7uQyBDac6pvqYVAVUsiVhm743l9GgtS243lYgYcRjD6rYfTIqkb1ZBHyUIs9CtgCY5Fp75RfLHxtymIAye8ATsc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753169462; c=relaxed/simple; bh=v6i0cid4H5yhWlKbRkygU1z4DHBIu/PeFOLdt8mYb6M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=V58J9CwIXubq07dbbPhrX5hQY6l8XzW7DfQePYnXqb/9KVB0grjayUBJMnsutJwV9r6hCD1PsMzQTtjsbHlhKQT+YVPoGdUX5fWXGqL69Nj76C6Tq0GYmrYRP5QG36OSos5OYm4qtd9rHs0N7gL+sDGy1Qmq3p3CNJrEO2Iky1I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4bmTTG1jc5zKHMw9; Tue, 22 Jul 2025 15:30:58 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id E6D271A1591; Tue, 22 Jul 2025 15:30:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgDnYhMrPn9ovjJdBA--.52549S10; Tue, 22 Jul 2025 15:30:56 +0800 (CST) From: Yu Kuai To: dlemoal@kernel.org, hare@suse.de, tj@kernel.org, josef@toxicpanda.com, axboe@kernel.dk, yukuai3@huawei.com Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com Subject: [PATCH 6/6] blk-mq-sched: support request batch dispatching for sq elevator Date: Tue, 22 Jul 2025 15:24:31 +0800 Message-Id: <20250722072431.610354-7-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250722072431.610354-1-yukuai1@huaweicloud.com> References: <20250722072431.610354-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgDnYhMrPn9ovjJdBA--.52549S10 X-Coremail-Antispam: 1UD129KBjvJXoWxWFWkXFyfCF1kuF15GFW5GFg_yoWrAF1rpF WrJa1FyrWvq3ZFqF9xCw47Jw15Gw4I9r9rWryfKr43JFs7XrsxGr1rJa4UZF4xAr4fCFsr ur4DXF95uF1Iva7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUma14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCw CI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsG vfC2KfnxnUUI43ZEXa7VUbPC7UUUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai For dispatch_request method, current behavior is dispatching one request at a time. In the case of multiple dispatching contexts, This behavior, on the one hand, introduce intense lock contention: t1: t2: t3: lock lock lock // grab lock ops.dispatch_request unlock // grab lock ops.dispatch_request unlock // grab lock ops.dispatch_request unlock on the other hand, messing up the requests dispatching order: t1: lock rq1 =3D ops.dispatch_request unlock t2: lock rq2 =3D ops.dispatch_request unlock lock rq3 =3D ops.dispatch_request unlock lock rq4 =3D ops.dispatch_request unlock //rq1,rq3 issue to disk // rq2, rq4 issue to disk In this case, the elevator dispatch order is rq 1-2-3-4, however, such order in disk is rq 1-3-2-4, the order for rq2 and rq3 is inversed. Fix those problems by introducing elevator_dispatch_requests(), this helper will grab the lock and dispatch a batch of requests while holding the lock. Signed-off-by: Yu Kuai --- block/blk-mq-sched.c | 60 +++++++++++++++++++++++++++++++++++++++++--- block/blk-mq.h | 21 ++++++++++++++++ 2 files changed, 77 insertions(+), 4 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index f18aecf710ad..c4450b73ab25 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -101,6 +101,54 @@ static bool elevator_can_dispatch(struct sched_dispatc= h_ctx *ctx) return true; } =20 +static void elevator_dispatch_requests(struct sched_dispatch_ctx *ctx) +{ + struct request *rq; + bool has_get_budget =3D ctx->q->mq_ops->get_budget !=3D NULL; + int budget_token[BUDGET_TOKEN_BATCH]; + int count =3D ctx->q->nr_requests; + int i; + + while (true) { + if (!elevator_can_dispatch(ctx)) + return; + + if (has_get_budget) { + count =3D blk_mq_get_dispatch_budgets(ctx->q, budget_token); + if (count <=3D 0) + return; + } + + spin_lock_irq(&ctx->e->lock); + for (i =3D 0; i < count; ++i) { + rq =3D ctx->e->type->ops.dispatch_request(ctx->hctx); + if (!rq) { + ctx->run_queue =3D true; + goto err_free_budgets; + } + + if (has_get_budget) + blk_mq_set_rq_budget_token(rq, budget_token[i]); + list_add_tail(&rq->queuelist, &ctx->rq_list); + ctx->count++; + if (rq->mq_hctx !=3D ctx->hctx) + ctx->multi_hctxs =3D true; + + if (!blk_mq_get_driver_tag(rq)) { + i++; + goto err_free_budgets; + } + } + spin_unlock_irq(&ctx->e->lock); + } + +err_free_budgets: + spin_unlock_irq(&ctx->e->lock); + if (has_get_budget) + for (; i < count; ++i) + blk_mq_put_dispatch_budget(ctx->q, budget_token[i]); +} + static bool elevator_dispatch_one_request(struct sched_dispatch_ctx *ctx) { bool sq_sched =3D blk_queue_sq_sched(ctx->q); @@ -213,10 +261,14 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_h= w_ctx *hctx) else max_dispatch =3D hctx->queue->nr_requests; =20 - do { - if (!elevator_dispatch_one_request(&ctx)) - break; - } while (ctx.count < max_dispatch); + if (!hctx->dispatch_busy && blk_queue_sq_sched(ctx.q)) + elevator_dispatch_requests(&ctx); + else { + do { + if (!elevator_dispatch_one_request(&ctx)) + break; + } while (ctx.count < max_dispatch); + } =20 return elevator_finish_dispatch(&ctx); } diff --git a/block/blk-mq.h b/block/blk-mq.h index affb2e14b56e..450c16a07841 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -37,6 +37,7 @@ enum { }; =20 #define BLK_MQ_CPU_WORK_BATCH (8) +#define BUDGET_TOKEN_BATCH (8) =20 typedef unsigned int __bitwise blk_insert_t; #define BLK_MQ_INSERT_AT_HEAD ((__force blk_insert_t)0x01) @@ -262,6 +263,26 @@ static inline int blk_mq_get_dispatch_budget(struct re= quest_queue *q) return 0; } =20 +static inline int blk_mq_get_dispatch_budgets(struct request_queue *q, + int *budget_token) +{ + int count =3D 0; + + while (count < BUDGET_TOKEN_BATCH) { + int token =3D 0; + + if (q->mq_ops->get_budget) + token =3D q->mq_ops->get_budget(q); + + if (token < 0) + return count; + + budget_token[count++] =3D token; + } + + return count; +} + static inline void blk_mq_set_rq_budget_token(struct request *rq, int toke= n) { if (token < 0) --=20 2.39.2