From nobody Wed Apr 15 13:22:52 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A5DF36D9EB; Wed, 4 Mar 2026 07:38:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772609896; cv=none; b=puzQqpdlt/7XWqnau7lKGICXEu099JzhJWuyO3wrPpCh6XqOwRKfiS9WVGONZrAyEtMlRLliHFVT7N9oFY1GIBDbA1vLQ+/cTCfVRh6uNusPHK8MW3CyawuBBsxHZy/FGmR5JW3MBP2SIk5zbqkfM+m/Ge97h9MN2fRlVWDf9Qs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772609896; c=relaxed/simple; bh=rhAh8Qt6BcEMTM98R8MnFkUNtJyn1YPkVEfNxMDnwP4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KF7I2TQlu/JHlRTB4ALf5eaMhU4fJhh+y9jAwHzINMOqE+x2RGmEOQg5OpOsfiExZk/Kwxyy2SDAExwgkmXFrrmZyYONSP1+q5iOliofhBkMXZJlzCxvTCX+oYOkyNK7kNXJ3X1vmqouq54khqwkYw9iFRK/d5jXLWyzpn0uEUU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id F3556C2BC87; Wed, 4 Mar 2026 07:38:13 +0000 (UTC) From: Yu Kuai To: Tejun Heo , Josef Bacik , Jens Axboe Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Zheng Qixing , Ming Lei , Nilay Shroff Subject: [PATCH v3 1/7] blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with blkcg_mutex Date: Wed, 4 Mar 2026 15:38:02 +0800 Message-ID: <20260304073809.3438679-2-yukuai@fnnas.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260304073809.3438679-1-yukuai@fnnas.com> References: <20260304073809.3438679-1-yukuai@fnnas.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" blkg_destroy_all() iterates q->blkg_list without holding blkcg_mutex, which can race with blkg_free_workfn() that removes blkgs from the list while holding blkcg_mutex. Add blkcg_mutex protection around the q->blkg_list iteration to prevent potential list corruption or use-after-free issues. Signed-off-by: Yu Kuai --- block/blk-cgroup.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 3cffb68ba5d8..0bc7b19399b6 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -574,6 +574,7 @@ static void blkg_destroy_all(struct gendisk *disk) int i; =20 restart: + mutex_lock(&q->blkcg_mutex); spin_lock_irq(&q->queue_lock); list_for_each_entry(blkg, &q->blkg_list, q_node) { struct blkcg *blkcg =3D blkg->blkcg; @@ -592,6 +593,7 @@ static void blkg_destroy_all(struct gendisk *disk) if (!(--count)) { count =3D BLKG_DESTROY_BATCH_SIZE; spin_unlock_irq(&q->queue_lock); + mutex_unlock(&q->blkcg_mutex); cond_resched(); goto restart; } @@ -611,6 +613,7 @@ static void blkg_destroy_all(struct gendisk *disk) =20 q->root_blkg =3D NULL; spin_unlock_irq(&q->queue_lock); + mutex_unlock(&q->blkcg_mutex); } =20 static void blkg_iostat_set(struct blkg_iostat *dst, struct blkg_iostat *s= rc) --=20 2.51.0 From nobody Wed Apr 15 13:22:52 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F1A6D36D9EB; Wed, 4 Mar 2026 07:38:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772609899; cv=none; b=Eq7n7op5wjzObuy3soeLEAKcZl/j348GkWVrhuSaVuF+WDnzt/aXGTH18DhoSC/OAKls01eaaQqvZSHatPeqLnKHq61G4ik5ew1vTp5QmLedhT+bk1nJNVGKuYeCSgopWuYRPdCPvamQmTPQA+bZ5mJQBeOcilDvtSY86mOpVw4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772609899; c=relaxed/simple; bh=1NXvgTVosEtMAJVxyd/ZLi5HROrezuItv5ZNVXIfWjg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hdZ1SGbzsk2IwWHevUetQcBIwkW4rpuKUAz3Lg3k/zU6wernlR2zhzoC9bDQVjGxKdRydr1zCNxeyNCj6+sd2PWAcRQ/TsfTkzZbBmi9SR4T4EgCtsyvMrMpg1tULhp2OeE2/4Me5l15BO3+VKEZbwKnUsvR9EbYxi89NRCG49k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id A757FC19423; Wed, 4 Mar 2026 07:38:16 +0000 (UTC) From: Yu Kuai To: Tejun Heo , Josef Bacik , Jens Axboe Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Zheng Qixing , Ming Lei , Nilay Shroff Subject: [PATCH v3 2/7] bfq: protect q->blkg_list iteration in bfq_end_wr_async() with blkcg_mutex Date: Wed, 4 Mar 2026 15:38:03 +0800 Message-ID: <20260304073809.3438679-3-yukuai@fnnas.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260304073809.3438679-1-yukuai@fnnas.com> References: <20260304073809.3438679-1-yukuai@fnnas.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" bfq_end_wr_async() iterates q->blkg_list while only holding bfqd->lock, but not blkcg_mutex. This can race with blkg_free_workfn() that removes blkgs from the list while holding blkcg_mutex. Add blkcg_mutex protection in bfq_end_wr() before taking bfqd->lock to ensure proper synchronization when iterating q->blkg_list. Signed-off-by: Yu Kuai --- block/bfq-cgroup.c | 3 ++- block/bfq-iosched.c | 6 ++++++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c index 6a75fe1c7a5c..839d266a6aa6 100644 --- a/block/bfq-cgroup.c +++ b/block/bfq-cgroup.c @@ -940,7 +940,8 @@ void bfq_end_wr_async(struct bfq_data *bfqd) list_for_each_entry(blkg, &bfqd->queue->blkg_list, q_node) { struct bfq_group *bfqg =3D blkg_to_bfqg(blkg); =20 - bfq_end_wr_async_queues(bfqd, bfqg); + if (bfqg) + bfq_end_wr_async_queues(bfqd, bfqg); } bfq_end_wr_async_queues(bfqd, bfqd->root_group); } diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index b180ce583951..14ca67a616b7 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -2645,6 +2645,9 @@ static void bfq_end_wr(struct bfq_data *bfqd) struct bfq_queue *bfqq; int i; =20 +#ifdef CONFIG_BFQ_GROUP_IOSCHED + mutex_lock(&bfqd->queue->blkcg_mutex); +#endif spin_lock_irq(&bfqd->lock); =20 for (i =3D 0; i < bfqd->num_actuators; i++) { @@ -2656,6 +2659,9 @@ static void bfq_end_wr(struct bfq_data *bfqd) bfq_end_wr_async(bfqd); =20 spin_unlock_irq(&bfqd->lock); +#ifdef CONFIG_BFQ_GROUP_IOSCHED + mutex_unlock(&bfqd->queue->blkcg_mutex); +#endif } =20 static sector_t bfq_io_struct_pos(void *io_struct, bool request) --=20 2.51.0 From nobody Wed Apr 15 13:22:52 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE7FD35CB7A; Wed, 4 Mar 2026 07:38:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772609902; cv=none; b=G0Mn4nVd7Rp1172F93wqzE25c7vHHa+p/+nBBSnHpclsHoKGk0Jmb6OJZCRbbVFJJys3DPhrfRykwIYI2k3H1HQx/2XDmhIZan3oHPuz3wv0dQdX5wzUfU6TNQLJQ/2kEGNIv/29elhwRMT1Oy9QzvZ3B3vgwHGN/ropHnLVdcE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772609902; c=relaxed/simple; bh=mfoop8ZFZvTQ70rW8k+bwTmcnWPolUFqyPFaLre1q6E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dk24XrR5up3c2jPZbSm5NthIk5LlPkfsrUc0mb7VgThUJp+uTFXqMYmO3VMINE/11WeG5ka3JEB+8uOlx3OQI9lbVu/WpNUMWy8igLJI/Yk1TtiIWGMDYyZseQH0f41/2m3vjgMhhulo64afnsa2zVHePcxPOgDeyoGKwq7XZbM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5B2F7C2BCB1; Wed, 4 Mar 2026 07:38:19 +0000 (UTC) From: Yu Kuai To: Tejun Heo , Josef Bacik , Jens Axboe Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Zheng Qixing , Ming Lei , Nilay Shroff Subject: [PATCH v3 3/7] blk-cgroup: fix race between policy activation and blkg destruction Date: Wed, 4 Mar 2026 15:38:04 +0800 Message-ID: <20260304073809.3438679-4-yukuai@fnnas.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260304073809.3438679-1-yukuai@fnnas.com> References: <20260304073809.3438679-1-yukuai@fnnas.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Zheng Qixing When switching an IO scheduler on a block device, blkcg_activate_policy() allocates blkg_policy_data (pd) for all blkgs attached to the queue. However, blkcg_activate_policy() may race with concurrent blkcg deletion, leading to use-after-free and memory leak issues. The use-after-free occurs in the following race: T1 (blkcg_activate_policy): - Successfully allocates pd for blkg1 (loop0->queue, blkcgA) - Fails to allocate pd for blkg2 (loop0->queue, blkcgB) - Enters the enomem rollback path to release blkg1 resources T2 (blkcg deletion): - blkcgA is deleted concurrently - blkg1 is freed via blkg_free_workfn() - blkg1->pd is freed T1 (continued): - Rollback path accesses blkg1->pd->online after pd is freed - Triggers use-after-free In addition, blkg_free_workfn() frees pd before removing the blkg from q->blkg_list. This allows blkcg_activate_policy() to allocate a new pd for a blkg that is being destroyed, leaving the newly allocated pd unreachable when the blkg is finally freed. Fix these races by extending blkcg_mutex coverage to serialize blkcg_activate_policy() rollback and blkg destruction, ensuring pd lifecycle is synchronized with blkg list visibility. Link: https://lore.kernel.org/all/20260108014416.3656493-3-zhengqixing@huaw= eicloud.com/ Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_w= orkfn() and blkcg_deactivate_policy()") Signed-off-by: Zheng Qixing Signed-off-by: Yu Kuai --- block/blk-cgroup.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 0bc7b19399b6..a6ac6ba9430d 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1599,6 +1599,8 @@ int blkcg_activate_policy(struct gendisk *disk, const= struct blkcg_policy *pol) =20 if (queue_is_mq(q)) memflags =3D blk_mq_freeze_queue(q); + + mutex_lock(&q->blkcg_mutex); retry: spin_lock_irq(&q->queue_lock); =20 @@ -1661,6 +1663,7 @@ int blkcg_activate_policy(struct gendisk *disk, const= struct blkcg_policy *pol) =20 spin_unlock_irq(&q->queue_lock); out: + mutex_unlock(&q->blkcg_mutex); if (queue_is_mq(q)) blk_mq_unfreeze_queue(q, memflags); if (pinned_blkg) --=20 2.51.0 From nobody Wed Apr 15 13:22:52 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59A7036DA10; Wed, 4 Mar 2026 07:38:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772609904; cv=none; b=QOqVhC+eNfskRtSOWfDhwEOMUMrSc+0cFXxFPgvHUvXqW10vadbfOawj9g/h/6AkSgt3GSjCA2+GTLx0VHzuuJeMMpSOUxCwjmVO93EU+hhA+4Kvqxk7MLz3YhO5ySx8Fvpn7zPKsMl9pfaVeoonZfrTJ7e9Uzz+VD6cF3vRa78= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772609904; c=relaxed/simple; bh=zRNfVXYgulIZD3seS5ter0NjMXM16qYonEcmIcmnGMA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hbvhPNakCS5JyYZEyOV8UHi1wrWu0bUT0sCLDnFPzcgd+x4x4QSrgk5b5ho42HkKaAkmVdg15ibtHYCFEnVgtcddP2cGYcGWgrIXlc6RmwxzkFBSUVNNXqCDjbOhISaGOLwYMY4TY330XMB9ZUGBFnVKKKwfIa4elKJpv9UYepM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0F70AC19423; Wed, 4 Mar 2026 07:38:21 +0000 (UTC) From: Yu Kuai To: Tejun Heo , Josef Bacik , Jens Axboe Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Zheng Qixing , Ming Lei , Nilay Shroff Subject: [PATCH v3 4/7] blk-cgroup: skip dying blkg in blkcg_activate_policy() Date: Wed, 4 Mar 2026 15:38:05 +0800 Message-ID: <20260304073809.3438679-5-yukuai@fnnas.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260304073809.3438679-1-yukuai@fnnas.com> References: <20260304073809.3438679-1-yukuai@fnnas.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Zheng Qixing When switching IO schedulers on a block device, blkcg_activate_policy() can race with concurrent blkcg deletion, leading to a use-after-free in rcu_accelerate_cbs. T1: T2: blkg_destroy kill(&blkg->refcnt) // blkg->refcnt=3D1->0 blkg_release // call_rcu(__blkg_release) ... blkg_free_workfn ->pd_free_fn(pd) elv_iosched_store elevator_switch ... iterate blkg list blkg_get(blkg) // blkg->refcnt=3D0->1 list_del_init(&blkg->q_node) blkg_put(pinned_blkg) // blkg->refcnt=3D1->0 blkg_release // call_rcu again rcu_accelerate_cbs // uaf Fix this by checking hlist_unhashed(&blkg->blkcg_node) before getting a reference to the blkg. This is the same check used in blkg_destroy() to detect if a blkg has already been destroyed. If the blkg is already unhashed, skip processing it since it's being destroyed. Link: https://lore.kernel.org/all/20260108014416.3656493-4-zhengqixing@huaw= eicloud.com/ Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_w= orkfn() and blkcg_deactivate_policy()") Signed-off-by: Zheng Qixing Signed-off-by: Yu Kuai --- block/blk-cgroup.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index a6ac6ba9430d..f5b14a1d6973 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1610,6 +1610,8 @@ int blkcg_activate_policy(struct gendisk *disk, const= struct blkcg_policy *pol) =20 if (blkg->pd[pol->plid]) continue; + if (hlist_unhashed(&blkg->blkcg_node)) + continue; =20 /* If prealloc matches, use it; otherwise try GFP_NOWAIT */ if (blkg =3D=3D pinned_blkg) { --=20 2.51.0 From nobody Wed Apr 15 13:22:52 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6955636DA10; Wed, 4 Mar 2026 07:38:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772609907; cv=none; b=aMmfTYUMTt0lq4vOu/sNI0EUkAbHwDhodHeDN2/OvUFgGqgy3UB4hUbZuEMl7K9NdpbGenhgFsrWYTnWUFI0WH1WDsy1i1pckgECviqDYW4MfToxYfShnYN3A34JQNTRfV3a3Q4CQ1/68TgRgIdbjHhiY7z3AP601KOnDgPaE1I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772609907; c=relaxed/simple; bh=HHnajLh6kYJlxQ+LpyBqhym9lefYoo/+5GGH5ivhc2U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jzYy/xcHLjdqerBf1QmyhVg4Pn4SzVerij07srPBDGtKaiLuiUj9Bv2y7uGTtiX20F5uNhaxX4CQSCAnOhQ9cPPWLPl0knFcbU27ibufWk3NAfoMp2JXLWqHxJkfFTFcREHtr81vHjyHuIpJ7D18Qz61HrV/2fwhgZCxDQ07EZI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id B839EC2BC87; Wed, 4 Mar 2026 07:38:24 +0000 (UTC) From: Yu Kuai To: Tejun Heo , Josef Bacik , Jens Axboe Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Zheng Qixing , Ming Lei , Nilay Shroff Subject: [PATCH v3 5/7] blk-cgroup: factor policy pd teardown loop into helper Date: Wed, 4 Mar 2026 15:38:06 +0800 Message-ID: <20260304073809.3438679-6-yukuai@fnnas.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260304073809.3438679-1-yukuai@fnnas.com> References: <20260304073809.3438679-1-yukuai@fnnas.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Zheng Qixing Move the teardown sequence which offlines and frees per-policy blkg_policy_data (pd) into a helper for readability. No functional change intended. Signed-off-by: Zheng Qixing Reviewed-by: Christoph Hellwig Reviewed-by: Yu Kuai Signed-off-by: Yu Kuai --- block/blk-cgroup.c | 58 +++++++++++++++++++++------------------------- 1 file changed, 27 insertions(+), 31 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index f5b14a1d6973..0206050f81ea 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1562,6 +1562,31 @@ struct cgroup_subsys io_cgrp_subsys =3D { }; EXPORT_SYMBOL_GPL(io_cgrp_subsys); =20 +/* + * Tear down per-blkg policy data for @pol on @q. + */ +static void blkcg_policy_teardown_pds(struct request_queue *q, + const struct blkcg_policy *pol) +{ + struct blkcg_gq *blkg; + + list_for_each_entry(blkg, &q->blkg_list, q_node) { + struct blkcg *blkcg =3D blkg->blkcg; + struct blkg_policy_data *pd; + + spin_lock(&blkcg->lock); + pd =3D blkg->pd[pol->plid]; + if (pd) { + if (pd->online && pol->pd_offline_fn) + pol->pd_offline_fn(pd); + pd->online =3D false; + pol->pd_free_fn(pd); + blkg->pd[pol->plid] =3D NULL; + } + spin_unlock(&blkcg->lock); + } +} + /** * blkcg_activate_policy - activate a blkcg policy on a gendisk * @disk: gendisk of interest @@ -1677,21 +1702,7 @@ int blkcg_activate_policy(struct gendisk *disk, cons= t struct blkcg_policy *pol) enomem: /* alloc failed, take down everything */ spin_lock_irq(&q->queue_lock); - list_for_each_entry(blkg, &q->blkg_list, q_node) { - struct blkcg *blkcg =3D blkg->blkcg; - struct blkg_policy_data *pd; - - spin_lock(&blkcg->lock); - pd =3D blkg->pd[pol->plid]; - if (pd) { - if (pd->online && pol->pd_offline_fn) - pol->pd_offline_fn(pd); - pd->online =3D false; - pol->pd_free_fn(pd); - blkg->pd[pol->plid] =3D NULL; - } - spin_unlock(&blkcg->lock); - } + blkcg_policy_teardown_pds(q, pol); spin_unlock_irq(&q->queue_lock); ret =3D -ENOMEM; goto out; @@ -1710,7 +1721,6 @@ void blkcg_deactivate_policy(struct gendisk *disk, const struct blkcg_policy *pol) { struct request_queue *q =3D disk->queue; - struct blkcg_gq *blkg; unsigned int memflags; =20 if (!blkcg_policy_enabled(q, pol)) @@ -1721,22 +1731,8 @@ void blkcg_deactivate_policy(struct gendisk *disk, =20 mutex_lock(&q->blkcg_mutex); spin_lock_irq(&q->queue_lock); - __clear_bit(pol->plid, q->blkcg_pols); - - list_for_each_entry(blkg, &q->blkg_list, q_node) { - struct blkcg *blkcg =3D blkg->blkcg; - - spin_lock(&blkcg->lock); - if (blkg->pd[pol->plid]) { - if (blkg->pd[pol->plid]->online && pol->pd_offline_fn) - pol->pd_offline_fn(blkg->pd[pol->plid]); - pol->pd_free_fn(blkg->pd[pol->plid]); - blkg->pd[pol->plid] =3D NULL; - } - spin_unlock(&blkcg->lock); - } - + blkcg_policy_teardown_pds(q, pol); spin_unlock_irq(&q->queue_lock); mutex_unlock(&q->blkcg_mutex); =20 --=20 2.51.0 From nobody Wed Apr 15 13:22:52 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B683E36DA10; Wed, 4 Mar 2026 07:38:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772609909; cv=none; b=j90V6ge3NtrPhbfwUJZK7Mqm7fZSaopXEtHUltP+M4PCwvzwBOuXXFpIhk/gUcvYriw2BvyygQz6Ze69hBCv+SfQXSk/kIeUcvqrzAYuD51xpWEM74SyzXc7s7d50hlYOiEZIPk00eHxmLHeVbk1KYyxCgzS4yd2fYqElMgP/4I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772609909; c=relaxed/simple; bh=1a00pmprAMpOcZecOaCGFQeMRGnRmri34kTsVuS/N9c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Rc4HO4CsMP6YCNyRHZI2XwF4xjycOBX+C/g29Ko/eZaa7Y226wpPFDeXNksKgLtVQZgUxHydqIUgvcSjHiPaL0Rmg1CoRRKsxR2/uFnTvLmHvlcm7RqziyVrXkDY5TB5GJv9a9gapyop0KJK+RmPcHYi5oFxVaPEXmyG7bVz+KA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6CE8FC19425; Wed, 4 Mar 2026 07:38:27 +0000 (UTC) From: Yu Kuai To: Tejun Heo , Josef Bacik , Jens Axboe Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Zheng Qixing , Ming Lei , Nilay Shroff Subject: [PATCH v3 6/7] blk-cgroup: allocate pds before freezing queue in blkcg_activate_policy() Date: Wed, 4 Mar 2026 15:38:07 +0800 Message-ID: <20260304073809.3438679-7-yukuai@fnnas.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260304073809.3438679-1-yukuai@fnnas.com> References: <20260304073809.3438679-1-yukuai@fnnas.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some policies like iocost and iolatency perform percpu allocation in pd_alloc_fn(). Percpu allocation with queue frozen can cause deadlock because percpu memory reclaim may issue IO. Now that q->blkg_list is protected by blkcg_mutex, restructure blkcg_activate_policy() to allocate all pds before freezing the queue: 1. Allocate all pds with GFP_KERNEL before freezing the queue 2. Freeze the queue 3. Initialize and online all pds Note: Future work is to remove all queue freezing before blkcg_activate_policy() to fix the deadlocks thoroughly. Signed-off-by: Yu Kuai --- block/blk-cgroup.c | 95 +++++++++++++++++----------------------------- 1 file changed, 35 insertions(+), 60 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 0206050f81ea..1620be75f124 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1606,8 +1606,7 @@ static void blkcg_policy_teardown_pds(struct request_= queue *q, int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy = *pol) { struct request_queue *q =3D disk->queue; - struct blkg_policy_data *pd_prealloc =3D NULL; - struct blkcg_gq *blkg, *pinned_blkg =3D NULL; + struct blkcg_gq *blkg; unsigned int memflags; int ret; =20 @@ -1622,90 +1621,65 @@ int blkcg_activate_policy(struct gendisk *disk, con= st struct blkcg_policy *pol) if (WARN_ON_ONCE(!pol->pd_alloc_fn || !pol->pd_free_fn)) return -EINVAL; =20 - if (queue_is_mq(q)) - memflags =3D blk_mq_freeze_queue(q); - + /* + * Allocate all pds before freezing queue. Some policies like iocost + * and iolatency do percpu allocation in pd_alloc_fn(), which can + * deadlock with queue frozen because percpu memory reclaim may issue + * IO. blkcg_mutex protects q->blkg_list iteration. + */ mutex_lock(&q->blkcg_mutex); -retry: - spin_lock_irq(&q->queue_lock); - - /* blkg_list is pushed at the head, reverse walk to initialize parents fi= rst */ list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) { struct blkg_policy_data *pd; =20 - if (blkg->pd[pol->plid]) - continue; + /* Skip dying blkg */ if (hlist_unhashed(&blkg->blkcg_node)) continue; =20 - /* If prealloc matches, use it; otherwise try GFP_NOWAIT */ - if (blkg =3D=3D pinned_blkg) { - pd =3D pd_prealloc; - pd_prealloc =3D NULL; - } else { - pd =3D pol->pd_alloc_fn(disk, blkg->blkcg, - GFP_NOWAIT); - } - + pd =3D pol->pd_alloc_fn(disk, blkg->blkcg, GFP_KERNEL); if (!pd) { - /* - * GFP_NOWAIT failed. Free the existing one and - * prealloc for @blkg w/ GFP_KERNEL. - */ - if (pinned_blkg) - blkg_put(pinned_blkg); - blkg_get(blkg); - pinned_blkg =3D blkg; - - spin_unlock_irq(&q->queue_lock); - - if (pd_prealloc) - pol->pd_free_fn(pd_prealloc); - pd_prealloc =3D pol->pd_alloc_fn(disk, blkg->blkcg, - GFP_KERNEL); - if (pd_prealloc) - goto retry; - else - goto enomem; + ret =3D -ENOMEM; + goto err_teardown; } =20 - spin_lock(&blkg->blkcg->lock); - pd->blkg =3D blkg; pd->plid =3D pol->plid; + pd->online =3D false; blkg->pd[pol->plid] =3D pd; + } + + /* Now freeze queue and initialize/online all pds */ + if (queue_is_mq(q)) + memflags =3D blk_mq_freeze_queue(q); =20 + spin_lock_irq(&q->queue_lock); + list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) { + struct blkg_policy_data *pd =3D blkg->pd[pol->plid]; + + /* Skip dying blkg */ + if (hlist_unhashed(&blkg->blkcg_node)) + continue; + + spin_lock(&blkg->blkcg->lock); if (pol->pd_init_fn) pol->pd_init_fn(pd); - if (pol->pd_online_fn) pol->pd_online_fn(pd); pd->online =3D true; - spin_unlock(&blkg->blkcg->lock); } =20 __set_bit(pol->plid, q->blkcg_pols); - ret =3D 0; - spin_unlock_irq(&q->queue_lock); -out: - mutex_unlock(&q->blkcg_mutex); + if (queue_is_mq(q)) blk_mq_unfreeze_queue(q, memflags); - if (pinned_blkg) - blkg_put(pinned_blkg); - if (pd_prealloc) - pol->pd_free_fn(pd_prealloc); - return ret; + mutex_unlock(&q->blkcg_mutex); + return 0; =20 -enomem: - /* alloc failed, take down everything */ - spin_lock_irq(&q->queue_lock); +err_teardown: blkcg_policy_teardown_pds(q, pol); - spin_unlock_irq(&q->queue_lock); - ret =3D -ENOMEM; - goto out; + mutex_unlock(&q->blkcg_mutex); + return ret; } EXPORT_SYMBOL_GPL(blkcg_activate_policy); =20 @@ -1726,18 +1700,19 @@ void blkcg_deactivate_policy(struct gendisk *disk, if (!blkcg_policy_enabled(q, pol)) return; =20 + /* Same locking order as blkcg_activate_policy(): mutex -> freeze */ + mutex_lock(&q->blkcg_mutex); if (queue_is_mq(q)) memflags =3D blk_mq_freeze_queue(q); =20 - mutex_lock(&q->blkcg_mutex); spin_lock_irq(&q->queue_lock); __clear_bit(pol->plid, q->blkcg_pols); blkcg_policy_teardown_pds(q, pol); spin_unlock_irq(&q->queue_lock); - mutex_unlock(&q->blkcg_mutex); =20 if (queue_is_mq(q)) blk_mq_unfreeze_queue(q, memflags); + mutex_unlock(&q->blkcg_mutex); } EXPORT_SYMBOL_GPL(blkcg_deactivate_policy); =20 --=20 2.51.0 From nobody Wed Apr 15 13:22:52 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A481036F438; Wed, 4 Mar 2026 07:38:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772609912; cv=none; b=pM68EDtKSjAGxOwcRE+ASfI4FW9ZTtEXfob8AJYrvFvw7DFiwNGkI2KNrsEGtJJI2xXpReR/qferHKhmgOFZ/bxfGtjVExh763Xg7zyNscL7qY+yv0OUocBpHpziZoNS3AUtX9/Z2GjACe/lOqu9XN31bm19DY6dErroeJR09gs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772609912; c=relaxed/simple; bh=koV0sJD4Z1/UqnRiMVYBYj17B8YK+Mf46MVKcY/HxNk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IuUjKg9bIQHzeMOJSXXU0sAeujqdYgT+xFvjhx2MhwTrMR/DU2f9m7YwFQSVyUTE20ZVvexxsPQKu4xjpB942Qri3RnWVHEv1pOzjWJxUHYX7sZzcrWgAy9yG5hDN+OXmOS0gUmGCI9Oew3o1WN0QSrlQXe/a+2oXnNxHhXEtUA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 21DF7C19423; Wed, 4 Mar 2026 07:38:29 +0000 (UTC) From: Yu Kuai To: Tejun Heo , Josef Bacik , Jens Axboe Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Zheng Qixing , Ming Lei , Nilay Shroff Subject: [PATCH v3 7/7] blk-rq-qos: move rq_qos_mutex acquisition inside rq_qos_add/del Date: Wed, 4 Mar 2026 15:38:08 +0800 Message-ID: <20260304073809.3438679-8-yukuai@fnnas.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260304073809.3438679-1-yukuai@fnnas.com> References: <20260304073809.3438679-1-yukuai@fnnas.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The current rq_qos_mutex handling has an awkward pattern where callers must acquire the mutex before calling rq_qos_add()/rq_qos_del(), and blkg_conf_open_bdev_frozen() had to release and re-acquire the mutex around queue freezing to maintain proper locking order (freeze queue before mutex). On the other hand, with rq_qos_mutex held after blkg_conf_prep(), there are many possible deadlocks: - allocating memory with GFP_KERNEL, like blk_throtl_init(); - allocating percpu memory, like pd_alloc_fn() for iocost/iolatency; This patch refactors the locking by: 1. Moving queue freeze and rq_qos_mutex acquisition inside rq_qos_add()/rq_qos_del(), with the correct order: freeze first, then acquire mutex. 2. Removing external mutex handling from wbt_init() since rq_qos_add() now handles it internally. 3. Removing rq_qos_mutex handling from blkg_conf_open_bdev() entirely, making it only responsible for parsing MAJ:MIN and opening the bdev. 4. Removing blkg_conf_open_bdev_frozen() and blkg_conf_exit_frozen() functions which are no longer needed. 5. Updating ioc_qos_write() to use the simpler blkg_conf_open_bdev() and blkg_conf_exit() functions. This eliminates the release-and-reacquire pattern and makes rq_qos_add()/rq_qos_del() self-contained, which is cleaner and reduces complexity. Each function now properly manages its own locking with the correct order: queue freeze =E2=86=92 mutex acquire =E2=86=92 modify = =E2=86=92 mutex release =E2=86=92 queue unfreeze. Signed-off-by: Yu Kuai --- block/blk-cgroup.c | 50 ------------------------------------------- block/blk-cgroup.h | 2 -- block/blk-iocost.c | 11 ++++------ block/blk-iolatency.c | 5 ----- block/blk-rq-qos.c | 31 ++++++++++++++++----------- block/blk-wbt.c | 2 -- 6 files changed, 22 insertions(+), 79 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 1620be75f124..02ef8f60f759 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -802,10 +802,8 @@ int blkg_conf_open_bdev(struct blkg_conf_ctx *ctx) return -ENODEV; } =20 - mutex_lock(&bdev->bd_queue->rq_qos_mutex); if (!disk_live(bdev->bd_disk)) { blkdev_put_no_open(bdev); - mutex_unlock(&bdev->bd_queue->rq_qos_mutex); return -ENODEV; } =20 @@ -813,38 +811,6 @@ int blkg_conf_open_bdev(struct blkg_conf_ctx *ctx) ctx->bdev =3D bdev; return 0; } -/* - * Similar to blkg_conf_open_bdev, but additionally freezes the queue, - * ensures the correct locking order between freeze queue and q->rq_qos_mu= tex. - * - * This function returns negative error on failure. On success it returns - * memflags which must be saved and later passed to blkg_conf_exit_frozen - * for restoring the memalloc scope. - */ -unsigned long __must_check blkg_conf_open_bdev_frozen(struct blkg_conf_ctx= *ctx) -{ - int ret; - unsigned long memflags; - - if (ctx->bdev) - return -EINVAL; - - ret =3D blkg_conf_open_bdev(ctx); - if (ret < 0) - return ret; - /* - * At this point, we haven=E2=80=99t started protecting anything related = to QoS, - * so we release q->rq_qos_mutex here, which was first acquired in blkg_ - * conf_open_bdev. Later, we re-acquire q->rq_qos_mutex after freezing - * the queue to maintain the correct locking order. - */ - mutex_unlock(&ctx->bdev->bd_queue->rq_qos_mutex); - - memflags =3D blk_mq_freeze_queue(ctx->bdev->bd_queue); - mutex_lock(&ctx->bdev->bd_queue->rq_qos_mutex); - - return memflags; -} =20 /** * blkg_conf_prep - parse and prepare for per-blkg config update @@ -978,7 +944,6 @@ EXPORT_SYMBOL_GPL(blkg_conf_prep); */ void blkg_conf_exit(struct blkg_conf_ctx *ctx) __releases(&ctx->bdev->bd_queue->queue_lock) - __releases(&ctx->bdev->bd_queue->rq_qos_mutex) { if (ctx->blkg) { spin_unlock_irq(&bdev_get_queue(ctx->bdev)->queue_lock); @@ -986,7 +951,6 @@ void blkg_conf_exit(struct blkg_conf_ctx *ctx) } =20 if (ctx->bdev) { - mutex_unlock(&ctx->bdev->bd_queue->rq_qos_mutex); blkdev_put_no_open(ctx->bdev); ctx->body =3D NULL; ctx->bdev =3D NULL; @@ -994,20 +958,6 @@ void blkg_conf_exit(struct blkg_conf_ctx *ctx) } EXPORT_SYMBOL_GPL(blkg_conf_exit); =20 -/* - * Similar to blkg_conf_exit, but also unfreezes the queue. Should be used - * when blkg_conf_open_bdev_frozen is used to open the bdev. - */ -void blkg_conf_exit_frozen(struct blkg_conf_ctx *ctx, unsigned long memfla= gs) -{ - if (ctx->bdev) { - struct request_queue *q =3D ctx->bdev->bd_queue; - - blkg_conf_exit(ctx); - blk_mq_unfreeze_queue(q, memflags); - } -} - static void blkg_iostat_add(struct blkg_iostat *dst, struct blkg_iostat *s= rc) { int i; diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index 1cce3294634d..d4e7f78ba545 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -219,11 +219,9 @@ struct blkg_conf_ctx { =20 void blkg_conf_init(struct blkg_conf_ctx *ctx, char *input); int blkg_conf_open_bdev(struct blkg_conf_ctx *ctx); -unsigned long blkg_conf_open_bdev_frozen(struct blkg_conf_ctx *ctx); int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol, struct blkg_conf_ctx *ctx); void blkg_conf_exit(struct blkg_conf_ctx *ctx); -void blkg_conf_exit_frozen(struct blkg_conf_ctx *ctx, unsigned long memfla= gs); =20 /** * bio_issue_as_root_blkg - see if this bio needs to be issued as root blkg diff --git a/block/blk-iocost.c b/block/blk-iocost.c index ef543d163d46..104a9a9f563f 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -3220,16 +3220,13 @@ static ssize_t ioc_qos_write(struct kernfs_open_fil= e *of, char *input, u32 qos[NR_QOS_PARAMS]; bool enable, user; char *body, *p; - unsigned long memflags; int ret; =20 blkg_conf_init(&ctx, input); =20 - memflags =3D blkg_conf_open_bdev_frozen(&ctx); - if (IS_ERR_VALUE(memflags)) { - ret =3D memflags; + ret =3D blkg_conf_open_bdev(&ctx); + if (ret) goto err; - } =20 body =3D ctx.body; disk =3D ctx.bdev->bd_disk; @@ -3346,14 +3343,14 @@ static ssize_t ioc_qos_write(struct kernfs_open_fil= e *of, char *input, =20 blk_mq_unquiesce_queue(disk->queue); =20 - blkg_conf_exit_frozen(&ctx, memflags); + blkg_conf_exit(&ctx); return nbytes; einval: spin_unlock_irq(&ioc->lock); blk_mq_unquiesce_queue(disk->queue); ret =3D -EINVAL; err: - blkg_conf_exit_frozen(&ctx, memflags); + blkg_conf_exit(&ctx); return ret; } =20 diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c index f7434278cd29..3f454fb3ff51 100644 --- a/block/blk-iolatency.c +++ b/block/blk-iolatency.c @@ -842,11 +842,6 @@ static ssize_t iolatency_set_limit(struct kernfs_open_= file *of, char *buf, if (ret) goto out; =20 - /* - * blk_iolatency_init() may fail after rq_qos_add() succeeds which can - * confuse iolat_rq_qos() test. Make the test and init atomic. - */ - lockdep_assert_held(&ctx.bdev->bd_queue->rq_qos_mutex); if (!iolat_rq_qos(ctx.bdev->bd_queue)) ret =3D blk_iolatency_init(ctx.bdev->bd_disk); if (ret) diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c index 85cf74402a09..fe96183bcc75 100644 --- a/block/blk-rq-qos.c +++ b/block/blk-rq-qos.c @@ -327,8 +327,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *dis= k, enum rq_qos_id id, { struct request_queue *q =3D disk->queue; unsigned int memflags; - - lockdep_assert_held(&q->rq_qos_mutex); + int ret =3D 0; =20 rqos->disk =3D disk; rqos->id =3D id; @@ -337,20 +336,24 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *d= isk, enum rq_qos_id id, /* * No IO can be in-flight when adding rqos, so freeze queue, which * is fine since we only support rq_qos for blk-mq queue. + * + * Acquire rq_qos_mutex after freezing the queue to ensure proper + * locking order. */ memflags =3D blk_mq_freeze_queue(q); + mutex_lock(&q->rq_qos_mutex); =20 - if (rq_qos_id(q, rqos->id)) - goto ebusy; - rqos->next =3D q->rq_qos; - q->rq_qos =3D rqos; - blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q); + if (rq_qos_id(q, rqos->id)) { + ret =3D -EBUSY; + } else { + rqos->next =3D q->rq_qos; + q->rq_qos =3D rqos; + blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q); + } =20 + mutex_unlock(&q->rq_qos_mutex); blk_mq_unfreeze_queue(q, memflags); - return 0; -ebusy: - blk_mq_unfreeze_queue(q, memflags); - return -EBUSY; + return ret; } =20 void rq_qos_del(struct rq_qos *rqos) @@ -359,9 +362,9 @@ void rq_qos_del(struct rq_qos *rqos) struct rq_qos **cur; unsigned int memflags; =20 - lockdep_assert_held(&q->rq_qos_mutex); - memflags =3D blk_mq_freeze_queue(q); + mutex_lock(&q->rq_qos_mutex); + for (cur =3D &q->rq_qos; *cur; cur =3D &(*cur)->next) { if (*cur =3D=3D rqos) { *cur =3D rqos->next; @@ -370,5 +373,7 @@ void rq_qos_del(struct rq_qos *rqos) } if (!q->rq_qos) blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q); + + mutex_unlock(&q->rq_qos_mutex); blk_mq_unfreeze_queue(q, memflags); } diff --git a/block/blk-wbt.c b/block/blk-wbt.c index 6dba71e87387..dde03b9ea074 100644 --- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -961,9 +961,7 @@ static int wbt_init(struct gendisk *disk, struct rq_wb = *rwb) /* * Assign rwb and add the stats callback. */ - mutex_lock(&q->rq_qos_mutex); ret =3D rq_qos_add(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops); - mutex_unlock(&q->rq_qos_mutex); if (ret) return ret; =20 --=20 2.51.0