block/blk-throttle.c | 68 ++++++++++++++++++++++++++++---------------- 1 file changed, 44 insertions(+), 24 deletions(-)
From: Li Lingfeng <lilingfeng3@huawei.com>
When a process migrates to another cgroup and the original cgroup is deleted,
the restrictions of throttled bios cannot be removed. If the restrictions
are set too low, it will take a long time to complete these bios.
Refer to the process of deleting a disk to remove the restrictions and
issue bios when deleting the cgroup.
This makes difference on the behavior of throttled bios:
Before: the limit of the throttled bios can't be changed and the bios will
complete under this limit;
Now: the limit will be canceled and the throttled bios will be flushed
immediately.
References:
[1] https://lore.kernel.org/r/20220318130144.1066064-4-ming.lei@redhat.com
[2] https://lore.kernel.org/all/da861d63-58c6-3ca0-2535-9089993e9e28@huaweicloud.com/
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
---
v2->v3:
Change "tg_cancel_bios" to "tg_flush_bios";
Add reference of v2 to describe the background.
block/blk-throttle.c | 68 ++++++++++++++++++++++++++++----------------
1 file changed, 44 insertions(+), 24 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 6943ec720f39..cf7f4912c57a 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1526,6 +1526,42 @@ static void throtl_shutdown_wq(struct request_queue *q)
cancel_work_sync(&td->dispatch_work);
}
+static void tg_flush_bios(struct throtl_grp *tg)
+{
+ struct throtl_service_queue *sq = &tg->service_queue;
+
+ if (tg->flags & THROTL_TG_CANCELING)
+ return;
+ /*
+ * Set the flag to make sure throtl_pending_timer_fn() won't
+ * stop until all throttled bios are dispatched.
+ */
+ tg->flags |= THROTL_TG_CANCELING;
+
+ /*
+ * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup
+ * will be inserted to service queue without THROTL_TG_PENDING
+ * set in tg_update_disptime below. Then IO dispatched from
+ * child in tg_dispatch_one_bio will trigger double insertion
+ * and corrupt the tree.
+ */
+ if (!(tg->flags & THROTL_TG_PENDING))
+ return;
+
+ /*
+ * Update disptime after setting the above flag to make sure
+ * throtl_select_dispatch() won't exit without dispatching.
+ */
+ tg_update_disptime(tg);
+
+ throtl_schedule_pending_timer(sq, jiffies + 1);
+}
+
+static void throtl_pd_offline(struct blkg_policy_data *pd)
+{
+ tg_flush_bios(pd_to_tg(pd));
+}
+
struct blkcg_policy blkcg_policy_throtl = {
.dfl_cftypes = throtl_files,
.legacy_cftypes = throtl_legacy_files,
@@ -1533,6 +1569,7 @@ struct blkcg_policy blkcg_policy_throtl = {
.pd_alloc_fn = throtl_pd_alloc,
.pd_init_fn = throtl_pd_init,
.pd_online_fn = throtl_pd_online,
+ .pd_offline_fn = throtl_pd_offline,
.pd_free_fn = throtl_pd_free,
};
@@ -1553,32 +1590,15 @@ void blk_throtl_cancel_bios(struct gendisk *disk)
*/
rcu_read_lock();
blkg_for_each_descendant_post(blkg, pos_css, q->root_blkg) {
- struct throtl_grp *tg = blkg_to_tg(blkg);
- struct throtl_service_queue *sq = &tg->service_queue;
-
- /*
- * Set the flag to make sure throtl_pending_timer_fn() won't
- * stop until all throttled bios are dispatched.
- */
- tg->flags |= THROTL_TG_CANCELING;
-
/*
- * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup
- * will be inserted to service queue without THROTL_TG_PENDING
- * set in tg_update_disptime below. Then IO dispatched from
- * child in tg_dispatch_one_bio will trigger double insertion
- * and corrupt the tree.
+ * disk_release will call pd_offline_fn to cancel bios.
+ * However, disk_release can't be called if someone get
+ * the refcount of device and issued bios which are
+ * inflight after del_gendisk.
+ * Cancel bios here to ensure no bios are inflight after
+ * del_gendisk.
*/
- if (!(tg->flags & THROTL_TG_PENDING))
- continue;
-
- /*
- * Update disptime after setting the above flag to make sure
- * throtl_select_dispatch() won't exit without dispatching.
- */
- tg_update_disptime(tg);
-
- throtl_schedule_pending_timer(sq, jiffies + 1);
+ tg_flush_bios(blkg_to_tg(blkg));
}
rcu_read_unlock();
spin_unlock_irq(&q->queue_lock);
--
2.39.2
On Sat, 17 Aug 2024 15:11:08 +0800, Li Lingfeng wrote:
> When a process migrates to another cgroup and the original cgroup is deleted,
> the restrictions of throttled bios cannot be removed. If the restrictions
> are set too low, it will take a long time to complete these bios.
>
> Refer to the process of deleting a disk to remove the restrictions and
> issue bios when deleting the cgroup.
>
> [...]
Applied, thanks!
[1/1] block: flush all throttled bios when deleting the cgroup
(no commit info)
Best regards,
--
Jens Axboe
Friendly ping ...
Thanks
在 2024/8/17 15:11, Li Lingfeng 写道:
> From: Li Lingfeng <lilingfeng3@huawei.com>
>
> When a process migrates to another cgroup and the original cgroup is deleted,
> the restrictions of throttled bios cannot be removed. If the restrictions
> are set too low, it will take a long time to complete these bios.
>
> Refer to the process of deleting a disk to remove the restrictions and
> issue bios when deleting the cgroup.
>
> This makes difference on the behavior of throttled bios:
> Before: the limit of the throttled bios can't be changed and the bios will
> complete under this limit;
> Now: the limit will be canceled and the throttled bios will be flushed
> immediately.
>
> References:
> [1] https://lore.kernel.org/r/20220318130144.1066064-4-ming.lei@redhat.com
> [2] https://lore.kernel.org/all/da861d63-58c6-3ca0-2535-9089993e9e28@huaweicloud.com/
>
> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
> ---
> v2->v3:
> Change "tg_cancel_bios" to "tg_flush_bios";
> Add reference of v2 to describe the background.
> block/blk-throttle.c | 68 ++++++++++++++++++++++++++++----------------
> 1 file changed, 44 insertions(+), 24 deletions(-)
>
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 6943ec720f39..cf7f4912c57a 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -1526,6 +1526,42 @@ static void throtl_shutdown_wq(struct request_queue *q)
> cancel_work_sync(&td->dispatch_work);
> }
>
> +static void tg_flush_bios(struct throtl_grp *tg)
> +{
> + struct throtl_service_queue *sq = &tg->service_queue;
> +
> + if (tg->flags & THROTL_TG_CANCELING)
> + return;
> + /*
> + * Set the flag to make sure throtl_pending_timer_fn() won't
> + * stop until all throttled bios are dispatched.
> + */
> + tg->flags |= THROTL_TG_CANCELING;
> +
> + /*
> + * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup
> + * will be inserted to service queue without THROTL_TG_PENDING
> + * set in tg_update_disptime below. Then IO dispatched from
> + * child in tg_dispatch_one_bio will trigger double insertion
> + * and corrupt the tree.
> + */
> + if (!(tg->flags & THROTL_TG_PENDING))
> + return;
> +
> + /*
> + * Update disptime after setting the above flag to make sure
> + * throtl_select_dispatch() won't exit without dispatching.
> + */
> + tg_update_disptime(tg);
> +
> + throtl_schedule_pending_timer(sq, jiffies + 1);
> +}
> +
> +static void throtl_pd_offline(struct blkg_policy_data *pd)
> +{
> + tg_flush_bios(pd_to_tg(pd));
> +}
> +
> struct blkcg_policy blkcg_policy_throtl = {
> .dfl_cftypes = throtl_files,
> .legacy_cftypes = throtl_legacy_files,
> @@ -1533,6 +1569,7 @@ struct blkcg_policy blkcg_policy_throtl = {
> .pd_alloc_fn = throtl_pd_alloc,
> .pd_init_fn = throtl_pd_init,
> .pd_online_fn = throtl_pd_online,
> + .pd_offline_fn = throtl_pd_offline,
> .pd_free_fn = throtl_pd_free,
> };
>
> @@ -1553,32 +1590,15 @@ void blk_throtl_cancel_bios(struct gendisk *disk)
> */
> rcu_read_lock();
> blkg_for_each_descendant_post(blkg, pos_css, q->root_blkg) {
> - struct throtl_grp *tg = blkg_to_tg(blkg);
> - struct throtl_service_queue *sq = &tg->service_queue;
> -
> - /*
> - * Set the flag to make sure throtl_pending_timer_fn() won't
> - * stop until all throttled bios are dispatched.
> - */
> - tg->flags |= THROTL_TG_CANCELING;
> -
> /*
> - * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup
> - * will be inserted to service queue without THROTL_TG_PENDING
> - * set in tg_update_disptime below. Then IO dispatched from
> - * child in tg_dispatch_one_bio will trigger double insertion
> - * and corrupt the tree.
> + * disk_release will call pd_offline_fn to cancel bios.
> + * However, disk_release can't be called if someone get
> + * the refcount of device and issued bios which are
> + * inflight after del_gendisk.
> + * Cancel bios here to ensure no bios are inflight after
> + * del_gendisk.
> */
> - if (!(tg->flags & THROTL_TG_PENDING))
> - continue;
> -
> - /*
> - * Update disptime after setting the above flag to make sure
> - * throtl_select_dispatch() won't exit without dispatching.
> - */
> - tg_update_disptime(tg);
> -
> - throtl_schedule_pending_timer(sq, jiffies + 1);
> + tg_flush_bios(blkg_to_tg(blkg));
> }
> rcu_read_unlock();
> spin_unlock_irq(&q->queue_lock);
On Sat, Aug 17, 2024 at 03:11:08PM +0800, Li Lingfeng wrote: > From: Li Lingfeng <lilingfeng3@huawei.com> > > When a process migrates to another cgroup and the original cgroup is deleted, > the restrictions of throttled bios cannot be removed. If the restrictions > are set too low, it will take a long time to complete these bios. > > Refer to the process of deleting a disk to remove the restrictions and > issue bios when deleting the cgroup. > > This makes difference on the behavior of throttled bios: > Before: the limit of the throttled bios can't be changed and the bios will > complete under this limit; > Now: the limit will be canceled and the throttled bios will be flushed > immediately. > > References: > [1] https://lore.kernel.org/r/20220318130144.1066064-4-ming.lei@redhat.com > [2] https://lore.kernel.org/all/da861d63-58c6-3ca0-2535-9089993e9e28@huaweicloud.com/ > > Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Thanks. -- tejun
Hello, On Sat, Aug 17, 2024 at 03:11:08PM +0800, Li Lingfeng wrote: > From: Li Lingfeng <lilingfeng3@huawei.com> > > When a process migrates to another cgroup and the original cgroup is deleted, > the restrictions of throttled bios cannot be removed. If the restrictions > are set too low, it will take a long time to complete these bios. > > Refer to the process of deleting a disk to remove the restrictions and > issue bios when deleting the cgroup. > > This makes difference on the behavior of throttled bios: > Before: the limit of the throttled bios can't be changed and the bios will > complete under this limit; > Now: the limit will be canceled and the throttled bios will be flushed > immediately. I still don't see why this behavior is better. Wouldn't this make it easy to escape IO limits by creating cgroups, doing a bunch of IOs and then deleting them? Thanks. -- tejun
On Mon, Aug 19, 2024 at 11:24:18AM GMT, Tejun Heo <tj@kernel.org> wrote: > I still don't see why this behavior is better. Wouldn't this make it easy to > escape IO limits by creating cgroups, doing a bunch of IOs and then deleting > them? IIUC, bios are flushed to parent throttl group, so if there's an ancestral limit, it should be honored. (I find this similar to memcg reparenting.) Mere create + set limit + delete falls under the same delegation scope, so if that limit is bypassed, it is only self-shooting in the leg. Shortening the lifetime of offlined structures is benefitial, no? Michal
在 2024/8/20 5:24, Tejun Heo 写道: > Hello, > > On Sat, Aug 17, 2024 at 03:11:08PM +0800, Li Lingfeng wrote: >> From: Li Lingfeng <lilingfeng3@huawei.com> >> >> When a process migrates to another cgroup and the original cgroup is deleted, >> the restrictions of throttled bios cannot be removed. If the restrictions >> are set too low, it will take a long time to complete these bios. >> >> Refer to the process of deleting a disk to remove the restrictions and >> issue bios when deleting the cgroup. >> >> This makes difference on the behavior of throttled bios: >> Before: the limit of the throttled bios can't be changed and the bios will >> complete under this limit; >> Now: the limit will be canceled and the throttled bios will be flushed >> immediately. > I still don't see why this behavior is better. Wouldn't this make it easy to > escape IO limits by creating cgroups, doing a bunch of IOs and then deleting > them? > > Thanks. Yes, this actually would make it easy to escape IO limits. As described by Yu Kuai in v2, I changed this to prevent IO hang. And I think it may be more appropriate to remove the limits in this scenario since the limits were set by cgroup and the cgroup has been deleted. Thanks.
© 2016 - 2025 Red Hat, Inc.