[PATCH v3] block: flush all throttled bios when deleting the cgroup

Li Lingfeng posted 1 patch 1 year, 4 months ago
block/blk-throttle.c | 68 ++++++++++++++++++++++++++++----------------
1 file changed, 44 insertions(+), 24 deletions(-)
[PATCH v3] block: flush all throttled bios when deleting the cgroup
Posted by Li Lingfeng 1 year, 4 months ago
From: Li Lingfeng <lilingfeng3@huawei.com>

When a process migrates to another cgroup and the original cgroup is deleted,
the restrictions of throttled bios cannot be removed. If the restrictions
are set too low, it will take a long time to complete these bios.

Refer to the process of deleting a disk to remove the restrictions and
issue bios when deleting the cgroup.

This makes difference on the behavior of throttled bios:
Before: the limit of the throttled bios can't be changed and the bios will
complete under this limit;
Now: the limit will be canceled and the throttled bios will be flushed
immediately.

References:
[1] https://lore.kernel.org/r/20220318130144.1066064-4-ming.lei@redhat.com
[2] https://lore.kernel.org/all/da861d63-58c6-3ca0-2535-9089993e9e28@huaweicloud.com/

Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
---
  v2->v3:
    Change "tg_cancel_bios" to "tg_flush_bios";
    Add reference of v2 to describe the background.
 block/blk-throttle.c | 68 ++++++++++++++++++++++++++++----------------
 1 file changed, 44 insertions(+), 24 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 6943ec720f39..cf7f4912c57a 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1526,6 +1526,42 @@ static void throtl_shutdown_wq(struct request_queue *q)
 	cancel_work_sync(&td->dispatch_work);
 }
 
+static void tg_flush_bios(struct throtl_grp *tg)
+{
+	struct throtl_service_queue *sq = &tg->service_queue;
+
+	if (tg->flags & THROTL_TG_CANCELING)
+		return;
+	/*
+	 * Set the flag to make sure throtl_pending_timer_fn() won't
+	 * stop until all throttled bios are dispatched.
+	 */
+	tg->flags |= THROTL_TG_CANCELING;
+
+	/*
+	 * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup
+	 * will be inserted to service queue without THROTL_TG_PENDING
+	 * set in tg_update_disptime below. Then IO dispatched from
+	 * child in tg_dispatch_one_bio will trigger double insertion
+	 * and corrupt the tree.
+	 */
+	if (!(tg->flags & THROTL_TG_PENDING))
+		return;
+
+	/*
+	 * Update disptime after setting the above flag to make sure
+	 * throtl_select_dispatch() won't exit without dispatching.
+	 */
+	tg_update_disptime(tg);
+
+	throtl_schedule_pending_timer(sq, jiffies + 1);
+}
+
+static void throtl_pd_offline(struct blkg_policy_data *pd)
+{
+	tg_flush_bios(pd_to_tg(pd));
+}
+
 struct blkcg_policy blkcg_policy_throtl = {
 	.dfl_cftypes		= throtl_files,
 	.legacy_cftypes		= throtl_legacy_files,
@@ -1533,6 +1569,7 @@ struct blkcg_policy blkcg_policy_throtl = {
 	.pd_alloc_fn		= throtl_pd_alloc,
 	.pd_init_fn		= throtl_pd_init,
 	.pd_online_fn		= throtl_pd_online,
+	.pd_offline_fn		= throtl_pd_offline,
 	.pd_free_fn		= throtl_pd_free,
 };
 
@@ -1553,32 +1590,15 @@ void blk_throtl_cancel_bios(struct gendisk *disk)
 	 */
 	rcu_read_lock();
 	blkg_for_each_descendant_post(blkg, pos_css, q->root_blkg) {
-		struct throtl_grp *tg = blkg_to_tg(blkg);
-		struct throtl_service_queue *sq = &tg->service_queue;
-
-		/*
-		 * Set the flag to make sure throtl_pending_timer_fn() won't
-		 * stop until all throttled bios are dispatched.
-		 */
-		tg->flags |= THROTL_TG_CANCELING;
-
 		/*
-		 * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup
-		 * will be inserted to service queue without THROTL_TG_PENDING
-		 * set in tg_update_disptime below. Then IO dispatched from
-		 * child in tg_dispatch_one_bio will trigger double insertion
-		 * and corrupt the tree.
+		 * disk_release will call pd_offline_fn to cancel bios.
+		 * However, disk_release can't be called if someone get
+		 * the refcount of device and issued bios which are
+		 * inflight after del_gendisk.
+		 * Cancel bios here to ensure no bios are inflight after
+		 * del_gendisk.
 		 */
-		if (!(tg->flags & THROTL_TG_PENDING))
-			continue;
-
-		/*
-		 * Update disptime after setting the above flag to make sure
-		 * throtl_select_dispatch() won't exit without dispatching.
-		 */
-		tg_update_disptime(tg);
-
-		throtl_schedule_pending_timer(sq, jiffies + 1);
+		tg_flush_bios(blkg_to_tg(blkg));
 	}
 	rcu_read_unlock();
 	spin_unlock_irq(&q->queue_lock);
-- 
2.39.2
Re: [PATCH v3] block: flush all throttled bios when deleting the cgroup
Posted by Jens Axboe 1 year, 1 month ago
On Sat, 17 Aug 2024 15:11:08 +0800, Li Lingfeng wrote:
> When a process migrates to another cgroup and the original cgroup is deleted,
> the restrictions of throttled bios cannot be removed. If the restrictions
> are set too low, it will take a long time to complete these bios.
> 
> Refer to the process of deleting a disk to remove the restrictions and
> issue bios when deleting the cgroup.
> 
> [...]

Applied, thanks!

[1/1] block: flush all throttled bios when deleting the cgroup
      (no commit info)

Best regards,
-- 
Jens Axboe
Re: [PATCH v3] block: flush all throttled bios when deleting the cgroup
Posted by Li Lingfeng 1 year, 1 month ago
Friendly ping ...

Thanks

在 2024/8/17 15:11, Li Lingfeng 写道:
> From: Li Lingfeng <lilingfeng3@huawei.com>
>
> When a process migrates to another cgroup and the original cgroup is deleted,
> the restrictions of throttled bios cannot be removed. If the restrictions
> are set too low, it will take a long time to complete these bios.
>
> Refer to the process of deleting a disk to remove the restrictions and
> issue bios when deleting the cgroup.
>
> This makes difference on the behavior of throttled bios:
> Before: the limit of the throttled bios can't be changed and the bios will
> complete under this limit;
> Now: the limit will be canceled and the throttled bios will be flushed
> immediately.
>
> References:
> [1] https://lore.kernel.org/r/20220318130144.1066064-4-ming.lei@redhat.com
> [2] https://lore.kernel.org/all/da861d63-58c6-3ca0-2535-9089993e9e28@huaweicloud.com/
>
> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
> ---
>    v2->v3:
>      Change "tg_cancel_bios" to "tg_flush_bios";
>      Add reference of v2 to describe the background.
>   block/blk-throttle.c | 68 ++++++++++++++++++++++++++++----------------
>   1 file changed, 44 insertions(+), 24 deletions(-)
>
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 6943ec720f39..cf7f4912c57a 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -1526,6 +1526,42 @@ static void throtl_shutdown_wq(struct request_queue *q)
>   	cancel_work_sync(&td->dispatch_work);
>   }
>   
> +static void tg_flush_bios(struct throtl_grp *tg)
> +{
> +	struct throtl_service_queue *sq = &tg->service_queue;
> +
> +	if (tg->flags & THROTL_TG_CANCELING)
> +		return;
> +	/*
> +	 * Set the flag to make sure throtl_pending_timer_fn() won't
> +	 * stop until all throttled bios are dispatched.
> +	 */
> +	tg->flags |= THROTL_TG_CANCELING;
> +
> +	/*
> +	 * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup
> +	 * will be inserted to service queue without THROTL_TG_PENDING
> +	 * set in tg_update_disptime below. Then IO dispatched from
> +	 * child in tg_dispatch_one_bio will trigger double insertion
> +	 * and corrupt the tree.
> +	 */
> +	if (!(tg->flags & THROTL_TG_PENDING))
> +		return;
> +
> +	/*
> +	 * Update disptime after setting the above flag to make sure
> +	 * throtl_select_dispatch() won't exit without dispatching.
> +	 */
> +	tg_update_disptime(tg);
> +
> +	throtl_schedule_pending_timer(sq, jiffies + 1);
> +}
> +
> +static void throtl_pd_offline(struct blkg_policy_data *pd)
> +{
> +	tg_flush_bios(pd_to_tg(pd));
> +}
> +
>   struct blkcg_policy blkcg_policy_throtl = {
>   	.dfl_cftypes		= throtl_files,
>   	.legacy_cftypes		= throtl_legacy_files,
> @@ -1533,6 +1569,7 @@ struct blkcg_policy blkcg_policy_throtl = {
>   	.pd_alloc_fn		= throtl_pd_alloc,
>   	.pd_init_fn		= throtl_pd_init,
>   	.pd_online_fn		= throtl_pd_online,
> +	.pd_offline_fn		= throtl_pd_offline,
>   	.pd_free_fn		= throtl_pd_free,
>   };
>   
> @@ -1553,32 +1590,15 @@ void blk_throtl_cancel_bios(struct gendisk *disk)
>   	 */
>   	rcu_read_lock();
>   	blkg_for_each_descendant_post(blkg, pos_css, q->root_blkg) {
> -		struct throtl_grp *tg = blkg_to_tg(blkg);
> -		struct throtl_service_queue *sq = &tg->service_queue;
> -
> -		/*
> -		 * Set the flag to make sure throtl_pending_timer_fn() won't
> -		 * stop until all throttled bios are dispatched.
> -		 */
> -		tg->flags |= THROTL_TG_CANCELING;
> -
>   		/*
> -		 * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup
> -		 * will be inserted to service queue without THROTL_TG_PENDING
> -		 * set in tg_update_disptime below. Then IO dispatched from
> -		 * child in tg_dispatch_one_bio will trigger double insertion
> -		 * and corrupt the tree.
> +		 * disk_release will call pd_offline_fn to cancel bios.
> +		 * However, disk_release can't be called if someone get
> +		 * the refcount of device and issued bios which are
> +		 * inflight after del_gendisk.
> +		 * Cancel bios here to ensure no bios are inflight after
> +		 * del_gendisk.
>   		 */
> -		if (!(tg->flags & THROTL_TG_PENDING))
> -			continue;
> -
> -		/*
> -		 * Update disptime after setting the above flag to make sure
> -		 * throtl_select_dispatch() won't exit without dispatching.
> -		 */
> -		tg_update_disptime(tg);
> -
> -		throtl_schedule_pending_timer(sq, jiffies + 1);
> +		tg_flush_bios(blkg_to_tg(blkg));
>   	}
>   	rcu_read_unlock();
>   	spin_unlock_irq(&q->queue_lock);

Re: [PATCH v3] block: flush all throttled bios when deleting the cgroup
Posted by Tejun Heo 1 year, 3 months ago
On Sat, Aug 17, 2024 at 03:11:08PM +0800, Li Lingfeng wrote:
> From: Li Lingfeng <lilingfeng3@huawei.com>
> 
> When a process migrates to another cgroup and the original cgroup is deleted,
> the restrictions of throttled bios cannot be removed. If the restrictions
> are set too low, it will take a long time to complete these bios.
> 
> Refer to the process of deleting a disk to remove the restrictions and
> issue bios when deleting the cgroup.
> 
> This makes difference on the behavior of throttled bios:
> Before: the limit of the throttled bios can't be changed and the bios will
> complete under this limit;
> Now: the limit will be canceled and the throttled bios will be flushed
> immediately.
> 
> References:
> [1] https://lore.kernel.org/r/20220318130144.1066064-4-ming.lei@redhat.com
> [2] https://lore.kernel.org/all/da861d63-58c6-3ca0-2535-9089993e9e28@huaweicloud.com/
> 
> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun
Re: [PATCH v3] block: flush all throttled bios when deleting the cgroup
Posted by Tejun Heo 1 year, 4 months ago
Hello,

On Sat, Aug 17, 2024 at 03:11:08PM +0800, Li Lingfeng wrote:
> From: Li Lingfeng <lilingfeng3@huawei.com>
> 
> When a process migrates to another cgroup and the original cgroup is deleted,
> the restrictions of throttled bios cannot be removed. If the restrictions
> are set too low, it will take a long time to complete these bios.
> 
> Refer to the process of deleting a disk to remove the restrictions and
> issue bios when deleting the cgroup.
> 
> This makes difference on the behavior of throttled bios:
> Before: the limit of the throttled bios can't be changed and the bios will
> complete under this limit;
> Now: the limit will be canceled and the throttled bios will be flushed
> immediately.

I still don't see why this behavior is better. Wouldn't this make it easy to
escape IO limits by creating cgroups, doing a bunch of IOs and then deleting
them?

Thanks.

-- 
tejun
Re: [PATCH v3] block: flush all throttled bios when deleting the cgroup
Posted by Michal Koutný 1 year, 3 months ago
On Mon, Aug 19, 2024 at 11:24:18AM GMT, Tejun Heo <tj@kernel.org> wrote:
> I still don't see why this behavior is better. Wouldn't this make it easy to
> escape IO limits by creating cgroups, doing a bunch of IOs and then deleting
> them?

IIUC, bios are flushed to parent throttl group, so if there's an
ancestral limit, it should be honored. (I find this similar to memcg
reparenting.)

Mere create + set limit + delete falls under the same delegation scope,
so if that limit is bypassed, it is only self-shooting in the leg.
Shortening the lifetime of offlined structures is benefitial, no?

Michal
Re: [PATCH v3] block: flush all throttled bios when deleting the cgroup
Posted by Li Lingfeng 1 year, 3 months ago
在 2024/8/20 5:24, Tejun Heo 写道:
> Hello,
>
> On Sat, Aug 17, 2024 at 03:11:08PM +0800, Li Lingfeng wrote:
>> From: Li Lingfeng <lilingfeng3@huawei.com>
>>
>> When a process migrates to another cgroup and the original cgroup is deleted,
>> the restrictions of throttled bios cannot be removed. If the restrictions
>> are set too low, it will take a long time to complete these bios.
>>
>> Refer to the process of deleting a disk to remove the restrictions and
>> issue bios when deleting the cgroup.
>>
>> This makes difference on the behavior of throttled bios:
>> Before: the limit of the throttled bios can't be changed and the bios will
>> complete under this limit;
>> Now: the limit will be canceled and the throttled bios will be flushed
>> immediately.
> I still don't see why this behavior is better. Wouldn't this make it easy to
> escape IO limits by creating cgroups, doing a bunch of IOs and then deleting
> them?
>
> Thanks.
Yes, this actually would make it easy to escape IO limits.

As described by Yu Kuai in v2, I changed this to prevent IO hang.
And I think it may be more appropriate to remove the limits in this
scenario since the limits were set by cgroup and the cgroup has been
deleted.

Thanks.