blk-throttle: enable io throttle for root in cgroup v2

[PATCH -next] blk-throttle: enable io throttle for root in cgroup v2

Posted by Yu Kuai 4 years, 5 months ago

RFC patch: https://lkml.org/lkml/2021/9/9/1432

There is a proformance problem in our environment:

A host can provide a remote device to difierent client. If one client is
under high io pressure, other clients might be affected.

Limit the overall iops/bps(io.max) from the client can fix the problem,
however, config files do not exist in root cgroup currently, which makes
it impossible.

This patch enables io throttle for root cgroup:
 - enable "io.max" and "io.low" in root
 - don't skip root group in tg_iops_limit() and tg_bps_limit()
 - don't skip root group in tg_conf_updated()

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-throttle.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 7c462c006b26..ac25bfbbfe7f 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -156,9 +156,6 @@ static uint64_t tg_bps_limit(struct throtl_grp *tg, int rw)
 	struct throtl_data *td;
 	uint64_t ret;
 
-	if (cgroup_subsys_on_dfl(io_cgrp_subsys) && !blkg->parent)
-		return U64_MAX;
-
 	td = tg->td;
 	ret = tg->bps[rw][td->limit_index];
 	if (ret == 0 && td->limit_index == LIMIT_LOW) {
@@ -186,9 +183,6 @@ static unsigned int tg_iops_limit(struct throtl_grp *tg, int rw)
 	struct throtl_data *td;
 	unsigned int ret;
 
-	if (cgroup_subsys_on_dfl(io_cgrp_subsys) && !blkg->parent)
-		return UINT_MAX;
-
 	td = tg->td;
 	ret = tg->iops[rw][td->limit_index];
 	if (ret == 0 && tg->td->limit_index == LIMIT_LOW) {
@@ -1284,9 +1278,8 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global)
 		struct throtl_grp *parent_tg;
 
 		tg_update_has_rules(this_tg);
-		/* ignore root/second level */
-		if (!cgroup_subsys_on_dfl(io_cgrp_subsys) || !blkg->parent ||
-		    !blkg->parent->parent)
+		/* ignore root level */
+		if (!cgroup_subsys_on_dfl(io_cgrp_subsys) || !blkg->parent)
 			continue;
 		parent_tg = blkg_to_tg(blkg->parent);
 		/*
@@ -1625,7 +1618,6 @@ static struct cftype throtl_files[] = {
 #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
 	{
 		.name = "low",
-		.flags = CFTYPE_NOT_ON_ROOT,
 		.seq_show = tg_print_limit,
 		.write = tg_set_limit,
 		.private = LIMIT_LOW,
@@ -1633,7 +1625,6 @@ static struct cftype throtl_files[] = {
 #endif
 	{
 		.name = "max",
-		.flags = CFTYPE_NOT_ON_ROOT,
 		.seq_show = tg_print_limit,
 		.write = tg_set_limit,
 		.private = LIMIT_MAX,
-- 
2.31.1

Re: [PATCH -next] blk-throttle: enable io throttle for root in cgroup v2

Posted by Tejun Heo 4 years, 5 months ago

On Fri, Jan 14, 2022 at 05:30:00PM +0800, Yu Kuai wrote:
> RFC patch: https://lkml.org/lkml/2021/9/9/1432
> 
> There is a proformance problem in our environment:
> 
> A host can provide a remote device to difierent client. If one client is
> under high io pressure, other clients might be affected.
> 
> Limit the overall iops/bps(io.max) from the client can fix the problem,
> however, config files do not exist in root cgroup currently, which makes
> it impossible.
> 
> This patch enables io throttle for root cgroup:
>  - enable "io.max" and "io.low" in root
>  - don't skip root group in tg_iops_limit() and tg_bps_limit()
>  - don't skip root group in tg_conf_updated()
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Yeah, I'm kinda split. It's a simple change with some utility, but it's also
something which doesn't fit with the cgroup feature or interface. It's
regulating the whole system behavior. There's no reason for any of the
control "groups" to be involved here and semantically the interface would
fit a lot better under /proc, /sys or some other system-wide location. Here
are some points to consider:

* As a comparison, it'd be rather absurd to enable memory.max at system root
  in terms of interface and most likely break whole lot of mm operations.

* Resource control knobs of a cgroup belong to the parent as the parent is
  responsible for divvying up the available resources to its children. Here
  too, the knobs are making sense because there's a higher level parent
  (whether that's hypervisor or some network server).

Is your use case VMs or network attached storage?

Thanks.

-- 
tejun

Re: [PATCH -next] blk-throttle: enable io throttle for root in cgroup v2

Posted by yukuai (C) 4 years, 5 months ago

在 2022/01/27 1:29, Tejun Heo 写道:
> On Fri, Jan 14, 2022 at 05:30:00PM +0800, Yu Kuai wrote:
>> RFC patch: https://lkml.org/lkml/2021/9/9/1432
>>
>> There is a proformance problem in our environment:
>>
>> A host can provide a remote device to difierent client. If one client is
>> under high io pressure, other clients might be affected.
>>
>> Limit the overall iops/bps(io.max) from the client can fix the problem,
>> however, config files do not exist in root cgroup currently, which makes
>> it impossible.
>>
>> This patch enables io throttle for root cgroup:
>>   - enable "io.max" and "io.low" in root
>>   - don't skip root group in tg_iops_limit() and tg_bps_limit()
>>   - don't skip root group in tg_conf_updated()
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> 
> Yeah, I'm kinda split. It's a simple change with some utility, but it's also
> something which doesn't fit with the cgroup feature or interface. It's
> regulating the whole system behavior. There's no reason for any of the
> control "groups" to be involved here and semantically the interface would
> fit a lot better under /proc, /sys or some other system-wide location. Here
> are some points to consider:
> 
> * As a comparison, it'd be rather absurd to enable memory.max at system root
>    in terms of interface and most likely break whole lot of mm operations.
> 
> * Resource control knobs of a cgroup belong to the parent as the parent is
>    responsible for divvying up the available resources to its children. Here
>    too, the knobs are making sense because there's a higher level parent
>    (whether that's hypervisor or some network server).
> 
> Is your use case VMs or network attached storage?
> 
Hi,

In our case, the disk is provided by server, and such disk can be shared
by multipul clients. Thus for the client side, the server is a higher
level parent.

Theoretically, limit the io from server for each client is feasible,
however, the main reason we don't want to do this is the following
shortcoming:

client can still send io to server unlimited, we can just limit the
amount of io that can complete from server, which might cause too much
pressure on the server side.

Thanks,
Kuai

Re: [PATCH -next] blk-throttle: enable io throttle for root in cgroup v2

Posted by Ming Lei 4 years, 4 months ago

Hello Yu Kuai,

On Fri, Jan 14, 2022 at 05:30:00PM +0800, Yu Kuai wrote:
> RFC patch: https://lkml.org/lkml/2021/9/9/1432
> 
> There is a proformance problem in our environment:
> 
> A host can provide a remote device to difierent client. If one client is
> under high io pressure, other clients might be affected.

Can you use the linux kernel storage term to describe the issue?
Such as, I guess here host means target server(iscsi, nvme target?),
client should be scsi initiator, or nvme host. If not, can you provide
one actual example for your storage use case?

With common term used, it becomes pretty easy for people to understand &
solve the issue, and avoid any misunderstanding.

> 
> Limit the overall iops/bps(io.max) from the client can fix the problem,

Just be curious how each client can figure out perfect iops/bps limit?
Given one client doesn't know how many clients are connected to the
target server.

It sounds like the throttle shouldn't be done in client side cgroup,
given the throttle is nothing to do with tasks. 

Maybe it should be done in server side, since server has enough
information to provide fair iops/bps allocation for each clients.

Thanks, 
Ming

[PATCH -next] blk-throttle: enable io throttle for root in cgroup v2

Posted by Ofir Gal 3 years, 4 months ago

From: Ofir Gal <ofir@gal.software>

Hello Ming Lei,

I am trying to use cgroups v2 to throttle a media disk that is controlled by an NVME target.
Unfortunately, it cannot be done without setting the limit in the root cgroup.
It can be done via cgroups v1. Yu Kuai's patch allows this to be accomplished.

My setup consist from 3 servers.
Server #1:
    a. SSD media disk (needs to be throttled to 100K IOPs)
    b. NVME target controlling the SSD (1.a)

Server #2:
    a. NVME initiator is connected to Server #1 NVME target (1.b)

Server #3:
    a. NVME initiator is connected to Server #1 NVME target (1.b)

My setup accesses this media from multiple servers using NVMe over TCP,
but the initiator servers' workloads are unknown and can be changed dynamically. I need to limit the media disk to 100K IOPS on the target side.

I have tried to limit the SSD on Server #1, but it seems that the NVME target kworkers are not affected unless I use Yu Kuai's patch.

Can you elaborate on the issues with this patch or how the scenario described above can be done with cgroups v2?

Best regards, Ofir Gal.

Re: [PATCH -next] blk-throttle: enable io throttle for root in cgroup v2

Posted by Michal Koutný 3 years, 4 months ago

On Sun, Feb 05, 2023 at 05:55:41PM +0200, Ofir Gal <ofir.gal@volumez.com> wrote:
> I have tried to limit the SSD on Server #1, but it seems that the NVME
> target kworkers are not affected unless I use Yu Kuai's patch.
> 
> Can you elaborate on the issues with this patch or how the scenario
> described above can be done with cgroups v2?

The issue is that if there's a client that doesn't implement
self-throttling you cannot guarantee anything on the server side.
Hence the mechanism must exist on the server side.

The NVME target should charge IO to respective blkcg's (just a generic
advice, I'm not familiar with that interface; see also
kthread_associate_blkcg() use in loop device driver).

HTH,
Michal