kernel/sched/core.c | 21 +++++---------------- 1 file changed, 5 insertions(+), 16 deletions(-)
For the handling logic of child_quota, there is no need to distinguish
between cgroup1 and cgroup2, so unify the handling logic here.
Signed-off-by: Liu Song <liusong@linux.alibaba.com>
---
kernel/sched/core.c | 21 +++++----------------
1 file changed, 5 insertions(+), 16 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e752146e59a4..8418c67faa69 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9501,23 +9501,12 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data)
parent_quota = parent_b->hierarchical_quota;
/*
- * Ensure max(child_quota) <= parent_quota. On cgroup2,
- * always take the non-RUNTIME_INF min. On cgroup1, only
- * inherit when no limit is set. In both cases this is used
- * by the scheduler to determine if a given CFS task has a
- * bandwidth constraint at some higher level.
+ * Ensure max(child_quota) <= parent_quota.
*/
- if (cgroup_subsys_on_dfl(cpu_cgrp_subsys)) {
- if (quota == RUNTIME_INF)
- quota = parent_quota;
- else if (parent_quota != RUNTIME_INF)
- quota = min(quota, parent_quota);
- } else {
- if (quota == RUNTIME_INF)
- quota = parent_quota;
- else if (parent_quota != RUNTIME_INF && quota > parent_quota)
- return -EINVAL;
- }
+ if (quota == RUNTIME_INF)
+ quota = parent_quota;
+ else if (parent_quota != RUNTIME_INF)
+ quota = min(quota, parent_quota);
}
cfs_b->hierarchical_quota = quota;
--
2.19.1.6.gb485710b
On Tue, Sep 10, 2024 at 03:48:32PM +0800, Liu Song wrote: > For the handling logic of child_quota, there is no need to distinguish > between cgroup1 and cgroup2, so unify the handling logic here. > > Signed-off-by: Liu Song <liusong@linux.alibaba.com> It doens't make much sense to change the interface for cgroup1 at this point. Let's please leave it as-is. Thanks. -- tejun
> 2024年9月11日 03:32,Tejun Heo <tj@kernel.org> 写道: > > On Tue, Sep 10, 2024 at 03:48:32PM +0800, Liu Song wrote: >> For the handling logic of child_quota, there is no need to distinguish >> between cgroup1 and cgroup2, so unify the handling logic here. >> >> Signed-off-by: Liu Song <liusong@linux.alibaba.com> > > It doens't make much sense to change the interface for cgroup1 at this > point. Let's please leave it as-is. > > Thanks. > > -- > tejun Hi In scenarios involving secure shared containers (like Kata), where containers are deployed on VMs and constrained by CPU runtime using quotas, the concept of vCPUs comes into play. If the CPU limit set by Kubernetes is less than the actual number of vCPUs, meaning the CPU count derived from the quota is less than the vCPU count, then when a user runs lscpu inside the container, the reported CPU count will be greater than the container's quota. If the user uses this reported count to calculate quota and attempts to set it for their own sub-container, it will result in an error under cgroup1, whereas the same operation will succeed under cgroup2. To avoid imposing extra learning costs on users, unifying the handling logic in this regard is still beneficial. Thanks
Hi, On Tue, Sep 10, 2024 at 03:48:32PM +0800 Liu Song wrote: > For the handling logic of child_quota, there is no need to distinguish > between cgroup1 and cgroup2, so unify the handling logic here. > > Signed-off-by: Liu Song <liusong@linux.alibaba.com> > --- > kernel/sched/core.c | 21 +++++---------------- > 1 file changed, 5 insertions(+), 16 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index e752146e59a4..8418c67faa69 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -9501,23 +9501,12 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data) > parent_quota = parent_b->hierarchical_quota; > > /* > - * Ensure max(child_quota) <= parent_quota. On cgroup2, > - * always take the non-RUNTIME_INF min. On cgroup1, only > - * inherit when no limit is set. In both cases this is used > - * by the scheduler to determine if a given CFS task has a > - * bandwidth constraint at some higher level. This comment is here for a reason. Please don't remove it. > + * Ensure max(child_quota) <= parent_quota. > */ > - if (cgroup_subsys_on_dfl(cpu_cgrp_subsys)) { > - if (quota == RUNTIME_INF) > - quota = parent_quota; > - else if (parent_quota != RUNTIME_INF) > - quota = min(quota, parent_quota); > - } else { > - if (quota == RUNTIME_INF) > - quota = parent_quota; > - else if (parent_quota != RUNTIME_INF && quota > parent_quota) > - return -EINVAL; > - } > + if (quota == RUNTIME_INF) > + quota = parent_quota; > + else if (parent_quota != RUNTIME_INF) > + quota = min(quota, parent_quota); > } > cfs_b->hierarchical_quota = quota; > I don't think there is a need to optimize this slow path to allow setting invalid values which have to be handled in fast paths. And this will change expected behavior. So NAK. Cheers, Phil --
> 2024年9月10日 18:49,Phil Auld <pauld@redhat.com> 写道: > > > Hi, > > On Tue, Sep 10, 2024 at 03:48:32PM +0800 Liu Song wrote: >> For the handling logic of child_quota, there is no need to distinguish >> between cgroup1 and cgroup2, so unify the handling logic here. >> >> Signed-off-by: Liu Song <liusong@linux.alibaba.com> >> --- >> kernel/sched/core.c | 21 +++++---------------- >> 1 file changed, 5 insertions(+), 16 deletions(-) >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index e752146e59a4..8418c67faa69 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -9501,23 +9501,12 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data) >> parent_quota = parent_b->hierarchical_quota; >> >> /* >> - * Ensure max(child_quota) <= parent_quota. On cgroup2, >> - * always take the non-RUNTIME_INF min. On cgroup1, only >> - * inherit when no limit is set. In both cases this is used >> - * by the scheduler to determine if a given CFS task has a >> - * bandwidth constraint at some higher level. > > This comment is here for a reason. Please don't remove it. Hi I don’t see why cgroup1 needs to impose this restriction while cgroup2 can directly take the non-RUNTIME_INF minimum value. What is the necessity of this? It seems more reasonable to unify the handling logic. Even if the child group quota exceeds the parent group quota, it would not actually take effect. However, if the parent group quota is reset to a larger value, then the child group quota would have actual significance. Therefore, the handling logic should be consistent between cgroup1 and cgroup2. Thanks > >> + * Ensure max(child_quota) <= parent_quota. >> */ >> - if (cgroup_subsys_on_dfl(cpu_cgrp_subsys)) { >> - if (quota == RUNTIME_INF) >> - quota = parent_quota; >> - else if (parent_quota != RUNTIME_INF) >> - quota = min(quota, parent_quota); >> - } else { >> - if (quota == RUNTIME_INF) >> - quota = parent_quota; >> - else if (parent_quota != RUNTIME_INF && quota > parent_quota) >> - return -EINVAL; >> - } >> + if (quota == RUNTIME_INF) >> + quota = parent_quota; >> + else if (parent_quota != RUNTIME_INF) >> + quota = min(quota, parent_quota); >> } >> cfs_b->hierarchical_quota = quota; >> > > I don't think there is a need to optimize this slow path > to allow setting invalid values which have to be handled in > fast paths. And this will change expected behavior. > > So NAK. > > Cheers, > Phil > > --
On Tue, Sep 10, 2024 at 07:13:32PM +0800 刘嵩 wrote: > > > > 2024年9月10日 18:49,Phil Auld <pauld@redhat.com> 写道: > > > > > > Hi, > > > > On Tue, Sep 10, 2024 at 03:48:32PM +0800 Liu Song wrote: > >> For the handling logic of child_quota, there is no need to distinguish > >> between cgroup1 and cgroup2, so unify the handling logic here. > >> > >> Signed-off-by: Liu Song <liusong@linux.alibaba.com> > >> --- > >> kernel/sched/core.c | 21 +++++---------------- > >> 1 file changed, 5 insertions(+), 16 deletions(-) > >> > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c > >> index e752146e59a4..8418c67faa69 100644 > >> --- a/kernel/sched/core.c > >> +++ b/kernel/sched/core.c > >> @@ -9501,23 +9501,12 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data) > >> parent_quota = parent_b->hierarchical_quota; > >> > >> /* > >> - * Ensure max(child_quota) <= parent_quota. On cgroup2, > >> - * always take the non-RUNTIME_INF min. On cgroup1, only > >> - * inherit when no limit is set. In both cases this is used > >> - * by the scheduler to determine if a given CFS task has a > >> - * bandwidth constraint at some higher level. > > > > This comment is here for a reason. Please don't remove it. > > Hi > > I don’t see why cgroup1 needs to impose this restriction while cgroup2 > can directly take the non-RUNTIME_INF minimum value. What is the > necessity of this? > That's how cgroupv1 bandwidth control is defined. See Documentation/scheduler/sched-bcw.rst. > It seems more reasonable to unify the handling logic. Even if the child > group quota exceeds the parent group quota, it would not actually take > effect. > It's not about it taking effect or not. You are not supposed to be allowed to configure a child quota > parent quota. It's supposed to be an error. Also, my comment about the comment specifically is that last sentence, which explains that other parts of the scheduler rely on this being set correctly, needs to remain. But since I don't think this change is right, that should not be an issue. Cheers, Phil > However, if the parent group quota is reset to a larger value, then the > child group quota would have actual significance. Therefore, the handling > logic should be consistent between cgroup1 and cgroup2. > > Thanks > > > > > >> + * Ensure max(child_quota) <= parent_quota. > >> */ > >> - if (cgroup_subsys_on_dfl(cpu_cgrp_subsys)) { > >> - if (quota == RUNTIME_INF) > >> - quota = parent_quota; > >> - else if (parent_quota != RUNTIME_INF) > >> - quota = min(quota, parent_quota); > >> - } else { > >> - if (quota == RUNTIME_INF) > >> - quota = parent_quota; > >> - else if (parent_quota != RUNTIME_INF && quota > parent_quota) > >> - return -EINVAL; > >> - } > >> + if (quota == RUNTIME_INF) > >> + quota = parent_quota; > >> + else if (parent_quota != RUNTIME_INF) > >> + quota = min(quota, parent_quota); > >> } > >> cfs_b->hierarchical_quota = quota; > >> > > > > I don't think there is a need to optimize this slow path > > to allow setting invalid values which have to be handled in > > fast paths. And this will change expected behavior. > > > > So NAK. > > > > Cheers, > > Phil > > > > -- > --
Hello. On Tue, Sep 10, 2024 at 03:48:32PM GMT, Liu Song <liusong@linux.alibaba.com> wrote: > For the handling logic of child_quota, there is no need to distinguish > between cgroup1 and cgroup2, so unify the handling logic here. IIUC, v1 check prevents quota overcommit while v2 doesn't. So this isn't user invisible change (i.e. there may be a need to distinguish the two). Regards, Michal
> 2024年9月10日 17:35,Michal Koutný <mkoutny@suse.com> 写道: > > Hello. > > On Tue, Sep 10, 2024 at 03:48:32PM GMT, Liu Song <liusong@linux.alibaba.com> wrote: >> For the handling logic of child_quota, there is no need to distinguish >> between cgroup1 and cgroup2, so unify the handling logic here. > > IIUC, v1 check prevents quota overcommit while v2 doesn't. So this > isn't user invisible change (i.e. there may be a need to distinguish the > two). > Hi In cgroup2, child task group quota can set exceed parent task group quota. However, at the scheduling level, the cpu runqueue (rq) performs checks on the runtime of the CFS run queue (cfs_rq) in a hierarchical way. Therefore, even though the child group quota appears to be greater than the parent group quota, it will not actually receive a runtime that exceeds the parent's quota. This logic also holds true for cgroup1, so there is no need to differentiate the handling logic in this case. Thanks > Regards, > Michal
© 2016 - 2024 Red Hat, Inc.