The static key __cfs_bandwidth_used is used to indicate whether bandwidth
control is enabled in the system. Currently, it is only decreased when a
task group disables bandwidth control. This is incorrect because if there
was a task group in the past that enabled bandwidth control, the
__cfs_bandwidth_used will never go to zero, even if there are no task_group
using bandwidth control now.
This patch tries to fix this issue by decrsasing bandwidth usage in
destroy_cfs_bandwidth().
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
kernel/sched/fair.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b1e07ce90284..7ad50dc31a93 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6447,6 +6447,9 @@ static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
hrtimer_cancel(&cfs_b->period_timer);
hrtimer_cancel(&cfs_b->slack_timer);
+ if (cfs_b->quota != RUNTIME_INF)
+ cfs_bandwidth_usage_dec();
+
/*
* It is possible that we still have some cfs_rq's pending on a CSD
* list, though this race is very rare. In order for this to occur, we
--
2.20.1
Hi, Chuyi 在 2024/7/21 20:52, Chuyi Zhou 写道: > The static key __cfs_bandwidth_used is used to indicate whether bandwidth > control is enabled in the system. Currently, it is only decreased when a > task group disables bandwidth control. This is incorrect because if there > was a task group in the past that enabled bandwidth control, the > __cfs_bandwidth_used will never go to zero, even if there are no task_group > using bandwidth control now. > > This patch tries to fix this issue by decrsasing bandwidth usage in > destroy_cfs_bandwidth(). > > Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> > --- > kernel/sched/fair.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index b1e07ce90284..7ad50dc31a93 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -6447,6 +6447,9 @@ static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) > hrtimer_cancel(&cfs_b->period_timer); > hrtimer_cancel(&cfs_b->slack_timer); > > + if (cfs_b->quota != RUNTIME_INF) > + cfs_bandwidth_usage_dec(); This calls static_key_slow_dec_cpuslocked, but destroy_cfs_bandwidth isn't holding the hotplug lock [1]. For fixing this issue, i also sent a patch, but it be not merged into mainline [2]. [1]: https://lore.kernel.org/all/20210712162655.w3j6uczwbfkzazvt@oracle.com/ [2]: https://lore.kernel.org/all/20210910094139.184582-1-zhangqiao22@huawei.com/ Thanks, -- Qiao Zhang. > + > /* > * It is possible that we still have some cfs_rq's pending on a CSD > * list, though this race is very rare. In order for this to occur, we
Hello 在 2024/7/22 11:47, Zhang Qiao 写道: > > > Hi, Chuyi > > 在 2024/7/21 20:52, Chuyi Zhou 写道: >> The static key __cfs_bandwidth_used is used to indicate whether bandwidth >> control is enabled in the system. Currently, it is only decreased when a >> task group disables bandwidth control. This is incorrect because if there >> was a task group in the past that enabled bandwidth control, the >> __cfs_bandwidth_used will never go to zero, even if there are no task_group >> using bandwidth control now. >> >> This patch tries to fix this issue by decrsasing bandwidth usage in >> destroy_cfs_bandwidth(). >> >> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> >> --- >> kernel/sched/fair.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index b1e07ce90284..7ad50dc31a93 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -6447,6 +6447,9 @@ static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) >> hrtimer_cancel(&cfs_b->period_timer); >> hrtimer_cancel(&cfs_b->slack_timer); >> >> + if (cfs_b->quota != RUNTIME_INF) >> + cfs_bandwidth_usage_dec(); > > This calls static_key_slow_dec_cpuslocked, but destroy_cfs_bandwidth > isn't holding the hotplug lock [1]. > > For fixing this issue, i also sent a patch, but it be not merged into mainline [2]. > > [1]: https://lore.kernel.org/all/20210712162655.w3j6uczwbfkzazvt@oracle.com/ > [2]: https://lore.kernel.org/all/20210910094139.184582-1-zhangqiao22@huawei.com/ > Thanks for your information. I think maybe cfs_bandwidth_usage_dec() should be moved to other more suitable places where could hold hotplug lock(e.g. cpu_cgroup_css_released()). I would do some test to verify it. Thanks.
hi 在 2024/7/22 14:04, Chuyi Zhou 写道: > Hello > > 在 2024/7/22 11:47, Zhang Qiao 写道: >> >> >> Hi, Chuyi >> >> 在 2024/7/21 20:52, Chuyi Zhou 写道: >>> The static key __cfs_bandwidth_used is used to indicate whether bandwidth >>> control is enabled in the system. Currently, it is only decreased when a >>> task group disables bandwidth control. This is incorrect because if there >>> was a task group in the past that enabled bandwidth control, the >>> __cfs_bandwidth_used will never go to zero, even if there are no task_group >>> using bandwidth control now. >>> >>> This patch tries to fix this issue by decrsasing bandwidth usage in >>> destroy_cfs_bandwidth(). >>> >>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> >>> --- >>> kernel/sched/fair.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index b1e07ce90284..7ad50dc31a93 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -6447,6 +6447,9 @@ static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) >>> hrtimer_cancel(&cfs_b->period_timer); >>> hrtimer_cancel(&cfs_b->slack_timer); >>> + if (cfs_b->quota != RUNTIME_INF) >>> + cfs_bandwidth_usage_dec(); >> >> This calls static_key_slow_dec_cpuslocked, but destroy_cfs_bandwidth >> isn't holding the hotplug lock [1]. >> >> For fixing this issue, i also sent a patch, but it be not merged into mainline [2]. >> >> [1]: https://lore.kernel.org/all/20210712162655.w3j6uczwbfkzazvt@oracle.com/ >> [2]: https://lore.kernel.org/all/20210910094139.184582-1-zhangqiao22@huawei.com/ >> > > Thanks for your information. > > I think maybe cfs_bandwidth_usage_dec() should be moved to other more suitable places where could > hold hotplug lock(e.g. cpu_cgroup_css_released()). I would do some test to verify it. > The cpu_cgroup_css_released() also doesn't seem to be in the cpu hotplug lock-holding context. > Thanks. > > > > >
Hello, 在 2024/7/22 15:16, Zhang Qiao 写道: > hi > > 在 2024/7/22 14:04, Chuyi Zhou 写道: >> Hello >> >> 在 2024/7/22 11:47, Zhang Qiao 写道: >>> >>> >>> Hi, Chuyi >>> >>> 在 2024/7/21 20:52, Chuyi Zhou 写道: >>>> The static key __cfs_bandwidth_used is used to indicate whether bandwidth >>>> control is enabled in the system. Currently, it is only decreased when a >>>> task group disables bandwidth control. This is incorrect because if there >>>> was a task group in the past that enabled bandwidth control, the >>>> __cfs_bandwidth_used will never go to zero, even if there are no task_group >>>> using bandwidth control now. >>>> >>>> This patch tries to fix this issue by decrsasing bandwidth usage in >>>> destroy_cfs_bandwidth(). >>>> >>>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> >>>> --- >>>> kernel/sched/fair.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>>> index b1e07ce90284..7ad50dc31a93 100644 >>>> --- a/kernel/sched/fair.c >>>> +++ b/kernel/sched/fair.c >>>> @@ -6447,6 +6447,9 @@ static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) >>>> hrtimer_cancel(&cfs_b->period_timer); >>>> hrtimer_cancel(&cfs_b->slack_timer); >>>> + if (cfs_b->quota != RUNTIME_INF) >>>> + cfs_bandwidth_usage_dec(); >>> >>> This calls static_key_slow_dec_cpuslocked, but destroy_cfs_bandwidth >>> isn't holding the hotplug lock [1]. >>> >>> For fixing this issue, i also sent a patch, but it be not merged into mainline [2]. >>> >>> [1]: https://lore.kernel.org/all/20210712162655.w3j6uczwbfkzazvt@oracle.com/ >>> [2]: https://lore.kernel.org/all/20210910094139.184582-1-zhangqiao22@huawei.com/ >>> >> >> Thanks for your information. >> >> I think maybe cfs_bandwidth_usage_dec() should be moved to other more suitable places where could >> hold hotplug lock(e.g. cpu_cgroup_css_released()). I would do some test to verify it. >> > > The cpu_cgroup_css_released() also doesn't seem to be in the cpu hotplug lock-holding context. > IIUC, cpus_read_lock/cpus_read_unlock can be called in cpu_cgroup_css_released() right? But cfs bandwidth destroy maybe run in a rcu callback since task group list is protected by RCU so we could not get the lock. Did I miss something important?
在 2024/7/22 15:46, Chuyi Zhou 写道: >>> >>> Thanks for your information. >>> >>> I think maybe cfs_bandwidth_usage_dec() should be moved to other more suitable places where could >>> hold hotplug lock(e.g. cpu_cgroup_css_released()). I would do some test to verify it. >>> >> >> The cpu_cgroup_css_released() also doesn't seem to be in the cpu hotplug lock-holding context. >> > > IIUC, cpus_read_lock/cpus_read_unlock can be called in cpu_cgroup_css_released() right? But cfs > bandwidth destroy maybe run in a rcu callback since task group list is protected by RCU so we could not > get the lock. Did I miss something important? Okay, you're right. I ignored that we can't hold the hotplug lock in an rcu callback.
Zhang Qiao <zhangqiao22@huawei.com> writes: > 在 2024/7/22 15:46, Chuyi Zhou 写道: >>>> >>>> Thanks for your information. >>>> >>>> I think maybe cfs_bandwidth_usage_dec() should be moved to other more suitable places where could >>>> hold hotplug lock(e.g. cpu_cgroup_css_released()). I would do some test to verify it. >>>> >>> >>> The cpu_cgroup_css_released() also doesn't seem to be in the cpu hotplug lock-holding context. >>> >> >> IIUC, cpus_read_lock/cpus_read_unlock can be called in cpu_cgroup_css_released() right? But cfs >> bandwidth destroy maybe run in a rcu callback since task group list is protected by RCU so we could not >> get the lock. Did I miss something important? > > > Okay, you're right. I ignored that we can't hold the hotplug lock in an rcu callback. Yeah, cpu_cgroup_css_released/cpu_cgroup_css_free are fine I think, and I think it should be correct to move the call to destroy_cfs_bandwidth() to cpu_cgroup_css_free (it's unfortunate in terms of code organization, but as far as correctness goes it should be fine). As far as the diff goes, the _dec should go after the __cfsb_csd_unthrottle loop.
© 2016 - 2025 Red Hat, Inc.