[PATCH v2 2/2] sched/core: Avoid unnecessary update in tg_set_cfs_bandwidth

Chuyi Zhou posted 2 patches 1 month, 3 weeks ago
[PATCH v2 2/2] sched/core: Avoid unnecessary update in tg_set_cfs_bandwidth
Posted by Chuyi Zhou 1 month, 3 weeks ago
In the kubernetes production environment, we have observed a high
frequency of writes to cpu.max, approximately every 2~4 seconds for each
cgroup, with the same value being written each time. This can result in
unnecessary overhead, especially on machines with a large number of CPUs
and cgroups.

This is because kubelet and runc attempt to persist resource
configurations through frequent updates with same value in this manner.
While optimizations can be made to kubelet and runc to avoid such
overhead(e.g. check the current value of cpu request/limit before writing
to cpu.max), it is still worth to bail out from tg_set_cfs_bandwidth() if
we attempt to update with the same value.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 kernel/sched/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7720d34bd71b..0cc564f45511 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9090,6 +9090,9 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota,
 	guard(cpus_read_lock)();
 	guard(mutex)(&cfs_constraints_mutex);
 
+	if (cfs_b->period == ns_to_ktime(period) && cfs_b->quota == quota && cfs_b->burst == burst)
+		return 0;
+
 	ret = __cfs_schedulable(tg, period, quota);
 	if (ret)
 		return ret;
-- 
2.20.1
Re: [PATCH v2 2/2] sched/core: Avoid unnecessary update in tg_set_cfs_bandwidth
Posted by Chengming Zhou 1 month, 3 weeks ago
On 2024/7/23 20:20, Chuyi Zhou wrote:
> In the kubernetes production environment, we have observed a high
> frequency of writes to cpu.max, approximately every 2~4 seconds for each
> cgroup, with the same value being written each time. This can result in
> unnecessary overhead, especially on machines with a large number of CPUs
> and cgroups.
> 
> This is because kubelet and runc attempt to persist resource
> configurations through frequent updates with same value in this manner.
> While optimizations can be made to kubelet and runc to avoid such
> overhead(e.g. check the current value of cpu request/limit before writing
> to cpu.max), it is still worth to bail out from tg_set_cfs_bandwidth() if
> we attempt to update with the same value.
> 
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
>   kernel/sched/core.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7720d34bd71b..0cc564f45511 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9090,6 +9090,9 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota,
>   	guard(cpus_read_lock)();
>   	guard(mutex)(&cfs_constraints_mutex);
>   
> +	if (cfs_b->period == ns_to_ktime(period) && cfs_b->quota == quota && cfs_b->burst == burst)
> +		return 0;
> +

Should break this to multiple lines? Feel free to add:

Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>

>   	ret = __cfs_schedulable(tg, period, quota);
>   	if (ret)
>   		return ret;
Re: [PATCH v2 2/2] sched/core: Avoid unnecessary update in tg_set_cfs_bandwidth
Posted by Benjamin Segall 1 month, 3 weeks ago
Chuyi Zhou <zhouchuyi@bytedance.com> writes:

> In the kubernetes production environment, we have observed a high
> frequency of writes to cpu.max, approximately every 2~4 seconds for each
> cgroup, with the same value being written each time. This can result in
> unnecessary overhead, especially on machines with a large number of CPUs
> and cgroups.
>
> This is because kubelet and runc attempt to persist resource
> configurations through frequent updates with same value in this manner.
> While optimizations can be made to kubelet and runc to avoid such
> overhead(e.g. check the current value of cpu request/limit before writing
> to cpu.max), it is still worth to bail out from tg_set_cfs_bandwidth() if
> we attempt to update with the same value.
>
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>

Reviewed-by: Ben Segall <bsegall@google.com>

> ---
>  kernel/sched/core.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7720d34bd71b..0cc564f45511 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9090,6 +9090,9 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota,
>  	guard(cpus_read_lock)();
>  	guard(mutex)(&cfs_constraints_mutex);
>  
> +	if (cfs_b->period == ns_to_ktime(period) && cfs_b->quota == quota && cfs_b->burst == burst)
> +		return 0;
> +
>  	ret = __cfs_schedulable(tg, period, quota);
>  	if (ret)
>  		return ret;