[PATCH] psi: Inherit parent cgroup psi enable state

Chuyi Zhou posted 1 patch 1 year, 6 months ago
kernel/cgroup/cgroup.c | 21 +++++++++++++++++++--
kernel/sched/psi.c     |  4 ++--
2 files changed, 21 insertions(+), 4 deletions(-)
[PATCH] psi: Inherit parent cgroup psi enable state
Posted by Chuyi Zhou 1 year, 6 months ago
Currently when a parent cgroup disables psi through cgroup.pressure, newly
created child cgroups do not inherit the psi state of the parent cgroup.

This patch tries to solve this issue. When a child cgroup is created, it
would inherit the psi enabled state of the parent in group_init().
Once the enable state is found to be false in the css_populate_dir(), the
{cpu, io, memory}.pressure files will be hidden using cgroup_file_show().

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 kernel/cgroup/cgroup.c | 21 +++++++++++++++++++--
 kernel/sched/psi.c     |  4 ++--
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index c8e4b62b436a4..775fe528efcad 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1719,6 +1719,24 @@ static void css_clear_dir(struct cgroup_subsys_state *css)
 	}
 }
 
+static int populate_psi_files(struct cgroup_subsys_state *css)
+{
+	struct cgroup *cgrp = css->cgroup;
+	int ret, i;
+
+	ret = cgroup_addrm_files(css, cgrp, cgroup_psi_files, true);
+	if (ret < 0)
+		return ret;
+
+	if (cgrp->psi && !cgrp->psi->enabled) {
+		for (i = 0; i < NR_PSI_RESOURCES; i++)
+			cgroup_file_show(&cgrp->psi_files[i], 0);
+	}
+
+	return ret;
+}
+
+
 /**
  * css_populate_dir - create subsys files in a cgroup directory
  * @css: target css
@@ -1742,8 +1760,7 @@ static int css_populate_dir(struct cgroup_subsys_state *css)
 				return ret;
 
 			if (cgroup_psi_enabled()) {
-				ret = cgroup_addrm_files(css, cgrp,
-							 cgroup_psi_files, true);
+				ret = populate_psi_files(css);
 				if (ret < 0) {
 					cgroup_addrm_files(css, cgrp,
 							   cgroup_base_files, false);
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 020d58967d4e8..d0aa17b368819 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -180,7 +180,7 @@ static void group_init(struct psi_group *group)
 {
 	int cpu;
 
-	group->enabled = true;
+	group->enabled = group->parent ? group->parent->enabled : true;
 	for_each_possible_cpu(cpu)
 		seqcount_init(&per_cpu_ptr(group->pcpu, cpu)->seq);
 	group->avg_last_update = sched_clock();
@@ -1114,8 +1114,8 @@ int psi_cgroup_alloc(struct cgroup *cgroup)
 		kfree(cgroup->psi);
 		return -ENOMEM;
 	}
-	group_init(cgroup->psi);
 	cgroup->psi->parent = cgroup_psi(cgroup_parent(cgroup));
+	group_init(cgroup->psi);
 	return 0;
 }
 
-- 
2.20.1
Re: [PATCH] psi: Inherit parent cgroup psi enable state
Posted by Michal Koutný 1 year, 6 months ago
Hello Chuyi.

On Mon, Jul 29, 2024 at 11:41:00AM GMT, Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> This patch tries to solve this issue. When a child cgroup is created, it
> would inherit the psi enabled state of the parent in group_init().
> Once the enable state is found to be false in the css_populate_dir(), the
> {cpu, io, memory}.pressure files will be hidden using cgroup_file_show().

I'd consider such an inheritance cgroupv1-ism. Notably,
mmemory.swappiness also had inheritance and it caused some headaches
because of its susceptibility to races (cgroup creation vs parent
configuration).

What about adding an option (mount option?) to determine the default of
per-cgroup pressure accounting?
(Based on reading about your use case in other mail.) With that you
could only enable it for the "online" groups of interest.

(I admit I only assume it'd be possible thanks to existence of
psi_cgroup_restart(), I didn't look at how easy it'd be.)

Thanks,
Michal
Re: [PATCH] psi: Inherit parent cgroup psi enable state
Posted by Chuyi Zhou 1 year, 6 months ago
Hello Michal,

在 2024/7/29 19:37, Michal Koutný 写道:
> Hello Chuyi.
> 
> On Mon, Jul 29, 2024 at 11:41:00AM GMT, Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>> This patch tries to solve this issue. When a child cgroup is created, it
>> would inherit the psi enabled state of the parent in group_init().
>> Once the enable state is found to be false in the css_populate_dir(), the
>> {cpu, io, memory}.pressure files will be hidden using cgroup_file_show().
> 
> I'd consider such an inheritance cgroupv1-ism. Notably,
> mmemory.swappiness also had inheritance and it caused some headaches
> because of its susceptibility to races (cgroup creation vs parent
> configuration).

Agree.

> 
> What about adding an option (mount option?) to determine the default of
> per-cgroup pressure accounting?
> (Based on reading about your use case in other mail.) With that you
> could only enable it for the "online" groups of interest.
> 

Yes. We can add a another mount option to disable the default PSI 
accounting and enable the pod we interest.

Thanks.
Re: [PATCH] psi: Inherit parent cgroup psi enable state
Posted by K Prateek Nayak 1 year, 6 months ago
Hello Chuyi,

On 7/29/2024 9:11 AM, Chuyi Zhou wrote:
> Currently when a parent cgroup disables psi through cgroup.pressure, newly
> created child cgroups do not inherit the psi state of the parent cgroup.
> 
> This patch tries to solve this issue. When a child cgroup is created, it
> would inherit the psi enabled state of the parent in group_init().
> Once the enable state is found to be false in the css_populate_dir(), the
> {cpu, io, memory}.pressure files will be hidden using cgroup_file_show().
> 
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
>   kernel/cgroup/cgroup.c | 21 +++++++++++++++++++--
>   kernel/sched/psi.c     |  4 ++--
>   2 files changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index c8e4b62b436a4..775fe528efcad 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -1719,6 +1719,24 @@ static void css_clear_dir(struct cgroup_subsys_state *css)
>   	}
>   }
>   
> +static int populate_psi_files(struct cgroup_subsys_state *css)
> +{
> +	struct cgroup *cgrp = css->cgroup;
> +	int ret, i;
> +
> +	ret = cgroup_addrm_files(css, cgrp, cgroup_psi_files, true);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (cgrp->psi && !cgrp->psi->enabled) {
> +		for (i = 0; i < NR_PSI_RESOURCES; i++)
> +			cgroup_file_show(&cgrp->psi_files[i], 0);
> +	}
> +
> +	return ret;
> +}
> +
> +
>   /**
>    * css_populate_dir - create subsys files in a cgroup directory
>    * @css: target css
> @@ -1742,8 +1760,7 @@ static int css_populate_dir(struct cgroup_subsys_state *css)
>   				return ret;
>   
>   			if (cgroup_psi_enabled()) {
> -				ret = cgroup_addrm_files(css, cgrp,
> -							 cgroup_psi_files, true);
> +				ret = populate_psi_files(css);
>   				if (ret < 0) {
>   					cgroup_addrm_files(css, cgrp,
>   							   cgroup_base_files, false);
> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> index 020d58967d4e8..d0aa17b368819 100644
> --- a/kernel/sched/psi.c
> +++ b/kernel/sched/psi.c
> @@ -180,7 +180,7 @@ static void group_init(struct psi_group *group)
>   {
>   	int cpu;
>   
> -	group->enabled = true;
> +	group->enabled = group->parent ? group->parent->enabled : true;

Since this is only the init path, if the user later enables PSI
accounting for a parent, should it not re-evaluate it for the groups
down the hierarchy?

Looking at "cgroup_pressure_write()", I could not spot it calling
"css_populate_dir()". Should it not walk the hierarchy and do a
"cgroup_file_show()" considering the changes in you patch?

(P.S. I'm not too familiar with this piece of code so please do let me
  know if I missed something obvious)

>   	for_each_possible_cpu(cpu)
>   		seqcount_init(&per_cpu_ptr(group->pcpu, cpu)->seq);
>   	group->avg_last_update = sched_clock();
> @@ -1114,8 +1114,8 @@ int psi_cgroup_alloc(struct cgroup *cgroup)
>   		kfree(cgroup->psi);
>   		return -ENOMEM;
>   	}
> -	group_init(cgroup->psi);
>   	cgroup->psi->parent = cgroup_psi(cgroup_parent(cgroup));
> +	group_init(cgroup->psi);
>   	return 0;
>   }
>   

-- 
Thanks and Regards,
Prateek
Re: [PATCH] psi: Inherit parent cgroup psi enable state
Posted by Chuyi Zhou 1 year, 6 months ago
Hello,

在 2024/7/29 12:45, K Prateek Nayak 写道:
> Hello Chuyi,
> 
> On 7/29/2024 9:11 AM, Chuyi Zhou wrote:
>> Currently when a parent cgroup disables psi through cgroup.pressure, 
>> newly
>> created child cgroups do not inherit the psi state of the parent cgroup.
>>
>> This patch tries to solve this issue. When a child cgroup is created, it
>> would inherit the psi enabled state of the parent in group_init().
>> Once the enable state is found to be false in the css_populate_dir(), the
>> {cpu, io, memory}.pressure files will be hidden using cgroup_file_show().
>>
>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
>> ---
>>   kernel/cgroup/cgroup.c | 21 +++++++++++++++++++--
>>   kernel/sched/psi.c     |  4 ++--
>>   2 files changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>> index c8e4b62b436a4..775fe528efcad 100644
>> --- a/kernel/cgroup/cgroup.c
>> +++ b/kernel/cgroup/cgroup.c
>> @@ -1719,6 +1719,24 @@ static void css_clear_dir(struct 
>> cgroup_subsys_state *css)
>>       }
>>   }
>> +static int populate_psi_files(struct cgroup_subsys_state *css)
>> +{
>> +    struct cgroup *cgrp = css->cgroup;
>> +    int ret, i;
>> +
>> +    ret = cgroup_addrm_files(css, cgrp, cgroup_psi_files, true);
>> +    if (ret < 0)
>> +        return ret;
>> +
>> +    if (cgrp->psi && !cgrp->psi->enabled) {
>> +        for (i = 0; i < NR_PSI_RESOURCES; i++)
>> +            cgroup_file_show(&cgrp->psi_files[i], 0);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +
>>   /**
>>    * css_populate_dir - create subsys files in a cgroup directory
>>    * @css: target css
>> @@ -1742,8 +1760,7 @@ static int css_populate_dir(struct 
>> cgroup_subsys_state *css)
>>                   return ret;
>>               if (cgroup_psi_enabled()) {
>> -                ret = cgroup_addrm_files(css, cgrp,
>> -                             cgroup_psi_files, true);
>> +                ret = populate_psi_files(css);
>>                   if (ret < 0) {
>>                       cgroup_addrm_files(css, cgrp,
>>                                  cgroup_base_files, false);
>> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
>> index 020d58967d4e8..d0aa17b368819 100644
>> --- a/kernel/sched/psi.c
>> +++ b/kernel/sched/psi.c
>> @@ -180,7 +180,7 @@ static void group_init(struct psi_group *group)
>>   {
>>       int cpu;
>> -    group->enabled = true;
>> +    group->enabled = group->parent ? group->parent->enabled : true;
> 
> Since this is only the init path, if the user later enables PSI
> accounting for a parent, should it not re-evaluate it for the groups
> down the hierarchy?
> 
> Looking at "cgroup_pressure_write()", I could not spot it calling
> "css_populate_dir()". Should it not walk the hierarchy and do a
> "cgroup_file_show()" considering the changes in you patch?
> 
> (P.S. I'm not too familiar with this piece of code so please do let me
>   know if I missed something obvious)

Perhaps my description in the commit log was not clear enough. This 
patch is intended to make child cgroups inherit the state of the parent 
node *during initialization*. The cgroup.pressure interface remains the 
same as before, only changing the enable state at the current level.

In production environments, the overhead of PSI could be significant on 
some machines (with many deep levels of cgroups). For certain tasks 
(such as /sys/fs/cgroup/offline/pod_xxx), we may not need to monitor 
their PSI metrics. With this patch, after disabling 
/sys/fs/cgroup/offline/cgroup.pressure, any subsequently deployed 
offline pods will not enable PSI. Although users can disable PSI by 
traversing all cgroups under /sys/fs/cgroup/offline/, this may not be 
convenient enough.

Thanks.
Re: [PATCH] psi: Inherit parent cgroup psi enable state
Posted by K Prateek Nayak 1 year, 6 months ago
Hello Chuyi,

On 7/29/2024 12:01 PM, Chuyi Zhou wrote:
> Hello,
> 
> 在 2024/7/29 12:45, K Prateek Nayak 写道:
>> Hello Chuyi,
>>
>> On 7/29/2024 9:11 AM, Chuyi Zhou wrote:
>>> Currently when a parent cgroup disables psi through cgroup.pressure, newly
>>> created child cgroups do not inherit the psi state of the parent cgroup.
>>>
>>> This patch tries to solve this issue. When a child cgroup is created, it
>>> would inherit the psi enabled state of the parent in group_init().
>>> Once the enable state is found to be false in the css_populate_dir(), the
>>> {cpu, io, memory}.pressure files will be hidden using cgroup_file_show().
>>>
>>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
>>> ---
>>>   kernel/cgroup/cgroup.c | 21 +++++++++++++++++++--
>>>   kernel/sched/psi.c     |  4 ++--
>>>   2 files changed, 21 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>>> index c8e4b62b436a4..775fe528efcad 100644
>>> --- a/kernel/cgroup/cgroup.c
>>> +++ b/kernel/cgroup/cgroup.c
>>> @@ -1719,6 +1719,24 @@ static void css_clear_dir(struct cgroup_subsys_state *css)
>>>       }
>>>   }
>>> +static int populate_psi_files(struct cgroup_subsys_state *css)
>>> +{
>>> +    struct cgroup *cgrp = css->cgroup;
>>> +    int ret, i;
>>> +
>>> +    ret = cgroup_addrm_files(css, cgrp, cgroup_psi_files, true);
>>> +    if (ret < 0)
>>> +        return ret;
>>> +
>>> +    if (cgrp->psi && !cgrp->psi->enabled) {
>>> +        for (i = 0; i < NR_PSI_RESOURCES; i++)
>>> +            cgroup_file_show(&cgrp->psi_files[i], 0);
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +
>>>   /**
>>>    * css_populate_dir - create subsys files in a cgroup directory
>>>    * @css: target css
>>> @@ -1742,8 +1760,7 @@ static int css_populate_dir(struct cgroup_subsys_state *css)
>>>                   return ret;
>>>               if (cgroup_psi_enabled()) {
>>> -                ret = cgroup_addrm_files(css, cgrp,
>>> -                             cgroup_psi_files, true);
>>> +                ret = populate_psi_files(css);
>>>                   if (ret < 0) {
>>>                       cgroup_addrm_files(css, cgrp,
>>>                                  cgroup_base_files, false);
>>> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
>>> index 020d58967d4e8..d0aa17b368819 100644
>>> --- a/kernel/sched/psi.c
>>> +++ b/kernel/sched/psi.c
>>> @@ -180,7 +180,7 @@ static void group_init(struct psi_group *group)
>>>   {
>>>       int cpu;
>>> -    group->enabled = true;
>>> +    group->enabled = group->parent ? group->parent->enabled : true;
>>
>> Since this is only the init path, if the user later enables PSI
>> accounting for a parent, should it not re-evaluate it for the groups
>> down the hierarchy?
>>
>> Looking at "cgroup_pressure_write()", I could not spot it calling
>> "css_populate_dir()". Should it not walk the hierarchy and do a
>> "cgroup_file_show()" considering the changes in you patch?
>>
>> (P.S. I'm not too familiar with this piece of code so please do let me
>>   know if I missed something obvious)
> 
> Perhaps my description in the commit log was not clear enough. This patch is intended to make child cgroups inherit the state of the parent node *during initialization*.

Ah! i see. Thank you for clarifying :)

> The cgroup.pressure interface remains the same as before, only changing the enable state at the current level.
> 
> In production environments, the overhead of PSI could be significant on some machines (with many deep levels of cgroups). For certain tasks (such as /sys/fs/cgroup/offline/pod_xxx), we may not need to monitor their PSI metrics. With this patch, after disabling /sys/fs/cgroup/offline/cgroup.pressure, any subsequently deployed offline pods will not enable PSI. Although users can disable PSI by traversing all cgroups under /sys/fs/cgroup/offline/, this may not be convenient enough.
> 
> Thanks.

-- 
Thanks and Regards,
Prateek