[PATCH 4/4] cgroup: Do not report unavailable v1 controllers in /proc/cgroups

Michal Koutný posted 4 patches 1 year, 5 months ago
[PATCH 4/4] cgroup: Do not report unavailable v1 controllers in /proc/cgroups
Posted by Michal Koutný 1 year, 5 months ago
This is a followup to CONFIG-urability of cpuset and memory controllers
for v1 hierarchies. Make the output in /proc/cgroups reflect that
!CONFIG_CPUSETS_V1 is like !CONFIG_CPUSETS and
!CONFIG_MEMCG_V1 is like !CONFIG_MEMCG.

The intended effect is that hiding the unavailable controllers will hint
users not to try mounting them on v1.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/cgroup/cgroup-v1.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 784337694a4be..e28d5f0d20ed0 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -681,11 +681,14 @@ int proc_cgroupstats_show(struct seq_file *m, void *v)
 	 * cgroup_mutex contention.
 	 */
 
-	for_each_subsys(ss, i)
+	for_each_subsys(ss, i) {
+		if (cgroup1_subsys_absent(ss))
+			continue;
 		seq_printf(m, "%s\t%d\t%d\t%d\n",
 			   ss->legacy_name, ss->root->hierarchy_id,
 			   atomic_read(&ss->root->nr_cgrps),
 			   cgroup_ssid_enabled(i));
+	}
 
 	return 0;
 }
-- 
2.46.0

Re: [PATCH 4/4] cgroup: Do not report unavailable v1 controllers in /proc/cgroups
Posted by Ben Hutchings 7 months ago
On Mon, 2024-09-09 at 18:32 +0200, Michal Koutný wrote:
> This is a followup to CONFIG-urability of cpuset and memory controllers
> for v1 hierarchies. Make the output in /proc/cgroups reflect that
> !CONFIG_CPUSETS_V1 is like !CONFIG_CPUSETS and
> !CONFIG_MEMCG_V1 is like !CONFIG_MEMCG.
> 
> The intended effect is that hiding the unavailable controllers will hint
> users not to try mounting them on v1.

This change can cause problems for the OpenJDK JVM, as reported in
<https://bugs.debian.org/1108294>.

Since OpenJDK version 11, the JVM can detect and adapt to cpuset and
memory limits.  It supports both the cgroups v1 and v2 API, but before
version 25 it always relied on /proc/cgroups to detect whether those
controllers were enabled.

The result of this patch is that if CONFIG_MEMCG_V1 is disabled the JVM
can easily trigger OOM when otherwise it would trim its memory usage
through garbage collection.  (For cpusets, I'm not sure of the impact
but I think it might make bad decisions about the size of thread pools.)

Although the fix in OpenJDK 25 can probably be backported to older
versions, this issue primarily affects container workloads so fixing
this in distribution packages would not be sufficient.

The obvious compatibility fix for this at the kernel level is to enable
CONFIG_{CPUSETS,MEMCG}_V1.  But since the v1 API has long been
deprecated and is not actually needed by OpenJDK, I would prefer not to
do that.

Would you consider reverting this change for the sake of compatibility?

Ben.

> Signed-off-by: Michal Koutný <mkoutny@suse.com>
> ---
>  kernel/cgroup/cgroup-v1.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> index 784337694a4be..e28d5f0d20ed0 100644
> --- a/kernel/cgroup/cgroup-v1.c
> +++ b/kernel/cgroup/cgroup-v1.c
> @@ -681,11 +681,14 @@ int proc_cgroupstats_show(struct seq_file *m, void *v)
>  	 * cgroup_mutex contention.
>  	 */
>  
> -	for_each_subsys(ss, i)
> +	for_each_subsys(ss, i) {
> +		if (cgroup1_subsys_absent(ss))
> +			continue;
>  		seq_printf(m, "%s\t%d\t%d\t%d\n",
>  			   ss->legacy_name, ss->root->hierarchy_id,
>  			   atomic_read(&ss->root->nr_cgrps),
>  			   cgroup_ssid_enabled(i));
> +	}
>  
>  	return 0;
>  }

-- 
Ben Hutchings
73.46% of all statistics are made up.
Re: [PATCH 4/4] cgroup: Do not report unavailable v1 controllers in /proc/cgroups
Posted by Michal Koutný 7 months ago
Hello Ben.

On Wed, Jul 09, 2025 at 08:22:09PM +0200, Ben Hutchings <ben@decadent.org.uk> wrote:
> Would you consider reverting this change for the sake of compatibility?

As you write, it's not fatally broken and it may be "just" an issue of
container images that got no fresh rebuild. (And I think it should be
generally discouraged running containers with stale deps in them.)

The original patch would mainly serve legacy userspace (host) setups on
top of contemporary kernel (besides API purity reasons). Admittedly,
these should be rare and eventually extinct in contrast with your
example where it's a containerized userspace (which typically could do
no cgroup setup) that may still have some user demand.

So, I'd be more confident with the revert if such an adjustment was
carried downstream by some distro and proven its viability first. Do you
know of any in the wild?

I appreciate your report,
Michal
Re: [PATCH 4/4] cgroup: Do not report unavailable v1 controllers in /proc/cgroups
Posted by Ben Hutchings 7 months ago
On Fri, 2025-07-11 at 15:10 +0200, Michal Koutný wrote:
> Hello Ben.
> 
> On Wed, Jul 09, 2025 at 08:22:09PM +0200, Ben Hutchings <ben@decadent.org.uk> wrote:
> > Would you consider reverting this change for the sake of compatibility?
> 
> As you write, it's not fatally broken and it may be "just" an issue of
> container images that got no fresh rebuild. (And I think it should be
> generally discouraged running containers with stale deps in them.)
> 
> The original patch would mainly serve legacy userspace (host) setups on
> top of contemporary kernel (besides API purity reasons). Admittedly,
> these should be rare and eventually extinct in contrast with your
> example where it's a containerized userspace (which typically could do
> no cgroup setup) that may still have some user demand.
> 
> So, I'd be more confident with the revert if such an adjustment was
> carried downstream by some distro and proven its viability first. Do you
> know of any in the wild?

The revert has just gone into Debian unstable, targetting the upcoming
stable release.  So at this point I can't confidently state that it
won't also cause regressions.

Ben.

> 
> I appreciate your report,
> Michal

-- 
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed
                                                    - Carolyn Scheppner
Re: [PATCH 4/4] cgroup: Do not report unavailable v1 controllers in /proc/cgroups
Posted by Tejun Heo 7 months ago
On Fri, Jul 11, 2025 at 03:10:44PM +0200, Michal Koutný wrote:
> Hello Ben.
> 
> On Wed, Jul 09, 2025 at 08:22:09PM +0200, Ben Hutchings <ben@decadent.org.uk> wrote:
> > Would you consider reverting this change for the sake of compatibility?
> 
> As you write, it's not fatally broken and it may be "just" an issue of
> container images that got no fresh rebuild. (And I think it should be
> generally discouraged running containers with stale deps in them.)
> 
> The original patch would mainly serve legacy userspace (host) setups on
> top of contemporary kernel (besides API purity reasons). Admittedly,
> these should be rare and eventually extinct in contrast with your
> example where it's a containerized userspace (which typically could do
> no cgroup setup) that may still have some user demand.
> 
> So, I'd be more confident with the revert if such an adjustment was
> carried downstream by some distro and proven its viability first. Do you
> know of any in the wild?

I think we still want to deprecate /proc/cgroups but given that there are
impacted users maybe we can bring it back under a boottime param w/ warning?

Thanks.

-- 
tejun
Re: [PATCH 4/4] cgroup: Do not report unavailable v1 controllers in /proc/cgroups
Posted by Michal Koutný 6 months, 3 weeks ago
On Fri, Jul 11, 2025 at 12:15:07PM -1000, Tejun Heo <tj@kernel.org> wrote:
> I think we still want to deprecate /proc/cgroups but given that there are
> impacted users maybe we can bring it back under a boottime param w/ warning?

Something like below? (I don't change the log level.)

Ben, the affected Java users could modify it at boot time. I saw your
revert is in v6.12, so you may also want backport of a0ab1453226d8 to
give the users a message. (I realize current->comm in the message would
be even more instructive.)

-- >8 --

From ace88e9e3a77ff3fe86aee4b7a5866b3bfd2df58 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michal=20Koutn=C3=BD?= <mkoutny@suse.com>
Date: Thu, 17 Jul 2025 17:38:47 +0200
Subject: [PATCH] cgroup: Add compatibility option for content of /proc/cgroups
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

/proc/cgroups lists only v1 controllers by default, however, this is
only enforced since the commit af000ce85293b ("cgroup: Do not report
unavailable v1 controllers in /proc/cgroups") and there is software in
the wild that uses content of /proc/cgroups to decide on availability of
v2 (sic) controllers.

Add a boottime param that can bring back the previous behavior for
setups where the check in the software cannot be changed and it causes
e.g. unintended OOMs.

Also, this patch takes out cgrp_v1_visible from cgroup1_subsys_absent()
guard since it's only important to check which hierarchy (v1 vs v2) the
subsys is attached to. This has no effect on the printed message but
the code is cleaner since cgrp_v1_visible is really about mounted
hierarchies, not the content of /proc/cgroups.

Link: https://lore.kernel.org/r/b26b60b7d0d2a5ecfd2f3c45f95f32922ed24686.camel@decadent.org.uk
Fixes: af000ce85293b ("cgroup: Do not report unavailable v1 controllers in /proc/cgroups")
Fixes: a0ab1453226d8 ("cgroup: Print message when /proc/cgroups is read on v2-only system")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  8 ++++++++
 kernel/cgroup/cgroup-v1.c                       | 14 ++++++++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 07e22ba5bfe34..f6d317e1674d6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -633,6 +633,14 @@
 			named mounts. Specifying both "all" and "named" disables
 			all v1 hierarchies.
 
+	cgroup_v1_proc=	[KNL] Show also missing controllers in /proc/cgroups
+			Format: { "true" | "false" }
+			/proc/cgroups lists only v1 controllers by default.
+			This compatibility option enables listing also v2
+			controllers (whose v1 code is not compiled!), so that
+			semi-legacy software can check this file to decide
+			about usage of v2 (sic) controllers.
+
 	cgroup_favordynmods= [KNL] Enable or Disable favordynmods.
 			Format: { "true" | "false" }
 			Defaults to the value of CONFIG_CGROUP_FAVOR_DYNMODS.
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index fa24c032ed6fe..2a4a387f867ab 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -32,6 +32,9 @@ static u16 cgroup_no_v1_mask;
 /* disable named v1 mounts */
 static bool cgroup_no_v1_named;
 
+/* Show unavailable controllers in /proc/cgroups */
+static bool proc_show_all;
+
 /*
  * pidlist destructions need to be flushed on cgroup destruction.  Use a
  * separate workqueue as flush domain.
@@ -683,10 +686,11 @@ int proc_cgroupstats_show(struct seq_file *m, void *v)
 	 */
 
 	for_each_subsys(ss, i) {
-		if (cgroup1_subsys_absent(ss))
-			continue;
 		cgrp_v1_visible |= ss->root != &cgrp_dfl_root;
 
+		if (!proc_show_all && cgroup1_subsys_absent(ss))
+			continue;
+
 		seq_printf(m, "%s\t%d\t%d\t%d\n",
 			   ss->legacy_name, ss->root->hierarchy_id,
 			   atomic_read(&ss->root->nr_cgrps),
@@ -1359,3 +1363,9 @@ static int __init cgroup_no_v1(char *str)
 	return 1;
 }
 __setup("cgroup_no_v1=", cgroup_no_v1);
+
+static int __init cgroup_v1_proc(char *str)
+{
+	return (kstrtobool(str, &proc_show_all) == 0);
+}
+__setup("cgroup_v1_proc=", cgroup_v1_proc);
-- 
2.50.0

Re: [PATCH 4/4] cgroup: Do not report unavailable v1 controllers in /proc/cgroups
Posted by Tejun Heo 6 months, 3 weeks ago
On Fri, Jul 18, 2025 at 11:18:54AM +0200, Michal Koutný wrote:
> From ace88e9e3a77ff3fe86aee4b7a5866b3bfd2df58 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Michal=20Koutn=C3=BD?= <mkoutny@suse.com>
> Date: Thu, 17 Jul 2025 17:38:47 +0200
> Subject: [PATCH] cgroup: Add compatibility option for content of /proc/cgroups
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> /proc/cgroups lists only v1 controllers by default, however, this is
> only enforced since the commit af000ce85293b ("cgroup: Do not report
> unavailable v1 controllers in /proc/cgroups") and there is software in
> the wild that uses content of /proc/cgroups to decide on availability of
> v2 (sic) controllers.
> 
> Add a boottime param that can bring back the previous behavior for
> setups where the check in the software cannot be changed and it causes
> e.g. unintended OOMs.
> 
> Also, this patch takes out cgrp_v1_visible from cgroup1_subsys_absent()
> guard since it's only important to check which hierarchy (v1 vs v2) the
> subsys is attached to. This has no effect on the printed message but
> the code is cleaner since cgrp_v1_visible is really about mounted
> hierarchies, not the content of /proc/cgroups.
> 
> Link: https://lore.kernel.org/r/b26b60b7d0d2a5ecfd2f3c45f95f32922ed24686.camel@decadent.org.uk
> Fixes: af000ce85293b ("cgroup: Do not report unavailable v1 controllers in /proc/cgroups")
> Fixes: a0ab1453226d8 ("cgroup: Print message when /proc/cgroups is read on v2-only system")
> Signed-off-by: Michal Koutný <mkoutny@suse.com>

Applied to cgroup/for-6.17.

Thanks.

-- 
tejun
Re: [PATCH 4/4] cgroup: Do not report unavailable v1 controllers in /proc/cgroups
Posted by Tejun Heo 1 year, 5 months ago
On Mon, Sep 09, 2024 at 06:32:23PM +0200, Michal Koutný wrote:
> This is a followup to CONFIG-urability of cpuset and memory controllers
> for v1 hierarchies. Make the output in /proc/cgroups reflect that
> !CONFIG_CPUSETS_V1 is like !CONFIG_CPUSETS and
> !CONFIG_MEMCG_V1 is like !CONFIG_MEMCG.
> 
> The intended effect is that hiding the unavailable controllers will hint
> users not to try mounting them on v1.
> 
> Signed-off-by: Michal Koutný <mkoutny@suse.com>

Applied to cgroup/for-6.12 w/ Waiman's reviewed-by.

Thanks.

-- 
tejun
Re: [PATCH 4/4] cgroup: Do not report unavailable v1 controllers in /proc/cgroups
Posted by Waiman Long 1 year, 5 months ago
On 9/9/24 12:32, Michal Koutný wrote:
> This is a followup to CONFIG-urability of cpuset and memory controllers
> for v1 hierarchies. Make the output in /proc/cgroups reflect that
> !CONFIG_CPUSETS_V1 is like !CONFIG_CPUSETS and
> !CONFIG_MEMCG_V1 is like !CONFIG_MEMCG.
>
> The intended effect is that hiding the unavailable controllers will hint
> users not to try mounting them on v1.
>
> Signed-off-by: Michal Koutný <mkoutny@suse.com>
> ---
>   kernel/cgroup/cgroup-v1.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> index 784337694a4be..e28d5f0d20ed0 100644
> --- a/kernel/cgroup/cgroup-v1.c
> +++ b/kernel/cgroup/cgroup-v1.c
> @@ -681,11 +681,14 @@ int proc_cgroupstats_show(struct seq_file *m, void *v)
>   	 * cgroup_mutex contention.
>   	 */
>   
> -	for_each_subsys(ss, i)
> +	for_each_subsys(ss, i) {
> +		if (cgroup1_subsys_absent(ss))
> +			continue;
>   		seq_printf(m, "%s\t%d\t%d\t%d\n",
>   			   ss->legacy_name, ss->root->hierarchy_id,
>   			   atomic_read(&ss->root->nr_cgrps),
>   			   cgroup_ssid_enabled(i));
> +	}
>   
>   	return 0;
>   }
Reviewed-by: Waiman Long <longman@redhat.com>