cgroup: Improve cgroup_addrm_files remove files handling

[PATCH] cgroup: Improve cgroup_addrm_files remove files handling

Posted by Wenyu Liu 2 months, 4 weeks ago

For now cgroup_apply_cftypes(cfts, is_add) with `is_add` false
will remove all the cftype files in `cfts` array, this can lead
to some unexpected behaviors in some abnormal situation.

Consider this situation: if we have two cftype arrays A and B
which contain the exact same files, and we add this two cftypes
with cgroup_add_cftypes().

We can correctly add files from A, but adding B will delete all
files previously added from A.

When adding cftype files of B:
cgroup_add_cftypes
  ->cgroup_apply_cftypes
      ->cgroup_addrm_files (failed with -EEXIST)
  ->cgroup_rm_cftypes_locked (this will delete all files added from A)

Even worse thing is that if we add B again at this point,
we can successfully create these files, but there will be two cftys
nodes (A and B) on the ss->cftys list at the same time.

    ss        A|0|1|2|3|...|   B|0|1|2|3|...|
 +------+       |                |
 |      |       +          +-----+
 +------+       |          |
 | cfts |<-->|node|<--->|node|<--->|...|
 +------+

This will lead to all hierarchies that apply this ss controller
to be unable to create any directory:
cgroup_mkdir
  ->cgroup_apply_control_enable
    ->css_populate_dir
      ->cgroup_addrm_files (will return -EEXIST when handling node of B)

Add a new flag __CFTYPE_ADDRM_END to track the end cft if something
wrong with cgroup_addrm_files() add files, make sure we only remove
the cftype files that were successfully added.

Signed-off-by: Wenyu Liu <liuwenyu.0311@bytedance.com>
---
 include/linux/cgroup-defs.h | 1 +
 kernel/cgroup/cgroup.c      | 8 ++++----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 93318fce31f3..7ad98048ca23 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -144,6 +144,7 @@ enum {
 	__CFTYPE_ONLY_ON_DFL	= (1 << 16),	/* only on default hierarchy */
 	__CFTYPE_NOT_ON_DFL	= (1 << 17),	/* not on default hierarchy */
 	__CFTYPE_ADDED		= (1 << 18),
+	__CFTYPE_ADDRM_END  = (1 << 19),
 };
 
 enum cgroup_attach_lock_mode {
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 6ae5f48cf64e..0d7d3079e635 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4453,13 +4453,13 @@ static int cgroup_addrm_files(struct cgroup_subsys_state *css,
 			      struct cgroup *cgrp, struct cftype cfts[],
 			      bool is_add)
 {
-	struct cftype *cft, *cft_end = NULL;
+	struct cftype *cft;
 	int ret = 0;
 
 	lockdep_assert_held(&cgroup_mutex);
 
 restart:
-	for (cft = cfts; cft != cft_end && cft->name[0] != '\0'; cft++) {
+	for (cft = cfts; !(cft->flags & __CFTYPE_ADDRM_END) && cft->name[0] != '\0'; cft++) {
 		/* does cft->flags tell us to skip this file on @cgrp? */
 		if ((cft->flags & __CFTYPE_ONLY_ON_DFL) && !cgroup_on_dfl(cgrp))
 			continue;
@@ -4476,7 +4476,7 @@ static int cgroup_addrm_files(struct cgroup_subsys_state *css,
 			if (ret) {
 				pr_warn("%s: failed to add %s, err=%d\n",
 					__func__, cft->name, ret);
-				cft_end = cft;
+				cft->flags |= __CFTYPE_ADDRM_END;
 				is_add = false;
 				goto restart;
 			}
@@ -4526,7 +4526,7 @@ static void cgroup_exit_cftypes(struct cftype *cfts)
 
 		/* revert flags set by cgroup core while adding @cfts */
 		cft->flags &= ~(__CFTYPE_ONLY_ON_DFL | __CFTYPE_NOT_ON_DFL |
-				__CFTYPE_ADDED);
+				__CFTYPE_ADDED | __CFTYPE_ADDRM_END);
 	}
 }
 
-- 
2.39.3 (Apple Git-146)

Re: [PATCH] cgroup: Improve cgroup_addrm_files remove files handling

Posted by Michal Koutný 2 months, 4 weeks ago

Hi Wenyu.

On Tue, Nov 11, 2025 at 09:44:27PM +0800, Wenyu Liu <liuwenyu.0311@bytedance.com> wrote:
> Consider this situation: if we have two cftype arrays A and B
> which contain the exact same files, and we add this two cftypes
> with cgroup_add_cftypes().

Do you have more details about this situation?
Does this happen with any of the mainline controllers?

Thanks,
Michal

Re: [PATCH] cgroup: Improve cgroup_addrm_files remove files handling

Posted by Wenyu Liu 2 months, 3 weeks ago


在 11/11/25 21:54, Michal Koutný 写道:
> Hi Wenyu.
> 
> On Tue, Nov 11, 2025 at 09:44:27PM +0800, Wenyu Liu <liuwenyu.0311@bytedance.com> wrote:
>> Consider this situation: if we have two cftype arrays A and B
>> which contain the exact same files, and we add this two cftypes
>> with cgroup_add_cftypes().
> 
> Do you have more details about this situation?
> Does this happen with any of the mainline controllers?
> 
> Thanks,
> Michal
And here is a simple test module that will reproduce this problem. I know that using the symbol not exported is not recommended, but judging from the code, it seems to be indeed a bug: it should always return -EXEIST, however this module can be loaded successfully and make creating a new cgroup dirictory failed with the 'File exists' error.
---
 Makefile       |  9 +++++
 cft_add_test.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 108 insertions(+)
 create mode 100644 Makefile
 create mode 100644 cft_add_test.c

diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..160a718
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+kernelver ?= $(shell uname -r)
+obj-m += cft_add_test.o
+
+all:
+       $(MAKE) -C /lib/modules/$(kernelver)/build M=$(CURDIR) modules
+clean:
+       $(MAKE) -C /lib/modules/$(kernelver)/build M=$(CURDIR) clean
\ No newline at end of file
diff --git a/cft_add_test.c b/cft_add_test.c
new file mode 100644
index 0000000..473b988
--- /dev/null
+++ b/cft_add_test.c
@@ -0,0 +1,99 @@
+//SPDX-License-Identifier: GPL-2.0-only
+#include <linux/kprobes.h>
+#include <linux/cgroup.h>
+#include <linux/kernfs.h>
+#include <linux/dcache.h>
+
+typedef unsigned long (*kallsyms_lookup_name_t)(const char *name);
+
+static struct kprobe kp = {
+    .symbol_name = "kallsyms_lookup_name",
+};
+
+#define LOOKUP_SYMS(name)                                               \
+do {                                                                    \
+        kallsyms_lookup_name_t kallsyms_lookup_name;                    \
+        register_kprobe(&kp);                                           \
+        kallsyms_lookup_name = (kallsyms_lookup_name_t)kp.addr;         \
+        unregister_kprobe(&kp);                                         \
+        m_##name = (void *)kallsyms_lookup_name(#name);                 \
+        if (!m_##name) {                                                \
+                pr_err("kallsyms loopup failed: %s\n", #name);          \
+                return -ENXIO;                                          \
+        }                                                               \
+} while (0)                                                             \
+
+static int (*m_cgroup_add_dfl_cftypes)(struct cgroup_subsys *ss, struct cftype *cfts);
+static int (*m_cgroup_rm_cftypes)(struct cftype *cfts);
+static struct cgroup_subsys *m_cpu_cgrp_subsys;
+
+static int show_cft_cgroup(struct seq_file *m, void *v)                
+{
+        struct cgroup *cgrp;
+        char path[1024];
+
+        cgrp = seq_css(m)->cgroup;
+        kernfs_path_from_node(cgrp->kn, NULL, path, sizeof(path));
+        seq_printf(m, "cgroup dir:%s\n", path);
+        return 0;
+}
+
+static struct cftype cft_add_files[] = {
+        {
+                .name = "cft_test_0",
+                .seq_show = show_cft_cgroup,
+        },
+        {
+                .name = "cft_test_1",
+                .seq_show = show_cft_cgroup,
+        },
+        {
+                .name = "cft_test_2",
+                .seq_show = show_cft_cgroup,
+        },
+        { } /* terminate */
+};
+
+static struct cftype cft_add_files_duplicate[] = {
+        {
+                .name = "cft_test_0",
+                .seq_show = show_cft_cgroup,
+        },
+        {
+                .name = "cft_test_1",
+                .seq_show = show_cft_cgroup,
+        },
+        { } /* terminate */
+};
+
+static int __init cft_add_test_init(void)
+{
+        int ret = 0;
+
+        LOOKUP_SYMS(cgroup_add_dfl_cftypes);
+        LOOKUP_SYMS(cgroup_rm_cftypes);
+        LOOKUP_SYMS(cpu_cgrp_subsys);
+
+        ret = m_cgroup_add_dfl_cftypes(m_cpu_cgrp_subsys, cft_add_files);
+        if (ret) {
+            pr_err("failed to add cft_add_files\n"); 
+            return ret;
+        }
+
+        ret = m_cgroup_add_dfl_cftypes(m_cpu_cgrp_subsys, cft_add_files_duplicate);
+        if (ret) {
+            //try again, this time will success
+            return m_cgroup_add_dfl_cftypes(m_cpu_cgrp_subsys, cft_add_files_duplicate);
+        } 
+        return ret;
+}
+
+static void __exit cft_add_test_exit(void)
+{
+        m_cgroup_rm_cftypes(cft_add_files);
+        m_cgroup_rm_cftypes(cft_add_files_duplicate);
+}
+
+module_init(cft_add_test_init);
+module_exit(cft_add_test_exit);
+MODULE_LICENSE("GPL v2");
--

Re: [PATCH] cgroup: Improve cgroup_addrm_files remove files handling

Posted by Wenyu Liu 2 months, 4 weeks ago

在 11/11/25 21:54, Michal Koutný 写道:
> Hi Wenyu.
> 
> On Tue, Nov 11, 2025 at 09:44:27PM +0800, Wenyu Liu <liuwenyu.0311@bytedance.com> wrote:
>> Consider this situation: if we have two cftype arrays A and B
>> which contain the exact same files, and we add this two cftypes
>> with cgroup_add_cftypes().
> 
> Do you have more details about this situation?
> Does this happen with any of the mainline controllers?
> 
> Thanks,
> Michal

On our servers, there a kernel module that registered some control files with cpu controller(with some hacky code, finding and use cgroup_add_dfl_cftypes() via kallsyms_lookup_name()).
For some reason the module name was altered between version iterations.
And due to some accidents, unfortunately, the updated module was reloaded onto the server that previously had the old version installed, resulting in the cgroup becoming unavailable.

Not found this happenned with mainline controllers.

Thanks,
Wenyu