[PATCH] cgroup: Migrate tasks to the root css when a controller is rebound

Tejun Heo posted 1 patch 6 days, 9 hours ago
kernel/cgroup/cgroup.c | 35 +++++++++++++++++++++++++++++++----
1 file changed, 31 insertions(+), 4 deletions(-)
[PATCH] cgroup: Migrate tasks to the root css when a controller is rebound
Posted by Tejun Heo 6 days, 9 hours ago
cgroup_apply_control_disable() defers kill_css_finish() while a css is
still populated, relying on css_update_populated() to fire the deferred
kill once the populated count reaches zero.

This deadlocks when a controller is rebound out of a hierarchy. Mounting
an implicit_on_dfl controller such as perf_event as a v1 hierarchy steals
it off the default hierarchy, and rebind_subsystems() kills its
per-cgroup csses while they are still populated. The migration run in the
same step keeps the old css for a controller no longer in the hierarchy's
mask, so no task is migrated off the dying csses. Their populated count
never reaches zero, the deferred kill_css_finish() never fires, and the
next cgroup_lock_and_drain_offline() hangs forever under cgroup_mutex.

That migration is already a no-op pass over the rebound subtree. Add
cgroup_rebind_ss_mask so find_existing_css_set() resolves the leaving
controllers to the root css. Their tasks are migrated there, the
per-cgroup csses depopulate, and cgroup_apply_control_disable() kills
them synchronously. The deferral stays correct for the rmdir and
controller-disable paths it was meant for.

Fixes: 1dffd95575eb ("cgroup: Defer kill_css_finish() in cgroup_apply_control_disable()")
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/41cd159c-54e5-45e0-81df-eaf36a6c028e@sirena.org.uk/
Reported-by: Bert Karwatzki <spasswolf@web.de>
Closes: https://lore.kernel.org/all/4e986b4ed7e16547805d54b6e67d09120bc4d2f2.camel@web.de/
Signed-off-by: Tejun Heo <tj@kernel.org>
---
Hello, and thanks a lot for all the reproduction information. It made this
much easier to track down.

Bert, Mark, would you mind giving this a try on your setups?

 kernel/cgroup/cgroup.c | 35 +++++++++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index bdc8deedb4f7..7f4861109e48 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -197,6 +197,14 @@ static u32 cgrp_dfl_implicit_ss_mask;
 /* some controllers can be threaded on the default hierarchy */
 static u32 cgrp_dfl_threaded_ss_mask;
 
+/*
+ * Set across rebind_subsystems() to the controllers leaving a hierarchy.
+ * Guarded by cgroup_mutex. Makes find_existing_css_set() resolve them to the
+ * root css so the affected tasks are migrated there before
+ * cgroup_apply_control_disable() kills the per-cgroup csses.
+ */
+static u32 cgroup_rebind_ss_mask;
+
 /* The list of hierarchy roots */
 LIST_HEAD(cgroup_roots);
 static int cgroup_root_count;
@@ -1083,7 +1091,15 @@ static struct css_set *find_existing_css_set(struct css_set *old_cset,
 	 * won't change, so no need for locking.
 	 */
 	for_each_subsys(ss, i) {
-		if (root->subsys_mask & (1UL << i)) {
+		if (unlikely(cgroup_rebind_ss_mask & (1UL << i))) {
+			/*
+			 * @ss is leaving this hierarchy and its per-cgroup
+			 * csses are about to be killed. Resolve to the
+			 * surviving root css so the tasks are migrated there.
+			 */
+			template[i] = cgroup_css(&root->cgrp, ss);
+			WARN_ON_ONCE(!template[i]);
+		} else if (root->subsys_mask & (1UL << i)) {
 			/*
 			 * @ss is in this hierarchy, so we want the
 			 * effective css from @cgrp.
@@ -1853,11 +1869,17 @@ int rebind_subsystems(struct cgroup_root *dst_root, u32 ss_mask)
 		struct cgroup *scgrp = &cgrp_dfl_root.cgrp;
 
 		/*
-		 * Controllers from default hierarchy that need to be rebound
-		 * are all disabled together in one go.
+		 * Controllers leaving the default hierarchy are disabled
+		 * together. cgroup_rebind_ss_mask makes cgroup_apply_control()
+		 * migrate their tasks to the root css, so the per-cgroup csses
+		 * are unpopulated when cgroup_finalize_control() kills them.
+		 * Clear it before cgroup_finalize_control(), which does no
+		 * css_set lookup.
 		 */
 		cgrp_dfl_root.subsys_mask &= ~dfl_disable_ss_mask;
+		cgroup_rebind_ss_mask = dfl_disable_ss_mask;
 		WARN_ON(cgroup_apply_control(scgrp));
+		cgroup_rebind_ss_mask = 0;
 		cgroup_finalize_control(scgrp, 0);
 	}
 
@@ -1871,9 +1893,14 @@ int rebind_subsystems(struct cgroup_root *dst_root, u32 ss_mask)
 		WARN_ON(!css || cgroup_css(dcgrp, ss));
 
 		if (src_root != &cgrp_dfl_root) {
-			/* disable from the source */
+			/*
+			 * Disable from the source, migrating its tasks to the
+			 * root css first (see cgroup_rebind_ss_mask).
+			 */
 			src_root->subsys_mask &= ~(1 << ssid);
+			cgroup_rebind_ss_mask = 1 << ssid;
 			WARN_ON(cgroup_apply_control(scgrp));
+			cgroup_rebind_ss_mask = 0;
 			cgroup_finalize_control(scgrp, 0);
 		}
 
-- 
2.54.0
Re: [PATCH] cgroup: Migrate tasks to the root css when a controller is rebound
Posted by Mark Brown 5 days, 11 hours ago
On Mon, Jun 01, 2026 at 09:02:56AM -1000, Tejun Heo wrote:
> cgroup_apply_control_disable() defers kill_css_finish() while a css is
> still populated, relying on css_update_populated() to fire the deferred
> kill once the populated count reaches zero.

This seems to fix things for me, thanks both!

Tested-by: Mark Brown <broonie@kernel.org>
Re: [PATCH] cgroup: Migrate tasks to the root css when a controller is rebound
Posted by Bert Karwatzki 6 days, 9 hours ago
Am Montag, dem 01.06.2026 um 09:02 -1000 schrieb Tejun Heo:
> cgroup_apply_control_disable() defers kill_css_finish() while a css is
> still populated, relying on css_update_populated() to fire the deferred
> kill once the populated count reaches zero.
> 
> This deadlocks when a controller is rebound out of a hierarchy. Mounting
> an implicit_on_dfl controller such as perf_event as a v1 hierarchy steals
> it off the default hierarchy, and rebind_subsystems() kills its
> per-cgroup csses while they are still populated. The migration run in the
> same step keeps the old css for a controller no longer in the hierarchy's
> mask, so no task is migrated off the dying csses. Their populated count
> never reaches zero, the deferred kill_css_finish() never fires, and the
> next cgroup_lock_and_drain_offline() hangs forever under cgroup_mutex.
> 
> That migration is already a no-op pass over the rebound subtree. Add
> cgroup_rebind_ss_mask so find_existing_css_set() resolves the leaving
> controllers to the root css. Their tasks are migrated there, the
> per-cgroup csses depopulate, and cgroup_apply_control_disable() kills
> them synchronously. The deferral stays correct for the rmdir and
> controller-disable paths it was meant for.
> 
> Fixes: 1dffd95575eb ("cgroup: Defer kill_css_finish() in cgroup_apply_control_disable()")
> Reported-by: Mark Brown <broonie@kernel.org>
> Closes: https://lore.kernel.org/all/41cd159c-54e5-45e0-81df-eaf36a6c028e@sirena.org.uk/
> Reported-by: Bert Karwatzki <spasswolf@web.de>
> Closes: https://lore.kernel.org/all/4e986b4ed7e16547805d54b6e67d09120bc4d2f2.camel@web.de/
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> Hello, and thanks a lot for all the reproduction information. It made this
> much easier to track down.
> 
> Bert, Mark, would you mind giving this a try on your setups?
> 
>  kernel/cgroup/cgroup.c | 35 +++++++++++++++++++++++++++++++----
>  1 file changed, 31 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index bdc8deedb4f7..7f4861109e48 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -197,6 +197,14 @@ static u32 cgrp_dfl_implicit_ss_mask;
>  /* some controllers can be threaded on the default hierarchy */
>  static u32 cgrp_dfl_threaded_ss_mask;
>  
> +/*
> + * Set across rebind_subsystems() to the controllers leaving a hierarchy.
> + * Guarded by cgroup_mutex. Makes find_existing_css_set() resolve them to the
> + * root css so the affected tasks are migrated there before
> + * cgroup_apply_control_disable() kills the per-cgroup csses.
> + */
> +static u32 cgroup_rebind_ss_mask;
> +
>  /* The list of hierarchy roots */
>  LIST_HEAD(cgroup_roots);
>  static int cgroup_root_count;
> @@ -1083,7 +1091,15 @@ static struct css_set *find_existing_css_set(struct css_set *old_cset,
>  	 * won't change, so no need for locking.
>  	 */
>  	for_each_subsys(ss, i) {
> -		if (root->subsys_mask & (1UL << i)) {
> +		if (unlikely(cgroup_rebind_ss_mask & (1UL << i))) {
> +			/*
> +			 * @ss is leaving this hierarchy and its per-cgroup
> +			 * csses are about to be killed. Resolve to the
> +			 * surviving root css so the tasks are migrated there.
> +			 */
> +			template[i] = cgroup_css(&root->cgrp, ss);
> +			WARN_ON_ONCE(!template[i]);
> +		} else if (root->subsys_mask & (1UL << i)) {
>  			/*
>  			 * @ss is in this hierarchy, so we want the
>  			 * effective css from @cgrp.
> @@ -1853,11 +1869,17 @@ int rebind_subsystems(struct cgroup_root *dst_root, u32 ss_mask)
>  		struct cgroup *scgrp = &cgrp_dfl_root.cgrp;
>  
>  		/*
> -		 * Controllers from default hierarchy that need to be rebound
> -		 * are all disabled together in one go.
> +		 * Controllers leaving the default hierarchy are disabled
> +		 * together. cgroup_rebind_ss_mask makes cgroup_apply_control()
> +		 * migrate their tasks to the root css, so the per-cgroup csses
> +		 * are unpopulated when cgroup_finalize_control() kills them.
> +		 * Clear it before cgroup_finalize_control(), which does no
> +		 * css_set lookup.
>  		 */
>  		cgrp_dfl_root.subsys_mask &= ~dfl_disable_ss_mask;
> +		cgroup_rebind_ss_mask = dfl_disable_ss_mask;
>  		WARN_ON(cgroup_apply_control(scgrp));
> +		cgroup_rebind_ss_mask = 0;
>  		cgroup_finalize_control(scgrp, 0);
>  	}
>  
> @@ -1871,9 +1893,14 @@ int rebind_subsystems(struct cgroup_root *dst_root, u32 ss_mask)
>  		WARN_ON(!css || cgroup_css(dcgrp, ss));
>  
>  		if (src_root != &cgrp_dfl_root) {
> -			/* disable from the source */
> +			/*
> +			 * Disable from the source, migrating its tasks to the
> +			 * root css first (see cgroup_rebind_ss_mask).
> +			 */
>  			src_root->subsys_mask &= ~(1 << ssid);
> +			cgroup_rebind_ss_mask = 1 << ssid;
>  			WARN_ON(cgroup_apply_control(scgrp));
> +			cgroup_rebind_ss_mask = 0;
>  			cgroup_finalize_control(scgrp, 0);
>  		}
>  

I'll try this right away, but I found out another thing. My real problem seems
to be the perf_event test, the test after perf_events hangs, no matter what
test I run:

cgroup_fj_function_perf_event: pass  (0.206s)
cgroup_core01: HANG 

Bert Karwatzki
Re: [PATCH] cgroup: Migrate tasks to the root css when a controller is rebound
Posted by Bert Karwatzki 6 days, 8 hours ago
Am Montag, dem 01.06.2026 um 21:07 +0200 schrieb Bert Karwatzki:
> Am Montag, dem 01.06.2026 um 09:02 -1000 schrieb Tejun Heo:
> > cgroup_apply_control_disable() defers kill_css_finish() while a css is
> > still populated, relying on css_update_populated() to fire the deferred
> > kill once the populated count reaches zero.
> > 
> > This deadlocks when a controller is rebound out of a hierarchy. Mounting
> > an implicit_on_dfl controller such as perf_event as a v1 hierarchy steals
> > it off the default hierarchy, and rebind_subsystems() kills its
> > per-cgroup csses while they are still populated. The migration run in the
> > same step keeps the old css for a controller no longer in the hierarchy's
> > mask, so no task is migrated off the dying csses. Their populated count
> > never reaches zero, the deferred kill_css_finish() never fires, and the
> > next cgroup_lock_and_drain_offline() hangs forever under cgroup_mutex.
> > 
> > That migration is already a no-op pass over the rebound subtree. Add
> > cgroup_rebind_ss_mask so find_existing_css_set() resolves the leaving
> > controllers to the root css. Their tasks are migrated there, the
> > per-cgroup csses depopulate, and cgroup_apply_control_disable() kills
> > them synchronously. The deferral stays correct for the rmdir and
> > controller-disable paths it was meant for.
> > 
> > Fixes: 1dffd95575eb ("cgroup: Defer kill_css_finish() in cgroup_apply_control_disable()")
> > Reported-by: Mark Brown <broonie@kernel.org>
> > Closes: https://lore.kernel.org/all/41cd159c-54e5-45e0-81df-eaf36a6c028e@sirena.org.uk/
> > Reported-by: Bert Karwatzki <spasswolf@web.de>
> > Closes: https://lore.kernel.org/all/4e986b4ed7e16547805d54b6e67d09120bc4d2f2.camel@web.de/
> > Signed-off-by: Tejun Heo <tj@kernel.org>
> > ---
> > Hello, and thanks a lot for all the reproduction information. It made this
> > much easier to track down.
> > 
> > Bert, Mark, would you mind giving this a try on your setups?
> > 
> >  kernel/cgroup/cgroup.c | 35 +++++++++++++++++++++++++++++++----
> >  1 file changed, 31 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> > index bdc8deedb4f7..7f4861109e48 100644
> > --- a/kernel/cgroup/cgroup.c
> > +++ b/kernel/cgroup/cgroup.c
> > @@ -197,6 +197,14 @@ static u32 cgrp_dfl_implicit_ss_mask;
> >  /* some controllers can be threaded on the default hierarchy */
> >  static u32 cgrp_dfl_threaded_ss_mask;
> >  
> > +/*
> > + * Set across rebind_subsystems() to the controllers leaving a hierarchy.
> > + * Guarded by cgroup_mutex. Makes find_existing_css_set() resolve them to the
> > + * root css so the affected tasks are migrated there before
> > + * cgroup_apply_control_disable() kills the per-cgroup csses.
> > + */
> > +static u32 cgroup_rebind_ss_mask;
> > +
> >  /* The list of hierarchy roots */
> >  LIST_HEAD(cgroup_roots);
> >  static int cgroup_root_count;
> > @@ -1083,7 +1091,15 @@ static struct css_set *find_existing_css_set(struct css_set *old_cset,
> >  	 * won't change, so no need for locking.
> >  	 */
> >  	for_each_subsys(ss, i) {
> > -		if (root->subsys_mask & (1UL << i)) {
> > +		if (unlikely(cgroup_rebind_ss_mask & (1UL << i))) {
> > +			/*
> > +			 * @ss is leaving this hierarchy and its per-cgroup
> > +			 * csses are about to be killed. Resolve to the
> > +			 * surviving root css so the tasks are migrated there.
> > +			 */
> > +			template[i] = cgroup_css(&root->cgrp, ss);
> > +			WARN_ON_ONCE(!template[i]);
> > +		} else if (root->subsys_mask & (1UL << i)) {
> >  			/*
> >  			 * @ss is in this hierarchy, so we want the
> >  			 * effective css from @cgrp.
> > @@ -1853,11 +1869,17 @@ int rebind_subsystems(struct cgroup_root *dst_root, u32 ss_mask)
> >  		struct cgroup *scgrp = &cgrp_dfl_root.cgrp;
> >  
> >  		/*
> > -		 * Controllers from default hierarchy that need to be rebound
> > -		 * are all disabled together in one go.
> > +		 * Controllers leaving the default hierarchy are disabled
> > +		 * together. cgroup_rebind_ss_mask makes cgroup_apply_control()
> > +		 * migrate their tasks to the root css, so the per-cgroup csses
> > +		 * are unpopulated when cgroup_finalize_control() kills them.
> > +		 * Clear it before cgroup_finalize_control(), which does no
> > +		 * css_set lookup.
> >  		 */
> >  		cgrp_dfl_root.subsys_mask &= ~dfl_disable_ss_mask;
> > +		cgroup_rebind_ss_mask = dfl_disable_ss_mask;
> >  		WARN_ON(cgroup_apply_control(scgrp));
> > +		cgroup_rebind_ss_mask = 0;
> >  		cgroup_finalize_control(scgrp, 0);
> >  	}
> >  
> > @@ -1871,9 +1893,14 @@ int rebind_subsystems(struct cgroup_root *dst_root, u32 ss_mask)
> >  		WARN_ON(!css || cgroup_css(dcgrp, ss));
> >  
> >  		if (src_root != &cgrp_dfl_root) {
> > -			/* disable from the source */
> > +			/*
> > +			 * Disable from the source, migrating its tasks to the
> > +			 * root css first (see cgroup_rebind_ss_mask).
> > +			 */
> >  			src_root->subsys_mask &= ~(1 << ssid);
> > +			cgroup_rebind_ss_mask = 1 << ssid;
> >  			WARN_ON(cgroup_apply_control(scgrp));
> > +			cgroup_rebind_ss_mask = 0;
> >  			cgroup_finalize_control(scgrp, 0);
> >  		}
> >  
> 
> 
> Bert Karwatzki

Your fix works for me. No more hangs after cgroup_fj_function_perf_event is run.
Let's hope this solves Mark's problems, too.

Tested-By: Bert Karwatzki <spasswolf@web.de>

Bert Karwatzki