kernel/cgroup/cgroup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
From: Chen Ridong <chenridong@huawei.com>
A hung task can occur during LTP cgroup testing when repeatedly
mounting/unmounting perf_event and net_prio controllers with
systemd.unified_cgroup_hierarchy=1. The hang manifests in
cgroup_lock_and_drain_offline() during root destruction.
Call Trace:
cgroup_lock_and_drain_offline+0x14c/0x1e8
cgroup_destroy_root+0x3c/0x2c0
css_free_rwork_fn+0x248/0x338
process_one_work+0x16c/0x3b8
worker_thread+0x22c/0x3b0
kthread+0xec/0x100
ret_from_fork+0x10/0x20
Root Cause:
CPU0 CPU1
mount perf_event umount net_prio
cgroup1_get_tree cgroup_kill_sb
rebind_subsystems // root destruction enqueues
// cgroup_destroy_wq
// kill all perf_event css
// one perf_event css A is dying
// css A offline enqueues cgroup_destroy_wq
// root destruction will be executed first
css_free_rwork_fn
cgroup_destroy_root
cgroup_lock_and_drain_offline
// some perf descendants are dying
// cgroup_destroy_wq max_active = 1
// waiting for css A to die
Problem scenario:
1. CPU0 mounts perf_event (rebind_subsystems)
2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work
3. A dying perf_event CSS gets queued for offline after root destruction
4. Root destruction waits for offline completion, but offline work is
blocked behind root destruction in cgroup_destroy_wq (max_active=1)
Solution:
Move cgroup_lock_and_drain_offline() to the start of unmount operations.
This ensures:
1. cgroup_lock_and_drain_offline() will not be called within
cgroup_destroy_wq context.
2. No new dying csses for the subsystem being unmounted can appear in
cgrp_dfl_root between unmount start and subsystem rebinding.
Fixes: 334c3679ec4b ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends")
Reported-by: Gao Yingjie <gaoyingjie@uniontech.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/cgroup.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 312c6a8b55bb..7a71410b350e 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1346,8 +1346,7 @@ static void cgroup_destroy_root(struct cgroup_root *root)
trace_cgroup_destroy_root(root);
- cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
-
+ cgroup_lock();
BUG_ON(atomic_read(&root->nr_cgrps));
BUG_ON(!list_empty(&cgrp->self.children));
@@ -2336,6 +2335,7 @@ static void cgroup_kill_sb(struct super_block *sb)
*
* And don't kill the default root.
*/
+ cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
if (list_empty(&root->cgrp.self.children) && root != &cgrp_dfl_root &&
!percpu_ref_is_dying(&root->cgrp.self.refcnt))
percpu_ref_kill(&root->cgrp.self.refcnt);
--
2.34.1
On 2025/7/22 17:24, Chen Ridong wrote: > From: Chen Ridong <chenridong@huawei.com> > > A hung task can occur during LTP cgroup testing when repeatedly > mounting/unmounting perf_event and net_prio controllers with > systemd.unified_cgroup_hierarchy=1. The hang manifests in > cgroup_lock_and_drain_offline() during root destruction. > > Call Trace: > cgroup_lock_and_drain_offline+0x14c/0x1e8 > cgroup_destroy_root+0x3c/0x2c0 > css_free_rwork_fn+0x248/0x338 > process_one_work+0x16c/0x3b8 > worker_thread+0x22c/0x3b0 > kthread+0xec/0x100 > ret_from_fork+0x10/0x20 > > Root Cause: > > CPU0 CPU1 > mount perf_event umount net_prio > cgroup1_get_tree cgroup_kill_sb > rebind_subsystems // root destruction enqueues > // cgroup_destroy_wq > // kill all perf_event css > // one perf_event css A is dying > // css A offline enqueues cgroup_destroy_wq > // root destruction will be executed first > css_free_rwork_fn > cgroup_destroy_root > cgroup_lock_and_drain_offline > // some perf descendants are dying > // cgroup_destroy_wq max_active = 1 > // waiting for css A to die > > Problem scenario: > 1. CPU0 mounts perf_event (rebind_subsystems) > 2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work > 3. A dying perf_event CSS gets queued for offline after root destruction > 4. Root destruction waits for offline completion, but offline work is > blocked behind root destruction in cgroup_destroy_wq (max_active=1) > > Solution: > Move cgroup_lock_and_drain_offline() to the start of unmount operations. > This ensures: > 1. cgroup_lock_and_drain_offline() will not be called within > cgroup_destroy_wq context. > 2. No new dying csses for the subsystem being unmounted can appear in > cgrp_dfl_root between unmount start and subsystem rebinding. > > Fixes: 334c3679ec4b ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends") > Reported-by: Gao Yingjie <gaoyingjie@uniontech.com> > Signed-off-by: Chen Ridong <chenridong@huawei.com> > --- > kernel/cgroup/cgroup.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > index 312c6a8b55bb..7a71410b350e 100644 > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -1346,8 +1346,7 @@ static void cgroup_destroy_root(struct cgroup_root *root) > > trace_cgroup_destroy_root(root); > > - cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp); > - > + cgroup_lock(); > BUG_ON(atomic_read(&root->nr_cgrps)); > BUG_ON(!list_empty(&cgrp->self.children)); > > @@ -2336,6 +2335,7 @@ static void cgroup_kill_sb(struct super_block *sb) > * > * And don't kill the default root. > */ > + cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp); > if (list_empty(&root->cgrp.self.children) && root != &cgrp_dfl_root && > !percpu_ref_is_dying(&root->cgrp.self.refcnt)) > percpu_ref_kill(&root->cgrp.self.refcnt); Sorry, this is a mistake, I will send the new one. Best regards, Ridong
© 2016 - 2025 Red Hat, Inc.