[v1] cgroup/cpuset: remove kernfs active break

[PATCH v1] cgroup/cpuset: remove kernfs active break

Posted by Chen Ridong 1 year, 1 month ago

From: Chen Ridong <chenridong@huawei.com>

A warning was found:

WARNING: CPU: 10 PID: 3486953 at fs/kernfs/file.c:828
CPU: 10 PID: 3486953 Comm: rmdir Kdump: loaded Tainted: G
RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0
RSP: 0018:ffff8881107ef9e0 EFLAGS: 00010202
RAX: 0000000080000002 RBX: ffff888154738c00 RCX: dffffc0000000000
RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffff888154738c04
RBP: ffff888154738c04 R08: ffffffffaf27fa15 R09: ffffed102a8e7180
R10: ffff888154738c07 R11: 0000000000000000 R12: ffff888154738c08
R13: ffff888750f8c000 R14: ffff888750f8c0e8 R15: ffff888154738ca0
FS:  00007f84cd0be740(0000) GS:ffff8887ddc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555f9fbe00c8 CR3: 0000000153eec001 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 kernfs_drain+0x15e/0x2f0
 __kernfs_remove+0x165/0x300
 kernfs_remove_by_name_ns+0x7b/0xc0
 cgroup_rm_file+0x154/0x1c0
 cgroup_addrm_files+0x1c2/0x1f0
 css_clear_dir+0x77/0x110
 kill_css+0x4c/0x1b0
 cgroup_destroy_locked+0x194/0x380
 cgroup_rmdir+0x2a/0x140

It can be explained by:
rmdir 				echo 1 > cpuset.cpus
				kernfs_fop_write_iter // active=0
cgroup_rm_file
kernfs_remove_by_name_ns	kernfs_get_active // active=1
__kernfs_remove					  // active=0x80000002
kernfs_drain			cpuset_write_resmask
wait_event
//waiting (active == 0x80000001)
				kernfs_break_active_protection
				// active = 0x80000001
// continue
				kernfs_unbreak_active_protection
				// active = 0x80000002
...
kernfs_should_drain_open_files
// warning occurs
				kernfs_put_active

This warning is caused by 'kernfs_break_active_protection' when it is
writing to cpuset.cpus, and the cgroup is removed concurrently.

The commit 3a5a6d0c2b03 ("cpuset: don't nest cgroup_mutex inside
get_online_cpus()") made cpuset_hotplug_workfn asynchronous, which grabs
the cgroup_mutex. To avoid deadlock. the commit 76bb5ab8f6e3 ("cpuset:
break kernfs active protection in cpuset_write_resmask()") added
'kernfs_break_active_protection' in the cpuset_write_resmask. This could
lead to this warning.

After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
processing synchronous"), the cpuset_write_resmask no longer needs to
wait the hotplug to finish, which means that cpuset_write_resmask won't
grab the cgroup_mutex. So the deadlock doesn't exist anymore. Therefore,
remove kernfs_break_active_protection operation in the
'cpuset_write_resmask'

Fixes: 76bb5ab8f6e3 ("cpuset: break kernfs active protection in cpuset_write_resmask()")
Reported-by: Ji Fa <jifa@huawei.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 kernel/cgroup/cpuset.c | 25 -------------------------
 1 file changed, 25 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 7ea559fb0cbf..0f910c828973 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3124,29 +3124,6 @@ ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
 	int retval = -ENODEV;
 
 	buf = strstrip(buf);
-
-	/*
-	 * CPU or memory hotunplug may leave @cs w/o any execution
-	 * resources, in which case the hotplug code asynchronously updates
-	 * configuration and transfers all tasks to the nearest ancestor
-	 * which can execute.
-	 *
-	 * As writes to "cpus" or "mems" may restore @cs's execution
-	 * resources, wait for the previously scheduled operations before
-	 * proceeding, so that we don't end up keep removing tasks added
-	 * after execution capability is restored.
-	 *
-	 * cpuset_handle_hotplug may call back into cgroup core asynchronously
-	 * via cgroup_transfer_tasks() and waiting for it from a cgroupfs
-	 * operation like this one can lead to a deadlock through kernfs
-	 * active_ref protection.  Let's break the protection.  Losing the
-	 * protection is okay as we check whether @cs is online after
-	 * grabbing cpuset_mutex anyway.  This only happens on the legacy
-	 * hierarchies.
-	 */
-	css_get(&cs->css);
-	kernfs_break_active_protection(of->kn);
-
 	cpus_read_lock();
 	mutex_lock(&cpuset_mutex);
 	if (!is_cpuset_online(cs))
@@ -3179,8 +3156,6 @@ ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
 out_unlock:
 	mutex_unlock(&cpuset_mutex);
 	cpus_read_unlock();
-	kernfs_unbreak_active_protection(of->kn);
-	css_put(&cs->css);
 	flush_workqueue(cpuset_migrate_mm_wq);
 	return retval ?: nbytes;
 }
-- 
2.34.1

Re: [PATCH v1] cgroup/cpuset: remove kernfs active break

Posted by Michal Koutný 1 year, 1 month ago

On Fri, Dec 20, 2024 at 01:31:06AM +0000, Chen Ridong <chenridong@huaweicloud.com> wrote:
> RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0

I assume it's this
	WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);

> It can be explained by:
> rmdir 				echo 1 > cpuset.cpus
> 				kernfs_fop_write_iter // active=0
> cgroup_rm_file
> kernfs_remove_by_name_ns	kernfs_get_active // active=1
> __kernfs_remove					  // active=0x80000002
> kernfs_drain			cpuset_write_resmask
> wait_event
> //waiting (active == 0x80000001)
> 				kernfs_break_active_protection
> 				// active = 0x80000001
> // continue
> 				kernfs_unbreak_active_protection
> 				// active = 0x80000002
> ...
> kernfs_should_drain_open_files
> // warning occurs
> 				kernfs_put_active

Thanks for this breakdown.

> To avoid deadlock. the commit 76bb5ab8f6e3 ("cpuset:
> break kernfs active protection in cpuset_write_resmask()") added
> 'kernfs_break_active_protection' in the cpuset_write_resmask. This could
> lead to this warning.

The deadlock cycle included cpuset_hotplug_work and since that was
removed in the said commit, there shouldn't be same deadlock possible.

Ridong, have you run your patch with CONFIG_LOCKDEP to check that
eventuality?

> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
> processing synchronous"), the cpuset_write_resmask no longer needs to
> wait the hotplug to finish, which means that cpuset_write_resmask won't
> grab the cgroup_mutex. So the deadlock doesn't exist anymore. Therefore,
> remove kernfs_break_active_protection operation in the
> 'cpuset_write_resmask'
> 
> Fixes: 76bb5ab8f6e3 ("cpuset: break kernfs active protection in cpuset_write_resmask()")

This commit alone isn't sufficient to cause the warning you observed,
right?

As I read kernfs_break_active_protection() comment, I don't see cpuset
code violating its conditions:
a) it's broken/unbroken from withing a kernfs file operation handler,
b) it pins the needed struct cpuset independently of kernfs_node (it's
   ok to be removed)

All in all -- I think the particular break/unbreak pair is unncecessary
nowadays and the warning implemented with hiding/showing kernfs files
didn't take temporary breakage into account (only based on quick
searching and vague memories).

Thanks,
Michal

Re: [PATCH v1] cgroup/cpuset: remove kernfs active break

Posted by Chen Ridong 1 year, 1 month ago


On 2025/1/3 0:02, Michal Koutný wrote:
> On Fri, Dec 20, 2024 at 01:31:06AM +0000, Chen Ridong <chenridong@huaweicloud.com> wrote:
>> RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0
> 
> I assume it's this
> 	WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
> 
Right.

>> It can be explained by:
>> rmdir 				echo 1 > cpuset.cpus
>> 				kernfs_fop_write_iter // active=0
>> cgroup_rm_file
>> kernfs_remove_by_name_ns	kernfs_get_active // active=1
>> __kernfs_remove					  // active=0x80000002
>> kernfs_drain			cpuset_write_resmask
>> wait_event
>> //waiting (active == 0x80000001)
>> 				kernfs_break_active_protection
>> 				// active = 0x80000001
>> // continue
>> 				kernfs_unbreak_active_protection
>> 				// active = 0x80000002
>> ...
>> kernfs_should_drain_open_files
>> // warning occurs
>> 				kernfs_put_active
> 
> Thanks for this breakdown.
> 
>> To avoid deadlock. the commit 76bb5ab8f6e3 ("cpuset:
>> break kernfs active protection in cpuset_write_resmask()") added
>> 'kernfs_break_active_protection' in the cpuset_write_resmask. This could
>> lead to this warning.
> 
> The deadlock cycle included cpuset_hotplug_work and since that was
> removed in the said commit, there shouldn't be same deadlock possible.
> 
> Ridong, have you run your patch with CONFIG_LOCKDEP to check that
> eventuality?
> 

Yes, I tested.

>> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>> processing synchronous"), the cpuset_write_resmask no longer needs to
>> wait the hotplug to finish, which means that cpuset_write_resmask won't
>> grab the cgroup_mutex. So the deadlock doesn't exist anymore. Therefore,
>> remove kernfs_break_active_protection operation in the
>> 'cpuset_write_resmask'
>>
>> Fixes: 76bb5ab8f6e3 ("cpuset: break kernfs active protection in cpuset_write_resmask()")
> 
> This commit alone isn't sufficient to cause the warning you observed,
> right?

I think the commit 76bb5ab8f6e3 ("cpuset: break kernfs active protection
in cpuset_write_resmask()") is causing the warning I observed.

This warning was observed when removing a cpuset cgroup and writing to
cpuset.cpus concurrently. Unlike the cgroup_kn_lock_live functions,
which break active protection and grab the cgroup_mutex immediately to
avoid concurrent removal, writing to 'cpuset_write_resmask' cannot avoid
concurrent removal of the cgroup directory. Therefore, this could cause
the warning.

> As I read kernfs_break_active_protection() comment, I don't see cpuset
> code violating its conditions:
> a) it's broken/unbroken from withing a kernfs file operation handler,
> b) it pins the needed struct cpuset independently of kernfs_node (it's
>    ok to be removed)
> 
I am not sure if it is safe to call
kernfs_unbreak_active_protection(atomic_inc(&kn->active)); after the
'kn' has been removed. I don't know much about this. However, I have not
seen any Use-After-Free (UAF) issues so far.

I would be grateful if you could provide more information.

Best regards
Ridong

> All in all -- I think the particular break/unbreak pair is unncecessary
> nowadays and the warning implemented with hiding/showing kernfs files
> didn't take temporary breakage into account (only based on quick
> searching and vague memories).
> 
> Thanks,
> Michal

Re: [PATCH v1] cgroup/cpuset: remove kernfs active break

Posted by Michal Koutný 9 months, 1 week ago

On Fri, Jan 03, 2025 at 10:22:33AM +0800, Chen Ridong <chenridong@huaweicloud.com> wrote:
> I think the commit 76bb5ab8f6e3 ("cpuset: break kernfs active protection
> in cpuset_write_resmask()") is causing the warning I observed.

I was considering
bdb2fd7fc56e1 ("kernfs: Skip kernfs_drain_open_files() more aggressively") 
in conjunction (the warning didn't exist back then).

> writing to 'cpuset_write_resmask' cannot avoid concurrent removal of
> the cgroup directory. Therefore, this could cause the warning.
> 
> > As I read kernfs_break_active_protection() comment, I don't see cpuset
> > code violating its conditions:
> > a) it's broken/unbroken from withing a kernfs file operation handler,
> > b) it pins the needed struct cpuset independently of kernfs_node (it's
> >    ok to be removed)
> > 
> I am not sure if it is safe to call
> kernfs_unbreak_active_protection(atomic_inc(&kn->active)); after the
> 'kn' has been removed. 

Thit'd render the break/unbreak mechanism useless if unbreak cannot be
safely used. Users of unbreak know that they may get an inactive
reference. IOW in this part of the race:

                                                                         kernfs_unbreak_active_protection
                                                                         // active = 0x80000002
    ...
    kernfs_should_drain_open_files
    WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
                                                                         kernfs_put_active

the WARN_ON_ONCE seems misplaced if there are expected users of
inactivated reference.

For your concern about atomic_inc(&kn->active)); after the 'kn' has been
removed -- that's a different reference tracking (kn->count) and that
should be enshured by generic VFS due to existence of inode that pins
inode->i_private form kerfs_init_node().

All in all, the patch makes sense as a code cleanup (the deadlock is
gone already) but it doesn't tackle any reference underflow (I'm
bringing this up again because of CVE-2025-21634).

If anything, the warning in kernfs_should_drain_open_files() should be
reviewed. 

WDYT?

Michal

Re: [PATCH v1] cgroup/cpuset: remove kernfs active break

Posted by Chen Ridong 9 months, 1 week ago


On 2025/5/1 2:33, Michal Koutný wrote:
> On Fri, Jan 03, 2025 at 10:22:33AM +0800, Chen Ridong <chenridong@huaweicloud.com> wrote:
>> I think the commit 76bb5ab8f6e3 ("cpuset: break kernfs active protection
>> in cpuset_write_resmask()") is causing the warning I observed.
> 
> 
> I was considering
> bdb2fd7fc56e1 ("kernfs: Skip kernfs_drain_open_files() more aggressively") 
> in conjunction (the warning didn't exist back then).
> 

Thank you, Michal, I added two fixes:
Fixes: bdb2fd7fc56e ("kernfs: Skip kernfs_drain_open_files() more
aggressively")
Fixes: 76bb5ab8f6e3 ("cpuset: break kernfs active protection in
cpuset_write_resmask()")

The patch "kernfs: Relax constraint in draining guard" makes sense to me.

Thanks,
Ridong

> 
>> writing to 'cpuset_write_resmask' cannot avoid concurrent removal of
>> the cgroup directory. Therefore, this could cause the warning.
>>
>>> As I read kernfs_break_active_protection() comment, I don't see cpuset
>>> code violating its conditions:
>>> a) it's broken/unbroken from withing a kernfs file operation handler,
>>> b) it pins the needed struct cpuset independently of kernfs_node (it's
>>>    ok to be removed)
>>>
>> I am not sure if it is safe to call
>> kernfs_unbreak_active_protection(atomic_inc(&kn->active)); after the
>> 'kn' has been removed. 
> 
> Thit'd render the break/unbreak mechanism useless if unbreak cannot be
> safely used. Users of unbreak know that they may get an inactive
> reference. IOW in this part of the race:
> 
>                                                                          kernfs_unbreak_active_protection
>                                                                          // active = 0x80000002
>     ...
>     kernfs_should_drain_open_files
>     WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
>                                                                          kernfs_put_active
> 
> the WARN_ON_ONCE seems misplaced if there are expected users of
> inactivated reference.
> 
> For your concern about atomic_inc(&kn->active)); after the 'kn' has been
> removed -- that's a different reference tracking (kn->count) and that
> should be enshured by generic VFS due to existence of inode that pins
> inode->i_private form kerfs_init_node().
> 
> All in all, the patch makes sense as a code cleanup (the deadlock is
> gone already) but it doesn't tackle any reference underflow (I'm
> bringing this up again because of CVE-2025-21634).
> 
> If anything, the warning in kernfs_should_drain_open_files() should be
> reviewed. 
> 
> WDYT?
> 
> Michal

Re: [PATCH v1] cgroup/cpuset: remove kernfs active break

Posted by Waiman Long 1 year, 1 month ago

On 12/19/24 8:31 PM, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> A warning was found:
>
> WARNING: CPU: 10 PID: 3486953 at fs/kernfs/file.c:828
> CPU: 10 PID: 3486953 Comm: rmdir Kdump: loaded Tainted: G
> RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0
> RSP: 0018:ffff8881107ef9e0 EFLAGS: 00010202
> RAX: 0000000080000002 RBX: ffff888154738c00 RCX: dffffc0000000000
> RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffff888154738c04
> RBP: ffff888154738c04 R08: ffffffffaf27fa15 R09: ffffed102a8e7180
> R10: ffff888154738c07 R11: 0000000000000000 R12: ffff888154738c08
> R13: ffff888750f8c000 R14: ffff888750f8c0e8 R15: ffff888154738ca0
> FS:  00007f84cd0be740(0000) GS:ffff8887ddc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000555f9fbe00c8 CR3: 0000000153eec001 CR4: 0000000000370ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>   kernfs_drain+0x15e/0x2f0
>   __kernfs_remove+0x165/0x300
>   kernfs_remove_by_name_ns+0x7b/0xc0
>   cgroup_rm_file+0x154/0x1c0
>   cgroup_addrm_files+0x1c2/0x1f0
>   css_clear_dir+0x77/0x110
>   kill_css+0x4c/0x1b0
>   cgroup_destroy_locked+0x194/0x380
>   cgroup_rmdir+0x2a/0x140
Were you using cgroup v1 or v2 when this warning happened?
>
> It can be explained by:
> rmdir 				echo 1 > cpuset.cpus
> 				kernfs_fop_write_iter // active=0
> cgroup_rm_file
> kernfs_remove_by_name_ns	kernfs_get_active // active=1
> __kernfs_remove					  // active=0x80000002
> kernfs_drain			cpuset_write_resmask
> wait_event
> //waiting (active == 0x80000001)
> 				kernfs_break_active_protection
> 				// active = 0x80000001
> // continue
> 				kernfs_unbreak_active_protection
> 				// active = 0x80000002
> ...
> kernfs_should_drain_open_files
> // warning occurs
> 				kernfs_put_active
>
> This warning is caused by 'kernfs_break_active_protection' when it is
> writing to cpuset.cpus, and the cgroup is removed concurrently.
>
> The commit 3a5a6d0c2b03 ("cpuset: don't nest cgroup_mutex inside
> get_online_cpus()") made cpuset_hotplug_workfn asynchronous, which grabs
> the cgroup_mutex. To avoid deadlock. the commit 76bb5ab8f6e3 ("cpuset:
> break kernfs active protection in cpuset_write_resmask()") added
> 'kernfs_break_active_protection' in the cpuset_write_resmask. This could
> lead to this warning.
>
> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
> processing synchronous"), the cpuset_write_resmask no longer needs to
> wait the hotplug to finish, which means that cpuset_write_resmask won't
> grab the cgroup_mutex. So the deadlock doesn't exist anymore. Therefore,
> remove kernfs_break_active_protection operation in the
> 'cpuset_write_resmask'

The hotplug operation itself is now being done synchronously, but task 
transfer (cgroup_transfer_tasks()) because of lacking online CPUs is 
still being done asynchronously. So kernfs_break_active_protection() 
will still be needed for cgroup v1.

Cheers,
Longman

Re: [PATCH v1] cgroup/cpuset: remove kernfs active break

Posted by chenridong 1 year, 1 month ago


On 2024/12/20 10:55, Waiman Long wrote:
> On 12/19/24 8:31 PM, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> A warning was found:
>>
>> WARNING: CPU: 10 PID: 3486953 at fs/kernfs/file.c:828
>> CPU: 10 PID: 3486953 Comm: rmdir Kdump: loaded Tainted: G
>> RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0
>> RSP: 0018:ffff8881107ef9e0 EFLAGS: 00010202
>> RAX: 0000000080000002 RBX: ffff888154738c00 RCX: dffffc0000000000
>> RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffff888154738c04
>> RBP: ffff888154738c04 R08: ffffffffaf27fa15 R09: ffffed102a8e7180
>> R10: ffff888154738c07 R11: 0000000000000000 R12: ffff888154738c08
>> R13: ffff888750f8c000 R14: ffff888750f8c0e8 R15: ffff888154738ca0
>> FS:  00007f84cd0be740(0000) GS:ffff8887ddc00000(0000)
>> knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000555f9fbe00c8 CR3: 0000000153eec001 CR4: 0000000000370ee0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>>   kernfs_drain+0x15e/0x2f0
>>   __kernfs_remove+0x165/0x300
>>   kernfs_remove_by_name_ns+0x7b/0xc0
>>   cgroup_rm_file+0x154/0x1c0
>>   cgroup_addrm_files+0x1c2/0x1f0
>>   css_clear_dir+0x77/0x110
>>   kill_css+0x4c/0x1b0
>>   cgroup_destroy_locked+0x194/0x380
>>   cgroup_rmdir+0x2a/0x140
> Were you using cgroup v1 or v2 when this warning happened?

I was using cgroup v1.

>>
>> It can be explained by:
>> rmdir                 echo 1 > cpuset.cpus
>>                 kernfs_fop_write_iter // active=0
>> cgroup_rm_file
>> kernfs_remove_by_name_ns    kernfs_get_active // active=1
>> __kernfs_remove                      // active=0x80000002
>> kernfs_drain            cpuset_write_resmask
>> wait_event
>> //waiting (active == 0x80000001)
>>                 kernfs_break_active_protection
>>                 // active = 0x80000001
>> // continue
>>                 kernfs_unbreak_active_protection
>>                 // active = 0x80000002
>> ...
>> kernfs_should_drain_open_files
>> // warning occurs
>>                 kernfs_put_active
>>
>> This warning is caused by 'kernfs_break_active_protection' when it is
>> writing to cpuset.cpus, and the cgroup is removed concurrently.
>>
>> The commit 3a5a6d0c2b03 ("cpuset: don't nest cgroup_mutex inside
>> get_online_cpus()") made cpuset_hotplug_workfn asynchronous, which grabs
>> the cgroup_mutex. To avoid deadlock. the commit 76bb5ab8f6e3 ("cpuset:
>> break kernfs active protection in cpuset_write_resmask()") added
>> 'kernfs_break_active_protection' in the cpuset_write_resmask. This could
>> lead to this warning.
>>
>> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>> processing synchronous"), the cpuset_write_resmask no longer needs to
>> wait the hotplug to finish, which means that cpuset_write_resmask won't
>> grab the cgroup_mutex. So the deadlock doesn't exist anymore. Therefore,
>> remove kernfs_break_active_protection operation in the
>> 'cpuset_write_resmask'
> 
> The hotplug operation itself is now being done synchronously, but task
> transfer (cgroup_transfer_tasks()) because of lacking online CPUs is
> still being done asynchronously. So kernfs_break_active_protection()
> will still be needed for cgroup v1.
> 
> Cheers,
> Longman
> 
> 

Thank you, Longman.
IIUC, The commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
processing synchronous") deleted the 'flush_work(&cpuset_hotplug_work)'
in the cpuset_write_resmask. And I do not see any process within the
cpuset_write_resmask that will grab cgroup_mutex, except for
'flush_work(&cpuset_hotplug_work)'.

Although cgroup_transfer_tasks() is asynchronous, the
cpuset_write_resmask will not wait any work that will grab cgroup_mutex.
Consequently, the deadlock does not exist anymore.

Did I miss something?

Best regards
Ridong

Re: [PATCH v1] cgroup/cpuset: remove kernfs active break

Posted by Waiman Long 1 year, 1 month ago

On 12/19/24 11:07 PM, chenridong wrote:
>
> On 2024/12/20 10:55, Waiman Long wrote:
>> On 12/19/24 8:31 PM, Chen Ridong wrote:
>>> From: Chen Ridong <chenridong@huawei.com>
>>>
>>> A warning was found:
>>>
>>> WARNING: CPU: 10 PID: 3486953 at fs/kernfs/file.c:828
>>> CPU: 10 PID: 3486953 Comm: rmdir Kdump: loaded Tainted: G
>>> RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0
>>> RSP: 0018:ffff8881107ef9e0 EFLAGS: 00010202
>>> RAX: 0000000080000002 RBX: ffff888154738c00 RCX: dffffc0000000000
>>> RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffff888154738c04
>>> RBP: ffff888154738c04 R08: ffffffffaf27fa15 R09: ffffed102a8e7180
>>> R10: ffff888154738c07 R11: 0000000000000000 R12: ffff888154738c08
>>> R13: ffff888750f8c000 R14: ffff888750f8c0e8 R15: ffff888154738ca0
>>> FS:  00007f84cd0be740(0000) GS:ffff8887ddc00000(0000)
>>> knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000555f9fbe00c8 CR3: 0000000153eec001 CR4: 0000000000370ee0
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Call Trace:
>>>    kernfs_drain+0x15e/0x2f0
>>>    __kernfs_remove+0x165/0x300
>>>    kernfs_remove_by_name_ns+0x7b/0xc0
>>>    cgroup_rm_file+0x154/0x1c0
>>>    cgroup_addrm_files+0x1c2/0x1f0
>>>    css_clear_dir+0x77/0x110
>>>    kill_css+0x4c/0x1b0
>>>    cgroup_destroy_locked+0x194/0x380
>>>    cgroup_rmdir+0x2a/0x140
>> Were you using cgroup v1 or v2 when this warning happened?
> I was using cgroup v1.
Thanks for the confirmation.
>
>>> It can be explained by:
>>> rmdir                 echo 1 > cpuset.cpus
>>>                  kernfs_fop_write_iter // active=0
>>> cgroup_rm_file
>>> kernfs_remove_by_name_ns    kernfs_get_active // active=1
>>> __kernfs_remove                      // active=0x80000002
>>> kernfs_drain            cpuset_write_resmask
>>> wait_event
>>> //waiting (active == 0x80000001)
>>>                  kernfs_break_active_protection
>>>                  // active = 0x80000001
>>> // continue
>>>                  kernfs_unbreak_active_protection
>>>                  // active = 0x80000002
>>> ...
>>> kernfs_should_drain_open_files
>>> // warning occurs
>>>                  kernfs_put_active
>>>
>>> This warning is caused by 'kernfs_break_active_protection' when it is
>>> writing to cpuset.cpus, and the cgroup is removed concurrently.
>>>
>>> The commit 3a5a6d0c2b03 ("cpuset: don't nest cgroup_mutex inside
>>> get_online_cpus()") made cpuset_hotplug_workfn asynchronous, which grabs
>>> the cgroup_mutex. To avoid deadlock. the commit 76bb5ab8f6e3 ("cpuset:
>>> break kernfs active protection in cpuset_write_resmask()") added
>>> 'kernfs_break_active_protection' in the cpuset_write_resmask. This could
>>> lead to this warning.
>>>
>>> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>>> processing synchronous"), the cpuset_write_resmask no longer needs to
>>> wait the hotplug to finish, which means that cpuset_write_resmask won't
>>> grab the cgroup_mutex. So the deadlock doesn't exist anymore. Therefore,
>>> remove kernfs_break_active_protection operation in the
>>> 'cpuset_write_resmask'
>> The hotplug operation itself is now being done synchronously, but task
>> transfer (cgroup_transfer_tasks()) because of lacking online CPUs is
>> still being done asynchronously. So kernfs_break_active_protection()
>> will still be needed for cgroup v1.
>>
>> Cheers,
>> Longman
>>
>>
> Thank you, Longman.
> IIUC, The commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
> processing synchronous") deleted the 'flush_work(&cpuset_hotplug_work)'
> in the cpuset_write_resmask. And I do not see any process within the
> cpuset_write_resmask that will grab cgroup_mutex, except for
> 'flush_work(&cpuset_hotplug_work)'.
>
> Although cgroup_transfer_tasks() is asynchronous, the
> cpuset_write_resmask will not wait any work that will grab cgroup_mutex.
> Consequently, the deadlock does not exist anymore.
>
> Did I miss something?

Right. The flush_work() call is still needed for a different work 
function. cpuset_write_resmask() will not need to grab cgroup_mutex, but 
the asynchronously executed cgroup_transfer_tasks() will. I will work on 
a patch to fix that issue.

Cheers,
Longman

Re: [PATCH v1] cgroup/cpuset: remove kernfs active break

Posted by Chen Ridong 1 year, 1 month ago


On 2024/12/20 12:16, Waiman Long wrote:
> On 12/19/24 11:07 PM, chenridong wrote:
>>
>> On 2024/12/20 10:55, Waiman Long wrote:
>>> On 12/19/24 8:31 PM, Chen Ridong wrote:
>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>
>>>> A warning was found:
>>>>
>>>> WARNING: CPU: 10 PID: 3486953 at fs/kernfs/file.c:828
>>>> CPU: 10 PID: 3486953 Comm: rmdir Kdump: loaded Tainted: G
>>>> RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0
>>>> RSP: 0018:ffff8881107ef9e0 EFLAGS: 00010202
>>>> RAX: 0000000080000002 RBX: ffff888154738c00 RCX: dffffc0000000000
>>>> RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffff888154738c04
>>>> RBP: ffff888154738c04 R08: ffffffffaf27fa15 R09: ffffed102a8e7180
>>>> R10: ffff888154738c07 R11: 0000000000000000 R12: ffff888154738c08
>>>> R13: ffff888750f8c000 R14: ffff888750f8c0e8 R15: ffff888154738ca0
>>>> FS:  00007f84cd0be740(0000) GS:ffff8887ddc00000(0000)
>>>> knlGS:0000000000000000
>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 0000555f9fbe00c8 CR3: 0000000153eec001 CR4: 0000000000370ee0
>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> Call Trace:
>>>>    kernfs_drain+0x15e/0x2f0
>>>>    __kernfs_remove+0x165/0x300
>>>>    kernfs_remove_by_name_ns+0x7b/0xc0
>>>>    cgroup_rm_file+0x154/0x1c0
>>>>    cgroup_addrm_files+0x1c2/0x1f0
>>>>    css_clear_dir+0x77/0x110
>>>>    kill_css+0x4c/0x1b0
>>>>    cgroup_destroy_locked+0x194/0x380
>>>>    cgroup_rmdir+0x2a/0x140
>>> Were you using cgroup v1 or v2 when this warning happened?
>> I was using cgroup v1.
> Thanks for the confirmation.
>>
>>>> It can be explained by:
>>>> rmdir                 echo 1 > cpuset.cpus
>>>>                  kernfs_fop_write_iter // active=0
>>>> cgroup_rm_file
>>>> kernfs_remove_by_name_ns    kernfs_get_active // active=1
>>>> __kernfs_remove                      // active=0x80000002
>>>> kernfs_drain            cpuset_write_resmask
>>>> wait_event
>>>> //waiting (active == 0x80000001)
>>>>                  kernfs_break_active_protection
>>>>                  // active = 0x80000001
>>>> // continue
>>>>                  kernfs_unbreak_active_protection
>>>>                  // active = 0x80000002
>>>> ...
>>>> kernfs_should_drain_open_files
>>>> // warning occurs
>>>>                  kernfs_put_active
>>>>
>>>> This warning is caused by 'kernfs_break_active_protection' when it is
>>>> writing to cpuset.cpus, and the cgroup is removed concurrently.
>>>>
>>>> The commit 3a5a6d0c2b03 ("cpuset: don't nest cgroup_mutex inside
>>>> get_online_cpus()") made cpuset_hotplug_workfn asynchronous, which
>>>> grabs
>>>> the cgroup_mutex. To avoid deadlock. the commit 76bb5ab8f6e3 ("cpuset:
>>>> break kernfs active protection in cpuset_write_resmask()") added
>>>> 'kernfs_break_active_protection' in the cpuset_write_resmask. This
>>>> could
>>>> lead to this warning.
>>>>
>>>> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>>>> processing synchronous"), the cpuset_write_resmask no longer needs to
>>>> wait the hotplug to finish, which means that cpuset_write_resmask won't
>>>> grab the cgroup_mutex. So the deadlock doesn't exist anymore.
>>>> Therefore,
>>>> remove kernfs_break_active_protection operation in the
>>>> 'cpuset_write_resmask'
>>> The hotplug operation itself is now being done synchronously, but task
>>> transfer (cgroup_transfer_tasks()) because of lacking online CPUs is
>>> still being done asynchronously. So kernfs_break_active_protection()
>>> will still be needed for cgroup v1.
>>>
>>> Cheers,
>>> Longman
>>>
>>>
>> Thank you, Longman.
>> IIUC, The commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>> processing synchronous") deleted the 'flush_work(&cpuset_hotplug_work)'
>> in the cpuset_write_resmask. And I do not see any process within the
>> cpuset_write_resmask that will grab cgroup_mutex, except for
>> 'flush_work(&cpuset_hotplug_work)'.
>>
>> Although cgroup_transfer_tasks() is asynchronous, the
>> cpuset_write_resmask will not wait any work that will grab cgroup_mutex.
>> Consequently, the deadlock does not exist anymore.
>>
>> Did I miss something?
> 
> Right. The flush_work() call is still needed for a different work
> function. cpuset_write_resmask() will not need to grab cgroup_mutex, but
> the asynchronously executed cgroup_transfer_tasks() will. I will work on
> a patch to fix that issue.
> 
> Cheers,
> Longman

If flush_work() is added back, this warning still exists. Do you have a
idea to fix this warning?

Best regards
Ridong

Re: [PATCH v1] cgroup/cpuset: remove kernfs active break

Posted by Waiman Long 1 year, 1 month ago

On 12/20/24 1:11 AM, Chen Ridong wrote:
>
> On 2024/12/20 12:16, Waiman Long wrote:
>> On 12/19/24 11:07 PM, chenridong wrote:
>>> On 2024/12/20 10:55, Waiman Long wrote:
>>>> On 12/19/24 8:31 PM, Chen Ridong wrote:
>>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>>
>>>>> A warning was found:
>>>>>
>>>>> WARNING: CPU: 10 PID: 3486953 at fs/kernfs/file.c:828
>>>>> CPU: 10 PID: 3486953 Comm: rmdir Kdump: loaded Tainted: G
>>>>> RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0
>>>>> RSP: 0018:ffff8881107ef9e0 EFLAGS: 00010202
>>>>> RAX: 0000000080000002 RBX: ffff888154738c00 RCX: dffffc0000000000
>>>>> RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffff888154738c04
>>>>> RBP: ffff888154738c04 R08: ffffffffaf27fa15 R09: ffffed102a8e7180
>>>>> R10: ffff888154738c07 R11: 0000000000000000 R12: ffff888154738c08
>>>>> R13: ffff888750f8c000 R14: ffff888750f8c0e8 R15: ffff888154738ca0
>>>>> FS:  00007f84cd0be740(0000) GS:ffff8887ddc00000(0000)
>>>>> knlGS:0000000000000000
>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> CR2: 0000555f9fbe00c8 CR3: 0000000153eec001 CR4: 0000000000370ee0
>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> Call Trace:
>>>>>     kernfs_drain+0x15e/0x2f0
>>>>>     __kernfs_remove+0x165/0x300
>>>>>     kernfs_remove_by_name_ns+0x7b/0xc0
>>>>>     cgroup_rm_file+0x154/0x1c0
>>>>>     cgroup_addrm_files+0x1c2/0x1f0
>>>>>     css_clear_dir+0x77/0x110
>>>>>     kill_css+0x4c/0x1b0
>>>>>     cgroup_destroy_locked+0x194/0x380
>>>>>     cgroup_rmdir+0x2a/0x140
>>>> Were you using cgroup v1 or v2 when this warning happened?
>>> I was using cgroup v1.
>> Thanks for the confirmation.
>>>>> It can be explained by:
>>>>> rmdir                 echo 1 > cpuset.cpus
>>>>>                   kernfs_fop_write_iter // active=0
>>>>> cgroup_rm_file
>>>>> kernfs_remove_by_name_ns    kernfs_get_active // active=1
>>>>> __kernfs_remove                      // active=0x80000002
>>>>> kernfs_drain            cpuset_write_resmask
>>>>> wait_event
>>>>> //waiting (active == 0x80000001)
>>>>>                   kernfs_break_active_protection
>>>>>                   // active = 0x80000001
>>>>> // continue
>>>>>                   kernfs_unbreak_active_protection
>>>>>                   // active = 0x80000002
>>>>> ...
>>>>> kernfs_should_drain_open_files
>>>>> // warning occurs
>>>>>                   kernfs_put_active
>>>>>
>>>>> This warning is caused by 'kernfs_break_active_protection' when it is
>>>>> writing to cpuset.cpus, and the cgroup is removed concurrently.
>>>>>
>>>>> The commit 3a5a6d0c2b03 ("cpuset: don't nest cgroup_mutex inside
>>>>> get_online_cpus()") made cpuset_hotplug_workfn asynchronous, which
>>>>> grabs
>>>>> the cgroup_mutex. To avoid deadlock. the commit 76bb5ab8f6e3 ("cpuset:
>>>>> break kernfs active protection in cpuset_write_resmask()") added
>>>>> 'kernfs_break_active_protection' in the cpuset_write_resmask. This
>>>>> could
>>>>> lead to this warning.
>>>>>
>>>>> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>>>>> processing synchronous"), the cpuset_write_resmask no longer needs to
>>>>> wait the hotplug to finish, which means that cpuset_write_resmask won't
>>>>> grab the cgroup_mutex. So the deadlock doesn't exist anymore.
>>>>> Therefore,
>>>>> remove kernfs_break_active_protection operation in the
>>>>> 'cpuset_write_resmask'
>>>> The hotplug operation itself is now being done synchronously, but task
>>>> transfer (cgroup_transfer_tasks()) because of lacking online CPUs is
>>>> still being done asynchronously. So kernfs_break_active_protection()
>>>> will still be needed for cgroup v1.
>>>>
>>>> Cheers,
>>>> Longman
>>>>
>>>>
>>> Thank you, Longman.
>>> IIUC, The commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>>> processing synchronous") deleted the 'flush_work(&cpuset_hotplug_work)'
>>> in the cpuset_write_resmask. And I do not see any process within the
>>> cpuset_write_resmask that will grab cgroup_mutex, except for
>>> 'flush_work(&cpuset_hotplug_work)'.
>>>
>>> Although cgroup_transfer_tasks() is asynchronous, the
>>> cpuset_write_resmask will not wait any work that will grab cgroup_mutex.
>>> Consequently, the deadlock does not exist anymore.
>>>
>>> Did I miss something?
>> Right. The flush_work() call is still needed for a different work
>> function. cpuset_write_resmask() will not need to grab cgroup_mutex, but
>> the asynchronously executed cgroup_transfer_tasks() will. I will work on
>> a patch to fix that issue.
>>
>> Cheers,
>> Longman
> If flush_work() is added back, this warning still exists. Do you have a
> idea to fix this warning?

I was wrong. The flush_work() call isn't needed in this case and we 
shouldn't need to break kernfs protection. However, your patch 
description isn't quite right.

> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
> processing synchronous"), the cpuset_write_resmask no longer needs to
> wait the hotplug to finish, which means that cpuset_write_resmask won't
> grab the cgroup_mutex. So the deadlock doesn't exist anymore.

cpuset_write_resmask() never needs to grab the cgroup_mutex. The act of calling flush_work() can create a multiple processes circular locking dependency that involve cgroup_mutex which can cause a deadlock. After making cpuset hotplug synchronous, concurrent hotplug and cpuset operations are no longer possible. However, concurrent task transfer out of a previously empty CPU cpuset and adding CPU back to that cpuset is possible. This will result in what the comment said "keep removing tasks added
after execution capability is restored". That should be rare though and we should probably add a check in cgroup_transfer_tasks() to detect such a case and break out of it.

Cheers,
Longman

Re: [PATCH v1] cgroup/cpuset: remove kernfs active break

Posted by Chen Ridong 1 year, 1 month ago


On 2024/12/20 23:13, Waiman Long wrote:
> On 12/20/24 1:11 AM, Chen Ridong wrote:
>>
>> On 2024/12/20 12:16, Waiman Long wrote:
>>> On 12/19/24 11:07 PM, chenridong wrote:
>>>> On 2024/12/20 10:55, Waiman Long wrote:
>>>>> On 12/19/24 8:31 PM, Chen Ridong wrote:
>>>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>>>
>>>>>> A warning was found:
>>>>>>
>>>>>> WARNING: CPU: 10 PID: 3486953 at fs/kernfs/file.c:828
>>>>>> CPU: 10 PID: 3486953 Comm: rmdir Kdump: loaded Tainted: G
>>>>>> RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0
>>>>>> RSP: 0018:ffff8881107ef9e0 EFLAGS: 00010202
>>>>>> RAX: 0000000080000002 RBX: ffff888154738c00 RCX: dffffc0000000000
>>>>>> RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffff888154738c04
>>>>>> RBP: ffff888154738c04 R08: ffffffffaf27fa15 R09: ffffed102a8e7180
>>>>>> R10: ffff888154738c07 R11: 0000000000000000 R12: ffff888154738c08
>>>>>> R13: ffff888750f8c000 R14: ffff888750f8c0e8 R15: ffff888154738ca0
>>>>>> FS:  00007f84cd0be740(0000) GS:ffff8887ddc00000(0000)
>>>>>> knlGS:0000000000000000
>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> CR2: 0000555f9fbe00c8 CR3: 0000000153eec001 CR4: 0000000000370ee0
>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>> Call Trace:
>>>>>>     kernfs_drain+0x15e/0x2f0
>>>>>>     __kernfs_remove+0x165/0x300
>>>>>>     kernfs_remove_by_name_ns+0x7b/0xc0
>>>>>>     cgroup_rm_file+0x154/0x1c0
>>>>>>     cgroup_addrm_files+0x1c2/0x1f0
>>>>>>     css_clear_dir+0x77/0x110
>>>>>>     kill_css+0x4c/0x1b0
>>>>>>     cgroup_destroy_locked+0x194/0x380
>>>>>>     cgroup_rmdir+0x2a/0x140
>>>>> Were you using cgroup v1 or v2 when this warning happened?
>>>> I was using cgroup v1.
>>> Thanks for the confirmation.
>>>>>> It can be explained by:
>>>>>> rmdir                 echo 1 > cpuset.cpus
>>>>>>                   kernfs_fop_write_iter // active=0
>>>>>> cgroup_rm_file
>>>>>> kernfs_remove_by_name_ns    kernfs_get_active // active=1
>>>>>> __kernfs_remove                      // active=0x80000002
>>>>>> kernfs_drain            cpuset_write_resmask
>>>>>> wait_event
>>>>>> //waiting (active == 0x80000001)
>>>>>>                   kernfs_break_active_protection
>>>>>>                   // active = 0x80000001
>>>>>> // continue
>>>>>>                   kernfs_unbreak_active_protection
>>>>>>                   // active = 0x80000002
>>>>>> ...
>>>>>> kernfs_should_drain_open_files
>>>>>> // warning occurs
>>>>>>                   kernfs_put_active
>>>>>>
>>>>>> This warning is caused by 'kernfs_break_active_protection' when it is
>>>>>> writing to cpuset.cpus, and the cgroup is removed concurrently.
>>>>>>
>>>>>> The commit 3a5a6d0c2b03 ("cpuset: don't nest cgroup_mutex inside
>>>>>> get_online_cpus()") made cpuset_hotplug_workfn asynchronous, which
>>>>>> grabs
>>>>>> the cgroup_mutex. To avoid deadlock. the commit 76bb5ab8f6e3
>>>>>> ("cpuset:
>>>>>> break kernfs active protection in cpuset_write_resmask()") added
>>>>>> 'kernfs_break_active_protection' in the cpuset_write_resmask. This
>>>>>> could
>>>>>> lead to this warning.
>>>>>>
>>>>>> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>>>>>> processing synchronous"), the cpuset_write_resmask no longer needs to
>>>>>> wait the hotplug to finish, which means that cpuset_write_resmask
>>>>>> won't
>>>>>> grab the cgroup_mutex. So the deadlock doesn't exist anymore.
>>>>>> Therefore,
>>>>>> remove kernfs_break_active_protection operation in the
>>>>>> 'cpuset_write_resmask'
>>>>> The hotplug operation itself is now being done synchronously, but task
>>>>> transfer (cgroup_transfer_tasks()) because of lacking online CPUs is
>>>>> still being done asynchronously. So kernfs_break_active_protection()
>>>>> will still be needed for cgroup v1.
>>>>>
>>>>> Cheers,
>>>>> Longman
>>>>>
>>>>>
>>>> Thank you, Longman.
>>>> IIUC, The commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>>>> processing synchronous") deleted the 'flush_work(&cpuset_hotplug_work)'
>>>> in the cpuset_write_resmask. And I do not see any process within the
>>>> cpuset_write_resmask that will grab cgroup_mutex, except for
>>>> 'flush_work(&cpuset_hotplug_work)'.
>>>>
>>>> Although cgroup_transfer_tasks() is asynchronous, the
>>>> cpuset_write_resmask will not wait any work that will grab
>>>> cgroup_mutex.
>>>> Consequently, the deadlock does not exist anymore.
>>>>
>>>> Did I miss something?
>>> Right. The flush_work() call is still needed for a different work
>>> function. cpuset_write_resmask() will not need to grab cgroup_mutex, but
>>> the asynchronously executed cgroup_transfer_tasks() will. I will work on
>>> a patch to fix that issue.
>>>
>>> Cheers,
>>> Longman
>> If flush_work() is added back, this warning still exists. Do you have a
>> idea to fix this warning?
> 
> I was wrong. The flush_work() call isn't needed in this case and we
> shouldn't need to break kernfs protection. However, your patch
> description isn't quite right.
> 
>> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>> processing synchronous"), the cpuset_write_resmask no longer needs to
>> wait the hotplug to finish, which means that cpuset_write_resmask won't
>> grab the cgroup_mutex. So the deadlock doesn't exist anymore.
> 
> cpuset_write_resmask() never needs to grab the cgroup_mutex. The act of
> calling flush_work() can create a multiple processes circular locking
> dependency that involve cgroup_mutex which can cause a deadlock. After
> making cpuset hotplug synchronous, concurrent hotplug and cpuset
> operations are no longer possible. However, concurrent task transfer out
> of a previously empty CPU cpuset and adding CPU back to that cpuset is
> possible. This will result in what the comment said "keep removing tasks
> added
> after execution capability is restored". That should be rare though and
> we should probably add a check in cgroup_transfer_tasks() to detect such
> a case and break out of it.
> 
> Cheers,
> Longman

Hi, Longman, sorry the confused message. Do you mean this patch is
acceptable if I update the message?

I don't think we need to add a check in the cgroup_transfer_tasks
function. Because no process(except for writing cpuset.cpus, which has
been reoved) will need 'kn->active' to involve cgroup_transfer_tasks now.

Best regards,
Ridong

Re: [PATCH v1] cgroup/cpuset: remove kernfs active break

Posted by Waiman Long 1 year, 1 month ago

On 12/22/24 9:12 PM, Chen Ridong wrote:
>
> On 2024/12/20 23:13, Waiman Long wrote:
>> On 12/20/24 1:11 AM, Chen Ridong wrote:
>>> On 2024/12/20 12:16, Waiman Long wrote:
>>>> On 12/19/24 11:07 PM, chenridong wrote:
>>>>> On 2024/12/20 10:55, Waiman Long wrote:
>>>>>> On 12/19/24 8:31 PM, Chen Ridong wrote:
>>>>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>>>>
>>>>>>> A warning was found:
>>>>>>>
>>>>>>> WARNING: CPU: 10 PID: 3486953 at fs/kernfs/file.c:828
>>>>>>> CPU: 10 PID: 3486953 Comm: rmdir Kdump: loaded Tainted: G
>>>>>>> RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0
>>>>>>> RSP: 0018:ffff8881107ef9e0 EFLAGS: 00010202
>>>>>>> RAX: 0000000080000002 RBX: ffff888154738c00 RCX: dffffc0000000000
>>>>>>> RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffff888154738c04
>>>>>>> RBP: ffff888154738c04 R08: ffffffffaf27fa15 R09: ffffed102a8e7180
>>>>>>> R10: ffff888154738c07 R11: 0000000000000000 R12: ffff888154738c08
>>>>>>> R13: ffff888750f8c000 R14: ffff888750f8c0e8 R15: ffff888154738ca0
>>>>>>> FS:  00007f84cd0be740(0000) GS:ffff8887ddc00000(0000)
>>>>>>> knlGS:0000000000000000
>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> CR2: 0000555f9fbe00c8 CR3: 0000000153eec001 CR4: 0000000000370ee0
>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>> Call Trace:
>>>>>>>      kernfs_drain+0x15e/0x2f0
>>>>>>>      __kernfs_remove+0x165/0x300
>>>>>>>      kernfs_remove_by_name_ns+0x7b/0xc0
>>>>>>>      cgroup_rm_file+0x154/0x1c0
>>>>>>>      cgroup_addrm_files+0x1c2/0x1f0
>>>>>>>      css_clear_dir+0x77/0x110
>>>>>>>      kill_css+0x4c/0x1b0
>>>>>>>      cgroup_destroy_locked+0x194/0x380
>>>>>>>      cgroup_rmdir+0x2a/0x140
>>>>>> Were you using cgroup v1 or v2 when this warning happened?
>>>>> I was using cgroup v1.
>>>> Thanks for the confirmation.
>>>>>>> It can be explained by:
>>>>>>> rmdir                 echo 1 > cpuset.cpus
>>>>>>>                    kernfs_fop_write_iter // active=0
>>>>>>> cgroup_rm_file
>>>>>>> kernfs_remove_by_name_ns    kernfs_get_active // active=1
>>>>>>> __kernfs_remove                      // active=0x80000002
>>>>>>> kernfs_drain            cpuset_write_resmask
>>>>>>> wait_event
>>>>>>> //waiting (active == 0x80000001)
>>>>>>>                    kernfs_break_active_protection
>>>>>>>                    // active = 0x80000001
>>>>>>> // continue
>>>>>>>                    kernfs_unbreak_active_protection
>>>>>>>                    // active = 0x80000002
>>>>>>> ...
>>>>>>> kernfs_should_drain_open_files
>>>>>>> // warning occurs
>>>>>>>                    kernfs_put_active
>>>>>>>
>>>>>>> This warning is caused by 'kernfs_break_active_protection' when it is
>>>>>>> writing to cpuset.cpus, and the cgroup is removed concurrently.
>>>>>>>
>>>>>>> The commit 3a5a6d0c2b03 ("cpuset: don't nest cgroup_mutex inside
>>>>>>> get_online_cpus()") made cpuset_hotplug_workfn asynchronous, which
>>>>>>> grabs
>>>>>>> the cgroup_mutex. To avoid deadlock. the commit 76bb5ab8f6e3
>>>>>>> ("cpuset:
>>>>>>> break kernfs active protection in cpuset_write_resmask()") added
>>>>>>> 'kernfs_break_active_protection' in the cpuset_write_resmask. This
>>>>>>> could
>>>>>>> lead to this warning.
>>>>>>>
>>>>>>> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>>>>>>> processing synchronous"), the cpuset_write_resmask no longer needs to
>>>>>>> wait the hotplug to finish, which means that cpuset_write_resmask
>>>>>>> won't
>>>>>>> grab the cgroup_mutex. So the deadlock doesn't exist anymore.
>>>>>>> Therefore,
>>>>>>> remove kernfs_break_active_protection operation in the
>>>>>>> 'cpuset_write_resmask'
>>>>>> The hotplug operation itself is now being done synchronously, but task
>>>>>> transfer (cgroup_transfer_tasks()) because of lacking online CPUs is
>>>>>> still being done asynchronously. So kernfs_break_active_protection()
>>>>>> will still be needed for cgroup v1.
>>>>>>
>>>>>> Cheers,
>>>>>> Longman
>>>>>>
>>>>>>
>>>>> Thank you, Longman.
>>>>> IIUC, The commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>>>>> processing synchronous") deleted the 'flush_work(&cpuset_hotplug_work)'
>>>>> in the cpuset_write_resmask. And I do not see any process within the
>>>>> cpuset_write_resmask that will grab cgroup_mutex, except for
>>>>> 'flush_work(&cpuset_hotplug_work)'.
>>>>>
>>>>> Although cgroup_transfer_tasks() is asynchronous, the
>>>>> cpuset_write_resmask will not wait any work that will grab
>>>>> cgroup_mutex.
>>>>> Consequently, the deadlock does not exist anymore.
>>>>>
>>>>> Did I miss something?
>>>> Right. The flush_work() call is still needed for a different work
>>>> function. cpuset_write_resmask() will not need to grab cgroup_mutex, but
>>>> the asynchronously executed cgroup_transfer_tasks() will. I will work on
>>>> a patch to fix that issue.
>>>>
>>>> Cheers,
>>>> Longman
>>> If flush_work() is added back, this warning still exists. Do you have a
>>> idea to fix this warning?
>> I was wrong. The flush_work() call isn't needed in this case and we
>> shouldn't need to break kernfs protection. However, your patch
>> description isn't quite right.
>>
>>> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>>> processing synchronous"), the cpuset_write_resmask no longer needs to
>>> wait the hotplug to finish, which means that cpuset_write_resmask won't
>>> grab the cgroup_mutex. So the deadlock doesn't exist anymore.
>> cpuset_write_resmask() never needs to grab the cgroup_mutex. The act of
>> calling flush_work() can create a multiple processes circular locking
>> dependency that involve cgroup_mutex which can cause a deadlock. After
>> making cpuset hotplug synchronous, concurrent hotplug and cpuset
>> operations are no longer possible. However, concurrent task transfer out
>> of a previously empty CPU cpuset and adding CPU back to that cpuset is
>> possible. This will result in what the comment said "keep removing tasks
>> added
>> after execution capability is restored". That should be rare though and
>> we should probably add a check in cgroup_transfer_tasks() to detect such
>> a case and break out of it.
>>
>> Cheers,
>> Longman
> Hi, Longman, sorry the confused message. Do you mean this patch is
> acceptable if I update the message?
Sorry for the late reply. Yes, the patch is acceptable, but the patch 
description isn't quite right. Please sent out a v2.
>
> I don't think we need to add a check in the cgroup_transfer_tasks
> function. Because no process(except for writing cpuset.cpus, which has
> been reoved) will need 'kn->active' to involve cgroup_transfer_tasks now.

I agree that we don't need to add a check in cgroup_transfer_tasks().

Cheers,
Longman

Re: [PATCH v1] cgroup/cpuset: remove kernfs active break

Posted by Chen Ridong 1 year, 1 month ago


On 2025/1/2 23:38, Waiman Long wrote:
> 
> On 12/22/24 9:12 PM, Chen Ridong wrote:
>>
>> On 2024/12/20 23:13, Waiman Long wrote:
>>> On 12/20/24 1:11 AM, Chen Ridong wrote:
>>>> On 2024/12/20 12:16, Waiman Long wrote:
>>>>> On 12/19/24 11:07 PM, chenridong wrote:
>>>>>> On 2024/12/20 10:55, Waiman Long wrote:
>>>>>>> On 12/19/24 8:31 PM, Chen Ridong wrote:
>>>>>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>>>>>
>>>>>>>> A warning was found:
>>>>>>>>
>>>>>>>> WARNING: CPU: 10 PID: 3486953 at fs/kernfs/file.c:828
>>>>>>>> CPU: 10 PID: 3486953 Comm: rmdir Kdump: loaded Tainted: G
>>>>>>>> RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0
>>>>>>>> RSP: 0018:ffff8881107ef9e0 EFLAGS: 00010202
>>>>>>>> RAX: 0000000080000002 RBX: ffff888154738c00 RCX: dffffc0000000000
>>>>>>>> RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffff888154738c04
>>>>>>>> RBP: ffff888154738c04 R08: ffffffffaf27fa15 R09: ffffed102a8e7180
>>>>>>>> R10: ffff888154738c07 R11: 0000000000000000 R12: ffff888154738c08
>>>>>>>> R13: ffff888750f8c000 R14: ffff888750f8c0e8 R15: ffff888154738ca0
>>>>>>>> FS:  00007f84cd0be740(0000) GS:ffff8887ddc00000(0000)
>>>>>>>> knlGS:0000000000000000
>>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> CR2: 0000555f9fbe00c8 CR3: 0000000153eec001 CR4: 0000000000370ee0
>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>> Call Trace:
>>>>>>>>      kernfs_drain+0x15e/0x2f0
>>>>>>>>      __kernfs_remove+0x165/0x300
>>>>>>>>      kernfs_remove_by_name_ns+0x7b/0xc0
>>>>>>>>      cgroup_rm_file+0x154/0x1c0
>>>>>>>>      cgroup_addrm_files+0x1c2/0x1f0
>>>>>>>>      css_clear_dir+0x77/0x110
>>>>>>>>      kill_css+0x4c/0x1b0
>>>>>>>>      cgroup_destroy_locked+0x194/0x380
>>>>>>>>      cgroup_rmdir+0x2a/0x140
>>>>>>> Were you using cgroup v1 or v2 when this warning happened?
>>>>>> I was using cgroup v1.
>>>>> Thanks for the confirmation.
>>>>>>>> It can be explained by:
>>>>>>>> rmdir                 echo 1 > cpuset.cpus
>>>>>>>>                    kernfs_fop_write_iter // active=0
>>>>>>>> cgroup_rm_file
>>>>>>>> kernfs_remove_by_name_ns    kernfs_get_active // active=1
>>>>>>>> __kernfs_remove                      // active=0x80000002
>>>>>>>> kernfs_drain            cpuset_write_resmask
>>>>>>>> wait_event
>>>>>>>> //waiting (active == 0x80000001)
>>>>>>>>                    kernfs_break_active_protection
>>>>>>>>                    // active = 0x80000001
>>>>>>>> // continue
>>>>>>>>                    kernfs_unbreak_active_protection
>>>>>>>>                    // active = 0x80000002
>>>>>>>> ...
>>>>>>>> kernfs_should_drain_open_files
>>>>>>>> // warning occurs
>>>>>>>>                    kernfs_put_active
>>>>>>>>
>>>>>>>> This warning is caused by 'kernfs_break_active_protection' when
>>>>>>>> it is
>>>>>>>> writing to cpuset.cpus, and the cgroup is removed concurrently.
>>>>>>>>
>>>>>>>> The commit 3a5a6d0c2b03 ("cpuset: don't nest cgroup_mutex inside
>>>>>>>> get_online_cpus()") made cpuset_hotplug_workfn asynchronous, which
>>>>>>>> grabs
>>>>>>>> the cgroup_mutex. To avoid deadlock. the commit 76bb5ab8f6e3
>>>>>>>> ("cpuset:
>>>>>>>> break kernfs active protection in cpuset_write_resmask()") added
>>>>>>>> 'kernfs_break_active_protection' in the cpuset_write_resmask. This
>>>>>>>> could
>>>>>>>> lead to this warning.
>>>>>>>>
>>>>>>>> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>>>>>>>> processing synchronous"), the cpuset_write_resmask no longer
>>>>>>>> needs to
>>>>>>>> wait the hotplug to finish, which means that cpuset_write_resmask
>>>>>>>> won't
>>>>>>>> grab the cgroup_mutex. So the deadlock doesn't exist anymore.
>>>>>>>> Therefore,
>>>>>>>> remove kernfs_break_active_protection operation in the
>>>>>>>> 'cpuset_write_resmask'
>>>>>>> The hotplug operation itself is now being done synchronously, but
>>>>>>> task
>>>>>>> transfer (cgroup_transfer_tasks()) because of lacking online CPUs is
>>>>>>> still being done asynchronously. So kernfs_break_active_protection()
>>>>>>> will still be needed for cgroup v1.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Longman
>>>>>>>
>>>>>>>
>>>>>> Thank you, Longman.
>>>>>> IIUC, The commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>>>>>> processing synchronous") deleted the
>>>>>> 'flush_work(&cpuset_hotplug_work)'
>>>>>> in the cpuset_write_resmask. And I do not see any process within the
>>>>>> cpuset_write_resmask that will grab cgroup_mutex, except for
>>>>>> 'flush_work(&cpuset_hotplug_work)'.
>>>>>>
>>>>>> Although cgroup_transfer_tasks() is asynchronous, the
>>>>>> cpuset_write_resmask will not wait any work that will grab
>>>>>> cgroup_mutex.
>>>>>> Consequently, the deadlock does not exist anymore.
>>>>>>
>>>>>> Did I miss something?
>>>>> Right. The flush_work() call is still needed for a different work
>>>>> function. cpuset_write_resmask() will not need to grab
>>>>> cgroup_mutex, but
>>>>> the asynchronously executed cgroup_transfer_tasks() will. I will
>>>>> work on
>>>>> a patch to fix that issue.
>>>>>
>>>>> Cheers,
>>>>> Longman
>>>> If flush_work() is added back, this warning still exists. Do you have a
>>>> idea to fix this warning?
>>> I was wrong. The flush_work() call isn't needed in this case and we
>>> shouldn't need to break kernfs protection. However, your patch
>>> description isn't quite right.
>>>
>>>> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>>>> processing synchronous"), the cpuset_write_resmask no longer needs to
>>>> wait the hotplug to finish, which means that cpuset_write_resmask won't
>>>> grab the cgroup_mutex. So the deadlock doesn't exist anymore.
>>> cpuset_write_resmask() never needs to grab the cgroup_mutex. The act of
>>> calling flush_work() can create a multiple processes circular locking
>>> dependency that involve cgroup_mutex which can cause a deadlock. After
>>> making cpuset hotplug synchronous, concurrent hotplug and cpuset
>>> operations are no longer possible. However, concurrent task transfer out
>>> of a previously empty CPU cpuset and adding CPU back to that cpuset is
>>> possible. This will result in what the comment said "keep removing tasks
>>> added
>>> after execution capability is restored". That should be rare though and
>>> we should probably add a check in cgroup_transfer_tasks() to detect such
>>> a case and break out of it.
>>>
>>> Cheers,
>>> Longman
>> Hi, Longman, sorry the confused message. Do you mean this patch is
>> acceptable if I update the message?
> Sorry for the late reply. Yes, the patch is acceptable, but the patch
> description isn't quite right. Please sent out a v2.

Thank you, I will update.

Best regards,
Ridong

>>
>> I don't think we need to add a check in the cgroup_transfer_tasks
>> function. Because no process(except for writing cpuset.cpus, which has
>> been reoved) will need 'kn->active' to involve cgroup_transfer_tasks now.
> 
> I agree that we don't need to add a check in cgroup_transfer_tasks().
> 
> Cheers,
> Longman
>