[v5] cgroup: replace global percpu_rwsem with per threadgroup resem when writing to cgroup.procs

[PATCH v5 2/3] cgroup: relocate cgroup_attach_lock within cgroup_procs_write_start

Posted by Yi Tao 3 weeks, 1 day ago

Later patches will introduce a new parameter `task` to
cgroup_attach_lock, thus adjusting the position of cgroup_attach_lock
within cgroup_procs_write_start.

Between obtaining the threadgroup leader via PID and acquiring the
cgroup attach lock, the threadgroup leader may change, which could lead
to incorrect cgroup migration. Therefore, after acquiring the cgroup
attach lock, we check whether the threadgroup leader has changed, and if
so, retry the operation.

Signed-off-by: Yi Tao <escape@linux.alibaba.com>
---
 kernel/cgroup/cgroup.c | 61 ++++++++++++++++++++++++++----------------
 1 file changed, 38 insertions(+), 23 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 2b88c7abaa00..756807164091 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2994,29 +2994,13 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
 	if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0)
 		return ERR_PTR(-EINVAL);
 
-	/*
-	 * If we migrate a single thread, we don't care about threadgroup
-	 * stability. If the thread is `current`, it won't exit(2) under our
-	 * hands or change PID through exec(2). We exclude
-	 * cgroup_update_dfl_csses and other cgroup_{proc,thread}s_write
-	 * callers by cgroup_mutex.
-	 * Therefore, we can skip the global lock.
-	 */
-	lockdep_assert_held(&cgroup_mutex);
-
-	if (pid || threadgroup)
-		*lock_mode = CGRP_ATTACH_LOCK_GLOBAL;
-	else
-		*lock_mode = CGRP_ATTACH_LOCK_NONE;
-
-	cgroup_attach_lock(*lock_mode);
-
+retry_find_task:
 	rcu_read_lock();
 	if (pid) {
 		tsk = find_task_by_vpid(pid);
 		if (!tsk) {
 			tsk = ERR_PTR(-ESRCH);
-			goto out_unlock_threadgroup;
+			goto out_unlock_rcu;
 		}
 	} else {
 		tsk = current;
@@ -3033,15 +3017,46 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
 	 */
 	if (tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY)) {
 		tsk = ERR_PTR(-EINVAL);
-		goto out_unlock_threadgroup;
+		goto out_unlock_rcu;
 	}
 
 	get_task_struct(tsk);
-	goto out_unlock_rcu;
+	rcu_read_unlock();
+
+	/*
+	 * If we migrate a single thread, we don't care about threadgroup
+	 * stability. If the thread is `current`, it won't exit(2) under our
+	 * hands or change PID through exec(2). We exclude
+	 * cgroup_update_dfl_csses and other cgroup_{proc,thread}s_write
+	 * callers by cgroup_mutex.
+	 * Therefore, we can skip the global lock.
+	 */
+	lockdep_assert_held(&cgroup_mutex);
+
+	if (pid || threadgroup)
+		*lock_mode = CGRP_ATTACH_LOCK_GLOBAL;
+	else
+		*lock_mode = CGRP_ATTACH_LOCK_NONE;
+
+	cgroup_attach_lock(*lock_mode);
+
+	if (threadgroup) {
+		if (!thread_group_leader(tsk)) {
+			/*
+			 * a race with de_thread from another thread's exec()
+			 * may strip us of our leadership, if this happens,
+			 * there is no choice but to throw this task away and
+			 * try again; this is
+			 * "double-double-toil-and-trouble-check locking".
+			 */
+			cgroup_attach_unlock(*lock_mode);
+			put_task_struct(tsk);
+			goto retry_find_task;
+		}
+	}
+
+	return tsk;
 
-out_unlock_threadgroup:
-	cgroup_attach_unlock(*lock_mode);
-	*lock_mode = CGRP_ATTACH_LOCK_NONE;
 out_unlock_rcu:
 	rcu_read_unlock();
 	return tsk;
-- 
2.32.0.3.g01195cf9f

Re: [PATCH v5 2/3] cgroup: relocate cgroup_attach_lock within cgroup_procs_write_start

Posted by Waiman Long 3 weeks ago

On 9/10/25 2:59 AM, Yi Tao wrote:
> Later patches will introduce a new parameter `task` to
> cgroup_attach_lock, thus adjusting the position of cgroup_attach_lock
> within cgroup_procs_write_start.
>
> Between obtaining the threadgroup leader via PID and acquiring the
> cgroup attach lock, the threadgroup leader may change, which could lead
> to incorrect cgroup migration. Therefore, after acquiring the cgroup
> attach lock, we check whether the threadgroup leader has changed, and if
> so, retry the operation.
>
> Signed-off-by: Yi Tao <escape@linux.alibaba.com>
> ---
>   kernel/cgroup/cgroup.c | 61 ++++++++++++++++++++++++++----------------
>   1 file changed, 38 insertions(+), 23 deletions(-)
>
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 2b88c7abaa00..756807164091 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -2994,29 +2994,13 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
>   	if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0)
>   		return ERR_PTR(-EINVAL);
>   
> -	/*
> -	 * If we migrate a single thread, we don't care about threadgroup
> -	 * stability. If the thread is `current`, it won't exit(2) under our
> -	 * hands or change PID through exec(2). We exclude
> -	 * cgroup_update_dfl_csses and other cgroup_{proc,thread}s_write
> -	 * callers by cgroup_mutex.
> -	 * Therefore, we can skip the global lock.
> -	 */
> -	lockdep_assert_held(&cgroup_mutex);
> -
> -	if (pid || threadgroup)
> -		*lock_mode = CGRP_ATTACH_LOCK_GLOBAL;
> -	else
> -		*lock_mode = CGRP_ATTACH_LOCK_NONE;
> -
> -	cgroup_attach_lock(*lock_mode);
> -
> +retry_find_task:
>   	rcu_read_lock();
>   	if (pid) {
>   		tsk = find_task_by_vpid(pid);
>   		if (!tsk) {
>   			tsk = ERR_PTR(-ESRCH);
> -			goto out_unlock_threadgroup;
> +			goto out_unlock_rcu;
>   		}
>   	} else {
>   		tsk = current;
> @@ -3033,15 +3017,46 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
>   	 */
>   	if (tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY)) {
>   		tsk = ERR_PTR(-EINVAL);
> -		goto out_unlock_threadgroup;
> +		goto out_unlock_rcu;
>   	}
>   
>   	get_task_struct(tsk);
> -	goto out_unlock_rcu;
> +	rcu_read_unlock();
> +
> +	/*
> +	 * If we migrate a single thread, we don't care about threadgroup
> +	 * stability. If the thread is `current`, it won't exit(2) under our
> +	 * hands or change PID through exec(2). We exclude
> +	 * cgroup_update_dfl_csses and other cgroup_{proc,thread}s_write
> +	 * callers by cgroup_mutex.
> +	 * Therefore, we can skip the global lock.
> +	 */
> +	lockdep_assert_held(&cgroup_mutex);
> +
> +	if (pid || threadgroup)
> +		*lock_mode = CGRP_ATTACH_LOCK_GLOBAL;
> +	else
> +		*lock_mode = CGRP_ATTACH_LOCK_NONE;
> +
> +	cgroup_attach_lock(*lock_mode);
> +
> +	if (threadgroup) {
> +		if (!thread_group_leader(tsk)) {
Nit: You can combine the 2 conditions together to avoid excessive indent.

  if (threadgroup && !thread_group_leader(tsk)) {

> +			/*
> +			 * a race with de_thread from another thread's exec()
Should be "de_thread()" to signal that it is a function.
> +			 * may strip us of our leadership, if this happens,
> +			 * there is no choice but to throw this task away and
> +			 * try again; this is
> +			 * "double-double-toil-and-trouble-check locking".

This "double-double-toil-and-trouble-check" is a new term in the kernel 
source tree. I will suggest to use something simpler to avoid confusion.

Cheers, Longman

Re: [PATCH v5 2/3] cgroup: relocate cgroup_attach_lock within cgroup_procs_write_start

Posted by Tejun Heo 3 weeks, 1 day ago

On Wed, Sep 10, 2025 at 02:59:34PM +0800, Yi Tao wrote:
> Later patches will introduce a new parameter `task` to
> cgroup_attach_lock, thus adjusting the position of cgroup_attach_lock
> within cgroup_procs_write_start.
> 
> Between obtaining the threadgroup leader via PID and acquiring the
> cgroup attach lock, the threadgroup leader may change, which could lead
> to incorrect cgroup migration. Therefore, after acquiring the cgroup
> attach lock, we check whether the threadgroup leader has changed, and if
> so, retry the operation.
> 
> Signed-off-by: Yi Tao <escape@linux.alibaba.com>

Applied to cgroup/for-6.18 with minor comment adjustments.

Thanks.

-- 
tejun

[PATCH v5 1/3] cgroup: refactor the cgroup_attach_lock code to make it clearer
[PATCH v5 2/3] cgroup: relocate cgroup_attach_lock within cgroup_procs_write_start
[PATCH v5 3/3] cgroup: replace global percpu_rwsem with per threadgroup resem when writing to cgroup.procs